# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm), thank goodness we can search for these things.

## Setup: Import what you'll need to search and scrape and Selenium

In [54]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

In [55]:
driver = webdriver.Chrome()

## Starting from `https://arlweb.msha.gov/drs/drshome.htm`, search for every operator with 'dirt' in their name, including abandoned mines.

> - *Tip: If you can't make an element work using name, class or ID, try to use the XPath*

In [56]:
driver.get('https://arlweb.msha.gov/drs/drshome.htm')

In [57]:
text_input = driver.find_element_by_name('OperSearch')

In [58]:
driver.execute_script("arguments[0].scrollIntoView(true)", text_input)

In [59]:
text_input.send_keys('dirt')

In [60]:
search_button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[3]/td[3]/table/tbody/tr/td/input')
search_button.click()

In [61]:
button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[7]/td[3]/input[1]')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

In [81]:
contractors = driver.find_elements_by_tag_name('tr')
for contractor in contractors:
    print ('_______')
    print(contractor.text)

_______
Operator Name or Mine Name
Search  
_______
Abandoned*
Indicates Mine is Abandoned and Sealed
*CM (Coal or Metal Mine/Nonmetal Mine)
C
M ...... Coal
...... Metal/Nonmetal
_______
Abandoned*
_______
Indicates Mine is Abandoned and Sealed
_______
*CM (Coal or Metal Mine/Nonmetal Mine)
_______
C
M ...... Coal
...... Metal/Nonmetal
_______
ID State Operator Mine Name Type CM* Status Commodity More Info
_______
3503598
OR  Newberg Rock & Dirt   Newberg Rock & Dirt Surface M  Active  Crushed, Broken Stone NEC 
_______
0502030
CO  Allied Dirt Moving Company   Allied Dirt Moving Co Pit & Plant Surface M  Abandoned  Construction Sand and Gravel 
_______
4801789
ND  AM Dirtworks & Aggregate Sales   AM Dirtworks & Aggregate Sales Surface M  Abandoned  Construction Sand and Gravel 
_______
4201449
UT  Atlas-Dirty Devil Mining   Unit Train Loading Facility Facility C  Abandoned  Coal (Bituminous) 
_______
4201450
UT  Atlas-Dirty Devil Mining   Blackie Surface Mine & Prep Plant Surface C  Ab

PA  M R Dirt   Forbes Pit Surface M  Temporarily Idled  Construction Sand and Gravel 
_______
3800709
SC  M.C. Dirt LLC   Middleton Site Surface M  Abandoned  Sand, Common 
_______
3609931
PA  M.R. Dirt Inc.   Camptown Quarry Surface M  Intermittent  Dimension Stone NEC 
_______
1601257
LA  Maurice Dirt & Sand   Maurice Dirt And Sand Surface M  Abandoned  Construction Sand and Gravel 
_______
0801275
FL  Mc Dirt Industries Inc   BELLVIEW Surface M  Abandoned  Construction Sand and Gravel 
_______
1601379
LA  Mike Duhon Dirt Pit   REDS PIT Surface M  Abandoned  Construction Sand and Gravel 
_______
1601380
LA  Mike Duhon Dirt Pit   WAINWRIGHT PIT Surface M  Abandoned  Construction Sand and Gravel 
_______
1601381
LA  Mike Duhon Dirt Pit   COX PIT Surface M  Abandoned  Construction Sand and Gravel 
_______
1601134
LA  Moss Dirt Company   Cook Pit Surface M  Abandoned  Construction Sand and Gravel 
_______
1601165
LA  Moss Dirt Company   Moss Dirt Pit Surface M  Abandoned  Construction Sa

## Scrape the results page, saving it as `dirt-operators.csv`

> - *Tip: Think about what each row in your dataset will be, and start by looping through that*
> - *Tip: Printing is cool and good! Print everything! Move it into a dictionary later.*
> - *Tip: If you don't want a row, think about what's in the row that makes it different. You can use an `if` statement or list slicing to skip the ones you aren't interested in.*
> - *Tip: Make sure your dictionary and your loop variable have DIFFERENT NAMES*
> - *Tip: After you've made your dictionary (and printed it, of course), you'll want to add it to your list of rows*
> - *Tip: Be sure to import pandas to convert it to a dataframe*
> - *Tip: Make sure you don't include the index when saving your dataframe*

### Hopefully you know that each `tr` is supposed to be a row of your data. What is the index of the first row element that is actually a result?

> - *Tip: `.text` will help you here.*
> - *Tip: You aren't interesting in annotations or anything, just mines and where they are from*
> - *Tip: Using `print("-----")` will help you keep track of different rows*
> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third. You can use this to skip ahead to the 'good' data if you want*

In [79]:
contractors = driver.find_elements_by_tag_name('tr')
for contractor in contractors:
    print ('_______')
    print(contractor.text)

_______
Operator Name or Mine Name
Search  
_______
Abandoned*
Indicates Mine is Abandoned and Sealed
*CM (Coal or Metal Mine/Nonmetal Mine)
C
M ...... Coal
...... Metal/Nonmetal
_______
Abandoned*
_______
Indicates Mine is Abandoned and Sealed
_______
*CM (Coal or Metal Mine/Nonmetal Mine)
_______
C
M ...... Coal
...... Metal/Nonmetal
_______
ID State Operator Mine Name Type CM* Status Commodity More Info
_______
3503598
OR  Newberg Rock & Dirt   Newberg Rock & Dirt Surface M  Active  Crushed, Broken Stone NEC 
_______
0502030
CO  Allied Dirt Moving Company   Allied Dirt Moving Co Pit & Plant Surface M  Abandoned  Construction Sand and Gravel 
_______
4801789
ND  AM Dirtworks & Aggregate Sales   AM Dirtworks & Aggregate Sales Surface M  Abandoned  Construction Sand and Gravel 
_______
4201449
UT  Atlas-Dirty Devil Mining   Unit Train Loading Facility Facility C  Abandoned  Coal (Bituminous) 
_______
4201450
UT  Atlas-Dirty Devil Mining   Blackie Surface Mine & Prep Plant Surface C  Ab

3609624
PA  M R Dirt   Forbes Pit Surface M  Temporarily Idled  Construction Sand and Gravel 
_______
3800709
SC  M.C. Dirt LLC   Middleton Site Surface M  Abandoned  Sand, Common 
_______
3609931
PA  M.R. Dirt Inc.   Camptown Quarry Surface M  Intermittent  Dimension Stone NEC 
_______
1601257
LA  Maurice Dirt & Sand   Maurice Dirt And Sand Surface M  Abandoned  Construction Sand and Gravel 
_______
0801275
FL  Mc Dirt Industries Inc   BELLVIEW Surface M  Abandoned  Construction Sand and Gravel 
_______
1601379
LA  Mike Duhon Dirt Pit   REDS PIT Surface M  Abandoned  Construction Sand and Gravel 
_______
1601380
LA  Mike Duhon Dirt Pit   WAINWRIGHT PIT Surface M  Abandoned  Construction Sand and Gravel 
_______
1601381
LA  Mike Duhon Dirt Pit   COX PIT Surface M  Abandoned  Construction Sand and Gravel 
_______
1601134
LA  Moss Dirt Company   Cook Pit Surface M  Abandoned  Construction Sand and Gravel 
_______
1601165
LA  Moss Dirt Company   Moss Dirt Pit Surface M  Abandoned  Constru

### Loop through each operator result, printing its name

> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third.*
> - *Tip: You can use list slicing or an `if` statement to skip the non-data row(s). List slicing is probably easier, even if you aren't comfortable with it.*
> - *Tip: or honestly you can use `try` and `except` if you know how it works.*
> - *Tip: Once you have the "right" rows of data, you're going to be looking for a certain tag inside*
> - *Tip: Sometimes you can't say "give me this class," and instead you have to say "give me all of the `div` elements, and then give me the third one."*

### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [73]:
contractors = driver.find_elements_by_tag_name('tr')
for contractor in contractors[7:-1]:
    items = contractor.find_elements_by_tag_name('td')
    print ('_______')
    print (items[0].text)

_______
3503598
_______
0502030
_______
4801789
_______
4201449
_______
4201450
_______
1002257
_______
1601167
_______
4103265
_______
1401575
_______
1700776
_______
1601251
_______
0301963
_______
1601082
_______
3401751
_______
1600916
_______
3401211
_______
0301267
_______
1600956
_______
2200033
_______
0504953
_______
3401929
_______
1302445
_______
1601106
_______
3400915
_______
1600983
_______
4503200
_______
3401266
_______
3401468
_______
5001797
_______
4608254
_______
1510279
_______
2103723
_______
0100776
_______
4104016
_______
2103914
_______
4104757
_______
0301729
_______
0404851
_______
2200734
_______
5002028
_______
1513393
_______
3800602
_______
3101630
_______
3200860
_______
3401762
_______
2103517
_______
2402626
_______
2103181
_______
1601124
_______
1601150
_______
4703427
_______
0801306
_______
2501216
_______
3200965
_______
2901371
_______
2901544
_______
2901709
_______
4102355
_______
4102420
_______
4102869
_______
4102951
_______
4102958
_______


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

> - *Tip: Start with an empty dictionary, then add the keys one at a time like we did during class*
> - *Tip: You might want to save all of the cells in a variable, then use indexes to get the second, third, fourth, etc.*
> - *Tip: I know you already skipped a bunch of rows already, but one of them still might be bad! Which one is it? How can you skip it? You might need to slice out some of the end of your list, too. Use `print` to help you debug, or just look at the page closely.*
> - *Tip: Or, if you did the other homework already, `try` / `except` is also an option*

In [74]:
rows = []

contractors = driver.find_elements_by_tag_name('tr')
for contractor in contractors[7:-1]:
    items1 = contractor.find_elements_by_tag_name('td')
    
    row = {}
    
    row['ID'] = items1[0].text
    row['State'] = items1[1].text
    row['Operator']=items1[2].text
    row['Mine Name']=items1[3].text
    row['Type']=items1[4].text
    row['CM']=items1[5].text
    row['Status']=items1[6].text
    row['Commodity']=items1[7].text

    rows.append(row)

### Save that to a CSV named `dirt-operators.csv`

In [82]:
import pandas as pd

df = pd.DataFrame(rows)

In [84]:
df.to_csv("CONTRACTORS-mines.csv")

### Open the CSV file and examine the first few.

Make sure you didn't save that extra weird unnamed index column.