# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm), thank goodness we can search for these things.

## Setup: Import what you'll need to search and scrape and Selenium

In [15]:
import requests
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

## Starting from `https://arlweb.msha.gov/drs/drshome.htm`, search for every operator with 'dirt' in their name, including abandoned mines.

> - *Tip: If you can't make an element work using name, class or ID, try to use the XPath*

In [16]:
driver = webdriver.Chrome()
driver.get('https://arlweb.msha.gov/drs/drshome.htm')

In [17]:
text_input = driver.find_element_by_name('OperSearch')
text_input.send_keys('dirt')

tick_box = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[3]/td[3]/table/tbody/tr/td/input')
tick_box.click()

In [18]:
button = driver.find_element_by_xpath('//*[@id="content"]/form[1]/table/tbody/tr[7]/td[3]/input[1]')
button.click()

## Scrape the results page, saving it as `dirt-operators.csv`

> - *Tip: Think about what each row in your dataset will be, and start by looping through that*
> - *Tip: Printing is cool and good! Print everything! Move it into a dictionary later.*
> - *Tip: If you don't want a row, think about what's in the row that makes it different. You can use an `if` statement or list slicing to skip the ones you aren't interested in.*
> - *Tip: Make sure your dictionary and your loop variable have DIFFERENT NAMES*
> - *Tip: After you've made your dictionary (and printed it, of course), you'll want to add it to your list of rows*
> - *Tip: Be sure to import pandas to convert it to a dataframe*
> - *Tip: Make sure you don't include the index when saving your dataframe*

### Hopefully you know that each `tr` is supposed to be a row of your data. What is the index of the first row element that is actually a result?

> - *Tip: `.text` will help you here.*
> - *Tip: You aren't interesting in annotations or anything, just mines and where they are from*
> - *Tip: Using `print("-----")` will help you keep track of different rows*
> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third. You can use this to skip ahead to the 'good' data if you want*

In [23]:
operators = driver.find_elements_by_tag_name('tr')
for operator in operators[7:]:
    print(operator.text)

3503598
OR  Newberg Rock & Dirt   Newberg Rock & Dirt Surface M  Active  Crushed, Broken Stone NEC 
0502030
CO  Allied Dirt Moving Company   Allied Dirt Moving Co Pit & Plant Surface M  Abandoned  Construction Sand and Gravel 
4801789
ND  AM Dirtworks & Aggregate Sales   AM Dirtworks & Aggregate Sales Surface M  Abandoned  Construction Sand and Gravel 
4201449
UT  Atlas-Dirty Devil Mining   Unit Train Loading Facility Facility C  Abandoned  Coal (Bituminous) 
4201450
UT  Atlas-Dirty Devil Mining   Blackie Surface Mine & Prep Plant Surface C  Abandoned  Coal (Bituminous) 
1002257
ID  Babe's Dirt Work   Hitt Pit, Inc. Surface M  Abandoned  Construction Sand and Gravel 
1601167
LA  Bar-Lin Dirt Company   Bar-Lin Dirt Pit Surface M  Abandoned  Construction Sand and Gravel 
4103265
TX  Barber'S Dirt Pit   Barber'S Dirt Pit Surface M  Abandoned  Construction Sand and Gravel 
1401575
KS  Bender Sand & Dirt   BENDER SAND & DIRT Surface M  Intermittent  Construction Sand and Gravel 
1700776
ME 

1601194
LA  Nelson & Sons Dirt Haulers Inc   Nelson & Sons Dirt Haulers Incorporated Surface M  Abandoned  Construction Sand and Gravel 
4104054
TX  Nelson'S Dirt Pit   NELSON'S DIRT PIT Surface M  Abandoned  Construction Sand and Gravel 
4801674
WY  Nicholson Dirt Contracting   Eagle One Surface M  Abandoned  Construction Sand and Gravel 
2402474
MT  Nitty Gritty Dirt LLC   Rolling Glen Ranch Sand and Gravel Surface M  Abandoned  Construction Sand and Gravel 
1600920
LA  Northest Louisiana Dirt Contractors   Calhoun And Mcguire Pit Surface M  Abandoned  Construction Sand and Gravel 
4102955
TX  Orvil Carter Dirt Contractor Inc   Ed Meek Pit Surface M  Abandoned  Crushed, Broken Limestone NEC 
4103107
TX  Orvil Carter Dirt Contractor Inc   Seitz Pit Surface M  Abandoned  Crushed, Broken Limestone NEC 
1512530
KY  P B Dirt Movers Inc   No 1 Surface Surface C  Abandoned  Coal (Bituminous) 
1515619
KY  P B Dirt Movers Inc   No 5 Surface Surface C  Abandoned  Coal (Bituminous) 
1518318
KY 

In [34]:
operators = driver.find_elements_by_tag_name('tr')
for operator in operators[7:]:
    columns = operator.find_elements_by_tag_name('td')
    print(columns[2].text)

Newberg Rock & Dirt  
Allied Dirt Moving Company  
AM Dirtworks & Aggregate Sales  
Atlas-Dirty Devil Mining  
Atlas-Dirty Devil Mining  
Babe's Dirt Work  
Bar-Lin Dirt Company  
Barber'S Dirt Pit  
Bender Sand & Dirt  
BERT'S DIRT  
Big D Dirt Service Inc  
Big Red Dirt Farm LLC  
Big River Dirt Pit  
Bob Harris Dirt Contracting  
Bohannon Sand & Dirt  
Bratcher'S Sand & Dirt  
Brewer Dirt Works  
Buck'S Dirt Pit  
C & G Dirt Hauling  
C N C Dirt Movers, Inc.  
Cambridge Dirt Sand and Gravel LLC  
Central Iowa Dirt & Demo LLC  
Crowes Trucking & Dirt Pit Services  
D & H Dirt  
Diez Dirt & Sand Hauling Inc  
Dirt Cheap  
Dirt Company  
Dirt Company  
Dirt Company  
Dirt Con  
Dirt Diggers Inc  
Dirt Doctor Inc  
Dirt Inc  
Dirt Pit  
Dirt Work Specialists LLC  
Dirt Works  
Dirtco Inc  
Dirtman Trucking  
DIRTWORKS, INC.  
Dirtworks, Inc.  
Dirty Coal  
Dorchester Dirt Company Inc  
Douglas Dirt Sand & Gravel Company  
Ell Dirt Works LLC.  
Floyd Smith Dirt Pit  
Gary Kelm Dirt Servi

IndexError: list index out of range

### Loop through each operator result, printing its name

> - *Tip: If you have a list called `animals`, `animals[2:]` will skip the first two and start with the third.*
> - *Tip: You can use list slicing or an `if` statement to skip the non-data row(s). List slicing is probably easier, even if you aren't comfortable with it.*
> - *Tip: or honestly you can use `try` and `except` if you know how it works.*
> - *Tip: Once you have the "right" rows of data, you're going to be looking for a certain tag inside*
> - *Tip: Sometimes you can't say "give me this class," and instead you have to say "give me all of the `div` elements, and then give me the third one."*

In [36]:
operators = driver.find_elements_by_tag_name('tr')
for operator in operators[7:]:
    column_ID = operator.find_elements_by_tag_name('td')
    print(column_ID[0].text)

3503598
0502030
4801789
4201449
4201450
1002257
1601167
4103265
1401575
1700776
1601251
0301963
1601082
3401751
1600916
3401211
0301267
1600956
2200033
0504953
3401929
1302445
1601106
3400915
1600983
4503200
3401266
3401468
5001797
4608254
1510279
2103723
0100776
4104016
2103914
4104757
0301729
0404851
2200734
5002028
1513393
3800602
3101630
3200860
3401762
2103517
2402626
2103181
1601124
1601150
4703427
0801306
2501216
3200965
2901371
2901544
2901709
4102355
4102420
4102869
4102951
4102958
4104876
3003502
4103258
3901432
2103556
1601250
1600908
1600953
4104185
2901536
3609624
3800709
3609931
1601257
0801275
1601379
1601380
1601381
1601134
1601165
3901042
1601194
4104054
4801674
2402474
1600920
4102955
4103107
1512530
1515619
1518318
4405366
4407196
1519685
1519799
4407379
4407003
2602570
2402503
4407296
1519273
4407270
4102682
0801259
0203332
0302015
2901986
1601127
4105017
1600986
4103324
4202013
0801417
0801371
2402115
4300748
4300768
4300776
0103209
1601159
2302283
4102586
4104475


### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

> - *Tip: Start with an empty dictionary, then add the keys one at a time like we did during class*
> - *Tip: You might want to save all of the cells in a variable, then use indexes to get the second, third, fourth, etc.*
> - *Tip: I know you already skipped a bunch of rows already, but one of them still might be bad! Which one is it? How can you skip it? You might need to slice out some of the end of your list, too. Use `print` to help you debug, or just look at the page closely.*
> - *Tip: Or, if you did the other homework already, `try` / `except` is also an option*

In [62]:
operators = driver.find_elements_by_tag_name('tr')
rows = []

for operator in operators[7:]:
    row = {}
    
    column_ID = operator.find_elements_by_tag_name('td')[0]
    #print(column_ID.text)
    row['column_ID'] = column_ID.text
    
    column_state = operator.find_elements_by_tag_name('td')[1]
    #print(column_state.text)
    row['column_state'] = column_state.text
    
    column_name = operator.find_elements_by_tag_name('td')[2]
    #print(column_name.text)
    row['column_name'] = column_name.text
    
    column_mine_name = operator.find_elements_by_tag_name('td')[3]
    #print(column_mine_name.text)
    row['column_mine_name'] = column_mine_name.text
        
    column_type = operator.find_elements_by_tag_name('td')[4]
    #print(column_type.text)
    row['column_type'] = column_type.text
        
    column_coal_metal = operator.find_elements_by_tag_name('td')[5]
    #print(column_coal_metal.text)
    row['column_coal_metal'] = column_coal_metal.text
    
    column_status = operator.find_elements_by_tag_name('td')[6]
    #print(column_status.text)
    row['column_status'] = column_status.text
    
    column_commodity = operator.find_elements_by_tag_name('td')[7]
    #print(column_commodity.text)
    row['column_commodity'] = column_commodity.text
    
    print('My dictionary looks like', row)
    rows.append(row)
    

My dictionary looks like {'column_ID': '3503598', 'column_state': 'OR ', 'column_name': 'Newberg Rock & Dirt  ', 'column_mine_name': 'Newberg Rock & Dirt', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Active ', 'column_commodity': 'Crushed, Broken Stone NEC '}
My dictionary looks like {'column_ID': '0502030', 'column_state': 'CO ', 'column_name': 'Allied Dirt Moving Company  ', 'column_mine_name': 'Allied Dirt Moving Co Pit & Plant', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '4801789', 'column_state': 'ND ', 'column_name': 'AM Dirtworks & Aggregate Sales  ', 'column_mine_name': 'AM Dirtworks & Aggregate Sales', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '4201449', 'column_state': 'UT ', 'column_nam

My dictionary looks like {'column_ID': '4608254', 'column_state': 'WV ', 'column_name': 'Dirt Con  ', 'column_mine_name': 'Hog Lick Quarry', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Crushed, Broken Limestone NEC '}
My dictionary looks like {'column_ID': '1510279', 'column_state': 'KY ', 'column_name': 'Dirt Diggers Inc  ', 'column_mine_name': 'Debco Mine', 'column_type': 'Surface', 'column_coal_metal': 'C ', 'column_status': 'Abandoned ', 'column_commodity': 'Coal (Bituminous) '}
My dictionary looks like {'column_ID': '2103723', 'column_state': 'MN ', 'column_name': 'Dirt Doctor Inc  ', 'column_mine_name': 'Rock Lake Plant', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Intermittent ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '0100776', 'column_state': 'AL ', 'column_name': 'Dirt Inc  ', 'column_mine_name': 'Harrison Pit', 'column_type': 'Surface',

My dictionary looks like {'column_ID': '4102420', 'column_state': 'TX ', 'column_name': 'Jake Diel Dirt & Paving Inc  ', 'column_mine_name': 'Black Pit', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Crushed, Broken Limestone NEC '}
My dictionary looks like {'column_ID': '4102869', 'column_state': 'TX ', 'column_name': 'Jake Diel Dirt & Paving Inc  ', 'column_mine_name': 'Bailey Pit', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Crushed, Broken Stone NEC '}
My dictionary looks like {'column_ID': '4102951', 'column_state': 'TX ', 'column_name': 'Jake Diel Dirt & Paving Inc  ', 'column_mine_name': 'Toscosa Pit', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Crushed, Broken Limestone NEC '}
My dictionary looks like {'column_ID': '4102958', 'column_state': 'TX ', 'column_name': 'Jake Diel Dirt & Paving Inc  ', 'col

My dictionary looks like {'column_ID': '1600920', 'column_state': 'LA ', 'column_name': 'Northest Louisiana Dirt Contractors  ', 'column_mine_name': 'Calhoun And Mcguire Pit', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '4102955', 'column_state': 'TX ', 'column_name': 'Orvil Carter Dirt Contractor Inc  ', 'column_mine_name': 'Ed Meek Pit', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Crushed, Broken Limestone NEC '}
My dictionary looks like {'column_ID': '4103107', 'column_state': 'TX ', 'column_name': 'Orvil Carter Dirt Contractor Inc  ', 'column_mine_name': 'Seitz Pit', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Crushed, Broken Limestone NEC '}
My dictionary looks like {'column_ID': '1512530', 'column_state': 'KY ', 'column_name': 'P 

My dictionary looks like {'column_ID': '2402115', 'column_state': 'MT ', 'column_name': 'Sierra Rock & Dirt, Inc.  ', 'column_mine_name': 'Sierra Rock & Dirt Inc', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '4300748', 'column_state': 'VT ', 'column_name': 'Simpson Dirtworx llc  ', 'column_mine_name': 'Simpson Sand and Gravel', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Abandoned ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '4300768', 'column_state': 'VT ', 'column_name': 'SIMPSON DIRTWORX LLC  ', 'column_mine_name': 'Pettibone Jaw Crusher', 'column_type': 'Surface', 'column_coal_metal': 'M ', 'column_status': 'Intermittent ', 'column_commodity': 'Construction Sand and Gravel '}
My dictionary looks like {'column_ID': '4300776', 'column_state': 'VT ', 'column_name': 'SIMPSON DIRTW

IndexError: list index out of range

# Save that to a CSV named `dirt-operators.csv`

In [72]:
df = pd.DataFrame(rows)
df.head(5)

Unnamed: 0,column_ID,column_coal_metal,column_commodity,column_mine_name,column_name,column_state,column_status,column_type
0,3503598,M,"Crushed, Broken Stone NEC",Newberg Rock & Dirt,Newberg Rock & Dirt,OR,Active,Surface
1,502030,M,Construction Sand and Gravel,Allied Dirt Moving Co Pit & Plant,Allied Dirt Moving Company,CO,Abandoned,Surface
2,4801789,M,Construction Sand and Gravel,AM Dirtworks & Aggregate Sales,AM Dirtworks & Aggregate Sales,ND,Abandoned,Surface
3,4201449,C,Coal (Bituminous),Unit Train Loading Facility,Atlas-Dirty Devil Mining,UT,Abandoned,Facility
4,4201450,C,Coal (Bituminous),Blackie Surface Mine & Prep Plant,Atlas-Dirty Devil Mining,UT,Abandoned,Surface


In [68]:
df.to_csv('dirt_operators.csv', index=False)

### Open the CSV file and examine the first few.

Make sure you didn't save that extra weird unnamed index column.

In [73]:
df = pd.read_csv('dirt_operators.csv')
df.head(5)

Unnamed: 0,column_ID,column_coal_metal,column_commodity,column_mine_name,column_name,column_state,column_status,column_type
0,3503598,M,"Crushed, Broken Stone NEC",Newberg Rock & Dirt,Newberg Rock & Dirt,OR,Active,Surface
1,502030,M,Construction Sand and Gravel,Allied Dirt Moving Co Pit & Plant,Allied Dirt Moving Company,CO,Abandoned,Surface
2,4801789,M,Construction Sand and Gravel,AM Dirtworks & Aggregate Sales,AM Dirtworks & Aggregate Sales,ND,Abandoned,Surface
3,4201449,C,Coal (Bituminous),Unit Train Loading Facility,Atlas-Dirty Devil Mining,UT,Abandoned,Facility
4,4201450,C,Coal (Bituminous),Blackie Surface Mine & Prep Plant,Atlas-Dirty Devil Mining,UT,Abandoned,Surface
