# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm#MID), thank goodness we can search for these things.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

**Search for every operator with 'dirt' in their name, including abandoned mines.**

### What is the tag and class name for every row of data?

In [2]:
## tbody

### What is the tag and class name for every mine operator's name?

In [9]:
## the third td

[<td width="30%"><a href="/drs/drshome.htm"><img alt="Mine Data Retrieval System" border="0" height="75" src="/drs/images/drslogo.png" width="300"/></a></td>,
 <td width="30%"> </td>]

### What is the tag and class name for every mine's name?

In [None]:
# the fourth td

### What is the tag and class name for every mine operator's name?

In [None]:
## the third td

### What is the tag and class name for every mine operator's name?

In [None]:
## the third td

## Being lazy

If you only needed these results, what would you do instead of scraping them?

In [None]:
## Copy paste into excel

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [27]:
import requests as rq
from bs4 import BeautifulSoup


## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[-1].text` to get the text of the last `<tr>` element.

- If the result starts with **Total Number of Mines Found**, you were successful.

In [29]:
data = {
"OperSearch": "dirt",
"MineName": "",
"StateSearch": "None", 
"CM": "All",
"x": "20",
"y": "14",
"MC": "Opersearch"
}



In [31]:
url = "https://arlweb.msha.gov/drs/ASP/OprNameStatesearch.asp"
response = rq.post(url, data=data)
doc = BeautifulSoup(response.text, "html.parser")

In [32]:
doc.find_all('tr')[-1].text

'\nTotal Number of Mines Found:\xa0\xa0129'

## Actually scraping

### Hopefully you know that each `tr` is supposed to be your data. What is the index of the first row element that is actually a result?

`.text` will help you here.

In [66]:
doc.find_all("tr")[7].text


'\n\n\n3503598\n\nOR\xa0\n Newberg Rock & Dirt \xa0\nNewberg Rock & Dirt\nSurface             \nM\xa0\nActive\xa0 \nCrushed, Broken Stone NEC\xa0 \n'

### Loop through each operator result, printing its name

Use LIST SLICING to skip the non-data row(s).

In [87]:
for i in doc.find_all("tr")[7:-1]:
    operature_name = i.find_all("font")[2]
    for n in operature_name:
        print(n)

 DNT 
 Newberg Rock & Dirt
 /DNT 
  
 DNT 
Allied Dirt Moving Company
 /DNT 
  
 DNT 
AM Dirtworks & Aggregate Sales
 /DNT 
  
 DNT 
Atlas-Dirty Devil Mining
 /DNT 
  
 DNT 
Atlas-Dirty Devil Mining
 /DNT 
  
 DNT 
Babe's Dirt Work
 /DNT 
  
 DNT 
Bar-Lin Dirt Company
 /DNT 
  
 DNT 
Barber'S Dirt Pit
 /DNT 
  
 DNT 
Bender Sand & Dirt
 /DNT 
  
 DNT 
BERT'S DIRT
 /DNT 
  
 DNT 
Big D Dirt Service Inc
 /DNT 
  
 DNT 
Big Red Dirt Farm LLC
 /DNT 
  
 DNT 
Big River Dirt Pit
 /DNT 
  
 DNT 
Bob Harris Dirt Contracting
 /DNT 
  
 DNT 
Bohannon Sand & Dirt
 /DNT 
  
 DNT 
Bratcher'S Sand & Dirt
 /DNT 
  
 DNT 
Brewer Dirt Works
 /DNT 
  
 DNT 
Buck'S Dirt Pit
 /DNT 
  
 DNT 
C & G Dirt Hauling
 /DNT 
  
 DNT 
C N C Dirt Movers, Inc.
 /DNT 
  
 DNT 
Cambridge Dirt Sand and Gravel LLC
 /DNT 
  
 DNT 
Central Iowa Dirt & Demo LLC
 /DNT 
  
 DNT 
Crowes Trucking & Dirt Pit Services
 /DNT 
  
 DNT 
D & H Dirt
 /DNT 
  
 DNT 
Diez Dirt & Sand Hauling Inc
 /DNT 
  
 DNT 
Dirt Cheap
 /DNT 
  
 DNT

### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [85]:
for i in doc.find_all("tr")[7:-1]:
    ID_name = i.find_all("font")[0]
    for n in ID_name:
        print(n)

3503598
0502030
4801789
4201449
4201450
1002257
1601167
4103265
1401575
1700776
1601251
0301963
1601082
3401751
1600916
3401211
0301267
1600956
2200033
0504953
3401929
1302445
1601106
3400915
1600983
4503200
3401266
3401468
5001797
4608254
1510279
2103723
0100776
4104016
4104757
0301729
0404851
2200734
5002028
1513393
3800602
3101630
3200860
3401762
2103517
2402626
2103181
1601124
1601150
4703427
0801306
2501216
3200965
2901371
2901544
2901709
4102355
4102420
4102869
4102951
4102958
4104876
3003502
4103258
3901432
2103556
1601250
1600908
1600953
4104185
2901536
3609624
3800709
3609931
1601257
0801275
1601379
1601380
1601381
1601134
1601165
3901042
1601194
4104054
4801674
2402474
1600920
4102955
4103107
1512530
1515619
1518318
4405366
4407196
1519685
1519799
4407003
2602570
2402503
4407296
1519273
4407270
4102682
0801259
0203332
0302015
2901986
1601127
4105017
1600986
4103324
4202013
0801371
2402115
4300748
4300768
4300776
0103209
1601159
2302283
4102586
4104475
3800617
1601234
4104648


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

In [92]:
dirt = []
for i in doc.find_all("tr")[7:-1]:
    dictionary = {}
    ID_name = i.find_all("font")[0]
    if ID_name:
        print(ID_name.text)
        dictionary["ID"] = ID_name.text.strip()
    State = i.find_all("font")[1]
    if State:
        print(State.text)
        dictionary["State"] = State.text.strip()
    operature_name = i.find_all("font")[2]
    if operature_name:
        print(operature_name.text)
        dictionary["Operature"] = operature_name.text.strip()
    Mine_name = i.find_all("font")[3]
    if Mine_name:
        print(Mine_name.text)
        dictionary["Mine Name"] = Mine_name.text.strip()
    Type = i.find_all("font")[4]
    if Type:
        print(Type.text)
        dictionary["Type"] = Type.text.strip()
    CM = i.find_all("font")[5]
    if CM:
        print(CM.text)
        dictionary["CM"] = CM.text.strip()
    Status = i.find_all("font")[6]
    if Status:
        print(Status.text)
        dictionary["Status"] = Status.text.strip()
    Commodity = i.find_all("font")[7]
    if Commodity:
        print(Commodity.text)
        dictionary["Commodity"] = Commodity.text.strip()
    dirt.append(dictionary)

    


3503598
OR 
 Newberg Rock & Dirt  
Newberg Rock & Dirt
Surface             
M 
Active  
Crushed, Broken Stone NEC  
0502030
CO 
Allied Dirt Moving Company  
Allied Dirt Moving Co Pit & Plant
Surface             
M 
Abandoned  
Construction Sand and Gravel  
4801789
ND 
AM Dirtworks & Aggregate Sales  
AM Dirtworks & Aggregate Sales
Surface             
M 
Intermittent  
Construction Sand and Gravel  
4201449
UT 
Atlas-Dirty Devil Mining  
Unit Train Loading Facility
Facility            
C 
Abandoned  
Coal (Bituminous)  
4201450
UT 
Atlas-Dirty Devil Mining  
Blackie Surface Mine & Prep Plant
Surface             
C 
Abandoned  
Coal (Bituminous)  
1002257
ID 
Babe's Dirt Work  
Hitt Pit, Inc.
Surface             
M 
Abandoned  
Construction Sand and Gravel  
1601167
LA 
Bar-Lin Dirt Company  
Bar-Lin Dirt Pit
Surface             
M 
Abandoned  
Construction Sand and Gravel  
4103265
TX 
Barber'S Dirt Pit  
Barber'S Dirt Pit
Surface             
M 
Abandoned  
Construction Sand and Grav

In [98]:
dirt

[{'CM': 'M',
  'Commodity': 'Crushed, Broken Stone NEC',
  'ID': '3503598',
  'Mine Name': 'Newberg Rock & Dirt',
  'Operature': 'Newberg Rock & Dirt',
  'State': 'OR',
  'Status': 'Active',
  'Type': 'Surface'},
 {'CM': 'M',
  'Commodity': 'Construction Sand and Gravel',
  'ID': '0502030',
  'Mine Name': 'Allied Dirt Moving Co Pit & Plant',
  'Operature': 'Allied Dirt Moving Company',
  'State': 'CO',
  'Status': 'Abandoned',
  'Type': 'Surface'},
 {'CM': 'M',
  'Commodity': 'Construction Sand and Gravel',
  'ID': '4801789',
  'Mine Name': 'AM Dirtworks & Aggregate Sales',
  'Operature': 'AM Dirtworks & Aggregate Sales',
  'State': 'ND',
  'Status': 'Intermittent',
  'Type': 'Surface'},
 {'CM': 'C',
  'Commodity': 'Coal (Bituminous)',
  'ID': '4201449',
  'Mine Name': 'Unit Train Loading Facility',
  'Operature': 'Atlas-Dirty Devil Mining',
  'State': 'UT',
  'Status': 'Abandoned',
  'Type': 'Facility'},
 {'CM': 'C',
  'Commodity': 'Coal (Bituminous)',
  'ID': '4201450',
  'Mine Name'

### Save that to a CSV

In [101]:
import pandas as pd

df = pd.DataFrame(dirt)

df.head()
df.to_csv("dirt.csv", index = False)


### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [102]:
df_dirt = pd.read_csv("dirt.csv")
df_dirt.head()

Unnamed: 0,CM,Commodity,ID,Mine Name,Operature,State,Status,Type
0,M,"Crushed, Broken Stone NEC",3503598,Newberg Rock & Dirt,Newberg Rock & Dirt,OR,Active,Surface
1,M,Construction Sand and Gravel,502030,Allied Dirt Moving Co Pit & Plant,Allied Dirt Moving Company,CO,Abandoned,Surface
2,M,Construction Sand and Gravel,4801789,AM Dirtworks & Aggregate Sales,AM Dirtworks & Aggregate Sales,ND,Intermittent,Surface
3,C,Coal (Bituminous),4201449,Unit Train Loading Facility,Atlas-Dirty Devil Mining,UT,Abandoned,Facility
4,C,Coal (Bituminous),4201450,Blackie Surface Mine & Prep Plant,Atlas-Dirty Devil Mining,UT,Abandoned,Surface
