# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm#MID), thank goodness we can search for these things.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

**Search for every operator with 'dirt' in their name, including abandoned mines.**

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests as rq


In [2]:
data = {
    "OperSearch":"dirt",
    "Abandoned":"No",
    "MineName":'',
    'StateSearch':'None',
    'CM':'All',
    'x':'0',
    'y':'0',
    'MC':"Opersearch",
}

headers = {
    "Referer": 'https://arlweb.msha.gov/drs/drshome.htm'
}

response = rq.post("https://arlweb.msha.gov/drs/ASP/OprNameStatesearch.asp", data = data, headers = headers)


In [3]:
response


<Response [200]>

In [4]:
doc = BeautifulSoup(response.text, "html.parser")
len(doc)


86

### What is the tag and class name for every row of data?

In [7]:
operator_tags = doc.find_all('tr')[7:]
# operator_tags
for tag in operator_tags :
    print(tag.text.strip())


3503598

OR 
 Newberg Rock & Dirt  
Newberg Rock & Dirt
Surface             
M 
Active  
Crushed, Broken Stone NEC
4801789

ND 
AM Dirtworks & Aggregate Sales  
AM Dirtworks & Aggregate Sales
Surface             
M 
Intermittent  
Construction Sand and Gravel
5001797

AK 
Dirt Company  
Bush Pilot
Surface             
M 
Intermittent  
Construction Sand and Gravel
4608254

WV 
Dirt Con  
Hog Lick Quarry
Surface             
M 
Temporarily Idled  
Crushed, Broken Limestone NEC
2103723

MN 
Dirt Doctor Inc  
Rock Lake Plant
Surface             
M 
Intermittent  
Construction Sand and Gravel
4104757

TX 
Dirt Works  
Portable #1
Surface             
M 
Intermittent  
Construction Sand and Gravel
0801306

FL 
Holley Dirt Company, Inc  
River Road Pit
Surface             
M 
Active  
Sand, Common
3901432

SD 
Krueger Brothers Gravel & Dirt  
PORTABLE SCREENER
Surface             
M 
Intermittent  
Construction Sand and Gravel
3609624

PA 
M R Dirt  
Forbes Pit
Surface             
M 
Interm

### What is the tag and class name for every mine operator's name?

In [26]:
operator_tags = doc.find_all('tr')[7:]
for operator in operator_tags :
    td_list = operator.find_all('td')
    if len(td_list) >= 3 :
        print(td_list[2].text)
        

 Newberg Rock & Dirt  
AM Dirtworks & Aggregate Sales  
Dirt Company  
Dirt Con  
Dirt Doctor Inc  
Dirt Works  
Holley Dirt Company, Inc  
Krueger Brothers Gravel & Dirt  
M R Dirt  
M.R. Dirt Inc.  
P B Dirt Movers, Inc  
PB Dirt Movers  
PB Dirt Movers, Inc  
Prescott Dirt, LLC  
R D Blankenship Dirt Work LLC  
SIMPSON DIRTWORX LLC  
SIMPSON DIRTWORX LLC  
Spry's Dirt & Gravel, Inc.  
Vogt Dirt Service  


### What is the tag and class name for every mine's name?

In [29]:
operator_tags = doc.find_all('tr')[7:]
for mine_name in operator_tags :
    if len(mine_name.find_all('td')) >= 4 :
        print(mine_name.find_all('td')[3].text)


Newberg Rock & Dirt
AM Dirtworks & Aggregate Sales
Bush Pilot
Hog Lick Quarry
Rock Lake Plant
Portable #1
River Road Pit
PORTABLE SCREENER
Forbes Pit
Camptown Quarry
Fedscreek Surface
Mine No. 6
Surface Mine No. 1
Sandretto Drive
R D BLANKENSHIP DIRT WORK
Pettibone Jaw Crusher
Chieftan 1400
Mike's Money Pit
Crusher


## Being lazy

If you only needed these results, what would you do instead of scraping them?

In [None]:
# Just copy and paste in a excel file


## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [None]:
# import pandas as pd
# from bs4 import BeautifulSoup
# import requests as rq


## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[-1].text` to get the text of the last `<tr>` element.

- If the result starts with **Total Number of Mines Found**, you were successful.

In [11]:
doc_mine = BeautifulSoup(response.text, "html.parser")
print(doc_mine.find_all('tr')[-1].text.strip()  )


Total Number of Mines Found:  19


## Actually scraping

### Hopefully you know that each `tr` is supposed to be your data. What is the index of the first row element that is actually a result?

`.text` will help you here.

In [12]:
operator_tags = doc.find_all('tr')[6]
print(operator_tags.text.strip()  )


ID
State
Operator
Mine Name
Type
CM*
Status
Commodity
More Info


### Loop through each operator result, printing its name

Use LIST SLICING to skip the non-data row(s).

In [None]:
# operator_tags = doc.find_all('tr')[7:]
# for operator in operator_tags :
#     if len(operator.find_all('td')) >= 3 :
#         print(operator.find_all('td')[2].text)


### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [21]:
operator_tags = doc.find_all('tr')[7:]
for operator in operator_tags :
        print(operator.find_all('td')[0].text)





3503598



4801789



5001797



4608254



2103723



4104757



0801306



3901432



3609624



3609931



1519799



4407296



4407270



0203332



2901986



4300768



4300776



2302283



2103518

Total Number of Mines Found:  19


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

In [31]:
operator_dictionary = []

for element in operator_tags[:-1]:
    operator_row = {}
    td_list = element.find_all('td')
    Operator_ID = td_list[0].text.strip()
    if Operator_ID :
#         print(Operator_ID)
        operator_row['Operator_ID'] = Operator_ID
    Operator_name = td_list[2].text.strip()
    if Operator_name :
#         print(Operator_name)
        operator_row['Operator_name'] = Operator_name
    Mine_name = td_list[3].text.strip()
    if Mine_name :
#         print(Mine_name)
        operator_row['Mine_name'] = Mine_name
    State = td_list[1].text.strip()
    if State :
#         print(State)
        operator_row['State'] = State
    Mine_type = td_list[4].text.strip()
    if Mine_type :
#         print(Mine_type)
        operator_row['Mine_type'] = Mine_type
    Coal_or_metal = td_list[5].text.strip()
    if Coal_or_metal :
#         print(Coal_or_metal)
        operator_row['Coal_or_metal'] = Coal_or_metal
    Status = td_list[6].text.strip()
    if Status :
#         print(Status)
        operator_row['Status'] = Status
    Commodity = td_list[7].text.strip()
    if Commodity :
#         print(Commodity)
        operator_row['Commodity'] = Commodity
    operator_dictionary.append(operator_row)
#     print(operator_row)
#     print("-----")
operator_dictionary



[{'Coal_or_metal': 'M',
  'Commodity': 'Crushed, Broken Stone NEC',
  'Mine_name': 'Newberg Rock & Dirt',
  'Mine_type': 'Surface',
  'Operator_ID': '3503598',
  'Operator_name': 'Newberg Rock & Dirt',
  'State': 'OR',
  'Status': 'Active'},
 {'Coal_or_metal': 'M',
  'Commodity': 'Construction Sand and Gravel',
  'Mine_name': 'AM Dirtworks & Aggregate Sales',
  'Mine_type': 'Surface',
  'Operator_ID': '4801789',
  'Operator_name': 'AM Dirtworks & Aggregate Sales',
  'State': 'ND',
  'Status': 'Intermittent'},
 {'Coal_or_metal': 'M',
  'Commodity': 'Construction Sand and Gravel',
  'Mine_name': 'Bush Pilot',
  'Mine_type': 'Surface',
  'Operator_ID': '5001797',
  'Operator_name': 'Dirt Company',
  'State': 'AK',
  'Status': 'Intermittent'},
 {'Coal_or_metal': 'M',
  'Commodity': 'Crushed, Broken Limestone NEC',
  'Mine_name': 'Hog Lick Quarry',
  'Mine_type': 'Surface',
  'Operator_ID': '4608254',
  'Operator_name': 'Dirt Con',
  'State': 'WV',
  'Status': 'Temporarily Idled'},
 {'Coal_

### Save that to a CSV

In [32]:
import pandas as pd
df = pd.DataFrame(operator_dictionary)
df.head()
df.to_csv("operator_dictionary.csv", index=False)



### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [33]:
operator_dictionary_df = pd.read_csv("operator_dictionary.csv")
operator_dictionary_df.head()


Unnamed: 0,Coal_or_metal,Commodity,Mine_name,Mine_type,Operator_ID,Operator_name,State,Status
0,M,"Crushed, Broken Stone NEC",Newberg Rock & Dirt,Surface,3503598,Newberg Rock & Dirt,OR,Active
1,M,Construction Sand and Gravel,AM Dirtworks & Aggregate Sales,Surface,4801789,AM Dirtworks & Aggregate Sales,ND,Intermittent
2,M,Construction Sand and Gravel,Bush Pilot,Surface,5001797,Dirt Company,AK,Intermittent
3,M,"Crushed, Broken Limestone NEC",Hog Lick Quarry,Surface,4608254,Dirt Con,WV,Temporarily Idled
4,M,Construction Sand and Gravel,Rock Lake Plant,Surface,2103723,Dirt Doctor Inc,MN,Intermittent
