# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm#MID), thank goodness we can search for these things.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

**Search for every operator with 'dirt' in their name, including abandoned mines.**

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
data = {
    "OperSearch":"dirt",
    "Abandoned":"No",
    "MineName":"",
    "StateSearch":"None",
    "CM":"All",
    "x":"0",
    "y":"0",
    "MC":"Opersearch",
}

In [3]:
response = requests.post("https://arlweb.msha.gov/drs/ASP/OprNameStatesearch.asp", data=data)

doc = BeautifulSoup(response.text, "html.parser")


### What is the tag and class name for every row of data?

Every row of data is in the "tr" tag.

### What is the tag and class name for every mine operator's name?

Operator's name is in the 3rd "td" tag

## Being lazy

If you only needed these results, what would you do instead of scraping them?

In [4]:
# export it to excel and be happy ever after

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [5]:
#already did up there.

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[-1].text` to get the text of the last `<tr>` element.

- If the result starts with **Total Number of Mines Found**, you were successful.

In [6]:
doc.find_all('tr')[-1].text

'\nTotal Number of Mines Found:\xa0\xa019'

## Actually scraping

### Hopefully you know that each `tr` is supposed to be your data. What is the index of the first row element that is actually a result?

`.text` will help you here.

In [7]:
doc.find_all('tr')[7].find_all('td')[0].text

'\n\n3503598\n'

In [8]:
doc.find_all('tr')[7].find_all('td')[0].text.strip()

'3503598'

### Loop through each operator result, printing its name

Use LIST SLICING to skip the non-data row(s).

In [9]:
rows = doc.find_all("tr")

for element in rows[7:26]:
    operator = element.find_all('td')[2]
    #if operator:
        #print(operator.text)
    print(operator.text)
    print("-----")



 Newberg Rock & Dirt  
-----
AM Dirtworks & Aggregate Sales  
-----
Dirt Company  
-----
Dirt Con  
-----
Dirt Doctor Inc  
-----
Dirt Works  
-----
Holley Dirt Company, Inc  
-----
Krueger Brothers Gravel & Dirt  
-----
M R Dirt  
-----
M.R. Dirt Inc.  
-----
P B Dirt Movers, Inc  
-----
PB Dirt Movers  
-----
PB Dirt Movers, Inc  
-----
Prescott Dirt, LLC  
-----
R D Blankenship Dirt Work LLC  
-----
SIMPSON DIRTWORX LLC  
-----
SIMPSON DIRTWORX LLC  
-----
Spry's Dirt & Gravel, Inc.  
-----
Vogt Dirt Service  
-----


### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [10]:
for element in rows[7:26]:
    ID = element.find_all('td')[0]
    print(ID.text)
    print("-----")



3503598

-----


4801789

-----


5001797

-----


4608254

-----


2103723

-----


4104757

-----


0801306

-----


3901432

-----


3609624

-----


3609931

-----


1519799

-----


4407296

-----


4407270

-----


0203332

-----


2901986

-----


4300768

-----


4300776

-----


2302283

-----


2103518

-----


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

In [11]:
all_info = []
for element in rows[7:26]:
    current = {}
    ID = element.find_all('td')[0]
    state= element.find_all('td')[1]
    operator = element.find_all('td')[2]
    mine = element.find_all('td')[3]
    mine_type = element.find_all('td')[4]
    COM= element.find_all('td')[5]
    status = element.find_all('td')[6]
    commodity = element.find_all('td')[7]
    current['ID'] = ID.text.strip()
    current['state'] = state.text.strip()
    current['operator'] = operator.text.strip()
    current['mine'] = mine.text.strip()
    current['mine_type'] = mine_type.text.rstrip()
    current['COM'] = COM.text.strip()
    current['status'] = status.text.strip()
    current['commodity'] = commodity.text.strip()
    all_info.append(current)
print(all_info[0])

{'ID': '3503598', 'state': 'OR', 'operator': 'Newberg Rock & Dirt', 'mine': 'Newberg Rock & Dirt', 'mine_type': 'Surface', 'COM': 'M', 'status': 'Active', 'commodity': 'Crushed, Broken Stone NEC'}


### Save that to a CSV

In [12]:
import pandas as pd

In [13]:
df = pd.DataFrame(all_info)
df.to_csv("Mine_info.csv" , index = False)
df

Unnamed: 0,COM,ID,commodity,mine,mine_type,operator,state,status
0,M,3503598,"Crushed, Broken Stone NEC",Newberg Rock & Dirt,Surface,Newberg Rock & Dirt,OR,Active
1,M,4801789,Construction Sand and Gravel,AM Dirtworks & Aggregate Sales,Surface,AM Dirtworks & Aggregate Sales,ND,Intermittent
2,M,5001797,Construction Sand and Gravel,Bush Pilot,Surface,Dirt Company,AK,Intermittent
3,M,4608254,"Crushed, Broken Limestone NEC",Hog Lick Quarry,Surface,Dirt Con,WV,Temporarily Idled
4,M,2103723,Construction Sand and Gravel,Rock Lake Plant,Surface,Dirt Doctor Inc,MN,Intermittent
5,M,4104757,Construction Sand and Gravel,Portable #1,Surface,Dirt Works,TX,Intermittent
6,M,801306,"Sand, Common",River Road Pit,Surface,"Holley Dirt Company, Inc",FL,Active
7,M,3901432,Construction Sand and Gravel,PORTABLE SCREENER,Surface,Krueger Brothers Gravel & Dirt,SD,Intermittent
8,M,3609624,Construction Sand and Gravel,Forbes Pit,Surface,M R Dirt,PA,Intermittent
9,M,3609931,Dimension Stone NEC,Camptown Quarry,Surface,M.R. Dirt Inc.,PA,Intermittent


### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [14]:
df.head()

Unnamed: 0,COM,ID,commodity,mine,mine_type,operator,state,status
0,M,3503598,"Crushed, Broken Stone NEC",Newberg Rock & Dirt,Surface,Newberg Rock & Dirt,OR,Active
1,M,4801789,Construction Sand and Gravel,AM Dirtworks & Aggregate Sales,Surface,AM Dirtworks & Aggregate Sales,ND,Intermittent
2,M,5001797,Construction Sand and Gravel,Bush Pilot,Surface,Dirt Company,AK,Intermittent
3,M,4608254,"Crushed, Broken Limestone NEC",Hog Lick Quarry,Surface,Dirt Con,WV,Temporarily Idled
4,M,2103723,Construction Sand and Gravel,Rock Lake Plant,Surface,Dirt Doctor Inc,MN,Intermittent
