# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm#MID), thank goodness we can search for these things.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

**Search for every operator with 'dirt' in their name, including abandoned mines.**

In [37]:
import requests
from bs4 import BeautifulSoup

In [38]:
response = requests.post('https://arlweb.msha.gov/drs/ASP/OprNameStatesearch.asp', data=data)
doc = BeautifulSoup(response.text, 'html.parser')

In [39]:
data = {
    'OperSearch': 'dirt',
    'Abandoned': 'No',
    'MineName': '',
    'StateSearch': 'None',
    'CM': 'All',
    'x':'0',
    'y':'0',
    'MC':'Opersearch'
}

### What is the tag and class name for every row of data?

In [74]:
#Every row of data is in the 'tr'-tag

### What is the tag and class name for every mine operator's name?

In [76]:
#Every mine operators name is in the second 'td'-tag

### What is the tag and class name for every mine's name?

In [78]:
#Every mine's name is in the third 'td'-tag

## Being lazy

If you only needed these results, what would you do instead of scraping them?

In [None]:
# export it to excel and be happy ever after 

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [79]:
import requests
from bs4 import BeautifulSoup

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[-1].text` to get the text of the last `<tr>` element.

- If the result starts with **Total Number of Mines Found**, you were successful.

In [80]:
doc.find_all('tr')[-1].text

'\nTotal Number of Mines Found:\xa0\xa019'

## Actually scraping

### Hopefully you know that each `tr` is supposed to be your data. What is the index of the first row element that is actually a result?

`.text` will help you here.

In [111]:
doc.find_all('tr')[7].find_all('td')[0].text.strip()

'3503598'

### Loop through each operator result, printing its name

Use LIST SLICING to skip the non-data row(s).

In [145]:
operators = doc.find_all('tr')

for operator in operators[7:26]:
    name = operator.find_all('td')[3]
    print(name.text)
    print('------')

Newberg Rock & Dirt
------
AM Dirtworks & Aggregate Sales
------
Bush Pilot
------
Hog Lick Quarry
------
Rock Lake Plant
------
Portable #1
------
River Road Pit
------
PORTABLE SCREENER
------
Forbes Pit
------
Camptown Quarry
------
Fedscreek Surface
------
Mine No. 6
------
Surface Mine No. 1
------
Sandretto Drive
------
R D BLANKENSHIP DIRT WORK
------
Pettibone Jaw Crusher
------
Chieftan 1400
------
Mike's Money Pit
------
Crusher
------


### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [142]:
operators = doc.find_all('tr')

for operator in operators[7:]:
    name = operator.find_all('td')[0]
    print(name.text)
    print('------')



3503598

------


4801789

------


5001797

------


4608254

------


2103723

------


4104757

------


0801306

------


3901432

------


3609624

------


3609931

------


1519799

------


4407296

------


4407270

------


0203332

------


2901986

------


4300768

------


4300776

------


2302283

------


2103518

------
Total Number of Mines Found:  19
------


## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

In [155]:
mine_information = []

for operator in operators[7:26]:
    current = {}
    ID = operator.find_all('td')[0]
    Name = operator.find_all('td')[1]
    State = operator.find_all('td')[2]
    Mine_type = operator.find_all('td')[3]
    Coal_or_metal = operator.find_all('td')[4]
    Status = operator.find_all('td')[5]
    Commodity = operator.find_all('td')[6]
    current['ID'] = ID.text.strip()
    current['Name'] = name.text.strip()
    current['State'] = State.text.strip()
    current['Mine_type'] = Mine_type.text.strip()
    current['Coal_or_metal'] = Coal_or_metal.text.strip()
    current['Status'] = Status.text.strip()
    current['Commodity'] = Commodity.text.strip()
    
    mine_information.append(current)   
print(mine_information)

[{'ID': '3503598', 'Name': 'Crusher', 'State': 'Newberg Rock & Dirt', 'Mine_type': 'Newberg Rock & Dirt', 'Coal_or_metal': 'Surface', 'Status': 'M', 'Commodity': 'Active'}, {'ID': '4801789', 'Name': 'Crusher', 'State': 'AM Dirtworks & Aggregate Sales', 'Mine_type': 'AM Dirtworks & Aggregate Sales', 'Coal_or_metal': 'Surface', 'Status': 'M', 'Commodity': 'Intermittent'}, {'ID': '5001797', 'Name': 'Crusher', 'State': 'Dirt Company', 'Mine_type': 'Bush Pilot', 'Coal_or_metal': 'Surface', 'Status': 'M', 'Commodity': 'Intermittent'}, {'ID': '4608254', 'Name': 'Crusher', 'State': 'Dirt Con', 'Mine_type': 'Hog Lick Quarry', 'Coal_or_metal': 'Surface', 'Status': 'M', 'Commodity': 'Temporarily Idled'}, {'ID': '2103723', 'Name': 'Crusher', 'State': 'Dirt Doctor Inc', 'Mine_type': 'Rock Lake Plant', 'Coal_or_metal': 'Surface', 'Status': 'M', 'Commodity': 'Intermittent'}, {'ID': '4104757', 'Name': 'Crusher', 'State': 'Dirt Works', 'Mine_type': 'Portable #1', 'Coal_or_metal': 'Surface', 'Status': '

### Save that to a CSV

In [160]:
import pandas as pd
df = pd.DataFrame(mine_information)
df.to_csv('mine_information.csv')

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [159]:
df.head()

Unnamed: 0,Coal_or_metal,Commodity,ID,Mine_type,Name,State,Status
0,Surface,Active,3503598,Newberg Rock & Dirt,Crusher,Newberg Rock & Dirt,M
1,Surface,Intermittent,4801789,AM Dirtworks & Aggregate Sales,Crusher,AM Dirtworks & Aggregate Sales,M
2,Surface,Intermittent,5001797,Bush Pilot,Crusher,Dirt Company,M
3,Surface,Temporarily Idled,4608254,Hog Lick Quarry,Crusher,Dirt Con,M
4,Surface,Intermittent,2103723,Rock Lake Plant,Crusher,Dirt Doctor Inc,M
