# Mine Safety

We're interested in [US mine safety](https://arlweb.msha.gov/drs/drshome.htm#MID), thank goodness we can search for these things.

## Preparation: Knowing your tags

These questions are the same for every data set, and might not work exactly for yours.

**Search for every operator with 'dirt' in their name, including abandoned mines.**

### What is the tag and class name for every row of data?

In [15]:
tr

### What is the tag and class name for every mine operator's name?

In [12]:
td

NameError: name 'td' is not defined

### What is the tag and class name for every mine's name?

In [None]:
td

## Being lazy

If you only needed these results, what would you do instead of scraping them?

In [None]:
look at them!

## Setup: Import what you'll need to scrape the page

Use `requests`, not `urllib`.

In [25]:
import requests
from bs4 import BeautifulSoup


In [23]:
data= {
    'OperSearch':'dirt',
    'Abandoned':'No',
    'MineName':'',
    'StateSearch':'None',
    'CM':'All',
    'x':28,
    'y':4,
    'MC':'Opersearch'
      }

In [24]:
response= requests.post('https://arlweb.msha.gov/drs/ASP/OprNameStatesearch.asp', data= data)
doc= BeautifulSoup(response.text, 'html.parser')

## Try to scrape the page

To test if you requested the page correctly, save the BeautifulSoup document as `doc` and run the code `doc.find_all('tr')[-1].text` to get the text of the last `<tr>` element.

- If the result starts with **Total Number of Mines Found**, you were successful.

In [26]:
doc.find_all('tr')[-1].text

'\nTotal Number of Mines Found:\xa0\xa019'

## Actually scraping

### Hopefully you know that each `tr` is supposed to be your data. What is the index of the first row element that is actually a result?

`.text` will help you here.

In [42]:
doc.find_all('tr')[0].text

'\n\nOperator Name or Mine Name Search\n\xa0'

### Loop through each operator result, printing its name

Use LIST SLICING to skip the non-data row(s).

In [63]:
#body= doc.find('body)
#body

rows= doc.find_all('tr')
for item in rows[7:26]:
    operator = item.find_all('td')[2]
    print(operator.text)
#doc.find_all('tr')[0].find_all('td')[2]

 Newberg Rock & Dirt  
AM Dirtworks & Aggregate Sales  
Dirt Company  
Dirt Con  
Dirt Doctor Inc  
Dirt Works  
Holley Dirt Company, Inc  
Krueger Brothers Gravel & Dirt  
M R Dirt  
M.R. Dirt Inc.  
P B Dirt Movers, Inc  
PB Dirt Movers  
PB Dirt Movers, Inc  
Prescott Dirt, LLC  
R D Blankenship Dirt Work LLC  
SIMPSON DIRTWORX LLC  
SIMPSON DIRTWORX LLC  
Spry's Dirt & Gravel, Inc.  
Vogt Dirt Service  


In [57]:
doc.find_all('tr')

[<tr>
 <td width="30%"><a href="/drs/drshome.htm"><img alt="Mine Data Retrieval System" border="0" height="75" src="/drs/images/drslogo.png" width="300"/></a></td>
 <th width="40%"><font style="FONT-SIZE:1.20em;">Operator Name or Mine Name<br/> Search</font></th>
 <td width="30%"> </td></tr>, <tr>
 <td valign="top" width="50%">
 <table width="100%">
 <tr>
 <td><font style="FONT-SIZE:.80em;"><b>Abandoned*</b></font></td></tr>
 <tr>
 <td valign="top" width="95%"><font style="FONT-SIZE:.75em;">Indicates Mine is Abandoned and Sealed</font></td></tr></table></td>
 <td align="right" valign="top" width="50%">
 <table align="right" width="100%">
 <tr>
 <td align="right" colspan="2"><font style="FONT-SIZE:.80em;"><b>*CM (Coal or Metal Mine/Nonmetal Mine)</b></font></td></tr>
 <tr>
 <td align="right" width="46%"><font style="FONT-SIZE:.80em;">C<br/>M</font></td>
 <td width="54%"><font style="FONT-SIZE:.80em;">...... Coal<br/>...... Metal/Nonmetal</font></td></tr>
 </table></td></tr>, <tr>
 <td><

### Loop through each operator result, printing its ID

There should be ONE code per row, and NO empty rows between them.

In [64]:
rows= doc.find_all('tr')
for item in rows[7:26]:
    ID = item.find_all('td')[0]
    print(ID.text)



3503598



4801789



5001797



4608254



2103723



4104757



0801306



3901432



3609624



3609931



1519799



4407296



4407270



0203332



2901986



4300768



4300776



2302283



2103518



## Saving the results

### Loop through each `tr` to create a list of dictionaries

Each dictionary must contain

- Operator ID
- Operator name
- Mine name
- State
- Mine type
- Coal or metal
- Status
- Commodity

Create a new dictionary for each row.

In [None]:
doc2_=BeautifulSoup(response_.text, 'html.parser')
doc2_
products = doc.find_all('td')
products

In [85]:
info= []
for item2 in rows[7:26]:
    current= {
        
    'ID': item.find_all('td')[0],
    'State': item.find_all('td')[1],
    'operator': item.find_all('td')[2],
    'mine name': item.find_all('td')[3],
    'CM': item.find_all('td')[4],
    'Status': item.find_all('td')[5],
    'Commodity': item.find_all('td')[6],
        
    current['ID']:ID.text,
    current['State']:state.text,
    current['operator']:operator.text,
    current['mine name']:minename.text,
    current['CM']:CM.text,
    current['Status']:Status.text,
    current['Commodity']:Commodity.text,
    
    }
print(info)

# info.append(current)

KeyError: 'State'

### Save that to a CSV

In [None]:
import pandas as pd
#use csv for list of dictionaries

### Open the CSV file and examine the first few. Make sure you didn't save an extra weird unnamed column.

In [None]:
df = pd.DataFrame(doc)
df.to_csv("mines.csv" , index = False)
df