In order to successfully complete this assignment you need to participate both individually and in groups during class on **Wednesday January 29**.

# In-Class Assignment: Instructor template

We will be using some of the extensive datasets available at the National Oceanic and Atmospheric Administration (NOAA). 

<a href="http://www.noaa.gov/"><img width=200 align='center' src="http://www.nssl.noaa.gov/projects/debrisflow09/NOAA%20Circle.gif"></a>

Image From: https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/us-climate-reference-network-uscrn

### Agenda for today's class (80 minutes)

</p>




1. [(20 minutes) Review pre-class assignment](#Review_pre-class_assignment)
2. [(25 minutes) NOAA Example](#NOAA_Example)
1. [(10 minutes) Installing Beautiful Soup](#Installing_Beautiful_Soup)
2. [(25 minutes) Presidential data example](#Presidential_data_example)

----
<a name="Review_pre-class_assignment"></a>

# 1. Review pre-class Assignment

- [0128--Web_Scraping-pre-class-assignment](0128--Web_Scraping-pre-class-assignment.ipynb)

----
<a name="NOAA_Example"></a>

# 2. NOAA Example and Coding Standards.

In the course git repository there is now a ```noaa_scrapper.py``` file.  Load the file via the following command:

In [None]:
%matplotlib inline 
import matplotlib.pyplot as plt

from noaa_scraper import get_noaa_temperatures

&#9989; <font color=red>**DO THIS:**</font> Please run the ```get_noaa_temperatures``` as follows:

In [None]:
air_temperatures = get_noaa_temperatures('http://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/', 'Gaylord', 100)
plt.plot(air_temperatures)
# plt.axis([0,1000,-20,80])

&#9989; <font color=red>**DO THIS:**</font> With your group, do a code review of the contents of the **noaa_scraper.py** file and figure out what it does. What are the main part of this module and what do they do? Be prepared to discuss this with the class. 

Put your notes here

&#9989; <font color=red>**DO THIS:**</font> Let's look more closely at the code.  Run the folloing ```pylint``` command and figure out what the output is telling you (Hint: use the Internet)

In [None]:
!pylint noaa_scraper

&#9989; <font color=red>**DO THIS:**</font> Install and run ```autopep8``` library. There are two options depending on your setup:

Install ```autopep8```globally or in an environment:

In [None]:
#!pip install autopep8

In [None]:
#autopep8 noaa_scraper.py > noaa_scraper2.py

Install ```autopep8``` in a local packages folder:

In [None]:
#!pip install -t packages autopep8

In [None]:
#Use this if autopep8 was installed locally
#python ./packages/autopep8.py noaa_scraper.py  > noaa_scraper2.py

In [None]:
!pylint noaa_scraper2

&#9989; <font color=red>**DO THIS:**</font> In your group discuss the ```pylint``` and ```autopep8``` commands.  We will be using these commands in an upcoming Project assignment. What do they do? Is this useful or does it just add more work?  Be prepared to discuss your answers with the rest of class. 

Put your discussion notes here. 

----
<a name="Installing_Beautiful_Soup"></a>

# 3. Installing Beautiful Soup

For this class we will be trying out [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/), a Python web parsing module. 

&#9989; <font color=red>**DO THIS:**</font> Install the ```beautifulsoup4``` library on your computer (the following will work on jupyterhub but should work anywhere).  When you are done, help your neighbor and raise your hand if you need help.

In [None]:
#!mkdir packages

In [None]:
#!pip install -t ./packages/ beautifulsoup4

In [None]:
import sys
sys.path.append('./packages/')

----
<a name="Presidential_data_example"></a>
# 4. Presidential data example

Found this idea by reading the following blog post: https://blog.exploratory.io/scraping-us-presidents-list-from-web-and-transforming-it-to-be-useful-fff534470bb6

&#9989; <font color=red>**DO THIS:**</font> Click on the following link and review the page source with your neighbor.  Discuss which tags you need to look for to try and isolate the table data only.  Ideally we want to create a ```pandas table``` of this data:
https://www.loc.gov/rr/print/list/057_chron.html


Put notes on what you find here.

## Download and view html

The following code should download the above website and parse read it into a ```beautifulsoup``` object:

In [None]:
#The following library downloads the data and stores it in a page variable
import requests
page = requests.get("https://www.loc.gov/rr/print/list/057_chron.html")

In [None]:
#Import and run beautifule should html.parser
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

&#9989; <font color=red>**DO THIS:**</font> explore the ```soup``` variable using python functions such as; ```type```, ```dir``` and ```help```.


In [None]:
#Put your answer to the above here

In [None]:
#Print out the raw html using "pretty print" 
print(soup.prettify())

## Find the Tables

Next, the following code finds all of the ```table``` sections in the website:

In [None]:
tables = soup.find_all('table')

In [None]:
type(tables)

In [None]:
len(tables)

According to the above the results show that there are 9 ```table``` objects in the document.  We are just looking for the one that has our data in it. 



&#9989; <font color=red>**DO THIS:**</font> Find the table from the nine tables that has only the data we want. Make a variable ```table``` that only includes the information we want. Hint, it is not the first table which we can see by using the following code. 

In [None]:
table = tables[0]
print(table.prettify())

## Parse out all the rows

The rows of a table are determined by the ```tr``` (table row) tag and the columns are determined by the ```td```. The following code can find all of the rows in the table:

In [None]:
rows = table.find_all('tr')
rows

## Get the column labels

The first row is the column header row as can be seen by running the following code:

In [None]:
rows[0]

In [None]:
labels = []
for c in rows[0].find_all('th'):
    labels.append(c.get_text())
labels

## Parse Rows

&#9989; <font color=red>**DO THIS:**</font> The next step is to loop though the remaining rows and save the data as a list of lists

In [None]:
#put your code here

## Convert list of list to Pandas Dataframe

Assuming the above works, we can convert the list of lists and labels to a Pandas Dataframe

In [None]:
import pandas as pd  
    
# Create the pandas DataFrame  
df = pd.DataFrame(data, columns=labels)  

In [None]:
df

-----
### Congratulations, we're done!

### Course Resources:

- [Syllabus](https://docs.google.com/document/d/e/2PACX-1vTW4OzeUNhsuG_zvh06MT4r1tguxLFXGFCiMVN49XJJRYfekb7E6LyfGLP5tyLcHqcUNJjH2Vk-Isd8/pub)
- [Preliminary Schedule](https://docs.google.com/spreadsheets/d/e/2PACX-1vRsQcyH1nlbSD4x7zvHWAbAcLrGWRo_RqeFyt2loQPgt3MxirrI5ADVFW9IoeLGSBSu_Uo6e8BE4IQc/pubhtml?gid=2142090757&single=true)
- [D2L Page](https://d2l.msu.edu/d2l/home/912152)
- [Git Repository](https://gitlab.msu.edu/colbrydi/cmse802-s20)

&#169; Copyright 2020,  Michigan State University Board of Trustees