<center><h1>PyData Indy - Jupyter Web Scraper - Wunderground</h1></center>
![Image of Wunderground Logo](http://cordcutting.com/images/kodi-images/weather-wunderground.png)

<b>INTRO:</b>

One of the most critical elements of data science begins simply with having access to information through a variety of modern file formats.  This excersise will help highlight a few examples of collecting data through web scraping and accessing these sources with the modern Python data stack.  Whether you find yourself working with sources like flat files, XML, or JSON data, tools demonstrated in this exercise will be a valuable asset to your efforts.  Our scope in this example will primarily use the Numpy, Pandas, & Matplotlib libraries for this exercise.
<br>
<br>

<center><a href="http://docs.scipy.org/doc/numpy-dev/dev"> Numpy Docs </a></center>
<center><a href="http://pandas.pydata.org/pandas-docs/stable"> Pandas Docs </a></center>
<center><a href="http://matplotlib.org/contents.html#"> Matplotlib Docs </a></center>


<b>ABOUT OUR DATA:</b>
<br><br>
<a href="http://docs.scipy.org/doc/numpy-dev/dev"> Wunderground</a> is a popular source for weather data.  This demo will utilize a few pre-built .py files to help us scrape historical web data by using (1) urllib.requests and (2) BeautifulSoup to fetch our data, parse the contents, and return it to a local CSV file for analysis and visualization.

To follow along with this example, we will be using "%run filename.py" as a convention for active code cells.  Our order of operation is:
<br><br>
<i>(1) %run wunderground_scraper.py (uses urllib.requets to dynamically fetch HTML)<br>
(2) %run wunderground_parser.py (uses BeautifulSoup to parse the stored HTML and return data as a local CSV)<br>
(3) Use Numpy, Pandas, and Matplotlib to explore the data we've collected.</i>

In [None]:
pwd

In [None]:
ls

In [None]:
#%run wunderground_scraper.py

In [None]:
#%run wunderground_parser.py

In [None]:
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
weather_data = pd.read_csv('KIND-FULL-YEAR.csv', parse_dates=['date'])

In [None]:
weather_data.shape

In [None]:
weather_data.set_index(['date'])

In [None]:
# Establish a 'day_order' as a given range for all parsed dates from our scrape.
weather_data_subset = weather_data[weather_data['date'] >= datetime(year=2015, month=3, day=1)]
weather_data_subset = weather_data_subset[weather_data_subset['date'] < datetime(year=2016, month=3, day=1)].copy()
weather_data_subset['day_order'] = range(len(weather_data_subset))

# Create derived data for our 'day_order' axis & obtain the values of individual fields.
day_order = weather_data_subset['day_order']
record_max_temps = weather_data_subset['record_max_temp'].values
record_min_temps = weather_data_subset['record_min_temp'].values
average_max_temps = weather_data_subset['average_max_temp'].values
average_min_temps = weather_data_subset['average_min_temp'].values
actual_max_temps = weather_data_subset['actual_max_temp'].values
actual_min_temps = weather_data_subset['actual_min_temp'].values


In [None]:
fig, axes = plt.subplots(figsize=(15, 7))

# Layer #1 - Red range of bars showing RECORD High and low
record_temp_plot = plt.bar(day_order, record_max_temps - record_min_temps, bottom=record_min_temps,
        edgecolor='none', color='#CD5C5C', width=1, label='Record')

# Layer #2 - Blue range of bars showing AVERAGE Highs and lows
average_temp_plot = plt.bar(day_order, average_max_temps - average_min_temps, bottom=average_min_temps,
        edgecolor='none', color='#5F9EA0', width=1, label='Average')

# Layer #3 - Grey range of bars showing ACTUAL Highs and Lows
actual_temp_plot = plt.bar(day_order, actual_max_temps - actual_min_temps, bottom=actual_min_temps,
        edgecolor='black', linewidth=0.5, color='#808080', width=1, label='Actual')

# Formatting the Limits for our X and Y axes.
plt.ylim(-15, 111)
plt.xlim(-5, 370)

# Scale & Label our Y axis for Temperature (Far.)
plt.yticks(range(-10, 111, 10), [r'{}$^\circ$'.format(x)
                                 for x in range(-10, 111, 10)], fontsize=10)
plt.ylabel(r'Temperature ($^\circ$F)', fontsize=12)

# Assign our X axis units of days of the year
month_beginning_df = weather_data_subset[weather_data_subset['date'].apply(lambda x: True if x.day == 1 else False)]
month_beginning_indeces = list(month_beginning_df['day_order'].values)
month_beginning_names = list(month_beginning_df['date'].apply(lambda x: x.strftime("%B")).values)
month_beginning_names[0] += '\n\'15'
month_beginning_names[10] += '\n\'16'

# Manual add the last month label.
month_beginning_indeces += [weather_data_subset['day_order'].values[-1]]
month_beginning_names += ['March']
plt.xticks(month_beginning_indeces,
           month_beginning_names,
           fontsize=10)

ax2 = ax1.twiny()
plt.xticks(month_beginning_indeces,
           month_beginning_names,
           fontsize=10)

plt.xlim(-5, 370)
plt.grid(False)

ax3 = ax1.twinx()
plt.yticks(range(-10, 111, 10), [r'{}$^\circ$'.format(x)
                                 for x in range(-10, 111, 10)], fontsize=10)
plt.ylim(-15, 111)
plt.grid(False)

#plt.legend()
#plt.title('Indianapolis, IN\'s weather, March 2015 - March 2016\n\n', fontsize=20)
