<img src="support_files/images/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Python Bootcamp</h1> 
<h3 align="center">August 20-21, 2022</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h1>Exercise: Pandas, Matplotlib, Numpy</h1></left>
<p>
**Seattle tracks bike crossings across the Fremont Bridge, one of the major north/south crossings of the Ship Canal, and makes data available online**
</p>
<p>
This exercise uses that data to demonstrate some basic Pandas functionality, including:
<ul style="list-style-type:disc">
  <li>Sorting data</li>
  <li>Working with datetime objects</li>
  <li>Using Pandas built-in plotting methods</li>
  <li>Continued practice with Matplotlib to generate custom plots</li>
</ul>
</p>
</div>


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>We'll need the following libraries</h2></left>

<ul style="list-style-type:disc">
  <li>numpy (import as np)</li>
  <li>pandas (import as pd)</li>
  <li>matplotlib.pyplot (import as plt)</li>
</ul>

<p>
And don't forget to turn on the inline (or notebook) plotting magic
</p>

</div>


In [None]:
# Import packages

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>Download and open the data, then do some initial formatting</h2></left>

<p>Data is from October 2012 to the end of the last month  

<p>get the data using the read_csv method from the following URL (web connection required):  
https://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD

</div>

In [None]:
# Read the CSV from the above link

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Take a look at the first few columns using the .head() method

</div>

In [None]:
# Display the head

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Shorten the column names to make them easier to reference

</div>

In [None]:
#rename data columns 'northbound' and 'southbound'

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Add a column containing the total crossings for each hour

</div>

In [None]:
df['total'] =  #add a total column

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Take a look at the beginning and end of the dataset. How many total entries are in the table?

</div>

In [None]:
#display the head again

In [None]:
#display the tail

In [None]:
#print the length


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>Take advantage of Pandas datetime functionlity to make filtering easy</h2></left>
<p>Take a look at one of the date entries, what is it's data type?

</div>

In [None]:
#print the type of one entry



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>We need to convert it to a datetime object, which Pandas can then recognize for easy parsing by date

</div>

In [None]:
# look up the pd.to_datetime() method

In [None]:
# look at the head again, how have the dates changed?

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>Now plot the total column vs. date</h2></left>
<p>Notice how easily Pandas deals with the date column. It automatically parses and labels the x-axis in a rational way.


</div>

In [None]:
#use the df.plot() method with x being date and y being total


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>To make parsing by date easier, add some columns that explicitly list year, month, hour, day of week</h2></left>
<p>Pandas recently added the handy dt accessor, which makes this very easy:  

<p>http://pandas.pydata.org/pandas-docs/version/0.15.0/basics.html#dt-accessor


</div>

In [None]:
# make new columns for year, month, hour, and day of week. Here's how to make the year column:
df['year']=df['Date'].dt.year


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>What is the most common hourly count?</h2></left>
<p>Make a histogram of hourly counts


</div>

In [None]:
#make a histogram of the values in the total column


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>Find the busiest month for total crossings</h2></left>
<p>One approach is to use nested for-loops to search over all combinations of unique years and months, checking against the maximum value on each iteration


</div>

In [None]:
#try writing a for-loop to do this. But don't try too hard - there's a one-line way of doing this instead!



<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Another approach is to use the Pandas "groupby" method


</div>

In [None]:
#Instead of a for-loop, you can use the 'groupby' method, sorting by year and month

In [None]:
#print the maximum month from the grouped dataframe


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>Make a bar plot showing crossings for each month</h2></left>
<p>Start with the "groupby" method


</div>

In [None]:
#using the grouped dataframe, make a bar plot with the total crossings for each month


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>To gain a bit more control over the plot, make a temporary dataframe called "monthdf" that contains only the data we're interested in plotting

</div>

In [None]:
monthdf = pd.DataFrame(columns=('month', 'year', 'total'))
for year in df.year.unique():
    for month in df.month.unique():
        monthdf = monthdf.append(pd.DataFrame({'month':[month],
                                               'year':[year],
                                               'total':[df[(df.month==month) & (df.year==year)].total.sum()]}))


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Now make another version of the plot where months are grouped and color coded by year
</div>

In [None]:
# Make the plot here


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<left><h2>Make a bar plot showing crossings by day of week, seperated by year</h2></left>
<p>Again, make a temporary dataframe containing only the data we need for the plot

<p>Make sure to normalize the sum by the total number of days in each year!
</div>

In [None]:
#Try making another intermediate dataframe that contains data sorted by day


<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>Make a bar plot where days of week are grouped and color coded by year.  
<p>Again, make a temporary dataframe containing only the data we need for the plot


</div>

In [None]:
# make a similar plot below