![alt](https://raw.githubusercontent.com/callysto/callysto-sample-notebooks/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg)

## A Glimpse Into the Future: Interactive Textbooks

Jupyter Notebooks are easy to maintain, keep current and evergreen. 

Multiple Jupyter Notebooks can be combined to create Interactive Textbooks.

For example, _The Foundations of Data Science_ class at UC Berkeley's [Interactive Textbook](https://ds8.gitbooks.io/textbook/content/).
https://ds8.gitbooks.io/textbook/content/

# NHL Data

This notebook highlights how to use and work with open data using Jupyter notebooks in comparison to a more traditional approach of using standard, desktop tools to perform an open data assignment. 

The goal of the exercise is to use current National Hockey League (NHL) results to determine whether a team is on pace for making the playoffs. 

## Traditional approach

### Tool 1
Traditionally, students would have had to go to a particular website to access the data:
    http://www.hockey-reference.com/teams/CGY/2018_games.html

<img src="images/cgy_standings.png" width="800px" />

### Tool 2
From there, they would have to manually copy and paste the data into a tool such as Microsoft Excel. 

<img src="images/cgy_excel.png" width="100%"/>

<img src="images/cgy_excel_graph.png" width = 80% />

### Tool 3
...that is then copied and pasted into Microsoft Word in order to write up a final report. 

<img src ="images/cgy_word.png" width = 80% />

In total, that means the students would need to use the following tools:
- a web browser
- Microsoft Excel or something like it
- Microsoft Word or something similar

The final product is usually a static snapshot in time. 

## Jupyter notebooks approach

Using Jupyter notebooks, the entire analysis can be done in one tool, requiring only a web browser. The end product is an interactive notebook that combines active code along with the explanatory narrative for how the analysis was conducted which can be interpreted by anyone. 

In [None]:
import urllib.request
import pandas as pd
from bs4 import BeautifulSoup
from argparse import ArgumentParser
import numpy as np

In [None]:
# Query the hockey-reference website for data
html1 = urllib.request.urlopen("https://www.hockey-reference.com/teams/CGY/2017_games.html").read()
html2 = urllib.request.urlopen("https://www.hockey-reference.com/teams/VAN/2017_games.html").read()
soup1 = BeautifulSoup(html1,"html5lib")
soup2 = BeautifulSoup(html2,"html5lib")

In [None]:
table1 = soup1.find_all('table')[0]
table_body1 = table1.find('tbody')
rows1 = table_body1.find_all('tr')
table2 = soup2.find_all('table')[0]
table_body2 = table2.find('tbody')
rows2 = table_body2.find_all('tr')

In [None]:
column_headers = [ch.getText() for ch in table1.find_all('tr')[0].find_all('th')]
#print(column_headers)

In [None]:
team1_data = [[td1.getText() for td1 in rows1[i].find_all(['th','td'])]
            for i in range(len(rows1))]
team2_data = [[td2.getText() for td2 in rows2[i].find_all(['th','td'])]
            for i in range(len(rows2))]

In [None]:
df1 = pd.DataFrame(team1_data, columns=column_headers)
df2 = pd.DataFrame(team2_data, columns=column_headers)

In [None]:
df1 = df1.drop(df1.index[[20,41,62,83]])
df2 = df2.drop(df2.index[[20,41,62,83]])

In [None]:
# Extracted and cleaned data from the hockey-reference website
print(df1)

In [None]:
cols = ['GP','W', 'OL']
df1_clean = df1[cols].apply(pd.to_numeric, errors='coerce')
df2_clean = df2[cols].apply(pd.to_numeric, errors='coerce')

In [None]:
df1_clean['Playoff_Pace']=df1_clean['GP']*96/82
df1_clean['CGY_Points']=df1_clean['W']*2 + df1_clean['OL']
df2_clean['VAN_Points']=df2_clean['W']*2 + df2_clean['OL']

In [None]:
df1_clean['VAN_Points']=df2_clean['VAN_Points']

In [None]:
# Data analysis of my two favorite hockey teams
df_combined=df1_clean
df_combined=df_combined.drop(['W','OL'],axis=1)
print(df_combined)

In [None]:
import matplotlib.pyplot as plt

In [None]:
# Calgary Flames Points Pace
plt.plot( 'GP', 'Playoff_Pace', data=df_combined, marker='', color='black', linewidth=4);
plt.plot( 'GP', 'CGY_Points', data=df_combined, marker='', color='red', linewidth=4, linestyle='dashed', label="CGY");
plt.xlabel('GP');
plt.ylabel('Points');

In [None]:
# Calgary Flames and Vancouver Canucks Points Pace
plt.plot( 'GP', 'Playoff_Pace', data=df_combined, marker='', color='black', linewidth=4);
plt.plot( 'GP', 'CGY_Points', data=df_combined, marker='', color='red', linewidth=4, linestyle='dashed', label="CGY");
plt.plot( 'GP', 'VAN_Points', data=df_combined, marker='', color='blue', linewidth=4, linestyle='dashed', label="VAN");
plt.xlabel('GP');
plt.ylabel('Points')
plt.legend();

### In conclusion, the Calgary Flames and Vancouver Canucks were on pace (at some point) to make the playoffs last year 😜 

I can save this analysis as a snapshot in time and I can also re-run this analysis next year in the _same_ Jupyter notebook to see how the results have changed.

![](https://raw.githubusercontent.com/callysto/callysto-sample-notebooks/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg)