# Using Open Data with Python

(This content was adapted from [intro.syzygy.ca](http://intro.syzygy.ca/python-for-computing/) and content developed by Dr. Michael Lamoureux.)

This chapter focuses on how to use Python within a Jupyter notebook. Doing simple calculations in Python is very straightforward. However, once you try to do somethig complex, there are a few tricks to learn. In particular, how to get plots to appear in the notebook, how to do animations, and a few other niceties.

# A simple demo using financial data in Python
This notebook is based on course notes from Lamoureux’s course Math 651 at the University of Calgary, Winter 2016.

This was an exercise to try out some resourse in Python. Specifically, we want to scrape some data from the web concerning stock prices, and display in a Panda. Then do some basic data analysis on the information.

We take advantage of the fact that there is a lot of financial data freely accessible on the web, and lots of people post information about how to use it.

## Plotting in a Python notebook

Three things before you can plot.

- tell Jupyter that you want plots to appear inline

- load in numerical Python so you can deal with numerical arrays

- load in PyPlot to do your plotting.

In [None]:
# Get some basic tools
%pylab inline
# Note: Pylab also takes advantage of numpy and pyplot
from pandas import Series, DataFrame
import pandas as pd
import pandas_datareader.data as web
import datetime



## Accessing financial data

For free, historical data on commodities like oil and stock prices, you can try this site: http://www.databank.rbs.com This site will download data directly into spreadsheets for you, or in this case, a Jupyter notebook.

In [None]:
#web.get_quote_google('GOOG')
end = datetime.date.today()
G= web.DataReader("GOOG", 'google', '2001-01-01', end)
M = web.DataReader("MSFT", 'google', '2001-01-01', end)
A = web.DataReader("AAPL", 'google', '2001-01-01', end)


The commands above downloaded three datasets into the Jupyter notebook - stock data from Google, Microsoft, and Apple. You can look at the data by simply typing G, M, or A and running the cell (not done here).

Often a good way to get an overview of the data is to plot it visually and take a look. Here we're plotting all three in one plot. 

In [None]:
subplot(3,1,1)
plot(M.iloc[:,3]) #iloc specifies the index location - all rows and column 3.
title('Microsoft')
pylab.tick_params(labelbottom='off') 
pylab.xlabel('')
subplot(3,1,2)
plot(G.iloc[:,3])
title('Google')
pylab.tick_params(labelbottom='off') 
pylab.xlabel('')
subplot(3,1,3)
plot(A.iloc[:,3])
title('Apple')
pylab.tight_layout()

Let's calculate and plot the changes in the stock prices, normalized as a percentage. 

In [None]:
#Calculate the data
A_rets = A.iloc[:,3].pct_change()
M_rets = M.iloc[:,3].pct_change()
G_rets = G.iloc[:,3].pct_change()

#Plot the % change
subplot(3,1,1)
plot(M_rets)
pylab.xlim(['2001-01-01', end])
subplot(3,1,2)
plot(G_rets)
pylab.xlim(['2001-01-01', end])
subplot(3,1,3)
plot(A_rets)
pylab.xlim(['2001-01-01', end])
pylab.tight_layout()

Next, let's calculate and visualize the correlation between a couple of the datasets. 

In [None]:
# First, Apple and Microsoft: 
pd.rolling_corr(A_rets, M_rets, 250).plot()
title('Correlation between Apple and Microsoft stock prices')

In [None]:
pd.rolling_corr(M_rets, G_rets, 250).plot()
title('Correlation between Google and Microsoft stock prices')

## Getting fancy.

Now, we can use some more sophisticated statistical tools, like least squares regression. 

However, I had to do some work to get Python to recognize these items. But I didn’t work too hard, I just followed the error messages. Having to install additional packages can be quite common in your Jupyter notebook. 

If a package you need is not installed globally on the JupyterHub and available for all users, you can do the following in a terminal to install it on individual Jupyter container instances:

    pip install --user -U <package-name>
    
(The terminal is available in your home screen under New --> Terminal)

Or, using the magic command syntax, you could do this within the notebook:

    %%bash

    pip install --user -U <package-name>

The key is the --user flag, which forces it to install under a ‘.local’ directory in your account. Without this flag, you would encounter permission errors when trying to install. 

In [None]:
# We may also try a least square regression, also built in as a pandas function
model = pd.ols(y=M_rets, x={'G': G_rets},window=256)

In [None]:
model.beta

In [None]:
# Plotting the least scares fit
model.beta['G'].plot()
title('Least squares fit between Google and Microsoft stock prices')

The correlation plot and least squares plots look quite similar. Let's have a closer look. You can easily plot them side by side by using the subplot function and the same plotting lines you used above.

In [None]:

subplot(2,1,1)
pd.rolling_corr(M_rets, G_rets, 250).plot()
title('Rolling correlations')
pylab.tick_params(labelbottom='off') 
pylab.xlabel('')
pylab.xlim(['2001-01-01', end])
subplot(2,1,2)
model.beta['G'].plot()
pylab.xlim(['2001-01-01', end])
title('Least squaresn model')
pylab.tight_layout()

## Wrap-up

This was just a quick introduction to using one type of open data in Jupyter using python. There are an infinite number of additional analyses you could perform on this data or using other data. For example, are there any other stocks you would be interested in analyzing and comparing to the ones we looked at here? Or what entirely different datasets would you be interested in altogether? 

Check out some of the following data sources: 

- [Government of Canada Open Data Portal](http://open.canada.ca/en/working-datab)
- [Government of Alberta Open Data Portal](https://open.alberta.ca/interact/for-developers)
- [World Bank Data](https://blogs.worldbank.org/opendata/accessing-world-bank-data-apis-python-r-ruby-stata)