<h1 align="center"> Using Simple Line Charts to Visualize Change   

### Contents  
   
#### Introduction
[Introduction](#intro)  

#### Obtain Population Data
[Import Basic Libraries](#libs)  
[Collect Population Data from Quandl](#CPI)  
   
#### Visualize Change Methods
[Plot Actual Values](#plot)  
[Subplot Actual Values](#subplot)  
[Use Dual Axes](#dualaxes)  
[Using Stats Models for Ordinary Least Squares Regression](#stats)  
[Using Sci-Kit Learn for Ordinary Least Squares Regression](#scikit)  
   
#### Conculsion
[Conclusion](#conclusion)  


<a id='libs'></a>
### Import Basic Libraries

In [2]:
%matplotlib inline
import pandas as pd
import Quandl as qd
import warnings
warnings.filterwarnings('ignore')

#Nick's Quandl Auth token
auth = '9zjPBpsaLGqS-KPGzvyn'

In [3]:
cd ..

C:\Users\Nick\Documents\Data_Analytics


In [4]:
cd blog

C:\Users\Nick\Documents\Data_Analytics\blog


#### What is Quandl?

Quandl is an online data warehouse which has millions of public datasets. Quandl's API is set up to pull data directly into a Pandas dataframe, and it automatically sets the date as the index.  For more info on using Quandl with Python, visit: https://www.quandl.com/help/python   
   
Quandl houses the world bank's public data. The north_america_codes.json file contains all of the total population data for each country in North America, including Central America and the Caribbean. 


In [5]:
df_codes = pd.read_json('north_america_codes.json')

In [6]:
df_codes.head()

Unnamed: 0,code,country
0,WORLDBANK/USA_SP_POP_TOTL,USA
1,WORLDBANK/CAN_SP_POP_TOTL,Canada
10,WORLDBANK/HTI_SP_POP_TOTL,Haiti
11,WORLDBANK/JAM_SP_POP_TOTL,Jamaica
12,WORLDBANK/KNA_SP_POP_TOTL,Saint Kitts and Nevis


Using the Quandl API, I loop through each country to pull population data. As each country's data is pulled, it is concatenated into a single Pandas DataFrame (df).

In [16]:
df = pd.DataFrame

for x in df_codes.code:
    df_temp = ''
    df_temp = qd.get(x,authtoken=auth)
    df_temp.rename(columns={'Value': x[10:13]}, inplace=True)
    
    if df.empty:
        df = df_temp
    else:
        df = pd.concat([df, df_temp],axis=1)
        
df.columns = [x.lower() for x in df.columns]

I then calculate the total for North America. For the purpose of this analysis, we are going to compare USA, Mexico, and Canada in addition to the North American total. The DataFrame is then limited to just these four columns.

In [17]:
df.insert(0,'north america',df.sum(axis=1))
df = df[['north america', 'usa', 'mex', 'can']]
df.tail(5)

Unnamed: 0_level_0,north america,usa,mex,can
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-12-31,540355247,309347057,117886404,34005274
2011-12-31,545608318,311721632,119361233,34342780
2012-12-31,550985148,314112078,120847477,34754312
2013-12-31,556361769,316497531,122332399,35158304
2014-12-31,561674093,318857056,123799215,35540419


### Visualize That Data

In [18]:
import plotly.offline as py
py.init_notebook_mode() 
import cufflinks as cf
cf.go_offline()

layout = go.Layout(
    xaxis=dict(
        autotick=False,
        ticks='inside',
        tick0=0,
        dtick=0.25,
        ticklen=8,
        tickwidth=4,
        tickcolor='#000'
    ),
    yaxis=dict(
        autotick=False,
        ticks='inside',
        tick0=0,
        dtick=0.25,
        ticklen=8,
        tickwidth=4,
        tickcolor='#000'
    ),
    width = 2.5
)

In [19]:
colors = ['orange', 'blue', 'green', 'red']
dims = (800,500)
width = 2.5

In [20]:
title = """north america population"""
fig = df.iplot(theme='white',dimensions=dims,colors=colors,title=title,width=width, asFigure=True )
py.iplot(fig)

In [11]:
title = """North America Population"""
fig = df.iplot(subplots = True,theme='white',dimensions=dims,colors=colors,title=title,width=width, asFigure=True )
py.iplot(fig)

In [12]:
title = 'North America Population'
df.iplot(theme='white',dimensions=dims,colors=colors,title=title, \
        secondary_y =['mex','can'],legend = True, width=width )

In [13]:
df_diff = df.diff()
title = """Annual Change in North American Population
"""
df_diff.iplot(theme='white',dimensions=dims,colors=colors,title=title,width=width)

In [14]:
df_pct_change = df.pct_change() * 100
title = """Annual Percent Change in North American Population"""
df_pct_change.iplot(theme='white',dimensions=dims,colors=colors,title=title,width=width)

In [15]:
x = df[df.index == df.index.min()].squeeze()
df_1960 = 100 + ((df - x) / x) * 100
x

north_america    268076376
usa              180671000
mex               38676974
can               17909009
Name: 1960-12-31 00:00:00, dtype: float64

In [16]:
df_1960.head()

Unnamed: 0_level_0,north_america,usa,mex,can
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1960-12-31,100.0,100.0,100.0,100.0
1961-12-31,102.029384,101.671547,103.263691,102.021279
1962-12-31,104.012274,103.247339,106.612141,103.936516
1963-12-31,105.966402,104.743982,110.050073,105.89084
1964-12-31,107.920963,106.209076,113.585406,107.906585


In [17]:
title = """North American Population Index (1960=100)"""
df_1960.iplot(theme='white',dimensions=dims,colors=colors,title=title,width=width)

<a id='conclusion'></a>
### Conclusion

While unfortunately our model of simple change in inflation rate turned out not to be a great predictor of unemployment rate, I hope you found this to be a useful introduction of Python for Economic Data Analysis! 