<a href="https://colab.research.google.com/github/Kwanikaze/BikeShareTorontoGentrification/blob/master/TES_Census.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


The direct colab link to this notebook is: https://colab.research.google.com/github/D3Mlab/ppandas/blob/experiments/use_case_examples/2014_Toronto_Mayoral_Election_CoLab.ipynb

# Package Imports

In [0]:
!pip install -i https://test.pypi.org/simple/ PPandas==0.0.1.6.9

Looking in indexes: https://test.pypi.org/simple/
Collecting PPandas==0.0.1.6.9
  Downloading https://test-files.pythonhosted.org/packages/38/4d/39a27346c88a4c7c76f12bf346f839b9037f86b0009ae516acd9322c562a/PPandas-0.0.1.6.9-py3-none-any.whl
Installing collected packages: PPandas
Successfully installed PPandas-0.0.1.6.9


In [0]:
!pip install pgmpy==0.1.9
!pip install networkx==2.4
!pip install matplotlib
!pip install python-intervals
!pip install geopandas
!pip install geovoronoi

Collecting pgmpy==0.1.9
[?25l  Downloading https://files.pythonhosted.org/packages/5a/b1/18dfdfcb10dcce71fd39f8c6801407e9aebd953939682558a5317e4a021c/pgmpy-0.1.9-py3-none-any.whl (331kB)
[K     |████████████████████████████████| 337kB 3.2MB/s 
[?25hInstalling collected packages: pgmpy
Successfully installed pgmpy-0.1.9
Collecting python-intervals
  Downloading https://files.pythonhosted.org/packages/4e/51/b29570d4a820610be14d232aec77e6f0c66bca3d400f4903e98cc00012cb/python_intervals-1.10.0.post1-py2.py3-none-any.whl
Installing collected packages: python-intervals
Successfully installed python-intervals-1.10.0.post1
Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/83/c5/3cf9cdc39a6f2552922f79915f36b45a95b71fd343cfc51170a5b6ddb6e8/geopandas-0.7.0-py2.py3-none-any.whl (928kB)
[K     |████████████████████████████████| 931kB 3.2MB/s 
Collecting pyproj>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/e5/c3/071e080230ac4b6c64f1a2e2f9161c973

In [0]:
import pandas as pd
import numpy as np
from ppandas import PDataFrame

# Data Processing

### Toronto Election Study (TES)


Create a dataframe using the TES dataset columns of Age, Ward, and mayoral candidate preference. The TES dataset can be found [here](http://www.torontoelectionstudy.com/data).

In [0]:
TES_df  = pd.read_stata('Toronto+Election+Study.dta')[['AGE','CPS5','CPS9']]
TES_df.head()

Unnamed: 0,AGE,CPS5,CPS9
0,68.0,Ward 23 Willowdale (Current Councillor: John F...,John Tory
1,65.0,Ward 13 Parkdale-High Park (Current Councillor...,John Tory
2,65.0,Ward 35 Scarborough Southwest (Current Council...,Don't know or haven't decided
3,68.0,Ward 13 Parkdale-High Park (Current Councillor...,John Tory
4,49.0,Ward 9 York Centre (Current Councillor: Maria ...,Other


There are 3000 respondents in the TES dataset

In [0]:
TES_df.describe()

Unnamed: 0,AGE
count,3000.0
mean,50.006
std,15.4017
min,18.0
25%,37.0
50%,50.0
75%,62.0
max,114.0


Convert the Age attribute into bins: (17,23],(23,28),...(63,114]


In [0]:
# Age
age_values = [17,23,28,33,38,43,48,53,58,63,114]
TES_df['AGE'] = pd.cut(TES_df['AGE'],age_values)
TES_df.astype({'AGE': str})

# Ward
TES_df.CPS5 = TES_df['CPS5'].str.replace(r"[\D]",'')
TES_df.rename({'CPS5':'WARD44'},axis=1, inplace=True)
TES_df = TES_df[TES_df['WARD44'] !='']

# Candidate Vote - before election who they were leaning towards
mapDict = {'Doug Ford':'Doug Ford', 'Olivia Chow':'Olivia Chow', 'John Tory':'John Tory', 'Other':'Other', "Don't know or haven't decided": 'Unknown'}
TES_df.CPS9=TES_df.CPS9.map(mapDict)
TES_df.rename({'CPS9':'VOTE'},axis=1, inplace=True)
TES_df = TES_df[TES_df['VOTE'] !='Unknown']

### Census

In [0]:
#Marginal distribution of Age
age_marginal_df = pd.read_csv('census2011_age.csv')
age_marginal_df.head()

In [0]:
#Marginal distribution of Ward
ward_marginal_df = pd.read_csv('census2011_ward.csv')
ward_marginal_df.head()

# ppandas Analysis

In [0]:
#Create TES PDataFrame
indep_vars = ['AGE','WARD44']
all_vars = indep_vars + ['VOTE']
TES_pdf = PDataFrame(indep_vars, TES_df[all_vars])

In [0]:
# Read in aggregate-level census data
age_marginal_pdf = PDataFrame.from_populational_data(["AGE"],age_marginal_df,2615090)
ward_marginal_pdf = PDataFrame.from_populational_data(["WARD44"],ward_marginal_df,2615090)

In [0]:
#Replace TES marginal distributions of Age and Ward with census distributions
join_pdf = age_marginal_pdf.pjoin(TES_pdf,mismatches={"AGE":'numerical'})
join_pdf = ward_marginal_pdf.pjoin(join_pdf)
join_pdf.visualise()

In [0]:
#Query VOTE
print('ppandas 2011 Census + TES(n = {}):'.format(join_pdf.num_of_records))
queryResults= join_pdf.query(['VOTE'])
print(queryResults)