<a href="https://colab.research.google.com/github/DonnaVakalis/Urban.dat/blob/master/CensusDat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Analyze Toronto's Rooftop Solar Capacity

 

## Table of Contents
- [Introduction](#intro)
- [Part I - Setup ](#partone)
- [Part II - Calculations from scratch ](#parttwo)
- [Part III - Graphs](#partthree)


<a id='intro'></a>
### Introduction
 


<a id='partone'></a>
### Part I - Setup

In [None]:
# import libraries

!pip install statsmodels

import pandas as pd
import numpy as np
import requests
import os
from google.colab import drive
import matplotlib.pyplot as plt
import statsmodels as sm

%matplotlib inline

In [3]:
# Mount Google Drive
drive.mount('/content/gdrive')
os.chdir("/content/gdrive/My Drive/")

Mounted at /content/gdrive


In [11]:
# Download 2016 Census data from StatsCan

# Download 2016 place of work and place of residence sample set
# Manually downloaded and unzipped from here: "https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/CompDataDownload.cfm?LANG=E&PID=111332&OFT=CSV"
# file name is  

# Download 2016 population and population density full set
csv_url = "https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/CompFile.cfm?Lang=Eng&T=301&OFT=FULLCSV"
req = requests.get(csv_url)
url_content = req.content
csv_file = open('statcan_2016_census_subdivisions.csv', 'wb')
csv_file.write(url_content)
csv_file.close() 

# For now, manually put both files in same folder as this main script 
 

In [26]:
# Load csv files
base_dir =  "/content/gdrive/My Drive/Colab Notebooks/SSG/Census/"

# 2016 25% sample file with place of work and place of residence 
file = base_dir + "98-400-X2016325_English_CSV_data.csv"
df1 = pd.read_csv(file)

In [None]:
# 2016 full set of population and density for census subdivisions
file = base_dir + "rename1.csv"
df2= pd.read_csv(file, sep=",", encoding='Latin-1')  #note have issues if using uft-8 (default) encoding
df2.info()

Cleaning notes:
- df1: 
check rows with non-zero DATA_QUALITY_FLAG.1 see StatCan description:[https://www12.statcan.gc.ca/wds-sdw/cr2016geo-eng.cfm] and 
rename columns 10, 14 and 17

- df2: there are 5178 rows of census subdivisions but 5148 rows with population. Look at set of empty population and determine if we can throw out those rows.Check for completeness i.e., sensical entries even within remaining complete entries
 

Tidying Notes:
- Organize into three base dataframes: 1) Census ID dat 2) Place of Work tallies and 3) FUTURE df
 
- 1) One POW frame: name of census subdivisions, count of people for whom it is a place of work (will need to groub_by and generate new standalone frame)

- 2) One ID frame containing all subdivisions, with subdivisions and ID variables (like geocode, CSV types, ENglish and French names...)

- 3)NON-CENSUS sample of some census sub-divisions: One commercial and residential floor area (need more data): with commercial floor area of census subdivisions 

From these three, can create a merged frame with, for each Census Subdivision, ID, English Name, Population, Area, POW totals, Densities of Pop per area, and POW per area...and start to explore relationships from known 


In [60]:
# df1 Rename columns 8,10,12,14,17
old = df1.columns[[8,10,12,14,17]]
new = ['geo_code','geo_name','flag','alt_geo_code','num_workers']
df1.rename(columns = dict(zip(old,new)),inplace=True)

# examine subset where data quality has been flagged
flagged = df1[df1['flag'] != 0][['geo_name','flag','num_workers']]
flagged.head(50) # numbers 1-5 indicate response rates below 90%; leave in; 

# Create place of work 'pow' dataframe
pow = df1.groupby(['geo_code','alt_geo_code','geo_name'])['num_workers'].sum().reset_index()
print('Number of census subdivisions in Place-Of-Work data is:',len(pow))

# Are there any columns where geo_code is different than alt_geo_code?
len(pow[pow['alt_geo_code']!=pow['geo_code']]) # None, so drop one of them
pow.drop('alt_geo_code', axis=1, inplace=True)

# How many census subdivisions with 0 num_workers? (Leave them in anyway)
len(pow[pow['num_workers']==0]) #52

Number of census subdivisions in Place-Of-Work data is: 3331


52

In [65]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5178 entries, 0 to 5177
Data columns (total 29 columns):
 #   Column                                                                Non-Null Count  Dtype  
---  ------                                                                --------------  -----  
 0   Geographic code                                                       5178 non-null   object 
 1   Geographic name, english                                              5167 non-null   object 
 2   Geographic name, french                                               5162 non-null   object 
 3   CSD type, english                                                     5162 non-null   object 
 4   CSD type, french                                                      5162 non-null   object 
 5   Province / territory, english                                         5162 non-null   object 
 6   Province / territory, french                                          5162 non-null   object 
 7

In [81]:
# df2 drop rows with Population NAN or 0 people
len(df2) #originally 5178 rows
df2.drop(df2[(df2['Population, 2016']< 1) | (df2['Population, 2016'].isna())].index, inplace=True)
len(df2) # remaining 4869

# Clean up the column names
#new = [['Geographic code','Geographic name, english', 'CSD type, english', 'Population, 2016', 'Total private dwellings, 2016', 'Land area in square kilometres, 2016','Population density per square ']]

# Create main 'dat' dataframe 
#dat = df2.iloc[:, [0,1,2,3,10,11,17,25]]


4869

In [79]:
df2.tail(5)

Unnamed: 0,Geographic code,"Geographic name, english","Geographic name, french","CSD type, english","CSD type, french","Province / territory, english","Province / territory, french","Geographic code, Province / territory","Geographic code, Census division","Geographic code, Census metropolitan area / census agglomeration","Population, 2016","Incompletely enumerated Indian reserves and Indian settlements, 2016","Population, 2011",2011 adjusted population flag,"Incompletely enumerated Indian reserves and Indian settlements, 2011",2011 population review or received update flag,"Population, % change","Total private dwellings, 2016","Total private dwellings, 2011",2011 adjusted total private dwellings flag,"Total private dwellings, % change","Private dwellings occupied by usual residents, 2016","Private dwellings occupied by usual residents, 2011",2011 adjusted private dwellings occupied by usual residents flag,"Private dwellings occupied by usual residents, % change","Land area in square kilometres, 2016","Population density per square kilometre, 2016","National population rank, 2016","Provincial/territorial population rank, 2016"
5154,6208047,Kugaaruk,Kugaaruk,Hamlet,Hamlet,Nunavut,Nunavut,62.0,6208.0,,933.0,F,771.0,F,F,,21.0,180.0,173.0,F,4.0,176.0,155.0,F,13.5,4.97,187.6,2186.0,15.0
5155,6208059,Kugluktuk,Kugluktuk,Hamlet,Hamlet,Nunavut,Nunavut,62.0,6208.0,,1491.0,F,1450.0,F,F,,2.8,469.0,448.0,F,4.7,430.0,401.0,F,7.2,549.65,2.7,1644.0,8.0
5158,6208073,Cambridge Bay,Cambridge Bay,Hamlet,Hamlet,Nunavut,Nunavut,62.0,6208.0,,1766.0,F,1608.0,F,F,,9.8,646.0,573.0,F,12.7,542.0,501.0,F,8.2,202.35,8.7,1472.0,5.0
5159,6208081,Gjoa Haven,Gjoa Haven,Hamlet,Hamlet,Nunavut,Nunavut,62.0,6208.0,,1324.0,F,1279.0,F,F,,3.5,311.0,286.0,F,8.7,284.0,276.0,F,2.9,28.47,46.5,1773.0,11.0
5160,6208087,Taloyoak,Taloyoak,Hamlet,Hamlet,Nunavut,Nunavut,62.0,6208.0,,1029.0,F,899.0,F,F,,14.5,261.0,228.0,F,14.5,231.0,206.0,F,12.1,37.65,27.3,2055.0,14.0


<a id='parttwo></a>
### Part II -  

Assumptions:
-  
Other:
- Use an efficiency performance factor of 75% to account for losses (e.g., climatic factors, inverter, conversion losses...)


In [None]:
# Set Assumptions as named variables

PART IIA - SSG RESIDENTIAL

<a id='partthree'></a>
### Part III -  
 

 Assumptions:
 

Part III A:  

Part III B:  


In [5]:
# Download 2016 Census data from StatsCan

# Download 2016 place of work and place of residence sample set
csv_url = "https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/CompDataDownload.cfm?LANG=E&PID=111332&OFT=CSV"
req = requests.get(csv_url)
url_content = req.content
csv_file = open('statcan_2016_POW_sample.csv', 'wb')
csv_file.write(url_content)
csv_file.close()

from zipfile import ZipFile
from StringIO import StringIO
import urllib2

url = "https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/CompDataDownload.cfm?LANG=E&PID=111332&OFT=CSV"
r = urllib2.urlopen(url).read()
file = ZipFile(StringIO(r))
POW_csv = file.open("statcan_2016_POW_sample.csv")
salaries = pd.read_csv(salaries_csv)

zf = zipfile.ZipFile('C:/Users/Analytics Vidhya/Desktop/test.zip') # having First.csv zipped file.
df = pd.read_csv(zf.open('First.csv'))


# Download 2016 population and population density full set
csv_url = "https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Tables/CompFile.cfm?Lang=Eng&T=301&OFT=FULLCSV"
req = requests.get(csv_url)
url_content = req.content
csv_file = open('statcan_2016_census_subdivisions.csv', 'wb')
csv_file.write(url_content)
csv_file.close()
 