# Let's look at some Canadian census data!

Census time is the best. We get a brand new look at some very detailed, interesting data here in Canada. Let's slice it up to create some maps. As always, we start by importing pandas and reading in [data downloaded from StatsCan](https://www12.statcan.gc.ca/census-recensement/2021/dp-pd/dt-td/Index-eng.cfm?LANG=E&SUB=98P1016&SR=0&RPP=25&SORT=date).

In [52]:
import pandas as pd

raw = pd.read_csv("../raw/RAW 2021 STATSCAN CENSUS 1.csv", encoding = 'unicode_escape', engine ='python')
raw_csd = pd.read_csv("../raw/RAW 2021 STATSCAN CENSUS CSD.csv", encoding = 'unicode_escape', engine ='python')
tracts_raw = pd.read_csv("../raw/RAW 2021 STATSCAN CENSUS TRACTS.csv", encoding = 'unicode_escape', engine ='python')

tracts_raw["ALT_GEO_CODE"] = tracts_raw["ALT_GEO_CODE"].astype(float)

display(raw.head(3))

Unnamed: 0,CENSUS_YEAR,DGUID,ALT_GEO_CODE,GEO_LEVEL,GEO_NAME,DATA_QUALITY_FLAG,CHARACTERISTIC_ID,CHARACTERISTIC_NAME,CHARACTERISTIC_NOTE,C1_COUNT_TOTAL,SYMBOL
0,2021,2021S0503001,1,Census metropolitan area,St. John's,0,1,"Population, 2021",1.0,212579.0,
1,2021,2021S0503001,1,Census metropolitan area,St. John's,0,2,"Population, 2016",1.0,208418.0,r
2,2021,2021S0503001,1,Census metropolitan area,St. John's,0,3,"Population percentage change, 2016 to 2021",,2.0,


### Census subdivision

In [49]:
csd = (raw_csd[raw_csd["GEO_LEVEL"] == "Census subdivision"]
       .loc[raw_csd["CHARACTERISTIC_NAME"].isin(["Population percentage change, 2016 to 2021", "Population, 2021"])]
       .pivot(index=["DGUID", "GEO_NAME", "ALT_GEO_CODE"], columns="CHARACTERISTIC_NAME", values="C1_COUNT_TOTAL")
       .reset_index()
       )

display(csd.head())

CHARACTERISTIC_NAME,DGUID,GEO_NAME,ALT_GEO_CODE,"Population percentage change, 2016 to 2021","Population, 2021"
0,2021A00051001101,"Division No. 1, Subd. V, Subdivision of unorg...",1001101,52.8,55.0
1,2021A00051001105,"Portugal Cove South, Town (T)",1001105,-42.7,86.0
2,2021A00051001113,"Trepassey, Town (T)",1001113,-15.8,405.0
3,2021A00051001120,"St. Shott's, Town (T)",1001120,-16.7,55.0
4,2021A00051001124,"Division No. 1, Subd. U, Subdivision of unorg...",1001124,-15.5,1373.0


The biggest question in this release is how population has shifted since the last census in 2016. Let's filter first, so we can see which regions are shown here, as well as what category each fits into (tract, CMA etc).

In [50]:
population = (raw
              .loc[(raw["CHARACTERISTIC_NAME"] == "Population percentage change, 2016 to 2021"), :]
              .pivot(index=["GEO_NAME", "GEO_LEVEL"], columns="CHARACTERISTIC_NAME", values="C1_COUNT_TOTAL")
              .reset_index()
              .set_index("GEO_NAME")
              .sort_values("Population percentage change, 2016 to 2021", ascending=False)
              )

Now let's break them out into different dataframes based on their geographic level.

### Census agglomeration

In [47]:
df = population[population["GEO_LEVEL"] == "Census agglomeration"]

display(df.head(5))

CHARACTERISTIC_NAME,GEO_LEVEL,"Population percentage change, 2016 to 2021"
GEO_NAME,Unnamed: 1_level_1,Unnamed: 2_level_1
Squamish,Census agglomeration,21.8
Wasaga Beach,Census agglomeration,20.3
Tillsonburg,Census agglomeration,17.3
Canmore,Census agglomeration,14.3
Collingwood,Census agglomeration,13.8


### Census metropolitan areas

In [46]:
df = population[population["GEO_LEVEL"] == "Census metropolitan area"]

display(df.head(5))

CHARACTERISTIC_NAME,GEO_LEVEL,"Population percentage change, 2016 to 2021"
GEO_NAME,Unnamed: 1_level_1,Unnamed: 2_level_1
Kelowna,Census metropolitan area,14.0
Chilliwack,Census metropolitan area,12.1
Nanaimo,Census metropolitan area,10.0
Kamloops,Census metropolitan area,10.0
London,Census metropolitan area,10.0


### Census tracts

In [53]:
tracts_population = (tracts_raw[tracts_raw["CHARACTERISTIC_NAME"] == "Population percentage change, 2016 to 2021"]
                     .loc[tracts_raw["GEO_LEVEL"] == 'Census tract']
                     .pivot(columns="CHARACTERISTIC_NAME", index=["DGUID", "GEO_NAME"], values="C1_COUNT_TOTAL")
                     .reset_index()
                     .set_index("DGUID")
                    )      

### Census divisions

In [None]:
census_divisions = (raw_csd
                      .loc[raw_csd["GEO_LEVEL"] == "Census division"]
                      .loc[raw_csd["CHARACTERISTIC_NAME"] == "Population percentage change, 2016 to 2021"]
                      .loc[:,["ALT_GEO_CODE", "GEO_NAME", "C1_COUNT_TOTAL"]]
                      .set_index("ALT_GEO_CODE")
                    )

census_divisions["GEO_NAME"] = (census_divisions["GEO_NAME"]
                                .astype(str)
                                .str.replace(pat=", Region (REG)", repl="",regex=False)
                                .str.replace(pat=", Regional district (RD)", repl="",regex=False)
                                .str.replace(pat=", District (DIS)", repl="",regex=False)
                                .str.replace(pat=", County (CTY)", repl="",regex=False)
                                .str.split(pat=" / ", expand=True)[0]
)

display(census_divisions)