# 2011 Census Profile by Planning District

Sources:

* `2011 Census All.csv`: Census data extracted from Beyond 20/20
* `planning-district-codes.csv`: Waterloo Region planning districts (code, name, municipality); copied from /data/waterloo-region/

Project files:

* `census-planning-districts-indexed.csv`
* `census-planning-districts-columns.txt`
* `age-groups.csv`
* `households-persons.csv`
* `households-family-size.csv`
* `households-family-structure.csv`
* `occupied-dwellings.csv`
* `official-languages.csv`

Notes:

Census data is available for 102 of the 105 planning districts in Waterloo Region.

## Setup

In [1]:
%%bash
cp ../data/waterloo-region/planning-district-codes.csv ../sources/canada/census-planningdistrict

In [None]:
%cd ../sources/canada/census-planningdistrict/

In [3]:
import agate
import re

from operator import add

def extract_text(text, match="WAT - PD - (.*)"):
    """Extract text from a column, eg, district code from Geography column"""
    _match = re.match(match, text)
    return _match.group(1)

def rename_columns(file_name, new_column_names):
    """Overwrite a csv file with new column names"""
    old_names = agate.Table.from_csv(file_name)
    new_names = old_names.rename(new_column_names)
    new_names.to_csv(file_name)


## Index and list of columns for Census data

In [4]:
%%bash
# add line_number column as a "primary key" for joining to other tables
csvcut -l  2011\ Census\ all.csv > "census-planning-districts-indexed.csv"

csvcut -n census-planning-districts-indexed.csv > census-planning-districts-columns.txt

# list "top level" columns by excluding column names that start with a space
grep -v ":  " census-planning-districts-columns.txt

  1: line_number
  2: Geography
  3: Total population by age groups
 27: Median age of the population
 28: % of the population aged 15 and over
 81: Total population 15 years and over by marital status
108: Total number of persons in private households 
126: Total number of persons aged 65 years and over in private households
144: Total number of census families in private households by family size
149: Total number of census families in private households by family structure and number of children
172: Total children in census families in private households
178: Average number of children at home per census family
179: Average number of persons per census family
180: Total number of private households by household type
197: Total number of private households by household size
204: Total number of persons in private households
205: Average number of persons in private households
206: Total number of occupied private dwellings by structural type of dwelling
216: Detailed Mother Tongue -

## Extract planning district codes from Census data

In [5]:
%%bash
# create a temporary table with the Geography field which contains codes for each district
csvcut -c 1,2 census-planning-districts-indexed.csv > census-planning-districts-tmp.csv
head -n 10 census-planning-districts-tmp.csv | csvlook

|--------------+-------------------|
|  line_number | Geography         |
|--------------+-------------------|
|  1           | WAT - PD - C10    |
|  2           | WAT - PD - C1014  |
|  3           | WAT - PD - C103   |
|  4           | WAT - PD - C109   |
|  5           | WAT - PD - C10N   |
|  6           | WAT - PD - C10S   |
|  7           | WAT - PD - C11    |
|  8           | WAT - PD - C116   |
|  9           | WAT - PD - C12    |
|--------------+-------------------|


In [6]:
districts = agate.Table.from_csv("census-planning-districts-tmp.csv")

column_names = ["index", "code"]
column_types = [agate.Number(), agate.Text()]
column_values = [[row["line_number"], extract_text(row["Geography"])] for row in districts.rows]

districts_table = agate.Table(column_values, column_names, column_types)

districts_table.to_csv("census-planning-districts-codes-tmp.csv")

In [7]:
%%bash
# create a table we can join with census-planning-districts-indexed.csv
csvjoin -c "code,id" census-planning-districts-codes-tmp.csv planning-district-codes.csv | \
csvcut -c 1,2,4,5 > census-planning-districts.csv

head -n 10 census-planning-districts.csv | csvlook

|--------+-------+----------------------------+---------------|
|  index | code  | name                       | municipality  |
|--------+-------+----------------------------+---------------|
|  1     | C10   | Blair                      | Cambridge     |
|  2     | C1014 | Silver Heights/Blackbridge | Cambridge     |
|  3     | C103  | Riverside                  | Cambridge     |
|  4     | C109  | Centennial/River Flats     | Cambridge     |
|  5     | C10N  | Shades Mills North         | Cambridge     |
|  6     | C10S  | Shades Mills South         | Cambridge     |
|  7     | C11   | Preston Heights            | Cambridge     |
|  8     | C116  | Riverview                  | Cambridge     |
|  9     | C12   | Central Park               | Cambridge     |
|--------+-------+----------------------------+---------------|


## Population by age group

In [8]:
%%bash
# temporary file for population by age group: columns 1-7, 13-26
csvcut -c 1,3,4,5,6,7,13,14,15,16,17,18,19,20,21,22,23,24,25,26 census-planning-districts-indexed.csv > \
age-groups-tmp.csv

csvcut -n age-groups-tmp.csv

  1: line_number
  2: Total population by age groups
  3:   0 to 4 years 
  4:   5 to 9 years 
  5:   10 to 14 years 
  6:   15 to 19 years 
  7:   20 to 24 years 
  8:   25 to 29 years 
  9:   30 to 34 years 
 10:   35 to 39 years 
 11:   40 to 44 years 
 12:   45 to 49 years 
 13:   50 to 54 years 
 14:   55 to 59 years 
 15:   60 to 64 years 
 16:   65 to 69 years 
 17:   70 to 74 years 
 18:   75 to 79 years 
 19:   80 to 84 years 
 20:   85 years and over 


In [9]:
new_column_names = [
    "line_number",
    "total",
    "00-04", "05-09", "10-14", "15-19", "20-24", "25-29",
    "30-34", "35-39", "40-44", "45-49", "50-54", "55-59",
    "60-64", "65-69", "70-74", "75-79", "80-84", "85-00"
]
rename_columns("age-groups-tmp.csv", new_column_names)

In [10]:
%%bash
# add planning district data, remove line_number column
csvjoin -c "index,line_number" census-planning-districts.csv age-groups-tmp.csv | \
csvcut -C 5 > age-groups.csv

csvcut -n age-groups.csv

  1: index
  2: code
  3: name
  4: municipality
  5: total
  6: 00-04
  7: 05-09
  8: 10-14
  9: 15-19
 10: 20-24
 11: 25-29
 12: 30-34
 13: 35-39
 14: 40-44
 15: 45-49
 16: 50-54
 17: 55-59
 18: 60-64
 19: 65-69
 20: 70-74
 21: 75-79
 22: 80-84
 23: 85-00


## Households

### Persons

In [11]:
%%bash
csvcut -c 1,108,113,109,110,111,112,126,130 census-planning-districts-indexed.csv > \
households-persons-tmp.csv

csvcut -n households-persons-tmp.csv

  1: line_number
  2: Total number of persons in private households 
  3:   Number of census family persons 
  4:   Number of persons not in census families 
  5:     Living with relatives 
  6:     Living with non-relatives only
  7:     Living alone 
  8: Total number of persons aged 65 years and over in private households
  9:     Living alone


In [12]:
new_column_names = [
    "line_number",
    "total",
    "family",
    "other", "other-relatives", "other-non-relatives", "other-alone",
    "65-plus", "65-plus-alone"
]
rename_columns("households-persons-tmp.csv", new_column_names)

In [13]:
%%bash
# add planning district data, remove line_number column
csvjoin -c "index,line_number" census-planning-districts.csv households-persons-tmp.csv | \
csvcut -C 5 > households-persons.csv

csvcut -n households-persons.csv

  1: index
  2: code
  3: name
  4: municipality
  5: total
  6: family
  7: other
  8: other-relatives
  9: other-non-relatives
 10: other-alone
 11: 65-plus
 12: 65-plus-alone


### Family size

In [14]:
%%bash
csvcut -c 1,144,145,146,147,148 census-planning-districts-indexed.csv > \
households-family-size-tmp.csv

csvcut -n households-family-size-tmp.csv

  1: line_number
  2: Total number of census families in private households by family size
  3:   Size of census family: 2 persons
  4:   Size of census family: 3 persons
  5:   Size of census family: 4 persons
  6:   Size of census family: 5 or more persons


In [15]:
new_column_names = [
    "line_number",
    "total",
    "total-2", "total-3", "total-4", "total-5-plus"
]
rename_columns("households-family-size-tmp.csv", new_column_names)

In [16]:
%%bash
# add planning district data, remove line_number column
csvjoin -c "index,line_number" census-planning-districts.csv households-family-size-tmp.csv | \
csvcut -C 5 > households-family-size.csv

csvcut -n households-family-size.csv

  1: index
  2: code
  3: name
  4: municipality
  5: total
  6: total-2
  7: total-3
  8: total-4
  9: total-5-plus


### Family structure

In [17]:
%%bash
csvcut -c 1,149,150,151,152,153,157,158,159,163 \
census-planning-districts-indexed.csv > households-family-structure-tmp.csv

csvcut -n households-family-structure-tmp.csv

  1: line_number
  2: Total number of census families in private households by family structure and number of children
  3:   Total couple families by family structure and number of children
  4:     Married couples
  5:       Without children at home
  6:       With children at home
  7:     Common-law couples
  8:       Without children at home
  9:       With children at home
 10:   Total lone-parent families by sex of parent and number of children


In [None]:
new_column_names = [
    "line_number",
    "total", "couples",
    "married", "married-no-children", "married-children",
    "common-law", "common-law-no-children", "common-law-children",
    "lone-parent"
]
rename_columns("households-family-structure-tmp.csv", new_column_names)

# compute totals for couples with/without children
families_data = agate.Table.from_csv("households-family-structure-tmp.csv")
couples_data = families_data.compute([
    ("couples-children", agate.Formula(
        agate.Number(), lambda row: add(row["married-children"], row["common-law-children"]))),
    ("couples-no-children", agate.Formula(
        agate.Number(), lambda row: add(row["married-no-children"], row["common-law-no-children"])))
])
couples_data.to_csv("households-family-structure-tmp.csv")

In [19]:
%%bash
# add planning district data, remove line_number column, married/common-law columns
csvjoin -c "index,line_number" \
census-planning-districts.csv households-family-structure-tmp.csv | \
csvcut -c 1,2,3,4,6,7,15,16,14 > households-family-structure.csv

csvcut -n households-family-structure.csv

  1: index
  2: code
  3: name
  4: municipality
  5: total
  6: couples
  7: couples-children
  8: couples-no-children
  9: lone-parent


## Occupied private dwellings by structural type

In [20]:
%%bash
# temporary file for dwellings: columns 206-215
csvcut -c 1,206,207,208,209,210,211,212,213,214,215 census-planning-districts-indexed.csv > \
occupied-dwellings-tmp.csv

csvcut -n occupied-dwellings-tmp.csv

  1: line_number
  2: Total number of occupied private dwellings by structural type of dwelling
  3:   Single-detached house
  4:   Apartment, building that has five or more storeys
  5:   Movable dwelling
  6:   Other dwelling
  7:     Semi-detached house
  8:     Row house
  9:     Apartment, duplex
 10:     Apartment, building that has fewer than five storeys
 11:     Other single-attached house


In [21]:
new_column_names = [
    "line_number",
    "total",
    "single-detached", "apartment-5-plus", "movable", "other", "semi-detached",
    "row-house", "apartment-duplex", "apartment", "single-attached"
]
rename_columns("occupied-dwellings-tmp.csv", new_column_names)

In [22]:
%%bash
# add planning district data, remove line_number and other column, re-order columns
csvjoin -c "index,line_number" census-planning-districts.csv occupied-dwellings-tmp.csv | \
csvcut -c 1,2,3,4,6,14,8,13,9,12,11,15,7 > occupied-dwellings.csv

csvcut -n occupied-dwellings.csv

  1: index
  2: code
  3: name
  4: municipality
  5: total
  6: apartment
  7: apartment-5-plus
  8: apartment-duplex
  9: movable
 10: row-house
 11: semi-detached
 12: single-attached
 13: single-detached


## Knowledge of official languages

In [23]:
%%bash
# temporary file for official languages: columns 549-553
csvcut -c 1,549,550,551,552,553 census-planning-districts-indexed.csv > \
official-languages-tmp.csv

csvcut -n official-languages-tmp.csv

  1: line_number
  2: Knowledge of official languages - Total population excluding institutional residents
  3:     English only
  4:     French only
  5:     English and French
  6:     Neither English nor French


In [24]:
new_column_names = [
    "line_number",
    "total",
    "english", "french", "both", "neither"
]
rename_columns("official-languages-tmp.csv", new_column_names)

In [25]:
%%bash
# add planning district data, remove line_number column
csvjoin -c "index,line_number" census-planning-districts.csv official-languages-tmp.csv | \
csvcut -C 5 > official-languages.csv

csvcut -n official-languages.csv

  1: index
  2: code
  3: name
  4: municipality
  5: total
  6: english
  7: french
  8: both
  9: neither


## Clean up temporary files

In [26]:
%%bash
rm *tmp.csv
ls

2011 Census all.csv
age-groups.csv
census-planning-districts-columns.txt
census-planning-districts-indexed.csv
census-planning-districts.csv
households-family-size.csv
households-family-structure.csv
households-persons.csv
occupied-dwellings.csv
official-languages.csv
planning-district-codes.csv
readme.md
