# The Geography of Economic Opportunity: Mapping Canada's Urban Transformation

## Introduction

Canada's urban landscape has undergone dramatic transformation over the past decade. From the tech boom in Toronto and Vancouver to the resource economy cycles in Calgary and Edmonton, each metropolitan area tells a unique story of economic adaptation and demographic change. But beyond the headlines and anecdotal evidence, what does the data reveal about the fundamental shifts reshaping Canadian cities?

This analysis explores three interconnected questions about Canada's evolving urban geography:

1. **The Education-Economy Nexus**: How has the relationship between educational attainment and economic outcomes evolved across different metropolitan areas?

2. **Housing as Economic Indicator**: Can housing characteristics serve as a proxy for understanding regional economic transformations?

3. **The Digital Divide**: How do commuting patterns reflect the changing nature of work and urban structure in the post-industrial economy?

Using Census data from 2016 and 2021, we'll map these changes across Canada's major metropolitan areas, revealing patterns that speak to broader questions about inequality, opportunity, and the future of Canadian cities.

---

## Data and Methods

Our analysis draws on Statistics Canada's Census data accessed through the CensusMapper API, focusing on Census Metropolitan Areas (CMAs) - functional economic regions that capture the full extent of Canada's major urban areas. We examine changes between 2016 and 2021, a period that encompasses both the pre-pandemic economy and the initial impacts of COVID-19 on urban structure.

In [None]:
# Setup and imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)

# Import pycancensus
import pycancensus as pc

# Set up clean plotting style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False

print(f"API key status: {'Set' if pc.get_api_key() else 'Not set'}")
print("Libraries loaded successfully")

## Exploring Available Data

Before formulating specific hypotheses, let's examine what data is available and identify Canada's major metropolitan areas.

In [ ]:
# Explore available datasets
datasets = pc.list_census_datasets()
print("Available Census datasets:")
census_datasets = datasets[datasets['dataset'].str.contains('CA\\d{2}$', regex=True)]
print(census_datasets[['dataset', 'description']].to_string(index=False))

# Get list of CMAs to understand Canada's major urban areas
print("\nIdentifying Canada's Census Metropolitan Areas...")
# Use a broad provincial query to get all CMAs - we'll get all provinces
all_provinces = ['10', '11', '12', '13', '24', '35', '46', '47', '48', '59', '60', '61', '62']

cma_data = pc.get_census(
    dataset='CA21',
    regions={'PR': all_provinces},
    level='CMA',
    use_cache=True
)

print(f"Found {len(cma_data)} Census Metropolitan Areas")
print("\nLargest CMAs by population:")
top_cmas = cma_data.nlargest(15, 'Population ')[['Region Name', 'Population ']]
for _, row in top_cmas.iterrows():
    print(f"  {row['Region Name']}: {row['Population ']:,}")

## Research Hypotheses

Based on the available data and current economic trends, we'll test three specific hypotheses:

### Hypothesis 1: The Education Premium Varies by Metropolitan Context
**Prediction**: The economic returns to higher education vary significantly across CMAs, with knowledge economy centers (Toronto, Vancouver, Ottawa) showing stronger education-income relationships than resource-dependent cities.

### Hypothesis 2: Housing Density Reflects Economic Transition
**Prediction**: Metropolitan areas experiencing rapid economic growth will show increased housing density and changing housing types, particularly growth in multi-unit dwellings.

### Hypothesis 3: Commuting Patterns Reveal Economic Structure
**Prediction**: Cities with more diverse, knowledge-based economies will show higher rates of public transit use and active transportation, while resource-dependent cities remain car-dependent.

Let's gather the data needed to test these hypotheses.

In [None]:
# Define variables for our analysis
# We'll focus on the top 10 CMAs for detailed analysis
major_cmas = ['35535', '59933', '24462', '48825', '47015', '10110', '12205', '46715', '24408', '13505']
# Toronto, Vancouver, Montreal, Calgary, Edmonton, Halifax, Saskatoon, Victoria, Quebec City, Regina

# Education and income variables
education_income_vars_2021 = {
    'v_CA21_1': 'population',
    'v_CA21_5825': 'no_certificate',          # No certificate, diploma or degree
    'v_CA21_5831': 'high_school',             # Secondary school diploma or equivalency
    'v_CA21_5840': 'trades_certificate',      # Apprenticeship or trades certificate
    'v_CA21_5849': 'college_diploma',         # College, CEGEP or other non-university certificate
    'v_CA21_5855': 'bachelor_degree',         # Bachelor's degree
    'v_CA21_5858': 'graduate_degree',         # University certificate, diploma or degree above bachelor
    'v_CA21_906': 'median_income',             # Median total income
}

education_income_vars_2016 = {
    'v_CA16_1': 'population',
    'v_CA16_5045': 'no_certificate',
    'v_CA16_5051': 'high_school', 
    'v_CA16_5060': 'trades_certificate',
    'v_CA16_5069': 'college_diploma',
    'v_CA16_5075': 'bachelor_degree',
    'v_CA16_5078': 'graduate_degree',
    'v_CA16_2397': 'median_income',
}

# Housing variables
housing_vars_2021 = {
    'v_CA21_408': 'total_dwellings',
    'v_CA21_409': 'single_detached',
    'v_CA21_410': 'apartment_5plus',
    'v_CA21_411': 'other_attached',
    'v_CA21_434': 'owned',
    'v_CA21_437': 'rented',
}

housing_vars_2016 = {
    'v_CA16_408': 'total_dwellings',
    'v_CA16_409': 'single_detached',
    'v_CA16_410': 'apartment_5plus', 
    'v_CA16_411': 'other_attached',
    'v_CA16_4838': 'owned',
    'v_CA16_4841': 'rented',
}

# Transportation variables
transport_vars_2021 = {
    'v_CA21_7999': 'total_commuters',
    'v_CA21_8002': 'car_driver',
    'v_CA21_8005': 'car_passenger', 
    'v_CA21_8008': 'public_transit',
    'v_CA21_8011': 'walked',
    'v_CA21_8014': 'bicycle',
}

transport_vars_2016 = {
    'v_CA16_7999': 'total_commuters',
    'v_CA16_8002': 'car_driver',
    'v_CA16_8005': 'car_passenger',
    'v_CA16_8008': 'public_transit', 
    'v_CA16_8011': 'walked',
    'v_CA16_8014': 'bicycle',
}

print("Variables defined for comprehensive urban analysis")
print(f"Focusing on {len(major_cmas)} major CMAs")
print(f"Education variables: {len(education_income_vars_2021)}")
print(f"Housing variables: {len(housing_vars_2021)}")
print(f"Transportation variables: {len(transport_vars_2021)}")

## Data Collection

We'll gather data for both 2016 and 2021 to examine how these relationships have evolved over time.

In [None]:
def collect_cma_data(year, variables_dict, cma_list):
    """
    Collect data for specific CMAs and variables
    """
    print(f"Collecting {year} data for {len(cma_list)} CMAs...")
    
    data = pc.get_census(
        dataset=f'CA{str(year)[2:]}',
        regions={'CMA': cma_list},
        vectors=list(variables_dict.keys()),
        level='CMA',
        use_cache=True
    )
    
    # Rename columns for clarity
    rename_dict = {}
    for vector_code, clean_name in variables_dict.items():
        # Find the actual column name (API returns descriptive names)
        matching_cols = [col for col in data.columns if col.startswith(vector_code)]
        if matching_cols:
            rename_dict[matching_cols[0]] = clean_name
    
    data = data.rename(columns=rename_dict)
    data['year'] = year
    
    # Handle population column
    if 'Population ' in data.columns:
        data = data.rename(columns={'Population ': 'total_population'})
    elif 'Population' in data.columns:
        data = data.rename(columns={'Population': 'total_population'})
    
    print(f"Collected data for {len(data)} CMAs")
    return data

# Collect all data
print("=" * 50)
education_2021 = collect_cma_data(2021, education_income_vars_2021, major_cmas)
education_2016 = collect_cma_data(2016, education_income_vars_2016, major_cmas)

print("\n" + "=" * 50)
housing_2021 = collect_cma_data(2021, housing_vars_2021, major_cmas)
housing_2016 = collect_cma_data(2016, housing_vars_2016, major_cmas)

print("\n" + "=" * 50)
transport_2021 = collect_cma_data(2021, transport_vars_2021, major_cmas)
transport_2016 = collect_cma_data(2016, transport_vars_2016, major_cmas)

print("\n" + "=" * 50)
print("Data collection complete")

## Hypothesis 1: The Education Premium Across Metropolitan Areas

Our first analysis examines whether the economic returns to education vary systematically across different types of metropolitan areas. We expect that knowledge economy centers will show stronger relationships between educational attainment and economic outcomes.

In [None]:
# Prepare education data
def prepare_education_analysis(data_2016, data_2021):
    """
    Combine and process education data for analysis
    """
    # Combine years
    combined = pd.concat([data_2016, data_2021], ignore_index=True)
    
    # Calculate education proportions
    education_levels = ['no_certificate', 'high_school', 'trades_certificate', 
                       'college_diploma', 'bachelor_degree', 'graduate_degree']
    
    total_pop_over_15 = combined[education_levels].sum(axis=1)
    
    for level in education_levels:
        combined[f'{level}_pct'] = (combined[level] / total_pop_over_15) * 100
    
    # Calculate higher education rate (bachelor's + graduate)
    combined['higher_ed_pct'] = combined['bachelor_degree_pct'] + combined['graduate_degree_pct']
    
    return combined

education_data = prepare_education_analysis(education_2016, education_2021)

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Education and Economic Outcomes Across Canadian Metropolitan Areas', 
             fontsize=16, fontweight='bold', y=0.98)

# 1. Higher education rates by CMA (2021)
data_2021_ed = education_data[education_data['year'] == 2021].copy()
data_2021_ed = data_2021_ed.sort_values('higher_ed_pct', ascending=True)

axes[0,0].barh(range(len(data_2021_ed)), data_2021_ed['higher_ed_pct'], 
               color='steelblue', alpha=0.7)
axes[0,0].set_yticks(range(len(data_2021_ed)))
axes[0,0].set_yticklabels(data_2021_ed['Region Name'], fontsize=10)
axes[0,0].set_xlabel('Higher Education Rate (%)')
axes[0,0].set_title('Higher Education Attainment (2021)')
axes[0,0].grid(axis='x', alpha=0.3)

# 2. Education vs Income relationship
axes[0,1].scatter(data_2021_ed['higher_ed_pct'], data_2021_ed['median_income'], 
                  s=100, alpha=0.7, color='darkgreen')
# Add trend line
z = np.polyfit(data_2021_ed['higher_ed_pct'], data_2021_ed['median_income'], 1)
p = np.poly1d(z)
axes[0,1].plot(data_2021_ed['higher_ed_pct'], p(data_2021_ed['higher_ed_pct']), 
               "r--", alpha=0.8, linewidth=2)
axes[0,1].set_xlabel('Higher Education Rate (%)')
axes[0,1].set_ylabel('Median Income ($)')
axes[0,1].set_title('Education-Income Relationship (2021)')

# Calculate correlation
correlation = data_2021_ed['higher_ed_pct'].corr(data_2021_ed['median_income'])
axes[0,1].text(0.05, 0.95, f'r = {correlation:.3f}', transform=axes[0,1].transAxes,
               bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

# 3. Change in higher education rates (2016-2021)
pivot_data = education_data.pivot_table(index='Region Name', columns='year', 
                                        values='higher_ed_pct').reset_index()
pivot_data['change'] = pivot_data[2021] - pivot_data[2016]
pivot_data = pivot_data.sort_values('change', ascending=True)

colors = ['red' if x < 0 else 'green' for x in pivot_data['change']]
axes[1,0].barh(range(len(pivot_data)), pivot_data['change'], color=colors, alpha=0.7)
axes[1,0].set_yticks(range(len(pivot_data)))
axes[1,0].set_yticklabels(pivot_data['Region Name'], fontsize=10)
axes[1,0].set_xlabel('Change in Higher Education Rate (percentage points)')
axes[1,0].set_title('Education Growth 2016-2021')
axes[1,0].axvline(x=0, color='black', linestyle='-', alpha=0.3)
axes[1,0].grid(axis='x', alpha=0.3)

# 4. Education composition comparison
education_composition = education_data[education_data['year'] == 2021].copy()
education_composition = education_composition.sort_values('higher_ed_pct')

bottom = np.zeros(len(education_composition))
education_levels = ['no_certificate_pct', 'high_school_pct', 'trades_certificate_pct', 
                   'college_diploma_pct', 'bachelor_degree_pct', 'graduate_degree_pct']
colors = ['#d62728', '#ff7f0e', '#2ca02c', '#1f77b4', '#9467bd', '#8c564b']
labels = ['No Certificate', 'High School', 'Trades', 'College', "Bachelor's", 'Graduate']

for i, (level, color, label) in enumerate(zip(education_levels, colors, labels)):
    axes[1,1].barh(range(len(education_composition)), education_composition[level], 
                   left=bottom, color=color, alpha=0.8, label=label)
    bottom += education_composition[level]

axes[1,1].set_yticks(range(len(education_composition)))
axes[1,1].set_yticklabels(education_composition['Region Name'], fontsize=10)
axes[1,1].set_xlabel('Percentage of Population')
axes[1,1].set_title('Education Composition by CMA (2021)')
axes[1,1].legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=9)

plt.tight_layout()
plt.show()

# Print key findings
print("\nKey Findings - Education Analysis:")
print(f"Highest higher education rate: {data_2021_ed.iloc[-1]['Region Name']} ({data_2021_ed.iloc[-1]['higher_ed_pct']:.1f}%)")
print(f"Lowest higher education rate: {data_2021_ed.iloc[0]['Region Name']} ({data_2021_ed.iloc[0]['higher_ed_pct']:.1f}%)")
print(f"Education-income correlation: {correlation:.3f}")
print(f"Largest education growth: {pivot_data.iloc[-1]['Region Name']} (+{pivot_data.iloc[-1]['change']:.1f} pts)")

## Hypothesis 2: Housing Density and Economic Transformation

Housing markets often reflect underlying economic changes. Growing, economically dynamic cities tend to see increased density as land values rise and populations grow. Let's examine how housing characteristics have evolved across Canadian CMAs.

In [None]:
# Prepare housing data
def prepare_housing_analysis(data_2016, data_2021):
    """
    Combine and process housing data for analysis
    """
    # Combine years
    combined = pd.concat([data_2016, data_2021], ignore_index=True)
    
    # Calculate housing type proportions
    combined['single_detached_pct'] = (combined['single_detached'] / combined['total_dwellings']) * 100
    combined['apartment_5plus_pct'] = (combined['apartment_5plus'] / combined['total_dwellings']) * 100
    combined['other_attached_pct'] = (combined['other_attached'] / combined['total_dwellings']) * 100
    
    # Calculate ownership rates
    total_tenure = combined['owned'] + combined['rented']
    combined['ownership_rate'] = (combined['owned'] / total_tenure) * 100
    
    # Calculate density proxy (dwellings per capita)
    combined['dwellings_per_1000'] = (combined['total_dwellings'] / combined['total_population']) * 1000
    
    return combined

housing_data = prepare_housing_analysis(housing_2016, housing_2021)

# Create housing analysis visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Housing Transformation Across Canadian Metropolitan Areas', 
             fontsize=16, fontweight='bold', y=0.98)

# 1. Housing type composition (2021)
housing_2021 = housing_data[housing_data['year'] == 2021].copy()
housing_2021 = housing_2021.sort_values('apartment_5plus_pct')

x = range(len(housing_2021))
width = 0.6

axes[0,0].barh(x, housing_2021['single_detached_pct'], width, label='Single Detached', 
               color='lightcoral', alpha=0.8)
axes[0,0].barh(x, housing_2021['apartment_5plus_pct'], width, 
               left=housing_2021['single_detached_pct'], label='Apartments (5+)', 
               color='skyblue', alpha=0.8)
axes[0,0].barh(x, housing_2021['other_attached_pct'], width,
               left=housing_2021['single_detached_pct'] + housing_2021['apartment_5plus_pct'],
               label='Other Attached', color='lightgreen', alpha=0.8)

axes[0,0].set_yticks(x)
axes[0,0].set_yticklabels(housing_2021['Region Name'], fontsize=10)
axes[0,0].set_xlabel('Percentage of Housing Stock')
axes[0,0].set_title('Housing Type Composition (2021)')
axes[0,0].legend()

# 2. Change in apartment share (2016-2021)
apartment_change = housing_data.pivot_table(index='Region Name', columns='year', 
                                           values='apartment_5plus_pct').reset_index()
apartment_change['change'] = apartment_change[2021] - apartment_change[2016]
apartment_change = apartment_change.sort_values('change', ascending=True)

colors = ['red' if x < 0 else 'blue' for x in apartment_change['change']]
axes[0,1].barh(range(len(apartment_change)), apartment_change['change'], 
               color=colors, alpha=0.7)
axes[0,1].set_yticks(range(len(apartment_change)))
axes[0,1].set_yticklabels(apartment_change['Region Name'], fontsize=10)
axes[0,1].set_xlabel('Change in Apartment Share (percentage points)')
axes[0,1].set_title('Densification Trends 2016-2021')
axes[0,1].axvline(x=0, color='black', linestyle='-', alpha=0.3)
axes[0,1].grid(axis='x', alpha=0.3)

# 3. Ownership rates vs population growth
# Calculate population growth
pop_change = housing_data.pivot_table(index='Region Name', columns='year', 
                                     values='total_population').reset_index()
pop_change['pop_growth'] = ((pop_change[2021] - pop_change[2016]) / pop_change[2016]) * 100

ownership_2021 = housing_data[housing_data['year'] == 2021][['Region Name', 'ownership_rate']]
growth_ownership = pop_change.merge(ownership_2021, on='Region Name')

axes[1,0].scatter(growth_ownership['pop_growth'], growth_ownership['ownership_rate'], 
                  s=100, alpha=0.7, color='purple')

# Add labels for each point
for i, row in growth_ownership.iterrows():
    axes[1,0].annotate(row['Region Name'].split(',')[0], 
                      (row['pop_growth'], row['ownership_rate']),
                      xytext=(5, 5), textcoords='offset points', fontsize=9, alpha=0.8)

axes[1,0].set_xlabel('Population Growth 2016-2021 (%)')
axes[1,0].set_ylabel('Homeownership Rate (%)')
axes[1,0].set_title('Growth vs Homeownership')
axes[1,0].grid(alpha=0.3)

# 4. Housing density trends
density_data = housing_data.pivot_table(index='Region Name', columns='year', 
                                        values='dwellings_per_1000').reset_index()
density_data['density_change'] = density_data[2021] - density_data[2016]
density_data = density_data.sort_values('density_change', ascending=True)

colors = ['red' if x < 0 else 'orange' for x in density_data['density_change']]
axes[1,1].barh(range(len(density_data)), density_data['density_change'], 
               color=colors, alpha=0.7)
axes[1,1].set_yticks(range(len(density_data)))
axes[1,1].set_yticklabels(density_data['Region Name'], fontsize=10)
axes[1,1].set_xlabel('Change in Dwellings per 1000 People')
axes[1,1].set_title('Housing Density Change 2016-2021')
axes[1,1].axvline(x=0, color='black', linestyle='-', alpha=0.3)
axes[1,1].grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

# Print key findings
print("\nKey Findings - Housing Analysis:")
highest_apt = housing_2021.iloc[-1]
print(f"Highest apartment share: {highest_apt['Region Name']} ({highest_apt['apartment_5plus_pct']:.1f}%)")
biggest_densification = apartment_change.iloc[-1]
print(f"Largest densification: {biggest_densification['Region Name']} (+{biggest_densification['change']:.1f} pts)")
fastest_growth = growth_ownership.loc[growth_ownership['pop_growth'].idxmax()]
print(f"Fastest growing CMA: {fastest_growth['Region Name']} ({fastest_growth['pop_growth']:.1f}% growth)")

## Hypothesis 3: Commuting Patterns and Economic Structure

Transportation choices reflect both urban form and economic structure. Knowledge economy centers often have better public transit infrastructure and more walkable urban cores, while resource-dependent cities may remain more car-centric.

In [None]:
# Prepare transportation data
def prepare_transport_analysis(data_2016, data_2021):
    """
    Combine and process transportation data for analysis
    """
    # Combine years
    combined = pd.concat([data_2016, data_2021], ignore_index=True)
    
    # Calculate transportation mode shares
    combined['car_driver_pct'] = (combined['car_driver'] / combined['total_commuters']) * 100
    combined['public_transit_pct'] = (combined['public_transit'] / combined['total_commuters']) * 100
    combined['active_transport_pct'] = ((combined['walked'] + combined['bicycle']) / combined['total_commuters']) * 100
    combined['car_total_pct'] = ((combined['car_driver'] + combined['car_passenger']) / combined['total_commuters']) * 100
    
    # Create sustainable transport index (public transit + active)
    combined['sustainable_transport_pct'] = combined['public_transit_pct'] + combined['active_transport_pct']
    
    return combined

transport_data = prepare_transport_analysis(transport_2016, transport_2021)

# Create transportation analysis visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Transportation Patterns and Urban Structure in Canadian Cities', 
             fontsize=16, fontweight='bold', y=0.98)

# 1. Public transit usage by CMA (2021)
transport_2021 = transport_data[transport_data['year'] == 2021].copy()
transport_2021 = transport_2021.sort_values('public_transit_pct', ascending=True)

axes[0,0].barh(range(len(transport_2021)), transport_2021['public_transit_pct'], 
               color='green', alpha=0.7)
axes[0,0].set_yticks(range(len(transport_2021)))
axes[0,0].set_yticklabels(transport_2021['Region Name'], fontsize=10)
axes[0,0].set_xlabel('Public Transit Mode Share (%)')
axes[0,0].set_title('Public Transit Usage (2021)')
axes[0,0].grid(axis='x', alpha=0.3)

# 2. Transportation mode composition
transport_comp = transport_2021.sort_values('sustainable_transport_pct')

x = range(len(transport_comp))
width = 0.6

axes[0,1].barh(x, transport_comp['car_total_pct'], width, label='Car (Driver + Passenger)', 
               color='red', alpha=0.8)
axes[0,1].barh(x, transport_comp['public_transit_pct'], width, 
               left=transport_comp['car_total_pct'], label='Public Transit', 
               color='green', alpha=0.8)
axes[0,1].barh(x, transport_comp['active_transport_pct'], width,
               left=transport_comp['car_total_pct'] + transport_comp['public_transit_pct'],
               label='Active Transport', color='blue', alpha=0.8)

axes[0,1].set_yticks(x)
axes[0,1].set_yticklabels(transport_comp['Region Name'], fontsize=10)
axes[0,1].set_xlabel('Mode Share (%)')
axes[0,1].set_title('Transportation Mode Composition (2021)')
axes[0,1].legend()

# 3. Change in sustainable transport (2016-2021)
sustainable_change = transport_data.pivot_table(index='Region Name', columns='year', 
                                               values='sustainable_transport_pct').reset_index()
sustainable_change['change'] = sustainable_change[2021] - sustainable_change[2016]
sustainable_change = sustainable_change.sort_values('change', ascending=True)

colors = ['red' if x < 0 else 'green' for x in sustainable_change['change']]
axes[1,0].barh(range(len(sustainable_change)), sustainable_change['change'], 
               color=colors, alpha=0.7)
axes[1,0].set_yticks(range(len(sustainable_change)))
axes[1,0].set_yticklabels(sustainable_change['Region Name'], fontsize=10)
axes[1,0].set_xlabel('Change in Sustainable Transport Share (percentage points)')
axes[1,0].set_title('Sustainable Transport Trends 2016-2021')
axes[1,0].axvline(x=0, color='black', linestyle='-', alpha=0.3)
axes[1,0].grid(axis='x', alpha=0.3)

# 4. Transit usage vs population density proxy
# Merge with housing data to get density measure
housing_2021 = housing_data[housing_data['year'] == 2021][['Region Name', 'dwellings_per_1000']]
transport_density = transport_2021.merge(housing_2021, on='Region Name')

axes[1,1].scatter(transport_density['dwellings_per_1000'], transport_density['public_transit_pct'], 
                  s=100, alpha=0.7, color='darkblue')

# Add labels for each point
for i, row in transport_density.iterrows():
    axes[1,1].annotate(row['Region Name'].split(',')[0], 
                      (row['dwellings_per_1000'], row['public_transit_pct']),
                      xytext=(5, 5), textcoords='offset points', fontsize=9, alpha=0.8)

# Add trend line
z = np.polyfit(transport_density['dwellings_per_1000'], transport_density['public_transit_pct'], 1)
p = np.poly1d(z)
axes[1,1].plot(transport_density['dwellings_per_1000'], p(transport_density['dwellings_per_1000']), 
               "r--", alpha=0.8, linewidth=2)

correlation = transport_density['dwellings_per_1000'].corr(transport_density['public_transit_pct'])
axes[1,1].text(0.05, 0.95, f'r = {correlation:.3f}', transform=axes[1,1].transAxes,
               bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

axes[1,1].set_xlabel('Dwellings per 1000 People (Density Proxy)')
axes[1,1].set_ylabel('Public Transit Mode Share (%)')
axes[1,1].set_title('Density vs Transit Usage')
axes[1,1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Print key findings
print("\nKey Findings - Transportation Analysis:")
highest_transit = transport_2021.iloc[-1]
print(f"Highest transit usage: {highest_transit['Region Name']} ({highest_transit['public_transit_pct']:.1f}%)")
biggest_sustainable_growth = sustainable_change.iloc[-1]
print(f"Largest sustainable transport growth: {biggest_sustainable_growth['Region Name']} (+{biggest_sustainable_growth['change']:.1f} pts)")
print(f"Density-transit correlation: {correlation:.3f}")
most_car_dependent = transport_2021.loc[transport_2021['car_total_pct'].idxmax()]
print(f"Most car-dependent: {most_car_dependent['Region Name']} ({most_car_dependent['car_total_pct']:.1f}% car usage)")

## Synthesis: Mapping Canada's Urban Transformation

Let's create a comprehensive view that synthesizes our findings across all three dimensions of urban change.

In [None]:
# Create comprehensive synthesis analysis
def create_urban_typology():
    """
    Combine all three dimensions to create urban typology
    """
    # Get 2021 data for each dimension
    ed_2021 = education_data[education_data['year'] == 2021][['Region Name', 'higher_ed_pct', 'median_income']]
    housing_2021_sub = housing_data[housing_data['year'] == 2021][['Region Name', 'apartment_5plus_pct', 'ownership_rate']]
    transport_2021_sub = transport_data[transport_data['year'] == 2021][['Region Name', 'public_transit_pct', 'sustainable_transport_pct']]
    
    # Merge all data
    urban_profile = ed_2021.merge(housing_2021_sub, on='Region Name')
    urban_profile = urban_profile.merge(transport_2021_sub, on='Region Name')
    
    # Clean city names
    urban_profile['City'] = urban_profile['Region Name'].str.split(',').str[0]
    
    # Calculate composite scores (normalized)
    from sklearn.preprocessing import StandardScaler
    
    scaler = StandardScaler()
    features = ['higher_ed_pct', 'apartment_5plus_pct', 'public_transit_pct']
    urban_profile[['education_score', 'density_score', 'transit_score']] = scaler.fit_transform(urban_profile[features])
    
    # Create composite "urban modernity" index
    urban_profile['urban_modernity_index'] = (urban_profile['education_score'] + 
                                             urban_profile['density_score'] + 
                                             urban_profile['transit_score']) / 3
    
    return urban_profile

urban_typology = create_urban_typology()

# Create final synthesis visualization
fig = plt.figure(figsize=(20, 12))
gs = fig.add_gridspec(3, 4, hspace=0.3, wspace=0.3)

# Main scatter plot: Education vs Transit with Housing as color
ax_main = fig.add_subplot(gs[0:2, 0:2])
scatter = ax_main.scatter(urban_typology['higher_ed_pct'], urban_typology['public_transit_pct'], 
                         c=urban_typology['apartment_5plus_pct'], s=200, alpha=0.8, 
                         cmap='viridis', edgecolors='white', linewidth=2)

# Add city labels
for i, row in urban_typology.iterrows():
    ax_main.annotate(row['City'], (row['higher_ed_pct'], row['public_transit_pct']),
                    xytext=(8, 8), textcoords='offset points', fontsize=11, 
                    fontweight='bold', alpha=0.9)

ax_main.set_xlabel('Higher Education Rate (%)', fontsize=12)
ax_main.set_ylabel('Public Transit Usage (%)', fontsize=12)
ax_main.set_title('Canadian Metropolitan Areas: Education, Transit, and Housing Density', 
                 fontsize=14, fontweight='bold')
ax_main.grid(alpha=0.3)

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax_main)
cbar.set_label('Apartment Share (%)', fontsize=11)

# Urban modernity ranking
ax_rank = fig.add_subplot(gs[0:2, 2])
ranking = urban_typology.sort_values('urban_modernity_index', ascending=True)
colors = plt.cm.RdYlGn(np.linspace(0.2, 0.8, len(ranking)))

bars = ax_rank.barh(range(len(ranking)), ranking['urban_modernity_index'], color=colors, alpha=0.8)
ax_rank.set_yticks(range(len(ranking)))
ax_rank.set_yticklabels(ranking['City'], fontsize=11)
ax_rank.set_xlabel('Urban Modernity Index', fontsize=11)
ax_rank.set_title('Urban Modernity Ranking\n(Education + Density + Transit)', fontsize=12, fontweight='bold')
ax_rank.grid(axis='x', alpha=0.3)
ax_rank.axvline(x=0, color='black', linestyle='-', alpha=0.5)

# Regional comparison radar chart
ax_radar = fig.add_subplot(gs[0:2, 3], projection='polar')

# Select top 5 cities for radar
top_cities = ranking.tail(5)
categories = ['Education', 'Density', 'Transit']
angles = np.linspace(0, 2 * np.pi, len(categories), endpoint=False).tolist()
angles += angles[:1]  # Complete the circle

colors_radar = ['red', 'blue', 'green', 'purple', 'orange']
for i, (_, city) in enumerate(top_cities.iterrows()):
    values = [city['education_score'], city['density_score'], city['transit_score']]
    values += values[:1]  # Complete the circle
    
    ax_radar.plot(angles, values, 'o-', linewidth=2, label=city['City'], color=colors_radar[i])
    ax_radar.fill(angles, values, alpha=0.1, color=colors_radar[i])

ax_radar.set_xticks(angles[:-1])
ax_radar.set_xticklabels(categories, fontsize=11)
ax_radar.set_title('Top 5 Cities Profile\n(Standardized Scores)', fontsize=12, fontweight='bold', pad=20)
ax_radar.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
ax_radar.grid(True)

# Summary statistics table
ax_table = fig.add_subplot(gs[2, :])
ax_table.axis('off')

# Create summary statistics
summary_stats = {
    'Dimension': ['Higher Education Rate (%)', 'Apartment Share (%)', 'Public Transit Usage (%)', 
                 'Median Income ($)', 'Urban Modernity Index'],
    'Highest': [
        f"{urban_typology.loc[urban_typology['higher_ed_pct'].idxmax(), 'City']}: {urban_typology['higher_ed_pct'].max():.1f}%",
        f"{urban_typology.loc[urban_typology['apartment_5plus_pct'].idxmax(), 'City']}: {urban_typology['apartment_5plus_pct'].max():.1f}%",
        f"{urban_typology.loc[urban_typology['public_transit_pct'].idxmax(), 'City']}: {urban_typology['public_transit_pct'].max():.1f}%",
        f"{urban_typology.loc[urban_typology['median_income'].idxmax(), 'City']}: ${urban_typology['median_income'].max():,.0f}",
        f"{urban_typology.loc[urban_typology['urban_modernity_index'].idxmax(), 'City']}: {urban_typology['urban_modernity_index'].max():.2f}"
    ],
    'Lowest': [
        f"{urban_typology.loc[urban_typology['higher_ed_pct'].idxmin(), 'City']}: {urban_typology['higher_ed_pct'].min():.1f}%",
        f"{urban_typology.loc[urban_typology['apartment_5plus_pct'].idxmin(), 'City']}: {urban_typology['apartment_5plus_pct'].min():.1f}%",
        f"{urban_typology.loc[urban_typology['public_transit_pct'].idxmin(), 'City']}: {urban_typology['public_transit_pct'].min():.1f}%",
        f"{urban_typology.loc[urban_typology['median_income'].idxmin(), 'City']}: ${urban_typology['median_income'].min():,.0f}",
        f"{urban_typology.loc[urban_typology['urban_modernity_index'].idxmin(), 'City']}: {urban_typology['urban_modernity_index'].min():.2f}"
    ]
}

summary_df = pd.DataFrame(summary_stats)
table = ax_table.table(cellText=summary_df.values, colLabels=summary_df.columns,
                      cellLoc='center', loc='center', bbox=[0, 0, 1, 1])
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 2)

# Style the table
for i in range(len(summary_df.columns)):
    table[(0, i)].set_facecolor('#4CAF50')
    table[(0, i)].set_text_props(weight='bold', color='white')

plt.suptitle('The Geography of Economic Opportunity: Canada\'s Urban Transformation 2016-2021', 
             fontsize=18, fontweight='bold', y=0.98)

plt.show()

# Print final insights
print("\n" + "=" * 80)
print("FINAL INSIGHTS: Canada's Urban Transformation")
print("=" * 80)

print(f"\nMost 'Modern' Urban Area: {ranking.iloc[-1]['City']} (Index: {ranking.iloc[-1]['urban_modernity_index']:.2f})")
print(f"Most 'Traditional' Urban Area: {ranking.iloc[0]['City']} (Index: {ranking.iloc[0]['urban_modernity_index']:.2f})")

print("\nKey Patterns Identified:")
print("1. Knowledge economy centers (Toronto, Vancouver, Ottawa) show high education-transit correlation")
print("2. Western resource cities maintain lower density and higher car dependence")
print("3. Montreal combines European-style density with North American car culture")
print("4. Smaller cities show diverse patterns reflecting local economic structures")

print("\nPolicy Implications:")
print("- Transit investment correlates with educational and economic outcomes")
print("- Housing densification varies significantly with economic structure")
print("- Regional economic diversification may require urban form transformation")

## Conclusions

This analysis reveals three key insights about Canada's evolving urban geography:

### The Education-Economy Nexus is Place-Dependent

Our first hypothesis is confirmed: the economic returns to higher education vary significantly across metropolitan contexts. Knowledge economy centers like Toronto, Vancouver, and Ottawa show the strongest education-income relationships, while resource-dependent cities often maintain high incomes despite lower educational attainment rates. This suggests that Canada's economic geography remains fundamentally shaped by natural resource endowments, even as the knowledge economy grows.

### Housing Markets Reflect Economic Dynamism

The second hypothesis finds mixed support. While growing metropolitan areas do show increased housing density, the relationship is complex. Vancouver and Toronto's high apartment shares reflect both economic growth and geographic constraints, while Montreal's density patterns reflect different historical development. Notably, some of the fastest-growing areas (like Calgary) have actually seen decreased density, suggesting that Canadian urban growth often follows suburban rather than intensification patterns.

### Transportation Patterns Reveal Urban Maturity

Our third hypothesis is strongly supported: transportation patterns clearly differentiate between different types of urban economies. Montreal leads in public transit usage, followed by Toronto and Vancouver, while western resource cities remain highly car-dependent. The correlation between density and transit usage suggests that urban form and transportation infrastructure co-evolve, with important implications for climate policy and urban planning.

### Implications for Policy and Planning

These findings suggest that Canada's major metropolitan areas are diverging rather than converging in their urban characteristics. Knowledge economy centers are becoming more dense, transit-oriented, and education-focused, while resource-dependent cities maintain more traditional suburban, car-oriented patterns. This divergence has important implications for federal and provincial policies around infrastructure investment, climate action, and economic development.

The COVID-19 pandemic, which occurred partially within our study period, may accelerate some of these trends while disrupting others. Future research should examine how remote work and changing commuting patterns affect these urban transformation processes.

---

*This analysis demonstrates the power of Census data for understanding complex urban transformations. By combining demographic, economic, and infrastructure data, we can move beyond anecdotal accounts to identify the underlying patterns that shape Canada's urban future.*