# Housing Density and Demographics Analysis: Vancouver CMA

## Research Question
**How does housing density (apartments vs single-family homes) relate to demographic patterns across Vancouver's Census Metropolitan Area?**

### Objectives:
1. Map housing types across Vancouver CMA Census Subdivisions
2. Analyze population density patterns
3. Examine age demographics in high-density vs low-density areas
4. Compare 2016 vs 2021 trends
5. Visualize findings with maps and charts

### Data Sources:
- **pycancensus**: Canadian Census data via CensusMapper API
- **Census years**: 2016 (CA16) and 2021 (CA21)
- **Geographic level**: Census Subdivisions (CSD) within Vancouver CMA
- **Variables**: Housing types, population, age groups

## Setup and Data Loading

In [ ]:
# Import required libraries
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Import pycancensus
%load_ext autoreload
%autoreload 2
import pycancensus as pc

print("📊 Libraries loaded successfully!")
print(f"🔑 API key status: {'✅ Set' if pc.get_api_key() else '❌ Not set'}")

# Note: Install pycancensus with: pip install git+https://github.com/dshkol/pycancensus.git

## 1. Data Collection

Let's start by exploring available datasets and collecting housing and demographic data for Vancouver CMA.

In [2]:
# Check available datasets
print("📋 Available Census datasets:")
datasets = pc.list_census_datasets()
census_datasets = datasets[datasets['dataset'].str.contains('CA\d{2}$', regex=True)]
print(census_datasets[['dataset', 'description']].to_string(index=False))

# We'll focus on 2016 and 2021 Census
VANCOUVER_CMA = '59933'  # Vancouver CMA region code
print(f"\n🎯 Analyzing Vancouver CMA (region {VANCOUVER_CMA})")

📋 Available Census datasets:
Querying CensusMapper API for available datasets...
Retrieved 29 datasets
dataset                description
   CA01         2001 Canada Census
   CA06         2006 Canada Census
   CA11 2011 Canada Census and NHS
   CA16         2016 Canada Census
   CA21         2021 Canada Census

🎯 Analyzing Vancouver CMA (region 59933)


In [ ]:
# Use the new vector hierarchy functions to discover relevant variables
print("🔍 Using NEW hierarchy functions to discover housing and demographic variables...")

# Search for housing-related variables using enhanced search
housing_search = pc.find_census_vectors("CA21", "dwelling")
print(f"Found {len(housing_search)} dwelling-related variables")

# Search for demographic variables  
demo_search = pc.search_census_vectors("population", "CA21")
age_search = pc.search_census_vectors("age", "CA21")

print(f"Found {len(demo_search)} population variables")
print(f"Found {len(age_search)} age-related variables")

# Define our analysis vectors for 2021 Census
housing_vectors_2021 = [
    'v_CA21_4868',  # Total private dwellings by structural type
    'v_CA21_4869',  # Single-detached house
    'v_CA21_4872',  # Apartment building, five or more storeys
    'v_CA21_4870',  # Semi-detached house  
    'v_CA21_4871',  # Row house
    'v_CA21_4873',  # Apartment building, fewer than five storeys
]

demographic_vectors_2021 = [
    'v_CA21_1',     # Total population
    'v_CA21_8',     # Population aged 0-14
    'v_CA21_23',    # Population aged 15-64  
    'v_CA21_31',    # Population aged 65+
]

all_vectors_2021 = housing_vectors_2021 + demographic_vectors_2021

print(f"📊 Selected {len(all_vectors_2021)} variables for analysis:")
print("Housing variables:", len(housing_vectors_2021))
print("Demographic variables:", len(demographic_vectors_2021))

In [ ]:
# Collect 2021 Census data with geography (UPDATED to use CA21)
print("🔄 Fetching 2021 Census data with geography...")
vancouver_2021 = pc.get_census(
    dataset='CA21',  # Updated to 2021 Census
    regions={'CMA': VANCOUVER_CMA},
    vectors=all_vectors_2021,
    level='CSD',  # Census Subdivision level
    geo_format='geopandas'
)

print(f"✅ 2021 data loaded: {vancouver_2021.shape[0]} regions, {vancouver_2021.shape[1]} columns")
print(f"📍 Geographic data: {vancouver_2021.crs if vancouver_2021.crs else 'No CRS specified'}")
print(f"🏘️  Sample regions: {vancouver_2021['Region Name'].head(3).tolist()}")

# Show available columns
print(f"\n📋 Available data columns:")
for col in vancouver_2021.columns:
    print(f"  {col}")

In [5]:
# Collect 2021 Census data for comparison
print("🔄 Fetching 2021 Census data...")

# 2021 vectors (may have different codes)
basic_vectors_2021 = [
    'v_CA21_1',    # Total population 2021
    'v_CA21_408',  # Try housing data (codes might be different)
]

try:
    vancouver_2021 = pc.get_census(
        dataset='CA21',
        regions={'CMA': VANCOUVER_CMA},
        vectors=basic_vectors_2021,
        level='CSD',
        geo_format='geopandas'
    )
    print(f"✅ 2021 data loaded: {vancouver_2021.shape[0]} regions, {vancouver_2021.shape[1]} columns")
    has_2021_data = True
except Exception as e:
    print(f"⚠️  2021 data collection issue: {e}")
    print("📊 Continuing with 2016 data analysis...")
    has_2021_data = False

🔄 Fetching 2021 Census data...
Querying CensusMapper API...
Retrieved data for 38 regions
✅ 2021 data loaded: 38 regions, 15 columns


## 2. Data Exploration and Cleaning

In [6]:
# Explore the 2016 dataset structure
print("📋 Dataset Overview - Vancouver CMA 2016")
print(f"Shape: {vancouver_2016.shape}")
print(f"\n📊 Column types:")
print(vancouver_2016.dtypes)

print(f"\n🏘️  Region summary:")
print(f"Total regions: {len(vancouver_2016)}")
print(f"Total population: {vancouver_2016['pop'].sum():,}")
print(f"Total area: {vancouver_2016['a'].sum():.1f} sq km")

📋 Dataset Overview - Vancouver CMA 2016
Shape: (39, 12)

📊 Column types:
geometry    geometry
a             object
t             object
dw            object
hh            object
id            object
pop           object
name          object
pop2          object
rgid          object
rpid          object
ruid          object
dtype: object

🏘️  Region summary:
Total regions: 39


ValueError: Cannot specify ',' with 's'.

In [None]:
# Clean and prepare the data for analysis
df = vancouver_2016.copy()

# Create more readable column names
column_mapping = {
    'name': 'Region_Name',
    'pop': 'Population_2016',
    'a': 'Area_sqkm',
    'dw': 'Dwellings',
    'hh': 'Households'
}

# Add readable names for vector columns (these will have descriptive labels)
vector_columns = [col for col in df.columns if col.startswith('v_CA16_')]
print(f"\n📊 Found {len(vector_columns)} data variables:")
for col in vector_columns:
    print(f"  {col}: {col}")

df = df.rename(columns=column_mapping)
print(f"\n✅ Data cleaned and prepared for analysis")

In [None]:
# Calculate key housing metrics
print("🏠 Calculating housing density metrics...")

# Find the housing columns by their descriptive text
housing_cols = {}
for col in df.columns:
    if 'Single-detached house' in str(col):
        housing_cols['single_detached'] = col
    elif 'five or more storeys' in str(col):
        housing_cols['high_rise'] = col
    elif 'structural type of dwelling data' in str(col):
        housing_cols['total_dwellings'] = col
    elif 'fewer than five storeys' in str(col):
        housing_cols['low_rise'] = col
    elif 'Row house' in str(col):
        housing_cols['row_house'] = col

print(f"Found housing columns: {list(housing_cols.keys())}")

# Calculate housing ratios
if 'single_detached' in housing_cols and 'total_dwellings' in housing_cols:
    df['single_family_ratio'] = df[housing_cols['single_detached']] / df[housing_cols['total_dwellings']]
    
if 'high_rise' in housing_cols and 'total_dwellings' in housing_cols:
    df['high_rise_ratio'] = df[housing_cols['high_rise']] / df[housing_cols['total_dwellings']]

# Calculate population density
df['pop_density'] = df['Population_2016'] / df['Area_sqkm']

# Handle infinite values (division by zero)
df = df.replace([np.inf, -np.inf], np.nan)

print("✅ Housing metrics calculated")
print(f"📊 Average population density: {df['pop_density'].mean():.1f} people/sq km")

## 3. Housing Type Analysis

In [None]:
# Analyze housing composition across Vancouver CMA
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Housing Analysis - Vancouver CMA (2016 Census)', fontsize=16, fontweight='bold')

# 1. Population density distribution
axes[0,0].hist(df['pop_density'].dropna(), bins=20, alpha=0.7, color='skyblue', edgecolor='black')
axes[0,0].set_xlabel('Population Density (people/sq km)')
axes[0,0].set_ylabel('Number of Regions')
axes[0,0].set_title('Population Density Distribution')
axes[0,0].axvline(df['pop_density'].median(), color='red', linestyle='--', label=f'Median: {df["pop_density"].median():.0f}')
axes[0,0].legend()

# 2. Single-family vs High-rise ratio
if 'single_family_ratio' in df.columns:
    axes[0,1].scatter(df['single_family_ratio'], df['pop_density'], alpha=0.6, s=60)
    axes[0,1].set_xlabel('Single-Family Home Ratio')
    axes[0,1].set_ylabel('Population Density')
    axes[0,1].set_title('Single-Family Ratio vs Population Density')
    
    # Add correlation
    corr = df[['single_family_ratio', 'pop_density']].corr().iloc[0,1]
    axes[0,1].text(0.05, 0.95, f'Correlation: {corr:.3f}', transform=axes[0,1].transAxes, 
                   bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

# 3. Top 10 most populous regions
top_regions = df.nlargest(10, 'Population_2016')
axes[1,0].barh(range(len(top_regions)), top_regions['Population_2016'])
axes[1,0].set_yticks(range(len(top_regions)))
axes[1,0].set_yticklabels(top_regions['Region_Name'], fontsize=9)
axes[1,0].set_xlabel('Population')
axes[1,0].set_title('Top 10 Most Populous Regions')

# 4. Population vs Area
axes[1,1].scatter(df['Area_sqkm'], df['Population_2016'], alpha=0.6, s=60, color='green')
axes[1,1].set_xlabel('Area (sq km)')
axes[1,1].set_ylabel('Population')
axes[1,1].set_title('Population vs Area')

plt.tight_layout()
plt.show()

# Print summary statistics
print("\n📊 Summary Statistics:")
print(f"Total CMA Population: {df['Population_2016'].sum():,}")
print(f"Total CMA Area: {df['Area_sqkm'].sum():.1f} sq km")
print(f"Average Population Density: {df['pop_density'].mean():.1f} people/sq km")
print(f"Most Dense Region: {df.loc[df['pop_density'].idxmax(), 'Region_Name']} ({df['pop_density'].max():.0f} people/sq km)")
print(f"Least Dense Region: {df.loc[df['pop_density'].idxmin(), 'Region_Name']} ({df['pop_density'].min():.0f} people/sq km)")

## 4. Interactive Geographic Visualization

In [None]:
# Create an interactive map showing population density
print("🗺️  Creating interactive population density map...")

# Ensure we have coordinate system for mapping
if df.crs is None:
    df = df.set_crs('EPSG:4326')  # Assume WGS84 if no CRS

# Convert to geographic coordinates for web mapping
df_map = df.to_crs('EPSG:4326')

# Create the choropleth map
fig = px.choropleth_mapbox(
    df_map,
    geojson=df_map.geometry.__geo_interface__,
    locations=df_map.index,
    color='pop_density',
    hover_name='Region_Name',
    hover_data={
        'Population_2016': ':,',
        'Area_sqkm': ':.1f',
        'pop_density': ':.1f'
    },
    color_continuous_scale='Viridis',
    mapbox_style='open-street-map',
    zoom=9,
    center={'lat': df_map.geometry.centroid.y.mean(), 'lon': df_map.geometry.centroid.x.mean()},
    title='Population Density - Vancouver CMA (2016)',
    labels={'pop_density': 'People per sq km'}
)

fig.update_layout(height=600)
fig.show()

print("✅ Interactive map created!")

In [None]:
# Create housing type visualization
if 'single_family_ratio' in df.columns and 'high_rise_ratio' in df.columns:
    print("🏠 Creating housing type analysis...")
    
    # Create housing type categories
    df['housing_category'] = 'Mixed'
    df.loc[df['single_family_ratio'] > 0.7, 'housing_category'] = 'Suburban (70%+ Single-Family)'
    df.loc[df['high_rise_ratio'] > 0.3, 'housing_category'] = 'High-Density (30%+ High-Rise)'
    df.loc[(df['single_family_ratio'] < 0.3) & (df['high_rise_ratio'] < 0.3), 'housing_category'] = 'Mixed Medium-Density'
    
    # Housing category map
    fig2 = px.choropleth_mapbox(
        df_map,
        geojson=df_map.geometry.__geo_interface__,
        locations=df_map.index,
        color='housing_category',
        hover_name='Region_Name',
        hover_data={
            'single_family_ratio': ':.2f',
            'high_rise_ratio': ':.2f',
            'Population_2016': ':,'
        },
        color_discrete_map={
            'Suburban (70%+ Single-Family)': '#2E8B57',
            'High-Density (30%+ High-Rise)': '#FF6347',
            'Mixed Medium-Density': '#4682B4',
            'Mixed': '#DDA0DD'
        },
        mapbox_style='open-street-map',
        zoom=9,
        center={'lat': df_map.geometry.centroid.y.mean(), 'lon': df_map.geometry.centroid.x.mean()},
        title='Housing Type Categories - Vancouver CMA (2016)'
    )
    
    fig2.update_layout(height=600)
    fig2.show()
    
    # Print housing category summary
    print("\n🏘️  Housing Category Summary:")
    category_summary = df.groupby('housing_category').agg({
        'Region_Name': 'count',
        'Population_2016': 'sum',
        'single_family_ratio': 'mean',
        'high_rise_ratio': 'mean'
    }).round(3)
    category_summary.columns = ['Regions', 'Total Population', 'Avg Single-Family %', 'Avg High-Rise %']
    print(category_summary)

## 5. Demographic Analysis

In [None]:
# Analyze age demographics by housing density
print("👥 Analyzing age demographics by housing type...")

# Find age-related columns
age_cols = {}
for col in df.columns:
    if 'aged 0' in str(col) or '0-14' in str(col):
        age_cols['youth'] = col
    elif 'aged 15' in str(col) or '15-64' in str(col):
        age_cols['working_age'] = col
    elif 'aged 65' in str(col) or '65+' in str(col):
        age_cols['seniors'] = col

print(f"Found age columns: {list(age_cols.keys())}")

if age_cols:
    # Calculate age ratios
    for age_group, col in age_cols.items():
        df[f'{age_group}_ratio'] = df[col] / df['Population_2016']
    
    # Create age analysis visualization
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Age Demographics Analysis - Vancouver CMA (2016)', fontsize=16, fontweight='bold')
    
    # Age distribution by housing category
    if 'housing_category' in df.columns:
        age_by_housing = df.groupby('housing_category')[list(age_cols.values())].sum()
        age_by_housing.plot(kind='bar', ax=axes[0,0], stacked=True)
        axes[0,0].set_title('Age Distribution by Housing Category')
        axes[0,0].set_xlabel('Housing Category')
        axes[0,0].set_ylabel('Population')
        axes[0,0].legend(['Youth (0-14)', 'Working Age (15-64)', 'Seniors (65+)'])
        axes[0,0].tick_params(axis='x', rotation=45)
    
    # Youth ratio vs population density
    if 'youth_ratio' in df.columns:
        axes[0,1].scatter(df['pop_density'], df['youth_ratio'], alpha=0.6, color='orange')
        axes[0,1].set_xlabel('Population Density')
        axes[0,1].set_ylabel('Youth Ratio (0-14 years)')
        axes[0,1].set_title('Youth Population vs Density')
    
    # Seniors ratio vs single-family ratio
    if 'seniors_ratio' in df.columns and 'single_family_ratio' in df.columns:
        axes[1,0].scatter(df['single_family_ratio'], df['seniors_ratio'], alpha=0.6, color='purple')
        axes[1,0].set_xlabel('Single-Family Home Ratio')
        axes[1,0].set_ylabel('Seniors Ratio (65+ years)')
        axes[1,0].set_title('Seniors Population vs Single-Family Housing')
    
    # Age ratios distribution
    age_ratio_cols = [col for col in df.columns if col.endswith('_ratio') and any(age in col for age in ['youth', 'working', 'seniors'])]
    if age_ratio_cols:
        df[age_ratio_cols].boxplot(ax=axes[1,1])
        axes[1,1].set_title('Age Ratio Distributions')
        axes[1,1].set_ylabel('Ratio of Total Population')
        axes[1,1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()
    
    # Calculate correlations
    print("\n📊 Demographic Correlations:")
    corr_vars = ['pop_density'] + age_ratio_cols
    if 'single_family_ratio' in df.columns:
        corr_vars.append('single_family_ratio')
    
    correlations = df[corr_vars].corr()
    print(correlations.round(3))

## 6. Statistical Analysis

In [None]:
# Perform statistical tests and modeling
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

print("📈 Performing statistical analysis...")

# 1. Correlation analysis
if 'single_family_ratio' in df.columns and 'pop_density' in df.columns:
    corr_coef, p_value = stats.pearsonr(df['single_family_ratio'].dropna(), 
                                        df['pop_density'].dropna())
    print(f"\n🔗 Single-Family Housing vs Population Density:")
    print(f"   Correlation coefficient: {corr_coef:.3f}")
    print(f"   P-value: {p_value:.6f}")
    print(f"   Significance: {'Significant' if p_value < 0.05 else 'Not significant'} at α=0.05")

# 2. Cluster analysis - identify similar regions
clustering_vars = ['pop_density', 'Population_2016']
if 'single_family_ratio' in df.columns:
    clustering_vars.append('single_family_ratio')
if 'high_rise_ratio' in df.columns:
    clustering_vars.append('high_rise_ratio')

# Prepare data for clustering
cluster_data = df[clustering_vars].dropna()
scaler = StandardScaler()
scaled_data = scaler.fit_transform(cluster_data)

# Perform k-means clustering
n_clusters = 4
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
cluster_labels = kmeans.fit_predict(scaled_data)

# Add cluster labels to dataframe
df.loc[cluster_data.index, 'cluster'] = cluster_labels

print(f"\n🎯 K-means clustering (k={n_clusters}):")
cluster_summary = df.groupby('cluster')[clustering_vars].mean()
print(cluster_summary.round(2))

# Visualize clusters
fig, ax = plt.subplots(figsize=(10, 8))
scatter = ax.scatter(df['pop_density'], df['Population_2016'], 
                    c=df['cluster'], cmap='viridis', alpha=0.7, s=60)
ax.set_xlabel('Population Density (people/sq km)')
ax.set_ylabel('Total Population')
ax.set_title('Regional Clusters - Vancouver CMA')
plt.colorbar(scatter, label='Cluster')

# Add cluster centroids
centroids = scaler.inverse_transform(kmeans.cluster_centers_)
for i, centroid in enumerate(centroids):
    ax.scatter(centroid[0], centroid[1], marker='x', s=200, c='red', linewidth=3)
    ax.annotate(f'C{i}', (centroid[0], centroid[1]), xytext=(5, 5), 
               textcoords='offset points', fontweight='bold', color='red')

plt.show()

print("\n📊 Cluster Interpretation:")
for i in range(n_clusters):
    cluster_regions = df[df['cluster'] == i]
    avg_density = cluster_regions['pop_density'].mean()
    avg_pop = cluster_regions['Population_2016'].mean()
    region_count = len(cluster_regions)
    
    print(f"   Cluster {i}: {region_count} regions")
    print(f"      Avg density: {avg_density:.0f} people/sq km")
    print(f"      Avg population: {avg_pop:,.0f}")
    
    if region_count > 0:
        sample_regions = cluster_regions['Region_Name'].head(3).tolist()
        print(f"      Examples: {', '.join(sample_regions)}")
    print()

## 7. Comparative Analysis (2016 vs 2021)

In [None]:
# Compare 2016 vs 2021 data if available
if has_2021_data:
    print("📊 Comparing 2016 vs 2021 Census data...")
    
    # Merge 2016 and 2021 data
    # First, let's align the datasets by region name or geographic identifier
    
    # Prepare 2021 data
    df_2021 = vancouver_2021.copy()
    df_2021 = df_2021.rename(columns={'pop': 'Population_2021', 'name': 'Region_Name'})
    
    # Create comparison dataset
    comparison_cols = ['Region_Name', 'Population_2016']
    df_comparison = df[comparison_cols].merge(
        df_2021[['Region_Name', 'Population_2021']], 
        on='Region_Name', 
        how='inner'
    )
    
    # Calculate population change
    df_comparison['pop_change'] = df_comparison['Population_2021'] - df_comparison['Population_2016']
    df_comparison['pop_change_pct'] = (df_comparison['pop_change'] / df_comparison['Population_2016']) * 100
    
    print(f"✅ Matched {len(df_comparison)} regions between 2016 and 2021")
    
    # Visualize population changes
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Population change distribution
    axes[0].hist(df_comparison['pop_change_pct'], bins=15, alpha=0.7, color='lightblue', edgecolor='black')
    axes[0].set_xlabel('Population Change (%)')
    axes[0].set_ylabel('Number of Regions')
    axes[0].set_title('Population Change Distribution (2016-2021)')
    axes[0].axvline(df_comparison['pop_change_pct'].median(), color='red', linestyle='--', 
                   label=f'Median: {df_comparison["pop_change_pct"].median():.1f}%')
    axes[0].legend()
    
    # 2016 vs 2021 scatter plot
    axes[1].scatter(df_comparison['Population_2016'], df_comparison['Population_2021'], alpha=0.7)
    axes[1].plot([df_comparison['Population_2016'].min(), df_comparison['Population_2016'].max()],
                [df_comparison['Population_2016'].min(), df_comparison['Population_2016'].max()],
                'r--', label='No Change Line')
    axes[1].set_xlabel('2016 Population')
    axes[1].set_ylabel('2021 Population')
    axes[1].set_title('Population: 2016 vs 2021')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()
    
    # Top growing and declining regions
    print("\n🔝 Top 5 Fastest Growing Regions (2016-2021):")
    fastest_growing = df_comparison.nlargest(5, 'pop_change_pct')
    for _, row in fastest_growing.iterrows():
        print(f"   {row['Region_Name']}: +{row['pop_change_pct']:.1f}% ({row['pop_change']:,} people)")
    
    print("\n🔻 Top 5 Fastest Declining Regions (2016-2021):")
    fastest_declining = df_comparison.nsmallest(5, 'pop_change_pct')
    for _, row in fastest_declining.iterrows():
        print(f"   {row['Region_Name']}: {row['pop_change_pct']:.1f}% ({row['pop_change']:,} people)")
    
    # Overall statistics
    total_change = df_comparison['pop_change'].sum()
    total_2016 = df_comparison['Population_2016'].sum()
    overall_change_pct = (total_change / total_2016) * 100
    
    print(f"\n📈 Overall Vancouver CMA Change (2016-2021):")
    print(f"   Total population change: {total_change:,} people")
    print(f"   Overall growth rate: {overall_change_pct:.1f}%")
    print(f"   Average annual growth: {overall_change_pct/5:.1f}%")
    
else:
    print("⚠️  2021 data not available for comparison")
    print("📊 Analysis focused on 2016 Census data")

## 8. Key Findings and Conclusions

In [None]:
# Summarize key findings
print("🎯 KEY FINDINGS: Housing and Demographics in Vancouver CMA")
print("=" * 60)

# Population and density insights
total_pop = df['Population_2016'].sum()
total_area = df['Area_sqkm'].sum()
avg_density = df['pop_density'].mean()
median_density = df['pop_density'].median()

print(f"\n📊 POPULATION OVERVIEW:")
print(f"   • Total CMA Population (2016): {total_pop:,}")
print(f"   • Total CMA Area: {total_area:.1f} sq km")
print(f"   • Average Population Density: {avg_density:.0f} people/sq km")
print(f"   • Median Population Density: {median_density:.0f} people/sq km")
print(f"   • Number of Census Subdivisions: {len(df)}")

# Housing insights
if 'single_family_ratio' in df.columns and 'high_rise_ratio' in df.columns:
    avg_sf_ratio = df['single_family_ratio'].mean()
    avg_hr_ratio = df['high_rise_ratio'].mean()
    
    print(f"\n🏠 HOUSING PATTERNS:")
    print(f"   • Average Single-Family Home Ratio: {avg_sf_ratio:.1%}")
    print(f"   • Average High-Rise Apartment Ratio: {avg_hr_ratio:.1%}")
    
    # Correlation insights
    if 'pop_density' in df.columns:
        sf_density_corr = df[['single_family_ratio', 'pop_density']].corr().iloc[0,1]
        print(f"   • Single-Family vs Density Correlation: {sf_density_corr:.3f}")
        
        if sf_density_corr < -0.3:
            print(f"     → Strong negative correlation: More single-family = Lower density")
        elif sf_density_corr > 0.3:
            print(f"     → Strong positive correlation: More single-family = Higher density")
        else:
            print(f"     → Weak correlation")

# Housing category insights
if 'housing_category' in df.columns:
    print(f"\n🏘️  HOUSING CATEGORIES:")
    category_counts = df['housing_category'].value_counts()
    for category, count in category_counts.items():
        pct = (count / len(df)) * 100
        print(f"   • {category}: {count} regions ({pct:.1f}%)")

# Demographic insights
age_ratio_cols = [col for col in df.columns if col.endswith('_ratio') and any(age in col for age in ['youth', 'working', 'seniors'])]
if age_ratio_cols:
    print(f"\n👥 DEMOGRAPHIC PATTERNS:")
    for col in age_ratio_cols:
        avg_ratio = df[col].mean()
        age_group = col.replace('_ratio', '').replace('_', ' ').title()
        print(f"   • Average {age_group} Ratio: {avg_ratio:.1%}")

# Most interesting regions
print(f"\n🌟 NOTABLE REGIONS:")
densest = df.loc[df['pop_density'].idxmax()]
largest = df.loc[df['Population_2016'].idxmax()]
print(f"   • Most Dense: {densest['Region_Name']} ({densest['pop_density']:.0f} people/sq km)")
print(f"   • Most Populous: {largest['Region_Name']} ({largest['Population_2016']:,} people)")

if 'single_family_ratio' in df.columns:
    most_suburban = df.loc[df['single_family_ratio'].idxmax()]
    print(f"   • Most Suburban: {most_suburban['Region_Name']} ({most_suburban['single_family_ratio']:.1%} single-family)")

print(f"\n💡 POLICY IMPLICATIONS:")
print(f"   • Housing density varies significantly across the CMA")
print(f"   • Urban planning should consider demographic-housing relationships")
print(f"   • Transit and infrastructure needs differ by housing type")
print(f"   • Age demographics correlate with housing preferences")

print(f"\n📈 METHODOLOGY:")
print(f"   • Data source: Statistics Canada via CensusMapper API")
print(f"   • Geographic level: Census Subdivisions (CSD)")
print(f"   • Analysis includes: Population, housing types, age demographics")
print(f"   • Visualization: Interactive maps and statistical charts")

print(f"\n✅ Analysis complete! This notebook demonstrates the power of pycancensus")
print(f"   for Canadian census data analysis and visualization.")

## Appendix: Data Dictionary

### Geographic Variables
- **Region_Name**: Census subdivision name
- **Population_2016**: Total population from 2016 Census
- **Area_sqkm**: Area in square kilometers
- **pop_density**: Population per square kilometer

### Housing Variables
- **single_family_ratio**: Proportion of single-detached houses
- **high_rise_ratio**: Proportion of apartments in buildings 5+ storeys
- **housing_category**: Classification based on housing mix

### Demographic Variables
- **youth_ratio**: Proportion aged 0-14 years
- **working_age_ratio**: Proportion aged 15-64 years  
- **seniors_ratio**: Proportion aged 65+ years

### Analysis Variables
- **cluster**: K-means cluster assignment
- **pop_change**: Population change 2016-2021 (if available)
- **pop_change_pct**: Percentage population change 2016-2021

---

**About pycancensus**: This analysis was powered by the pycancensus Python package, which provides convenient access to Canadian Census data through the CensusMapper API. The package enables researchers, analysts, and policymakers to easily explore demographic and housing patterns across Canada.

**Next Steps**: This analysis could be extended to include:
- Income and employment data
- Transportation and commuting patterns
- Language and immigration demographics
- Comparison with other Canadian metropolitan areas
- Time series analysis across multiple census years