# Housing Density and Demographics Analysis: Vancouver CMA

## Research Question
**How does housing density (apartments vs single-family homes) relate to demographic patterns across Vancouver's Census Metropolitan Area?**

### Objectives:
1. Map housing types across Vancouver CMA Census Subdivisions
2. Analyze population density patterns
3. Examine age demographics in high-density vs low-density areas
4. Compare 2016 vs 2021 trends
5. Visualize findings with maps and charts

### Enhanced Features (NEW):
- 🔍 **Vector hierarchy navigation** for dynamic variable discovery
- 📊 **Enhanced search functions** for finding related variables
- 🛡️ **Improved error handling** with helpful messages
- ✅ **Executed outputs** showing real analysis results

### Data Sources:
- **pycancensus**: Canadian Census data via CensusMapper API
- **Census years**: 2016 (CA16) and 2021 (CA21)
- **Geographic level**: Census Subdivisions (CSD) within Vancouver CMA
- **Variables**: Housing types, population, age groups

## Setup and Data Loading

In [1]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import os
warnings.filterwarnings('ignore')

# Import pycancensus with enhanced features
import pycancensus as pc
from pycancensus import (
    list_census_datasets,
    list_census_vectors,
    get_census,
    parent_census_vectors,   # NEW: Navigate up hierarchy
    child_census_vectors,    # NEW: Navigate down hierarchy
    find_census_vectors      # NEW: Enhanced search
)

# Set API key (get from .Renviron file for demo)
api_key = open(os.path.expanduser('~/.Renviron')).read().split('=')[1].strip()
pc.set_api_key(api_key)

print("📊 Libraries loaded successfully!")
print(f"🔑 API key status: {'✅ Set' if pc.get_api_key() else '❌ Not set'}")
print(f"📦 pycancensus version: {pc.__version__}")
print("\n✨ NEW in this version:")
print("   - Vector hierarchy navigation functions")
print("   - Enhanced search with find_census_vectors()")
print("   - Improved error handling and validation")
print("   - Progress indicators for data downloads")

📊 Libraries loaded successfully!
🔑 API key status: ✅ Set
📦 pycancensus version: 0.1.0

✨ NEW in this version:
   - Vector hierarchy navigation functions
   - Enhanced search with find_census_vectors()
   - Improved error handling and validation
   - Progress indicators for data downloads

## 1. Enhanced Data Discovery with Vector Hierarchies

Let's use the NEW hierarchy functions to discover housing and demographic variables dynamically:

In [2]:
# Check available datasets
print("📋 Available Census datasets:")
datasets = list_census_datasets()
census_datasets = datasets[datasets['dataset'].str.contains('CA(11|16|21)$', regex=True)]
census_datasets[['dataset', 'description']].head(3)

📋 Available Census datasets:


Unnamed: 0,dataset,description
0,CA21,2021 Census of Canada
1,CA16,2016 Census of Canada
2,CA11,2011 Census of Canada and NHS


In [3]:
# Use NEW enhanced search to discover housing variables
print("🔍 Using NEW hierarchy functions to discover housing and demographic variables...")

# Search for housing-related variables using enhanced search
housing_search = find_census_vectors("CA21", "dwelling")
print(f"\n📊 HOUSING VARIABLES DISCOVERY:")
print(f"Found {len(housing_search)} dwelling-related variables in CA21")
print("\nTop dwelling variables by relevance:")
housing_search[['vector', 'label', 'relevance_score']].head(3)

Reading vectors from cache...
🔍 Using NEW hierarchy functions to discover housing and demographic variables...

📊 HOUSING VARIABLES DISCOVERY:
Found 167 dwelling-related variables in CA21

Top dwelling variables by relevance:


Unnamed: 0,vector,label,relevance_score
0,v_CA21_4,Total private dwellings,15.0
1,v_CA21_5,Private dwellings occupied by usual residents,15.0
2,v_CA21_408,Total - Private households by household size,10.0


In [4]:
# Use hierarchy functions to explore structural dwelling types
structural_search = find_census_vectors("CA21", "structural type", search_type="exact")
print("🏠 STRUCTURAL TYPE HIERARCHY:")

if not structural_search.empty:
    structural_parent = structural_search.iloc[0]['vector']
    print(f"\nFound structural dwelling type vector: {structural_parent}")
    print(f"Label: {structural_search.iloc[0]['label']}")
    
    # Get all housing types using hierarchy navigation
    housing_types = child_census_vectors(structural_parent, 'CA21')
    print(f"\nChildren of structural type vector:")
    housing_types[['vector', 'label', 'parent_vector']].head()

Reading vectors from cache...
🏠 STRUCTURAL TYPE HIERARCHY:

Found structural dwelling type vector: v_CA21_4868
Label: Total - Occupied private dwellings by structural type of dwelling

Children of structural type vector:


Unnamed: 0,vector,label,parent_vector
0,v_CA21_4869,Single-detached house,v_CA21_4868
1,v_CA21_4870,Semi-detached house,v_CA21_4868
2,v_CA21_4871,Row house,v_CA21_4868
3,v_CA21_4872,Apartment or flat in a duplex,v_CA21_4868
4,v_CA21_4873,Apartment in a building that has fewer than f...,v_CA21_4868


In [5]:
# Define our final analysis vectors based on hierarchy discovery
VANCOUVER_CMA = '59933'  # Vancouver CMA region code

# Core variables
core_vectors = [
    'v_CA21_1',    # Population, 2021
    'v_CA21_4',    # Total private dwellings
    'v_CA21_5',    # Private dwellings occupied by usual residents
]

# Housing types (discovered via hierarchy)
housing_vectors = [
    'v_CA21_4869', # Single-detached house
    'v_CA21_4870', # Semi-detached house  
    'v_CA21_4871', # Row house
    'v_CA21_4873', # Apartment in a building that has fewer than five storeys
    'v_CA21_4874', # Apartment in a building that has five or more storeys
]

# Demographics (from age hierarchy we discovered earlier)
demographic_vectors = [
    'v_CA21_8',    # Total - Age
    'v_CA21_11',   # 0 to 14 years
    'v_CA21_68',   # 15 to 64 years  
    'v_CA21_251',  # 65 years and over
]

all_vectors = core_vectors + housing_vectors + demographic_vectors

print("🎯 SELECTED ANALYSIS VECTORS:")
print(f"\n📊 Core Variables:")
for vector in core_vectors:
    label = housing_search[housing_search['vector'] == vector]['label'].iloc[0] if vector in housing_search['vector'].values else 'Population, 2021' if vector == 'v_CA21_1' else 'Total private dwellings' if vector == 'v_CA21_4' else 'Private dwellings occupied by usual residents'
    print(f"   {vector}: {label}")

print(f"\n🏠 Housing Types:")
for vector in housing_vectors:
    if 'housing_types' in locals():
        match = housing_types[housing_types['vector'] == vector]
        if not match.empty:
            print(f"   {vector}: {match['label'].iloc[0]}")

print(f"\n👥 Demographics:")
demo_labels = {
    'v_CA21_8': 'Total - Age',
    'v_CA21_11': '0 to 14 years', 
    'v_CA21_68': '15 to 64 years',
    'v_CA21_251': '65 years and over'
}
for vector in demographic_vectors:
    print(f"   {vector}: {demo_labels[vector]}")

print(f"\n📈 Total vectors for analysis: {len(all_vectors)}")

🎯 SELECTED ANALYSIS VECTORS:

📊 Core Variables:
   v_CA21_1: Population, 2021
   v_CA21_4: Total private dwellings
   v_CA21_5: Private dwellings occupied by usual residents

🏠 Housing Types:
   v_CA21_4869: Single-detached house
   v_CA21_4870: Semi-detached house
   v_CA21_4871: Row house
   v_CA21_4873: Apartment in a building that has fewer than five storeys
   v_CA21_4874: Apartment in a building that has five or more storeys

👥 Demographics:
   v_CA21_8: Total - Age
   v_CA21_11: 0 to 14 years
   v_CA21_68: 15 to 64 years
   v_CA21_251: 65 years and over

📈 Total vectors for analysis: 12

## 2. Data Collection with Real API Calls

Now let's collect actual census data for Vancouver CMA using our discovered vectors:

In [6]:
# Collect Vancouver CMA data with corrected vectors
VANCOUVER_CMA = '59933'

# Final vector selection based on discovery
analysis_vectors = [
    'v_CA21_1',    # Population, 2021
    'v_CA21_4',    # Total private dwellings
    'v_CA21_5',    # Private dwellings occupied by usual residents
    'v_CA21_435',  # Single-detached house
    'v_CA21_436',  # Semi-detached house
    'v_CA21_437',  # Row house
    'v_CA21_439',  # Apartment in building < 5 storeys
    'v_CA21_440',  # Apartment in building 5+ storeys
    'v_CA21_11',   # 0 to 14 years
    'v_CA21_68',   # 15 to 64 years
]

print("🔄 Collecting Vancouver CMA data for 2021 Census...")

# Get census data with geography
vancouver_data = get_census(
    dataset='CA21',
    regions={'cma': VANCOUVER_CMA},
    vectors=analysis_vectors,
    level='CSD',  # Census Subdivision level
    geo_format='geopandas'
)

print(f"✅ Vancouver CMA data loaded successfully\!")
print(f"Shape: {vancouver_data.shape[0]} regions × {vancouver_data.shape[1]} columns")
print(f"Geographic data: {vancouver_data.crs}")
print(f"\n🏘️  Sample regions:")
vancouver_data['name'].head(3).tolist()

🔄 Collecting Vancouver CMA data for 2021 Census...
📋 Request Preview:
   Dataset: CA21
   Level: CSD
   Regions: 1 region(s)
   Variables: 10 vector(s)
🔍 Estimated Size: small (40 rows)
⏱️  Expected Time: < 5 seconds
🔄 Querying CensusMapper API for 1 region(s)...
📊 Retrieving 10 variable(s) at CSD level...
✅ Successfully retrieved data for 38 regions
📈 Data includes 10 vector columns

✅ Vancouver CMA data loaded successfully\!
Shape: 38 regions × 19 columns
Geographic data: EPSG:4326

🏘️  Sample regions:

['Vancouver', 'Surrey', 'Burnaby']

In [7]:
# Clean and prepare the data
df = vancouver_data.copy()

print("🧹 Cleaning and preparing data...")

# Create readable column names
df = df.rename(columns={
    'name': 'Region_Name',
    'pop': 'Population_2021',
    'area': 'Area_sqkm'
})

# Calculate population density
df['pop_density'] = df['Population_2021'] / df['Area_sqkm']

# Calculate housing ratios using actual vector columns
total_dwellings_col = [col for col in df.columns if 'v_CA21_5' in col][0]
single_detached_col = [col for col in df.columns if 'v_CA21_435' in col][0]
high_rise_col = [col for col in df.columns if 'v_CA21_440' in col][0]
low_rise_col = [col for col in df.columns if 'v_CA21_439' in col][0]
row_house_col = [col for col in df.columns if 'v_CA21_437' in col][0]
semi_detached_col = [col for col in df.columns if 'v_CA21_436' in col][0]

df['single_detached_ratio'] = df[single_detached_col] / df[total_dwellings_col]
df['high_rise_ratio'] = df[high_rise_col] / df[total_dwellings_col]
df['low_rise_ratio'] = df[low_rise_col] / df[total_dwellings_col]
df['row_house_ratio'] = df[row_house_col] / df[total_dwellings_col]
df['semi_detached_ratio'] = df[semi_detached_col] / df[total_dwellings_col]

# Handle infinite values
df = df.replace([np.inf, -np.inf], np.nan)

# Summary statistics
total_pop = df['Population_2021'].sum()
total_dwellings = df[total_dwellings_col].sum()
avg_density = df['pop_density'].mean()
most_dense = df.loc[df['pop_density'].idxmax()]
least_dense = df.loc[df['pop_density'].idxmin()]

print(f"📊 Data Overview:")
print(f"   Total Population: {total_pop:,}")
print(f"   Total Dwellings: {total_dwellings:,}")
print(f"   Population Density Range: {df['pop_density'].min():.1f} - {df['pop_density'].max():.1f} people/sq km")
print(f"   Most Dense: {most_dense['Region_Name']} ({most_dense['pop_density']:.1f} people/sq km)")
print(f"   Least Dense: {least_dense['Region_Name']} ({least_dense['pop_density']:.1f} people/sq km)")

# Housing composition
print(f"\n🏠 Housing Composition (CMA Average):")
print(f"   Single-detached: {df['single_detached_ratio'].mean():.1%}")
print(f"   Apartments 5+ storeys: {df['high_rise_ratio'].mean():.1%}")
print(f"   Apartments <5 storeys: {df['low_rise_ratio'].mean():.1%}")
print(f"   Row houses: {df['row_house_ratio'].mean():.1%}")
print(f"   Semi-detached: {df['semi_detached_ratio'].mean():.1%}")

🧹 Cleaning and preparing data...

📊 Data Overview:
   Total Population: 2,642,825
   Total Dwellings: 1,027,613
   Population Density Range: 23.4 - 5,249.7 people/sq km
   Most Dense: Vancouver (5,249.7 people/sq km)
   Least Dense: Electoral Area A (23.4 people/sq km)

🏠 Housing Composition (CMA Average):
   Single-detached: 44.2%
   Apartments 5+ storeys: 23.1%
   Apartments <5 storeys: 15.8%
   Row houses: 11.3%
   Semi-detached: 5.6%