# Getting Started with pycancensus

This notebook provides a quick introduction to using the enhanced pycancensus package for accessing Canadian Census data.

## What is pycancensus?

pycancensus is a Python package that provides easy access to Canadian Census data through the CensusMapper API. It's the Python equivalent of the popular R `cancensus` package, with **full R library equivalence**.

### Key Features:
- ✅ **Full R Library Equivalence** (verified through automated testing)
- 🔍 **Enhanced Variable Discovery** with hierarchy navigation
- 🛡️ **Production-Grade Error Handling** with helpful messages
- ⚡ **Progress Indicators** for large downloads
- 📊 **Improved Data Quality** with clean column processing
- 🗺️ **Geographic Data Integration** with GeoPandas
- 📅 **Multi-Year Support** (1996-2021)
- 🚀 **High Performance** with intelligent caching

## Installation

Install directly from GitHub (not yet on PyPI):
```bash
pip install git+https://github.com/dshkol/pycancensus.git
```

## Setup

In [None]:
# Import required libraries
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import pycancensus as pc

print("✅ Libraries imported successfully!")
print(f"📦 pycancensus version: {pc.__version__}")

## API Key Setup

You'll need a free API key from CensusMapper to access the data.

In [None]:
# Set your API key (get one free at https://censusmapper.ca/users/sign_up)
# pc.set_api_key("your_api_key_here")

# Check if API key is set
if pc.get_api_key():
    print("✅ API key is configured")
else:
    print("⚠️  Please set your API key with: pc.set_api_key('your_key')")
    print("   Get a free key at: https://censusmapper.ca/users/sign_up")

## 1. Explore Available Data

In [None]:
# List available datasets
datasets = pc.list_census_datasets()
print("📋 Available Census datasets:")
print(datasets)

In [None]:
# List some regions for the 2021 Census
regions = pc.list_census_regions("CA21")
print(f"📍 Found {len(regions)} regions in the 2021 Census")
print("\nSample regions:")
print(regions.head())

## 2. NEW: Enhanced Variable Discovery

The enhanced pycancensus includes powerful new functions for discovering and navigating census variables.

In [None]:
# NEW: Use enhanced search to find variables
print("🔍 Enhanced Variable Discovery:")

# Find income-related variables
income_vars = pc.find_census_vectors("CA21", "income")
print(f"Found {len(income_vars)} income-related variables")

# Traditional search still works
pop_vars = pc.search_census_vectors("population", "CA21")
print(f"Found {len(pop_vars)} population variables")

# Show some examples
print("\nExample population variables:")
print(pop_vars[['vector', 'label']].head())

In [None]:
# NEW: Navigate variable hierarchies
print("🌳 Variable Hierarchy Navigation:")

base_vector = "v_CA21_1"  # Total population

# Find parent variables
parents = pc.parent_census_vectors(base_vector, dataset="CA21")
print(f"Found {len(parents)} parent variables for {base_vector}")

# Find child variables (breakdowns)
children = pc.child_census_vectors(base_vector, dataset="CA21")
print(f"Found {len(children)} child variables for {base_vector}")

if len(children) > 0:
    print("\nChild variables (population breakdowns):")
    print(children[['vector', 'label']].head())

## 3. Get Census Data

In [None]:
# Get population data for Toronto CMA
print("📊 Retrieving census data for Toronto CMA...")

toronto_data = pc.get_census(
    dataset="CA21",
    regions={"CMA": "35535"},  # Toronto CMA
    vectors=["v_CA21_1", "v_CA21_2", "v_CA21_3"],  # Total, Male, Female population
    level="CSD"  # Census Subdivision (municipalities)
)

print(f"✅ Retrieved data for {len(toronto_data)} municipalities")
print(f"📋 Columns: {list(toronto_data.columns)}")
print("\nSample data:")
print(toronto_data.head())

In [None]:
# Basic analysis
print("📈 Quick Analysis:")
print(f"Total Toronto CMA Population: {toronto_data['Population'].sum():,}")
print(f"Largest municipality: {toronto_data.loc[toronto_data['Population'].idxmax(), 'Region Name']}")
print(f"Smallest municipality: {toronto_data.loc[toronto_data['Population'].idxmin(), 'Region Name']}")

# Plot population by municipality
plt.figure(figsize=(12, 6))
top_10 = toronto_data.nlargest(10, 'Population')
plt.barh(range(len(top_10)), top_10['Population'])
plt.yticks(range(len(top_10)), top_10['Region Name'])
plt.xlabel('Population')
plt.title('Top 10 Most Populous Municipalities in Toronto CMA (2021)')
plt.tight_layout()
plt.show()

## 4. Geographic Data

In [None]:
# Get data with geographic boundaries
print("🗺️  Retrieving geographic data...")

geo_data = pc.get_census(
    dataset="CA21",
    regions={"PR": "11"},  # Prince Edward Island (smaller for demo)
    vectors=["v_CA21_1"],  # Total population
    level="CSD",
    geo_format="geopandas"  # Returns GeoDataFrame with geometry
)

print(f"✅ Retrieved geographic data for {len(geo_data)} regions")
print(f"📐 Coordinate system: {geo_data.crs}")
print(f"🗺️  Geometry type: {geo_data.geometry.geom_type.iloc[0]}")

In [None]:
# Simple map
fig, ax = plt.subplots(figsize=(10, 8))
geo_data.plot(column='Population', 
              cmap='viridis', 
              legend=True,
              ax=ax)
ax.set_title('Population by Municipality - Prince Edward Island (2021)')
ax.set_axis_off()
plt.tight_layout()
plt.show()

## 5. Enhanced Error Handling

The new pycancensus includes production-grade error handling with helpful messages.

In [None]:
# Demonstrate enhanced error handling
print("🛡️  Testing enhanced error handling...")

try:
    # Try to get data with invalid region
    pc.get_census(
        dataset="CA21",
        regions={"INVALID": "99999"},
        vectors=["v_CA21_1"],
        level="PR"
    )
except Exception as e:
    print(f"✅ Caught error gracefully: {type(e).__name__}")
    print(f"💡 Helpful message: {str(e)}")

## Next Steps

Now you're ready to explore Canadian Census data with pycancensus! Here are some ideas:

### 📚 Learn More:
- Check out the other example notebooks
- Read the comprehensive documentation
- Explore the cross-validation tests with R

### 🔍 Explore Data:
- Use the hierarchy functions to discover related variables
- Compare different Census years (2016 vs 2021)
- Analyze demographic trends in your area of interest

### 🗺️ Create Maps:
- Use GeoPandas for spatial analysis
- Create choropleth maps of demographic variables
- Combine with other geographic datasets

### 📊 Analyze Trends:
- Study population growth patterns
- Examine housing and income data
- Compare urban vs rural demographics

**Happy analyzing! 🎉**