# Tutorial 10. Cartography

Maps visualize data with geographic or spatial information, such as locations, regions, and boundaries. In this tutorial, you'll learn to create choropleth maps and other geographic visualizations using Altair.

## Learning Goals
Those who actively work through this notebook will be able to:
- Describe data formats for representing geographic features
- Create choropleth maps
- Use the `lookup_transform` to bind data 

In [1]:
import pandas as pd
import altair as alt
from vega_datasets import data

## Geographic Data: GeoJSON and TopoJSON

Geographic data requires special formats to represent shapes like country boundaries and regions. We'll use two common formats:

- **GeoJSON**: Stores geographic features (points, lines, polygons) with their properties (e.g., name, population)
- **TopoJSON**: A more compact format that stores shared boundaries only once instead of duplicating them

Altair works with both formats. TopoJSON files are smaller and recommended for web visualizations because they eliminate redundant coordinate data. For example, when multiple states share a border, TopoJSON stores those boundary coordinates just once rather than repeating them for each state like the GeoJSON file does.

### Loading TopoJSON Data

Let's load a TopoJSON file of world countries at 110 meter resolution:


In [2]:
# Load world country boundaries
world = data.world_110m.url

# Extract the 'countries' features from the TopoJSON file
countries = alt.topo_feature(world, 'countries')


- The `alt.topo_feature()` function extracts geographic features from TopoJSON and converts them to a format Altair can visualize. 
- The second argument (`'countries'`) specifies which feature set to extract—in this case, country boundaries.

## Geoshape Marks

To visualize geographic data, Altair provides the `geoshape` mark type. Let's create a basic world map:

```python
alt.Chart(countries).mark_geoshape()
```

This creates a simple map where each country is drawn as a separate shape.



In [3]:
alt.Chart(countries).mark_geoshape()


In the example above, Altair applies a default blue color.  Just like with other marks we have been exposed to, we can customize the colors and boundary stroke widths using standard mark properties. 


In [4]:
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
    fill='black', stroke='white', strokeWidth=0.5
)

### Projections

By default, Altair uses the `equalEarth` projection. Many other projections are available, including `albers`, `mercator`, and `azimuthalEquidistant`.

A full list of projections is available in [Altair's projection documentation](https://altair-viz.github.io/user_guide/generated/core/altair.Projection.html).

You can specify a different projection using the `project()` method:
    

In [5]:
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
    fill='black', stroke='white', strokeWidth=0.5
).project(
    type='mercator'
)

By default Altair automatically adjusts the projection so that all the data fits within the width and height of the chart. We can also specify projection parameters, such as `scale` (zoom level) and `translate` (panning), to customize the projection settings. Here we adjust the `scale` and `translate` parameters to focus on Europe:

In [6]:
alt.Chart(alt.topo_feature(world, 'countries')).mark_geoshape(
    fill='black', stroke='blue', strokeWidth=2   #notice how i changed the stroke and strokeWidth
).project(
    type='mercator', scale=400, translate=[100, 550]
)

## Choosing a Projection

Map projections always involve tradeoffs - you can't preserve area, shape, distance, and bearing simultaneously.

**For most data science applications:**

| Region | Recommended Projection | Code |
|--------|----------------------|------|
| United States | `albersUsa` | `project(type='albersUsa')` |
| Canada | `conicEqualArea` | `project(type='conicEqualArea', center=[-95, 60], parallels=[49, 77])` |
| Global view | `naturalEarth1` or `equalEarth` | `project(type='naturalEarth1')` |
| Polar regions | `azimuthalEqualArea` | `project(type='azimuthalEqualArea')` |

**Key parameters:**
- `scale`: Zoom level (higher = more zoomed in)
- `center`: `[longitude, latitude]` to center the map
- `parallels`: Standard parallels for conic projections

Experiment with projections: [Vega-Lite Cartographic Projections](https://observablehq.com/@vega/vega-lite-cartographic-projections)

## Choropleth Maps

A [choropleth map](https://en.wikipedia.org/wiki/Choropleth_map) uses colors or shading to represent data values across geographic regions. 

Let's create one showing the total alcohol consumption per capita for 2018

We'll need two data sources:
1. TopoJSON with country boundaries (we already have this)
2. Alcohol data

### Wrangling Geographic Data - Data Preparation for Choropleth Maps

Before creating our map, we need to prepare our data carefully. This involves two key steps: getting country IDs and cleaning our alcohol consumption data.

#### The Problem: Why We Need Country IDs

**The Challenge:** TopoJSON map files identify countries using **numeric IDs** (like 840 for USA), but most datasets use **country names** (like "United States"). We need a way to connect the two!

```
TopoJSON Map          Your Data           What We Need
┌─────────────┐      ┌─────────────┐     ┌──────────────────┐
│ id: 840     │  +   │ USA         │  =  │ id: 840          │
│ (shape)     │      │ 8 liters    │     │ Country: USA     │
└─────────────┘      └─────────────┘     │ Liters: 8        │
                                         └──────────────────┘
```
#### Step 1: Load Country ID Mapping
Create a list of `relevant_country_names` to filter our data later


In [7]:
country_ids = pd.read_csv('https://raw.githubusercontent.com/kemiolamudzengi/dsci-320-datasets/main/country-ids-and-continents.csv')
relevant_country_names = country_ids["Country"]
country_ids.sample(10)


Unnamed: 0,ID,Country,Continent
159,795,Turkmenistan,Asia
162,784,United Arab Emirates,Asia
64,324,Guinea,Africa
172,894,Zambia,Africa
56,266,Gabon,Africa
108,516,Namibia,Africa
89,422,Lebanon,Asia
8,40,Austria,Europe
67,332,Haiti,North America
40,203,Czechia,Europe


#### Step 2: Load and Clean Alcohol Data
 - Step 2a: Load the Raw Data
 - Step 2b: Rename Columns (Make Them Shorter!)
 - Step 2c: Drop Unnecessary Columns
 - Step 2d: Filter to Relevant Data
 - Step 2e: Add the Country IDs (The Critical Step!)
---
The Complete Data Flow

```
Step 1: Load country IDs
┌──────────────────────────┐
│ Country ID Mapping       │
│ ├─ ID: 840               │
│ └─ Country: USA          │
└──────────────────────────┘
           │
           ▼
Step 2a-d: Load & clean alcohol data
┌──────────────────────────┐
│ Alcohol Dataset          │
│ ├─ Country: USA          │
│ └─ Liters: 8.9           │
└──────────────────────────┘
           │
           ▼
Step 2e: Merge (add IDs to alcohol data)
┌──────────────────────────┐
│ Combined Dataset         │
│ ├─ ID: 840               │
│ ├─ Country: USA          │
│ └─ Liters: 8.9           │
└──────────────────────────┘
           │
           ▼
Step 3: Use in map (next code cell)
┌──────────────────────────┐
│ World Map                │
│ USA colored by 8.9       │
└──────────────────────────┘
```

---


In [8]:
# STEP 2: Load and prepare the alcohol consumption data
alcohol = pd.read_csv(
    'https://raw.githubusercontent.com/kemiolamudzengi/dsci-320-datasets/main/alcohol-consumption-vs-gdp-per-capita.csv',
    parse_dates=['Year']  # Convert Year to datetime for easier filtering
    
).rename(  # Shorten the extremely long column names
    columns={
        'Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)': 'Liters_per_capita',
        'GDP per capita, PPP (constant 2017 international $)': 'GDP_per_capita',
        'Population (historical estimates)': 'Population',
        'Entity': 'Country'  # Standardize to 'Country' to match country_ids
    }
    
).drop(  # Remove columns we don't need
    columns=['Continent']
    
).query(  # Filter to just what we need for the map
    'Year == "2018" & Country in @relevant_country_names'
    # Only 2018 data + only countries that have IDs
    
).merge(  # Add the ID column by joining with country_ids
    country_ids,  # Automatically joins on 'Country' column
    # Now each row has: Country, Liters_per_capita, GDP_per_capita, ID
)

# Preview the final, cleaned dataset
alcohol.sample(10)

Unnamed: 0,Country,Code,Year,Liters_per_capita,GDP_per_capita,Population,ID,Continent
71,Ireland,IRL,2018,12.88,84303.340892,4818694.0,372,Europe
33,Cote d'Ivoire,CIV,2018,2.71,5033.478826,25069230.0,384,Africa
85,Liberia,LBR,2018,6.12,1497.004787,4818976.0,430,Africa
161,Yemen,YEM,2018,0.051,,28498680.0,887,Asia
34,Croatia,HRV,2018,9.23,27799.843323,4156407.0,191,Europe
29,China,CHN,2018,7.05,15242.985858,1427648000.0,156,Asia
31,Congo,COG,2018,9.27,3933.425551,5244363.0,178,Africa
53,Gambia,GMB,2018,3.55,2158.077558,2280092.0,270,Africa
142,Tajikistan,TJK,2018,3.28,3415.322336,9100847.0,762,Asia
21,Burkina Faso,BFA,2018,12.03,2120.330268,19751470.0,854,Africa


### Lookup Transform

To create our map, we start by
#### 1. Loading the World Map Boundaries 
_we've done this before_
```
world_map = alt.topo_feature(data.world_110m.url, 'countries')
```
#### 2. Next we create a map based on this data
_we've done this before_
```
alt.Chart(world_map).mark_geoshape()
```

#### 3. Now we need to join the data
To integrate our data sources, we will use the `lookup` transform, augmenting our TopoJSON-based `geoshape` data with alcohol data

For more on the [lookup transform](https://altair-viz.github.io/user_guide/transform/lookup.html#combining-datasets-with-a-lookup-transform) review the Altair documentation
```
.transform_lookup(
    lookup='id',
    from_=alt.LookupData(alcohol, 'ID', ['Country', 'Liters_per_capita'])
)
```

This is the **critical step** that connects your data to the map:

- **`transform_lookup()`**: Joins two datasets together
- **`lookup='id'`**: The field in the **map data** to match on (country ID number)
- **`from_=alt.LookupData(...)`**: Where to get the matching data:
  - **`alcohol`**: Your DataFrame with alcohol consumption data
  - **`'ID'`**: The field in **your data** to match on (must contain the same country IDs)
  - **`['Country', 'Liters_per_capita']`**: Which columns to add to each country shape

**Visual explanation:**
```
World Map (TopoJSON)          Alcohol DataFrame
┌──────────────┐             ┌─────────────────────┐
│ id: 840      │────match───▶│ ID: 840             │
│ (country     │             │ Country: USA        │
│  shape)      │◀───attach───│ Liters_per_capita: 8│
└──────────────┘             └─────────────────────┘


```
#### 4. Now we encode the visual properties and set the projection and map size. 
```
.encode(
    tooltip='Country:N',
    color='Liters_per_capita:Q'
).project(
    'equalEarth',
    scale=150
).properties(
    width=700,
    height=420
)
```

In [9]:
# Load world country boundaries from built-in TopoJSON file
world_map = alt.topo_feature(data.world_110m.url, 'countries')

# Create the choropleth map
alcohol_choropleth = alt.Chart(world_map).mark_geoshape().transform_lookup(
    # Join map data with alcohol data using country IDs
    lookup='id',                    # Country ID in the map (numeric)
    from_=alt.LookupData(
        alcohol,                    # Your DataFrame with statistics
        'ID',                       # Matching ID field in your data
        ['Country',                 # Attach country name
         'Liters_per_capita']       # Attach alcohol consumption value
    )
).encode( # Visual encodings
    tooltip='Country:N',            # Show country name on hover
    color='Liters_per_capita:Q'     # Color by alcohol consumption
).project(
    'equalEarth',                   # Equal Earth projection (preserves area)
    scale=150                       # Zoom level (higher = more zoomed)
).properties(
    width=700,                      # Map width in pixels
    height=420                      # Map height in pixels
)

# Display the map
alcohol_choropleth

### Practice 
PAUSE
Take a breathe and go over what we did again, open a new jupyter notebook and make sure that you can re-create the map above. 
Remember process over product!!!

#### The Complete Flow

1. **Load** → Get country boundary shapes from TopoJSON
2. **Draw** → Use `mark_geoshape()` to draw each country
3. **Match** → Link country IDs in map to IDs in your data
4. **Attach** → Add `Country` and `Liters_per_capita` to each shape
5. **Color** → Shade countries based on alcohol consumption
6. **Tooltip** → Show country name on hover
7. **Project** → Use Equal Earth projection at scale 150
8. **Size** → Display at 700×420 pixels



#### Quick Reference

| Component  | Purpose | Example |
|-----------|---------|---------|
| `alt.topo_feature()`  | Load map shapes | `alt.topo_feature(url, 'countries')` |
| `.mark_geoshape()` | Draw geographic shapes | Creates the country boundaries |
| `.transform_lookup()` | Join data to shapes | Match IDs to attach statistics |
| `lookup='id'` | Map's ID field | Which field in the map to match |
| `from_=alt.LookupData()` | Data source | Where to get the statistics |
| `.encode(color=...)` | Set colors | Color by your metric |
| `.project()` | Set projection | How to display the globe |
| `.properties()` | Set dimensions | Width and height |

---


### Canadian  Map

In [10]:
import altair as alt
import pandas as pd

# URL to Canadian provinces/territories TopoJSON
# This contains both provinces and territories boundaries
canada_url = 'https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/canada.geojson'

# Load the geographic data
canada_geo = alt.Data(url=canada_url, 
                      format=alt.DataFormat(property='features', type='json'))

# Create a basic map of Canada
base_map = alt.Chart(canada_geo).mark_geoshape(
    stroke='white',
    strokeWidth=1
).project(
    type='mercator',   # Use Mercator or Albers for Canada
    center=[-95, 60],  # Centered roughly on Canada
    scale=400          # Adjust for zoom level
).properties(
    width=800,
    height=600,
    title='Canadian Provinces and Territories'
)

base_map

### Canadian Choropleth Map
We've done a considerable amount of data wrangling today, so we will just use a toy dataset to reduce the cognitive load. We will create a dataframe that simulates the provincial renewable energy data. In a real analysis, you would get this data from Statistics Canada. 

In [11]:
provincial_energy = pd.DataFrame({
    'name': [
        'British Columbia', 'Alberta', 'Saskatchewan', 'Manitoba',
        'Ontario', 'Quebec', 'New Brunswick', 'Nova Scotia',
        'Prince Edward Island', 'Newfoundland and Labrador',
        'Yukon Territory', 'Northwest Territories', 'Nunavut'
    ],
    'renewables_share': [
        95.2, 12.5, 25.3, 68.1,  # BC, AB, SK, MB
        35.4, 47.8, 32.1, 28.5,   # ON, QC, NB, NS
        21.3, 42.7, 32.2, 15.4, 3.1  # PE, NL, YT, NT, NU
    ],
    'hydro_share': [
        92.1, 3.2, 18.5, 97.2,
        24.8, 42.1, 28.3, 9.2,
        1.0, 38.5, 7.8, 13.2, 4.0
    ],
    'wind_share': [
        2.1, 8.3, 5.8, 0.5,
        8.9, 4.2, 2.8, 17.8,
        19.3, 2.8, 5.0, 1.2, 3.0
    ],
    'population': [
        5214805, 4543111, 1194803, 1386333,
        14826276, 8604495, 789225, 1004767,
        164318, 528448, 42986, 45504, 40817
    ]
})

provincial_energy.head(13)

Unnamed: 0,name,renewables_share,hydro_share,wind_share,population
0,British Columbia,95.2,92.1,2.1,5214805
1,Alberta,12.5,3.2,8.3,4543111
2,Saskatchewan,25.3,18.5,5.8,1194803
3,Manitoba,68.1,97.2,0.5,1386333
4,Ontario,35.4,24.8,8.9,14826276
5,Quebec,47.8,42.1,4.2,8604495
6,New Brunswick,32.1,28.3,2.8,789225
7,Nova Scotia,28.5,9.2,17.8,1004767
8,Prince Edward Island,21.3,1.0,19.3,164318
9,Newfoundland and Labrador,42.7,38.5,2.8,528448


In [12]:
# Create choropleth map showing renewable energy share
canada_renewables = alt.Chart(canada_geo).mark_geoshape(
    stroke='white',
    strokeWidth=1.5
).transform_lookup(
    lookup='properties.name',
    from_=alt.LookupData(
        data=provincial_energy,
        key='name',
        fields=['renewables_share',  'population']
    )
).encode(
    color=alt.Color(
        'renewables_share:Q',
        scale=alt.Scale(
            scheme='greens',
            domain=[0, 100]
        ),
     legend=alt.Legend(
    title='Renewable Energy Share (%)',
    format='.1f',
    orient='bottom',
         offset=-150,
    direction='horizontal',
    titleOrient='top',
    titleAnchor='middle',
    gradientLength=450,       # Width of the gradient bar
    gradientThickness=10,     # Height of the gradient bar
    labelFontSize=11,
    titleFontSize=13
)),
    tooltip=[
        alt.Tooltip('properties.name:N', title='Province/Territory'),
        alt.Tooltip('renewables_share:Q', title='Renewable Share (%)', format='.1f'),
        alt.Tooltip('population:Q', title='Population', format=',')
    ]
).project(
    type='mercator',   
    center=[-95, 60],  
    scale=400,         
    parallels=[49, 77]
).properties(
    width=800,
    height=600,
    title='Renewable Energy Share by Province/Territory (2022)'
).configure_view(
    stroke=None
)

canada_renewables

## Color Schemes. 
Altair has many color schemes, a full ist is available in (Vega-lite API)[https://vega.github.io/vega/docs/schemes/]



In [13]:
# Utility function to generate a map with a specific color scheme
def canada_map(scheme):
    return alt.Chart(canada_geo).mark_geoshape(
        stroke='white',
        strokeWidth=1
    ).transform_lookup(
        lookup='properties.name',
        from_=alt.LookupData(
            data=provincial_energy,
            key='name',
            fields=['renewables_share', 'population']
        )
    ).encode(
        alt.Color(
            'renewables_share:Q',
            scale=alt.Scale(scheme=scheme),
             legend=alt.Legend(
    title='Renewable Energy Share (%)',
    format='.1f',
    orient='bottom',
    direction='horizontal',
    titleOrient='top',
    titleAnchor='middle',
    gradientLength=250,       # Width of the gradient bar
    gradientThickness=5,     # Height of the gradient bar
    
)),   tooltip=[
        alt.Tooltip('properties.name:N', title='Province/Territory'),
        alt.Tooltip('renewables_share:Q', title='Renewable Share (%)', format='.1f'),
        alt.Tooltip('population:Q', title='Population', format=',')
    ]
).project(
        type='mercator',   
        center=[-95, 60],  
     #   scale=200,         
      #  parallels=[49, 77]
).properties(
        width=305,
        height=200
    )

# Compare three different color schemes
alt.hconcat(
    canada_map('greens'),
    canada_map('viridis'),
    canada_map('redyellowgreen')
).configure_view(
    stroke=None
).resolve_scale(
    color='independent'
)

## Troubleshooting Common Issues

**Map doesn't appear:**
- ✅ Verify the TopoJSON/GeoJSON URL is accessible
- ✅ Check the feature name matches (e.g., `'countries'`, `'states'`, `'counties'`)

**Data not showing on map:**
- ✅ Ensure IDs match between geographic data and your data table
- ✅ For US: Use FIPS codes; For Canada: Use exact province names

**Map looks distorted:**
- ✅ Use the correct projection for your region
- ✅ Adjust `center`, `scale`, and `parallels` parameters

**Colors not showing:**
- ✅ Check that the field exists in your data
- ✅ Verify `transform_lookup` includes the color field in `fields=[...]`

## Working with Your Own Geographic Data

**Common data sources:**
- **Canada:** [Statistics Canada](https://www12.statcan.gc.ca/), [Natural Resources Canada](https://www.nrcan.gc.ca/)
- **US:** [US Census Bureau](https://www.census.gov/geographies.html), [USGS](https://www.usgs.gov/)
- **Global:** [Natural Earth Data](https://www.naturalearthdata.com/)

**Simple workflow:**
1. Download shapefiles from Natural Earth Data
2. Convert to TopoJSON using [MapShaper](https://mapshaper.org/)
3. Load in Altair with `alt.topo_feature()` or `alt.Data()`

For advanced geospatial work, explore the `geopandas` library.


**Examples on Altair**

- https://altair-viz.github.io/gallery/groupby-map.html

- https://altair-viz.github.io/gallery/choropleth_repeat.html

- https://altair-viz.github.io/gallery/us_incomebrackets_by_state_facet.html

- https://altair-viz.github.io/gallery/point_map.html

- https://altair-viz.github.io/gallery/world_projections.html

## Exercises

### Exercise 1: Projection Exploration (Easy)
Create a world map using the orthographic projection (which shows Earth as a globe). Customize the map with:

- Land in color '#f0e6d2'
- Country borders in color '#8b7355' with strokeWidth of 0.3
- Center the projection on longitude -100 and latitude 40 (North America)


### Exercise 2: US Choropleth Map (Easy-Medium)
Create a choropleth map showing unemployment rates by state.
Start by getting the topojson file, using:
```
from vega_datasets import data
us_states = alt.topo_feature(data.us_10m.url, 'states')

```

The rental data available here.
https://raw.githubusercontent.com/kemiolamudzengi/dsci-320-datasets/main/state_market_tracker.csv

Your map should:
 - Use the `albersUsa` projection
 - Color states by condo price rate using the 'oranges' color scheme
 - Include tooltips showing state name and median cost of housing for condos
 - Add an appropriate title


### Data Wrangling

For these last two questions, the data wrangling is where you will spend alot of time. Re-visit the question we did on alcohol.
First explore the data and choose a year of interest. (e.g. 2020)

Owid dataset is here
https://raw.githubusercontent.com/kemiolamudzengi/dsci-320-datasets/main/owid_energy_clean.csv,


### Exercise 3: Energy Comparison Map (Medium-Hard)
Create a side-by-side comparison of two choropleth maps showing:

- Coal share of energy (coal_share_energy)
- Renewable energy share (renewables_share_energy)

For the same year and the same set of countries.

Requirements:

- Use `alt.hconcat()` to place maps side by side
- Use contrasting color schemes (e.g., 'greys' for coal, 'greens' for renewables)
- Ensure both maps use the same projection and dimensions
- Add clear titles to each map
- Use `resolve_scale(color='independent')` so each map has its own legend


### Exercise 4: Interactive Energy Dashboard (Hard)
For the same year you used in the preceding question, create an interactive choropleth map that allows users to select which energy metric to display using a dropdown menu.
Requirements:
- Allow selection between at least 3 different metrics:
   - Renewable energy share
   - Fossil fuel share
   - Nuclear energy share
   - Coal energy share
- Use a parameter binding with a dropdown selector
- Include appropriate tooltips
- Add a title that updates to show which metric is being displayed



## Summary

At this point, we've only dipped our toes into the waters of map-making. 
For example, we left untouched topics such as [_cartograms_](https://en.wikipedia.org/wiki/Cartogram) and conveying [_topography_](https://en.wikipedia.org/wiki/Topography) &mdash; as in Imhof's illuminating book [_Cartographic Relief Presentation_](https://books.google.com/books?id=cVy1Ms43fFYC). 
Nevertheless, you should now be well-equipped to create a rich array of geo-visualizations. 
For more, MacEachren's book [_How Maps Work: Representation, Visualization, and Design_](https://books.google.com/books?id=xhAvN3B0CkUC) provides a valuable overview of map-making from the perspective of data visualization. 