
# Urban Population Growth and CO₂ Emissions: Conflicting Perspectives  
### Data Story Project – M2 Draft  
**Group B5 – #5**  
Jens Menkveld (15768643) • Paul Elsinghorst (15002608) • Benjamin Didic (14986876) • Jonne Bijman (15801179)  
*Draft created: June 16, 2025*



## Table of Contents  
1. [Introduction](#Introduction)  
2. [Perspectives & Arguments](#Perspectives)  
3. [Datasets & Pre‑processing](#Datasets)  
4. [Visualization Drafts](#Visualizations)  
5. [Structure & Next Steps](#NextSteps)  



## 1  Introduction <a id="Introduction"></a>  
Cities are magnets for people and opportunity. In 1990 roughly 2.3 billion people lived in urban areas; by 2020 that figure had grown to almost 4.4 billion.  
Many economists see this rapid urbanisation as the engine of productivity and innovation, lifting millions out of poverty. Environmental scientists, however, warn that the same concentration of people and activity intensifies CO₂ emissions, traffic congestion and energy demand, jeopardising climate goals.  

This data‑story explores that tension through the lenses of urban population growth and carbon‑intensive development, using publicly available global datasets.  
We aim to answer:  

*“Can we reap the economic benefits of big cities without locking‑in unsustainable levels of CO₂?”*  
And look at examples of this



## 2  Perspectives & Arguments <a id="Perspectives"></a>  

| Perspective | Key Arguments | Planned Visualisations |
|-------------|---------------|------------------------|
| **1 Economic Upside**: Growing cities fuel development | • Larger labour pools raise productivity<br>• Urban density fosters innovation clusters<br>• Consumer concentration stimulates markets | **V1** Line chart — Urban population trends (1990‑2020)<br>**V3** Scatter — Urban pop vs GDP<br>*(links Perspective 1 to V1 & V3)* |
| **2 Environmental Downside**: Rapid urbanisation drives emissions | • Cities account for >70 % of energy‑related CO₂<br>• Transport, heating & industry scale with population density<br>• High‐growth megacities show steep CO₂ per‑capita trajectories | **V2** Line chart — CO₂ per capita trends<br>**V4** Choropleth — CO₂ / capita latest year<br>**V5** Dual‑axis — Urban pop vs CO₂ for China<br>*(links Perspective 2 to V2, V4, V5)* |



## 3  Datasets & Pre‑processing <a id="Datasets"></a>  

Dataset 1 – World Development Indicators (WDI)  
*Indicator:* `SP.URB.TOTL` – Total urban population (1960‑2023)  
Source: World Bank Databank  


Dataset 2 – OWID CO₂ & Greenhouse Gas Emissions  
*Variable of interest:* `co2_per_capita` (t CO₂ per person)  
Source: Our World in Data (Global Carbon Project, etc.)

*Reproducibility:* The notebook downloads the latest CSVs directly from the providers.  


In [14]:

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import numpy as np

pd.options.display.float_format = '{:,.2f}'.format



In [15]:

# -----------------------------
# 1. Load & tidy Urban Population
# -----------------------------
# Download once; cached locally afterwards
wdi_url = "https://api.worldbank.org/v2/en/indicator/SP.URB.TOTL?downloadformat=csv"
wdi_zip = "wdi_urban.zip"
urban_csv = None

try:
    import os, zipfile, requests, io, textwrap, warnings, tempfile
    if not os.path.exists(wdi_zip):
        print("Downloading World Bank data…")
        r = requests.get(wdi_url)
        with open(wdi_zip, "wb") as f:
            f.write(r.content)

    with zipfile.ZipFile(wdi_zip) as z:
        # The actual CSV has a long file‑name; pick the first one that starts with 'API_'
        for name in z.namelist():
            if name.startswith("API_") and name.endswith(".csv"):
                urban_csv = name
                break
        urban_pop = pd.read_csv(z.open(urban_csv), skiprows=4)
except Exception as e:
    warnings.warn(f"Automatic download failed: {e}\nFalling back to local stub…")
    # Fallback stub if offline
    urban_pop = pd.read_csv("data/urban_population_stub.csv")
urban_pop.head(n=5)


Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
0,Aruba,ABW,Urban population,SP.URB.TOTL,27887.0,28212.0,28580.0,28917.0,29221.0,29502.0,...,46961.0,47075.0,47278.0,47554.0,47449.0,47244.0,47272.0,47511.0,,
1,Africa Eastern and Southern,AFE,Urban population,SP.URB.TOTL,18960189.0,19796785.0,20690699.0,21653526.0,22685293.0,23779650.0,...,217677754.0,226557045.0,236107161.0,245939557.0,256139740.0,266650602.0,277426261.0,288380305.0,,
2,Afghanistan,AFG,Urban population,SP.URB.TOTL,759034.0,800151.0,844139.0,890912.0,940801.0,993966.0,...,8682093.0,9011456.0,9367638.0,9749465.0,10168092.0,10525708.0,10800465.0,11165011.0,,
3,Africa Western and Central,AFW,Urban population,SP.URB.TOTL,14361518.0,15050171.0,15775196.0,16550889.0,17374634.0,18251823.0,...,195289605.0,203221168.0,211219592.0,219276647.0,227465935.0,235827038.0,244365364.0,253228069.0,,
4,Angola,AGO,Urban population,SP.URB.TOTL,545923.0,572465.0,599897.0,628663.0,658872.0,690469.0,...,18720648.0,19603967.0,20504018.0,21425222.0,22353719.0,23295577.0,24260684.0,25242775.0,,


In [16]:
# Keep only needed years 1990‑2023
cols_keep = ['Country Name', 'Country Code'] + [str(y) for y in range(1990, 2024)]
urban_pop = urban_pop[cols_keep]

# Convert wide ➜ long
urban_long = urban_pop.melt(id_vars=['Country Name', 'Country Code'],
                            var_name='Year', value_name='Urban_Pop')
urban_long['Year'] = urban_long['Year'].astype(int)
urban_long.dropna(subset=['Urban_Pop'], inplace=True)

# Preview
urban_long.head()


Unnamed: 0,Country Name,Country Code,Year,Urban_Pop
0,Aruba,ABW,1990,31577.0
1,Africa Eastern and Southern,AFE,1990,78899581.0
2,Afghanistan,AFG,1990,2550909.0
3,Africa Western and Central,AFW,1990,64881850.0
4,Angola,AGO,1990,4318495.0


The data preprocessing begins by downloading the World Bank’s urban population dataset in ZIP format, which contains a CSV file with yearly population data for each country. The script extracts the relevant CSV file and reads it into a DataFrame, skipping the first four rows that contain metadata. If the download fails, it falls back to loading a local stub file.

Next, it filters the dataset to retain only the columns for country name, country code, and the years 1990 to 2023. This trims unnecessary metadata and unrelated years. The data is originally in wide format, with one column per year. Then it is reshaped into long format using the melt() function, resulting in one row per country-year combination. The ‘Year’ column is converted to integers, and rows with missing population values are removed. The result is a tidy dataset suitable for time-series analysis or visualization of urban population trends across countries. 

In [17]:
# -----------------------------
# 2. Load & tidy CO₂ per capita
# -----------------------------
owid_url = "https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv"
try:
    co2 = pd.read_csv(owid_url)
except Exception as e:
    import warnings
    warnings.warn(f"Download failed: {e}\nTrying local stub…")
    co2 = pd.read_csv("data/owid_co2_stub.csv")

vars_we_need = ['country', 'iso_code', 'year', 'co2_per_capita', 'population', 'gdp']
co2 = co2[vars_we_need]
co2.rename(columns={'country':'Country Name',
                    'iso_code':'Country Code',
                    'year':'Year'}, inplace=True)
co2 = co2.query("Year >= 1990")

# Preview
co2.head()


Unnamed: 0,Country Name,Country Code,Year,co2_per_capita,population,gdp
240,Afghanistan,AFG,1990,0.17,12045664.0,13065984000.0
241,Afghanistan,AFG,1991,0.16,12238879.0,12047362048.0
242,Afghanistan,AFG,1992,0.11,13278982.0,12677539840.0
243,Afghanistan,AFG,1993,0.1,14943174.0,9834582016.0
244,Afghanistan,AFG,1994,0.09,16250799.0,7919856640.0


In [18]:

# Merge one‑to‑one on Country & Year for analyses
merged = pd.merge(urban_long, co2, on=['Country Name', 'Country Code', 'Year'], how='inner')

# Quick sanity check
merged.describe()[['Urban_Pop', 'co2_per_capita', 'population', 'gdp']]


Unnamed: 0,Urban_Pop,co2_per_capita,population,gdp
count,6086.0,6011.0,6086.0,4717.0
mean,16475529.78,5.17,33485093.68,498586553475.69
std,59350531.0,8.36,136029655.48,1806305908612.36
min,3577.0,0.02,8821.0,257172000.0
25%,525195.25,0.62,1166395.25,18819405824.0
50%,3080348.0,2.65,6169678.0,57735684096.0
75%,8677608.25,7.12,18640599.25,267081351168.0
max,910895447.0,364.69,1438069597.0,26966017179648.0



## 4  Visualization Drafts <a id="Visualizations"></a>  
Below are six visual components. The first four are close to finished;  
V5–V6 are rough prototypes or code‑skeletons that we will refine after peer feedback.


In [19]:
#1e en 2e vis misschien bij elkaar toevoegen
focus_countries = ['China', 'India', 'United States', 'Germany', 'Nigeria']
fig1 = px.line(urban_long[urban_long['Country Name'].isin(focus_countries)],
               x='Year', y='Urban_Pop', color='Country Name',
               title='Urban Population Growth, 1990‑2023')
fig1.update_yaxes(title='Urban population')
fig1.show()


In [20]:

fig2 = px.line(co2[co2['Country Name'].isin(focus_countries)],
               x='Year', y='co2_per_capita', color='Country Name',
               title='CO₂ Emissions per Capita, 1990‑2023',
               labels={'co2_per_capita':'t CO₂ per person'})
fig2.show()


In [21]:
#deze misschien per continent/ development factor op kleur sorteren + minimale grootte datapoints, legende nodig?
latest = merged[merged['Year'] == 2020]
fig3 = px.scatter(latest, x='Urban_Pop', y='co2_per_capita',
                  size='population', hover_name='Country Name',
                  title='Urban Population vs CO₂ per Capita (2020)',
                  labels={'Urban_Pop':'Urban population',
                          'co2_per_capita':'t CO₂ per person'})
fig3.update_xaxes(type='log')
fig3.show()


In [None]:
fig = px.choropleth(
    co2,
    locations='Country Code',
    color='co2_per_capita',
    hover_name='Country Name',
    animation_frame='Year',  # <-- This creates the slider
    color_continuous_scale='Reds',
    title='t CO₂ per Capita by Country (per year)',
    labels={'co2_per_capita': 't CO₂ per Capita'}  # Label formatting
)

fig.update_layout(
    coloraxis_colorbar=dict(
        title="t CO₂<br>per Capita"
    )
)

fig.show()

In [25]:
countries = ['China', 'India', 'United States', 'Germany', 'Brazil']

# Create figure
fig = go.Figure()

# Add traces for each country, but make only the first one visible
for i, country in enumerate(countries):
    df = merged[merged['Country Name'] == country]
    fig.add_trace(go.Bar(
        x=df['Year'], y=df['Urban_Pop'],
        name='Urban Population',
        opacity=0.6,
        yaxis='y1',
        visible=(i == 0)
    ))
    fig.add_trace(go.Scatter(
        x=df['Year'], y=df['co2_per_capita'],
        name='CO₂ per Capita',
        yaxis='y2',
        visible=(i == 0)
    ))

# Create dropdown buttons to toggle visibility
buttons = []
for i, country in enumerate(countries):
    visibility = [False] * len(countries) * 2  # 2 traces per country
    visibility[2*i] = True      # Bar for this country
    visibility[2*i + 1] = True  # Line for this country

    buttons.append(dict(
        label=country,
        method='update',
        args=[{'visible': visibility},
              {'title': f'{country}: Urban Population (bars) vs CO₂ per Capita (line)'}]
    ))

# Update layout with both y-axes and dropdown menu
fig.update_layout(
    updatemenus=[dict(
        active=0,
        buttons=buttons,
        x=0.1,
        y=1.15,
        xanchor='left',
        yanchor='top'
    )],
    title='Urban Population vs CO₂ per Capita',
    yaxis=dict(title='Urban Population'),
    yaxis2=dict(title='CO₂ per Capita (t)', overlaying='y', side='right')
)

fig.show()

In [None]:
#schaal miss logaritmisch + grafiek hieronder met gdp per capita ook veradering
urb90 = urban_long.query("Year == 1990")[['Country Code', 'Urban_Pop']].rename(columns={'Urban_Pop':'Pop1990'})
urb20 = urban_long.query("Year == 2020")[['Country Code', 'Urban_Pop']].rename(columns={'Urban_Pop':'Pop2020'})
chg = pd.merge(urb90, urb20, on='Country Code')
chg['pct_change'] = (chg['Pop2020'] - chg['Pop1990']) / chg['Pop1990'] * 100

# draft map
fig6 = px.choropleth(chg,
                     locations='Country Code',
                     color='pct_change',
                     color_continuous_scale='Viridis',
                     title='Urban Population Growth 1990‑2020 (% change)')
fig6.show()



## 5  Structure & Next Steps <a id="NextSteps"></a>  

* What’s complete so far  
  * **Introduction** for a general audience  
  * Two perspectives & mapped arguments  
  * Cleaned datasets stored in `data/` or downloaded programmatically  
  * **4/6** visualisations ~90 % ready  
  * Draft code & skeleton for remaining visuals  

* Planned refinements  
  1. Add GDP variables to strengthen economic perspective (V3 enhancement)  
  2. Improve colour palettes & accessibility annotations  
  3. Write narrative captions linking each visual to an argument  
  4. Implement guided scrolling (“scrollytelling”) layout in final deployment  
  5. Peer‑review code for reproducibility and performance issues
  6. gdp per capita grafieken* urbanisatie
  7. argumenten duidelijk verwerken
