# COVID-19 Global Tracker 🌍

This notebook analyzes cases, deaths, recoveries, and vaccinations across countries and time.
We’ll clean and process real-world data, perform EDA, generate insights, and visualize trends using Python data tools.

Why this matters:
- To understand how different communities navigated the pandemic.
- To communicate trends clearly and compassionately for supporters and stakeholders.
- To keep the analysis reproducible and modular for future use.



## Setup and styling

We import core libraries (pandas, numpy, plotly) and enable a soft, blush beige theme.
We also add optional CSS and HTML helpers for KPI cards and section headers.

In [1]:
# 📦 Core imports
import os
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

# 💻 Optional widgets (for Jupyter)
try:
    import ipywidgets as widgets
    from IPython.display import display, HTML, Markdown
    WIDGETS_AVAILABLE = True
except Exception:
    WIDGETS_AVAILABLE = False
    from IPython.display import HTML, Markdown

# 🖥️ Pandas formatting
pd.set_option("display.max_columns", 100)
pd.set_option("display.width", 160)

# 🎨 Plotly styling (neutral)
px.defaults.template = "plotly_white"
px.defaults.color_discrete_sequence = px.colors.qualitative.Set2  # Soft, readable

def apply_plotly_theme(fig):
    fig.update_layout(
        font=dict(size=14),
        margin=dict(l=40, r=20, t=60, b=40),
        legend=dict(bordercolor="LightGray", borderwidth=1),
        xaxis=dict(gridcolor="LightGray"),
        yaxis=dict(gridcolor="LightGray"),
    )
    return fig

## Load data

We’ll read a local CSV (preferred for reproducibility). If you’re using the OWID dataset, point to `owid-covid-data.csv`.
Columns are auto-detected; the pipeline gracefully handles missing fields like recoveries.


In [2]:
# Configure your dataset path here
DATA_PATHS = [
    "covid_data.csv",               # your cleaned/curated file
    "owid-covid-data.csv",          # OWID canonical file (if present locally)
    os.path.join("data", "covid_data.csv"),
    os.path.join("data", "owid-covid-data.csv"),
]

def first_existing(paths):
    for p in paths:
        if os.path.exists(p):
            return p
    return None

DATA_FILE = first_existing(DATA_PATHS)
if DATA_FILE is None:
    raise FileNotFoundError(
        "No data file found. Place `covid_data.csv` or `owid-covid-data.csv` in the project or data/ folder."
    )

print(f"Using data file: {DATA_FILE}")

df_raw = pd.read_csv(DATA_FILE)
print(f"Raw shape: {df_raw.shape}")
df_raw.head(3)

Using data file: owid-covid-data.csv
Raw shape: (344779, 67)


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,total_tests,new_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,population,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-01-03,,0.0,,,0.0,,,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,41128772.0,,,,
1,AFG,Asia,Afghanistan,2020-01-04,,0.0,,,0.0,,,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,41128772.0,,,,
2,AFG,Asia,Afghanistan,2020-01-05,,0.0,,,0.0,,,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,41128772.0,,,,


## Standardize columns

We normalize column names to lowercase and strip spaces.
We also detect commonly used fields and provide fallbacks.

Key fields we aim for:
- location, iso_code, date
- total_cases, new_cases
- total_deaths, new_deaths
- total_vaccinations, new_vaccinations, people_vaccinated_per_hundred
- total_recoveries (if available; often not in OWID)
- population

In [3]:
df = df_raw.copy()
df.columns = [c.strip().lower() for c in df.columns]

# Common aliases for recoveries used in other datasets
RECOVERY_CANDIDATES = ["total_recoveries", "total_recovered", "recovered", "cumulative_recovered"]

# Ensure required columns exist
required_base = ["location", "date"]
for col in required_base:
    if col not in df.columns:
        raise ValueError(f"Missing required column: {col}")

# Optional helpful columns
maybe_cols = [
    "iso_code", "population",
    "total_cases", "new_cases",
    "total_deaths", "new_deaths",
    "total_vaccinations", "new_vaccinations",
    "people_vaccinated_per_hundred", "new_cases_smoothed", "new_deaths_smoothed"
]
present_cols = [c for c in maybe_cols if c in df.columns]

# Try to identify a recovery column if present
recovery_col = next((c for c in RECOVERY_CANDIDATES if c in df.columns), None)
print("Detected recovery column:", recovery_col)

# Parse dates
df["date"] = pd.to_datetime(df["date"], errors="coerce")
df = df.dropna(subset=["date"])
df = df.sort_values(["location", "date"]).reset_index(drop=True)

df.head(3)

Detected recovery column: None


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,total_tests,new_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,population,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-01-03,,0.0,,,0.0,,,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,41128772.0,,,,
1,AFG,Asia,Afghanistan,2020-01-04,,0.0,,,0.0,,,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,41128772.0,,,,
2,AFG,Asia,Afghanistan,2020-01-05,,0.0,,,0.0,,,0.0,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,54.422,18.6,2.581,1.337,1803.987,,597.029,9.59,,,37.746,0.5,64.83,0.511,41128772.0,,,,


## Cleaning and feature engineering

- Forward-fill cumulative fields within each country.
- Compute daily deltas where missing.
- Compute per-capita and rates where possible.
- Derive active cases if recoveries are available: active = cases - deaths - recoveries.
- Clip negative daily values to 0 (to handle data corrections).

In [4]:
group_cols = ["location"]

def ffill_group(df_in, cols):
    return df_in.groupby(group_cols, group_keys=False)[cols].ffill()

# Identify cumulative-like fields to forward-fill (if present)
cum_candidates = [
    "total_cases", "total_deaths", "total_vaccinations",
]
if recovery_col:
    cum_candidates.append(recovery_col)

cum_to_ffill = [c for c in cum_candidates if c in df.columns]
if cum_to_ffill:
    df[cum_to_ffill] = ffill_group(df, cum_to_ffill)

# Compute daily deltas if missing
if "new_cases" not in df.columns and "total_cases" in df.columns:
    df["new_cases"] = df.groupby(group_cols)["total_cases"].diff().fillna(0)

if "new_deaths" not in df.columns and "total_deaths" in df.columns:
    df["new_deaths"] = df.groupby(group_cols)["total_deaths"].diff().fillna(0)

if "new_vaccinations" not in df.columns and "total_vaccinations" in df.columns:
    df["new_vaccinations"] = df.groupby(group_cols)["total_vaccinations"].diff().fillna(0)

# Clip negatives created by data backfills
for daily_col in ["new_cases", "new_deaths", "new_vaccinations"]:
    if daily_col in df.columns:
        df[daily_col] = df[daily_col].clip(lower=0)

# Per-capita vaccinations if population exists and percentage not provided
if "people_vaccinated_per_hundred" not in df.columns and \
   "total_vaccinations" in df.columns and "population" in df.columns:
    # total_vaccinations per 100 people (approximation; OWID provides better fields when available)
    df["vaccinations_per_hundred_est"] = (df["total_vaccinations"] / df["population"]) * 100

# Case fatality rate (CFR) and mortality per 100k if population available
if "total_cases" in df.columns and "total_deaths" in df.columns:
    df["cfr"] = np.where(df["total_cases"] > 0, df["total_deaths"] / df["total_cases"], np.nan)

if "population" in df.columns and "total_deaths" in df.columns:
    df["deaths_per_100k"] = (df["total_deaths"] / df["population"]) * 100000

# Active cases if we have a recovery field
if recovery_col and "total_cases" in df.columns and "total_deaths" in df.columns:
    df["active_cases"] = df["total_cases"] - df["total_deaths"] - df[recovery_col]
    df["active_cases"] = df["active_cases"].where(df["active_cases"] >= 0, np.nan)

print("Cleaned shape:", df.shape)
df.tail(3)

Cleaned shape: (344779, 69)


Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,total_cases_per_million,new_cases_per_million,new_cases_smoothed_per_million,total_deaths_per_million,new_deaths_per_million,new_deaths_smoothed_per_million,reproduction_rate,icu_patients,icu_patients_per_million,hosp_patients,hosp_patients_per_million,weekly_icu_admissions,weekly_icu_admissions_per_million,weekly_hosp_admissions,weekly_hosp_admissions_per_million,total_tests,new_tests,total_tests_per_thousand,new_tests_per_thousand,new_tests_smoothed,new_tests_smoothed_per_thousand,positive_rate,tests_per_case,tests_units,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,new_vaccinations_smoothed,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,new_vaccinations_smoothed_per_million,new_people_vaccinated_smoothed,new_people_vaccinated_smoothed_per_hundred,stringency_index,population_density,median_age,aged_65_older,aged_70_older,gdp_per_capita,extreme_poverty,cardiovasc_death_rate,diabetes_prevalence,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,population,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million,cfr,deaths_per_100k
344776,ZWE,Africa,Zimbabwe,2023-09-25,265748.0,0.0,0.0,5718.0,0.0,0.0,16283.041,0.0,0.0,350.356,0.0,0.0,,,,,,,,,,,,,,,,,,,12222754.0,,,,,,,,,,,,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,,0.021517,35.035608
344777,ZWE,Africa,Zimbabwe,2023-09-26,265748.0,0.0,0.0,5718.0,0.0,0.0,16283.041,0.0,0.0,350.356,0.0,0.0,,,,,,,,,,,,,,,,,,,12222754.0,,,,,,,,,,,,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,,0.021517,35.035608
344778,ZWE,Africa,Zimbabwe,2023-09-27,265748.0,0.0,0.0,5718.0,0.0,0.0,16283.041,0.0,0.0,350.356,0.0,0.0,,,,,,,,,,,,,,,,,,,12222754.0,,,,,,,,,,,,,,42.729,19.6,2.822,1.882,1899.775,21.4,307.846,1.82,1.6,30.7,36.791,1.7,61.49,0.571,16320539.0,,,,,0.021517,35.035608


## Controls: country and date selection

Use the widgets to filter by country (single or multiple) and date range.
Note on dates: widgets return `datetime.date` objects — we convert them with `pd.to_datetime` to match `datetime64[ns]`.

In [5]:
# Defaults for convenience
DEFAULT_COUNTRIES = ["Kenya"] if "Kenya" in df["location"].unique() else [df["location"].iloc[0]]
min_date = df["date"].min().date()
max_date = df["date"].max().date()

if WIDGETS_AVAILABLE:
    countries_widget = widgets.SelectMultiple(
        options=sorted(df["location"].unique()),
        value=tuple(DEFAULT_COUNTRIES),
        description="Countries",
        rows=10,
        layout=widgets.Layout(width="50%")
    )
    start_widget = widgets.DatePicker(
        description="Start date",
        value=min_date
    )
    end_widget = widgets.DatePicker(
        description="End date",
        value=max_date
    )
    display(Markdown("### Filter panel"))
    display(widgets.HBox([countries_widget]))
    display(widgets.HBox([start_widget, end_widget]))
else:
    countries_widget = None
    start_widget = None
    end_widget = None
    print("ipywidgets not available. Set filters via variables below.")
    # Fallback manual settings:
    fallback_countries = DEFAULT_COUNTRIES
    fallback_start = min_date
    fallback_end = max_date

def apply_filters(df_in, countries, start_date, end_date):
    # Convert possible datetime.date -> Timestamp
    start_ts = pd.to_datetime(start_date)
    end_ts = pd.to_datetime(end_date)
    mask = (df_in["location"].isin(countries)) & (df_in["date"] >= start_ts) & (df_in["date"] <= end_ts)
    return df_in.loc[mask].copy()

if WIDGETS_AVAILABLE:
    selected_countries = list(countries_widget.value) if countries_widget.value else DEFAULT_COUNTRIES
    selected_start = start_widget.value or min_date
    selected_end = end_widget.value or max_date
else:
    selected_countries = fallback_countries
    selected_start = fallback_start
    selected_end = fallback_end

df_sel = apply_filters(df, selected_countries, selected_start, selected_end)

Markdown(f"""
<div class='note'>
<b>Selection:</b> {', '.join(selected_countries)} | {selected_start} → {selected_end}
</div>
""")

### Filter panel

HBox(children=(SelectMultiple(description='Countries', index=(114,), layout=Layout(width='50%'), options=('Afg…

HBox(children=(DatePicker(value=datetime.date(2020, 1, 1), description='Start date', step=1), DatePicker(value…


<div class='note'>
<b>Selection:</b> Kenya | 2020-01-01 → 2023-09-28
</div>


## KPI summary

Key figures after filters:
- Total cases, deaths, vaccinations
- CFR (deaths / cases) and deaths per 100k (if population is available)
- Active cases (if recoveries exist)

In [6]:
def fmt_int(x):
    return "N/A" if x is None or pd.isna(x) else f"{int(round(x)):,}"

def latest_value(series):
    if series.empty:
        return None
    return series.dropna().iloc[-1] if not series.dropna().empty else None

# Aggregate across selected countries for the latest date in range
if not df_sel.empty:
    last_date = df_sel["date"].max()
    df_latest = df_sel[df_sel["date"] == last_date]
else:
    last_date = None
    df_latest = df_sel

total_cases = fmt_int(df_latest["total_cases"].sum()) if "total_cases" in df_sel.columns else "N/A"
total_deaths = fmt_int(df_latest["total_deaths"].sum()) if "total_deaths" in df_sel.columns else "N/A"
total_vax = fmt_int(df_latest["total_vaccinations"].sum()) if "total_vaccinations" in df_sel.columns else "N/A"

# Mean CFR and deaths per 100k as a simple roll-up
mean_cfr = df_latest["cfr"].mean() if "cfr" in df_sel.columns else np.nan
mean_cfr_txt = f"{mean_cfr:.2%}" if pd.notna(mean_cfr) else "N/A"

mean_deaths_100k = df_latest["deaths_per_100k"].mean() if "deaths_per_100k" in df_sel.columns else np.nan
mean_deaths_100k_txt = f"{mean_deaths_100k:,.2f}" if pd.notna(mean_deaths_100k) else "N/A"

# Active cases if available
if "active_cases" in df_sel.columns:
    active_cases = fmt_int(df_latest["active_cases"].sum())
else:
    active_cases = "N/A"

HTML(f"""
<div class="section"><h3>KPI summary (as of {last_date.date() if last_date else '—'})</h3></div>
<div class="kpi"><div class="label">🧪 Total cases</div><div class="value">{total_cases}</div></div>
<div class="kpi"><div class="label">☠️ Total deaths</div><div class="value">{total_deaths}</div></div>
<div class="kpi"><div class="label">💉 Total vaccinations</div><div class="value">{total_vax}</div></div>
<div class="kpi"><div class="label">📉 CFR</div><div class="value">{mean_cfr_txt}</div><div class="delta">Deaths / Cases</div></div>
<div class="kpi"><div class="label">⚖️ Deaths per 100k</div><div class="value">{mean_deaths_100k_txt}</div></div>
<div class="kpi"><div class="label">🩺 Active cases</div><div class="value">{active_cases}</div></div>
""")

## Time series trends

We visualize cumulative totals and daily new metrics. Use the filters to compare multiple countries.

In [7]:
def plot_ts(df_in, y, title, color="location", mode="lines"):
    fig = px.line(df_in, x="date", y=y, color=color, title=title)
    fig.update_traces(mode=mode)
    return apply_plotly_theme(fig)

def plot_ts_bar(df_in, y, title, color="location"):
    fig = px.bar(df_in, x="date", y=y, color=color, title=title, barmode="group")
    return apply_plotly_theme(fig)

# Cumulative totals
fig_cases = plot_ts(df_sel, "total_cases", "Total Cases Over Time") if "total_cases" in df_sel.columns else None
fig_deaths = plot_ts(df_sel, "total_deaths", "Total Deaths Over Time") if "total_deaths" in df_sel.columns else None
fig_vax = plot_ts(df_sel, "total_vaccinations", "Total Vaccinations Over Time") if "total_vaccinations" in df_sel.columns else None

# Daily new
fig_new_cases = plot_ts_bar(df_sel, "new_cases", "Daily New Cases") if "new_cases" in df_sel.columns else None
fig_new_deaths = plot_ts_bar(df_sel, "new_deaths", "Daily New Deaths") if "new_deaths" in df_sel.columns else None
fig_new_vax = plot_ts_bar(df_sel, "new_vaccinations", "Daily New Vaccinations") if "new_vaccinations" in df_sel.columns else None

for f in [fig_cases, fig_deaths, fig_vax, fig_new_cases, fig_new_deaths, fig_new_vax]:
    if f:
        f.show()

## Vaccination rollout

We compare cumulative vaccinations and, when available, vaccinations per hundred people.
This adds context beyond absolute counts.

In [8]:
# Prefer provided per-hundred column; else use estimate we computed if present
per_hundred_col = None
if "people_vaccinated_per_hundred" in df_sel.columns:
    per_hundred_col = "people_vaccinated_per_hundred"
elif "vaccinations_per_hundred_est" in df_sel.columns:
    per_hundred_col = "vaccinations_per_hundred_est"

if "total_vaccinations" in df_sel.columns:
    fig_rollout = plot_ts(df_sel, "total_vaccinations", "Vaccination Rollout (Total Vaccinations)")
    fig_rollout.show()

if per_hundred_col:
    fig_percap = plot_ts(df_sel, per_hundred_col, "Vaccinations per Hundred (approx.)")
    fig_percap.show()

## Global snapshot (choropleth)

We render a choropleth map for a selected date to visualize spatial differences.
Note: We use the closest available data on or before the selected end date to avoid sparse maps.

In [9]:
# Prepare a snapshot on the latest selected date per country
# If iso_code missing, Plotly can still plot using location but better with ISO-3.
snapshot_date = pd.to_datetime(selected_end)
df_snap = (
    df[df["date"] <= snapshot_date]
    .sort_values(["location", "date"])
    .groupby("location", as_index=False)
    .tail(1)
)

# Pick a metric for the map
map_metric = "total_cases" if "total_cases" in df_snap.columns else None

if map_metric:
    title = f"Global {map_metric.replace('_',' ').title()} — up to {snapshot_date.date()}"
    if "iso_code" in df_snap.columns:
        fig_map = px.choropleth(
            df_snap, locations="iso_code", color=map_metric,
            hover_name="location", color_continuous_scale="Peach",
            title=title
        )
    else:
        # Fallback using names (less reliable)
        fig_map = px.choropleth(
            df_snap, locations="location", locationmode="country names",
            color=map_metric, hover_name="location",
            color_continuous_scale="Peach", title=title
        )
    apply_plotly_theme(fig_map)
    fig_map.show()
else:
    display(Markdown("> Map skipped — no suitable metric available (e.g., total_cases)."))

## Insights

We extract simple interpretable insights:
- Peak dates for daily new cases/deaths
- Recent 7-day trend direction
- Notes for supporter-facing summaries

In [10]:
def peak_info(df_in, metric, by="location"):
    out = []
    if metric not in df_in.columns or df_in.empty:
        return out
    for loc, g in df_in.groupby(by):
        g = g.dropna(subset=[metric])
        if g.empty:
            continue
        idx = g[metric].idxmax()
        peak_val = g.loc[idx, metric]
        peak_date = g.loc[idx, "date"]
        out.append((loc, peak_date.date(), float(peak_val)))
    return sorted(out, key=lambda x: x[0])

def recent_trend(df_in, metric, days=7, by="location"):
    out = []
    if metric not in df_in.columns or df_in.empty:
        return out
    end = df_in["date"].max()
    start = end - pd.Timedelta(days=days-1)
    window = df_in[(df_in["date"] >= start) & (df_in["date"] <= end)]
    for loc, g in window.groupby(by):
        g = g.dropna(subset=[metric]).sort_values("date")
        if len(g) < 2:
            continue
        change = g[metric].iloc[-1] - g[metric].iloc[0]
        direction = "up" if change > 0 else "down" if change < 0 else "flat"
        out.append((loc, direction, float(change)))
    return sorted(out, key=lambda x: x[0])

insights_md = []

# Peaks
for metric in ["new_cases", "new_deaths"]:
    peaks = peak_info(df_sel, metric)
    if peaks:
        lines = [f"- {loc}: peak {metric.replace('_',' ')} on {d} with {int(v):,}" for loc, d, v in peaks]
        insights_md.append(f"**Peaks in {metric.replace('_',' ')}:**\n" + "\n".join(lines))

# Trends (7-day change)
for metric in ["new_cases", "new_deaths"]:
    trends = recent_trend(df_sel, metric, days=7)
    if trends:
        lines = [f"- {loc}: {direction} over last 7 days (Δ {int(change):,})" for loc, direction, change in trends]
        insights_md.append(f"**7-day trend for {metric.replace('_',' ')}:**\n" + "\n".join(lines))

if insights_md:
    display(Markdown("### Insight highlights"))
    display(Markdown("\n\n".join(insights_md)))
else:
    display(Markdown("> No insights available for the current selection."))

### Insight highlights

**Peaks in new cases:**
- Kenya: peak new cases on 2021-12-26 with 3,749

**Peaks in new deaths:**
- Kenya: peak new deaths on 2021-09-25 with 37

**7-day trend for new cases:**
- Kenya: flat over last 7 days (Δ 0)

**7-day trend for new deaths:**
- Kenya: flat over last 7 days (Δ 0)

## Export

- Save filtered data to CSV for sharing or downstream analysis.
- Export figures to self-contained HTML files.
- Convert the notebook to HTML or PDF via `nbconvert` for distribution.

In [11]:
EXPORT_DIR = "exports"
os.makedirs(EXPORT_DIR, exist_ok=True)

# Save filtered data
filtered_csv_path = os.path.join(EXPORT_DIR, "filtered_data.csv")
df_sel.to_csv(filtered_csv_path, index=False)
print("Saved:", filtered_csv_path)

# Save figures (if created)
def save_fig(fig, name):
    out = os.path.join(EXPORT_DIR, f"{name}.html")
    fig.write_html(out, include_plotlyjs="cdn", full_html=True)
    print("Saved figure:", out)

if 'fig_cases' in globals() and fig_cases:
    save_fig(fig_cases, "total_cases_ts")
if 'fig_deaths' in globals() and fig_deaths:
    save_fig(fig_deaths, "total_deaths_ts")
if 'fig_vax' in globals() and fig_vax:
    save_fig(fig_vax, "total_vaccinations_ts")
if 'fig_new_cases' in globals() and fig_new_cases:
    save_fig(fig_new_cases, "daily_new_cases")
if 'fig_new_deaths' in globals() and fig_new_deaths:
    save_fig(fig_new_deaths, "daily_new_deaths")
if 'fig_new_vax' in globals() and fig_new_vax:
    save_fig(fig_new_vax, "daily_new_vaccinations")
if 'fig_map' in globals():
    save_fig(fig_map, "choropleth_snapshot")

print("\nTo export this notebook to HTML:")
print("!jupyter nbconvert --to html --TemplateExporter.exclude_input=False --TemplateExporter.exclude_output=False your_notebook.ipynb")

Saved: exports\filtered_data.csv
Saved figure: exports\total_cases_ts.html
Saved figure: exports\total_deaths_ts.html
Saved figure: exports\total_vaccinations_ts.html
Saved figure: exports\daily_new_cases.html
Saved figure: exports\daily_new_deaths.html
Saved figure: exports\daily_new_vaccinations.html
Saved figure: exports\choropleth_snapshot.html

To export this notebook to HTML:
!jupyter nbconvert --to html --TemplateExporter.exclude_input=False --TemplateExporter.exclude_output=False your_notebook.ipynb


## Streamlit alignment

If you also deploy via Streamlit, here’s a minimal app scaffold that mirrors the notebook filters.
This writes `COVID_Tracker.py` so you can run: `streamlit run COVID_Tracker.py`.

In [12]:
APP_PY = os.path.join(EXPORT_DIR, "COVID_Tracker.py")

app_code = f"""\
import pandas as pd
import streamlit as st

st.set_page_config(page_title="COVID-19 Global Tracker 🌍", layout="wide")

@st.cache_data
def load_data():
    df = pd.read_csv("{os.path.basename(DATA_FILE)}")
    df.columns = [c.strip().lower() for c in df.columns]
    df['date'] = pd.to_datetime(df['date'], errors='coerce')
    df.dropna(subset=['date'], inplace=True)
    df.sort_values(['location', 'date'], inplace=True)
    return df

df = load_data()

st.sidebar.title("🔎 Filters")
countries = sorted(df['location'].dropna().unique())
default = ['Kenya'] if 'Kenya' in countries else [countries[0]]
selected_country = st.sidebar.selectbox("Country", options=countries, index=countries.index(default[0]))
min_date = df[df['location'] == selected_country]['date'].min().date()
max_date = df[df['location'] == selected_country]['date'].max().date()
start_date = st.sidebar.date_input("Start date", value=min_date, min_value=min_date, max_value=max_date)
end_date = st.sidebar.date_input("End date", value=max_date, min_value=min_date, max_value=max_date)

# Convert date_input (date) -> pandas Timestamp
start_ts = pd.to_datetime({{start_date}})
end_ts = pd.to_datetime(end_date)

df_country = df[df['location'] == {{selected_country}}].copy()
df_country = df_country[(df_country['date'] >= start_ts) & (df_country['date'] <= end_ts)]

st.title("🦠 COVID-19 Data Tracker")
st.markdown(f"Showing data for **{{selected_country}}** from **{{start_date}}** to **{{end_date}}**")

def metric_val(series):
    return "N/A" if series.dropna().empty else f"{{int(series.dropna().iloc[-1]):,}}"

col1, col2, col3 = st.columns(3)
with col1:
    st.metric("🧪 Total Cases", metric_val(df_country.get('total_cases', pd.Series(dtype=float))))
with col2:
    st.metric("☠️ Total Deaths", metric_val(df_country.get('total_deaths', pd.Series(dtype=float))))
with col3:
    st.metric("💉 Total Vaccinations", metric_val(df_country.get('total_vaccinations', pd.Series(dtype=float))))

# Plots
import plotly.express as px

def show_line(y, title):
    if y not in df_country.columns or df_country[y].dropna().empty:
        return
    fig = px.line(df_country, x="date", y=y, title=title, markers=False)
    st.plotly_chart(fig, use_container_width=True)

def show_bar(y, title):
    if y not in df_country.columns or df_country[y].dropna().empty:
        return
    fig = px.bar(df_country, x="date", y=y, title=title)
    st.plotly_chart(fig, use_container_width=True)

show_line("total_cases", "Total Cases Over Time")
show_line("total_deaths", "Total Deaths Over Time")
show_line("total_vaccinations", "Total Vaccinations Over Time")

show_bar("new_cases", "Daily New Cases")
show_bar("new_deaths", "Daily New Deaths")
show_bar("new_vaccinations", "Daily New Vaccinations")

st.caption("Built with care, clarity, and a blush beige vibe ✨")
"""

with open(APP_PY, "w", encoding="utf-8") as f:
    f.write(app_code)

print("Wrote Streamlit app to:", APP_PY)
print("To run:")
print(f"cd {EXPORT_DIR} && streamlit run COVID_Tracker.py")

Wrote Streamlit app to: exports\COVID_Tracker.py
To run:
cd exports && streamlit run COVID_Tracker.py


## Supporter-facing message 

Use this block when exporting to HTML or embedding in reports. It brings warmth and context.

> This tracker reflects how communities across Kenya and beyond navigated the pandemic with resilience, data, and care.
> May these numbers guide decisions and deepen empathy for the people behind every line and dot.