---
title: Downloading and Harmonizing Dengue Data from OpenDengue
short_title: Dengue Cases
---

This guide demonstrates how to download and harmonize [**OpenDengue**](https://opendengue.org/data.html) case data for use with DHIS2. The same approach can also be applied to local Dengue case counts from official Ministry of Health data.

The notebook focuses on **data harmonization and preparation** using a worked example for **Nepal (districts / admin2)** and **monthly** data.

## Inputs

This workflow expects two local input files under `../../guides/data/`:

- `nepal-opendengue.csv` — [**OpenDengue**](https://opendengue.org/data.html) export containing Nepal dengue case counts
- `nepal-locations.geojson` — Nepal district geometries (admin2)

## Output

The workflow produces:

- `nepal-dengue-harmonized.csv` — harmonized monthly dengue cases per district (`time_period`, `location`, `disease_cases`)


In [None]:
from pathlib import Path

import pandas as pd
import geopandas as gpd

pd.set_option("display.max_columns", 200)


## Paths

In [None]:
DATA_FOLDER = Path("../../guides/data")

LOCATIONS_GEOJSON = DATA_FOLDER / "nepal-locations.geojson"
OPENDENGUE_SOURCE_PATH = DATA_FOLDER / "nepal-opendengue.csv"

# Output
OUT_CSV = DATA_FOLDER / "nepal-dengue-harmonized.csv"

for p in [LOCATIONS_GEOJSON, OPENDENGUE_SOURCE_PATH]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required input: {p}")

print("Using inputs:")
print(" -", LOCATIONS_GEOJSON)
print(" -", OPENDENGUE_SOURCE_PATH)


## Load district locations

In [None]:
locations = gpd.read_file(LOCATIONS_GEOJSON)

# DHIS2 UID 
uid_col = "id" if "id" in locations.columns else None
if uid_col is None:
    raise KeyError(f"Expected DHIS2 UID in GeoJSON 'id'. Found: {list(locations.columns)}")

locations["location"] = locations[uid_col].astype(str).str.strip() 

# Join helper (district name)
if "name" not in locations.columns:
    raise KeyError(f"Expected district name in GeoJSON 'name'. Found: {list(locations.columns)}")

locations["district_name"] = (
    locations["name"].astype(str)
    .str.replace(r"^\s*\d+\s+", "", regex=True)  # drop "101 that came with location names"
    .str.upper()
    .str.strip()
)

# Keep only what we need
locations = locations[["location", "district_name", "geometry"]].dropna(subset=["location"]).copy()


## Load OpenDengue

In [None]:
df_raw = pd.read_csv(OPENDENGUE_SOURCE_PATH)
print("Loaded:", OPENDENGUE_SOURCE_PATH)
print("Columns:", df_raw.columns.tolist())
df_raw.head()


OpenDengue contains multiple administrative levels in the same file, so we subset to only the Admin 2 units. 

In [None]:
df_adm2 = df_raw[df_raw['S_res']=='Admin2']
print('Number of rows after filtering to admin2 units:', len(df_adm2))

## Column mapping

In [None]:
# OpenDengue export columns
DATE_COL = "calendar_start_date"
CASES_COL = "dengue_total"
ADMIN2_COL = "adm_2_name"

missing = [c for c in [DATE_COL, CASES_COL, ADMIN2_COL] if c not in df_raw.columns]
if missing:
    raise KeyError(
        f"Input CSV is missing required columns: {missing}. "
        f"Available columns: {df_raw.columns.tolist()}"
    )

print("Using columns:", {"date": DATE_COL, "cases": CASES_COL, "admin2": ADMIN2_COL})


## Normalize OpenDengue (Nepal districts / admin2)

In [None]:
df_norm = pd.DataFrame({
    "date": pd.to_datetime(df_raw[DATE_COL], errors="coerce"),
    "cases": pd.to_numeric(df_raw[CASES_COL], errors="coerce"),
    "district_name": df_raw[ADMIN2_COL],   # <-- not location yet
})

# Normalize district name for the crosswalk join
df_norm["district_name"] = (
    df_norm["district_name"]
    .astype(str)
    .str.upper()
    .str.strip()
    .str.replace(r"\s+", " ", regex=True)
)

# Keep only valid rows
# Map district_name -> DHIS2 orgUnit UID
df_norm = df_norm.merge(
    locations[["district_name", "location"]],
    on="district_name",
    how="left",
)

# Fail fast (or drop) if mapping is incomplete
unmapped = df_norm["location"].isna().mean()
print(f"Unmapped dengue rows: {unmapped:.2%}")
if unmapped > 0:
    print("Examples:", df_norm.loc[df_norm["location"].isna(), "district_name"].drop_duplicates().head(15).tolist())

df_norm = df_norm.dropna(subset=["location"]).copy()


df_norm = df_norm.dropna(subset=["date", "cases", "district_name"])
df_norm = df_norm[df_norm["district_name"].ne("")]

df_norm.head()


## Monthly aggregation

In [None]:
# Convert to month period label (YYYY-MM)
df_norm["time_period"] = df_norm["date"].dt.to_period("M").astype(str)

# Aggregate within month + location
disease = (
    df_norm.groupby(["time_period", "location"], as_index=False)["cases"]
    .sum()
    .rename(columns={"cases": "disease_cases"})
)

print("Aggregated rows:", len(disease))
disease.head()


## Filter to spatial backbone and align time axis

## Filter to districts and align time axis

In [None]:
# Keep only locations present in the GeoJSON backbone
before = len(disease)
disease = disease.merge(locations[["location"]], on="location", how="inner")
after = len(disease)
print(f"Backbone filter kept {after}/{before} rows")

# Build full (time_period x location) grid and fill missing with 0
all_months = pd.period_range(disease["time_period"].min(), disease["time_period"].max(), freq="M").astype(str)
all_locations = locations["location"].sort_values().unique()

grid = pd.MultiIndex.from_product([all_months, all_locations], names=["time_period", "location"]).to_frame(index=False)

disease_full = grid.merge(disease, on=["time_period", "location"], how="left")
disease_full["disease_cases"] = disease_full["disease_cases"].fillna(0)

# Keep integer-looking values as ints where possible
disease_full["disease_cases"] = pd.to_numeric(disease_full["disease_cases"], errors="coerce").fillna(0).astype(int)

print("Final rows (complete grid):", len(disease_full))
disease_full.head()


## Write output CSV

In [None]:
disease_full.to_csv(OUT_CSV, index=False)
print("Wrote:", OUT_CSV)
OUT_CSV


## Next steps

This guide stops after downloading and producing a harmonized, DHIS2-ready dataset.

To import the resulting data into DHIS2, see our guides for [importing data to DHIS2](../../import-data/intro.md).
