---
title: Downloading and Harmonizing Dengue Data from OpenDengue
short_title: Dengue Cases
---

This guide demonstrates how to download and harmonize [**OpenDengue**](https://opendengue.org/data.html) case data for use with DHIS2. The same approach can also be applied to local Dengue case counts from official Ministry of Health data.

The notebook focuses on **data harmonization and preparation** using a worked example for **Nepal (districts / admin2)** and **monthly** data.

## Inputs

This workflow expects two local input files under `../../data/`:

- `nepal-opendengue.csv` — [**OpenDengue**](https://opendengue.org/data.html) export containing Nepal dengue case counts
- `nepal-locations.geojson` — Nepal district organisation units (admin2)

## Output

The workflow produces:

- `nepal-dengue-harmonized.csv` — harmonized monthly dengue cases per district (`time_period`, `location`, `disease_cases`)


In [1]:
from pathlib import Path

import pandas as pd
import geopandas as gpd

pd.set_option("display.max_columns", 200)

## Paths

In [2]:
DATA_FOLDER = Path("../../data")

LOCATIONS_GEOJSON = DATA_FOLDER / "nepal-locations.geojson"
OPENDENGUE_SOURCE_PATH = DATA_FOLDER / "nepal-opendengue.csv"

# Output
OUT_CSV = DATA_FOLDER / "nepal-dengue-harmonized.csv"

for p in [LOCATIONS_GEOJSON, OPENDENGUE_SOURCE_PATH]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required input: {p}")

print("Using inputs:")
print(" -", LOCATIONS_GEOJSON)
print(" -", OPENDENGUE_SOURCE_PATH)


Using inputs:
 - ..\..\data\nepal-locations.geojson
 - ..\..\data\nepal-opendengue.csv


## Load DHIS2 district locations

In [3]:
locations = gpd.read_file(LOCATIONS_GEOJSON)

# DHIS2 UID 
uid_col = "id" if "id" in locations.columns else None
if uid_col is None:
    raise KeyError(f"Expected DHIS2 UID in GeoJSON 'id'. Found: {list(locations.columns)}")

locations["location"] = locations[uid_col].astype(str).str.strip() 

# Join helper (district name)
if "name" not in locations.columns:
    raise KeyError(f"Expected district name in GeoJSON 'name'. Found: {list(locations.columns)}")

locations["district_name"] = (
    locations["name"].astype(str)
    .str.replace(r"^\s*\d+\s+", "", regex=True)  # drop "101 that came with location names"
    .str.upper()
    .str.strip()
)

# Keep only what we need
locations = locations[["location", "district_name", "geometry"]].dropna(subset=["location"]).copy()
locations

Unnamed: 0,location,district_name,geometry
0,BdLcDbLQd88,TAPLEJUNG,"POLYGON ((87.6988 27.2911, 87.6983 27.2909, 87..."
1,uHEl9oRZm8L,SANKHUWASABHA,"POLYGON ((87.5735 27.8614, 87.5739 27.8611, 87..."
2,Wep3D4POB3H,SOLUKHUMBU,"POLYGON ((86.817 27.4329, 86.8166 27.4327, 86...."
3,B7X957nA1lM,OKHALDHUNGA,"POLYGON ((86.2589 27.4428, 86.2592 27.4439, 86..."
4,LnJ8MTOmgGa,KHOTANG,"POLYGON ((86.8376 27.4306, 86.8396 27.4304, 86..."
...,...,...,...
72,uyGkoQ4F9rT,DADELDHURA,"POLYGON ((80.5543 29.4184, 80.5546 29.4179, 80..."
73,HkwA8YQgJK6,DOTI,"POLYGON ((81.1366 29.3093, 81.1368 29.3092, 81..."
74,GnpIwWbdxdl,ACHHAM,"POLYGON ((81.4891 29.1729, 81.4903 29.1713, 81..."
75,RSAZqrdfXy5,KAILALI,"POLYGON ((81.2894 28.6491, 81.2895 28.6488, 81..."


## Load OpenDengue

In [27]:
df_raw = pd.read_csv(OPENDENGUE_SOURCE_PATH)
print("Loaded:", OPENDENGUE_SOURCE_PATH)
print("Columns:", df_raw.columns.tolist())
df_raw.head()

Loaded: ..\..\data\nepal-opendengue.csv
Columns: ['adm_0_name', 'adm_1_name', 'adm_2_name', 'full_name', 'ISO_A0', 'FAO_GAUL_code', 'RNE_iso_code', 'IBGE_code', 'calendar_start_date', 'calendar_end_date', 'Year', 'dengue_total', 'case_definition_standardised', 'S_res', 'T_res', 'UUID', 'region']


Unnamed: 0,adm_0_name,adm_1_name,adm_2_name,full_name,ISO_A0,FAO_GAUL_code,RNE_iso_code,IBGE_code,calendar_start_date,calendar_end_date,Year,dengue_total,case_definition_standardised,S_res,T_res,UUID,region
0,NEPAL,,,NEPAL,NPL,175,NPL,,1987-01-01,1987-12-31,1987,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
1,NEPAL,,,NEPAL,NPL,175,NPL,,1985-01-01,1985-12-31,1985,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
2,NEPAL,,,NEPAL,NPL,175,NPL,,1986-01-01,1986-12-31,1986,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
3,NEPAL,,,NEPAL,NPL,175,NPL,,1991-01-01,1991-12-31,1991,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
4,NEPAL,,,NEPAL,NPL,175,NPL,,1988-01-01,1988-12-31,1988,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO


OpenDengue contains multiple administrative levels in the same file, so we subset to only the Admin 2 units. 

In [28]:
df_raw = df_raw[df_raw['S_res']=='Admin2']
print('Number of rows after filtering to admin2 units:', len(df_raw))

Number of rows after filtering to admin2 units: 2772


## Column mapping

In [29]:
# OpenDengue export columns
DATE_COL = "calendar_start_date"
CASES_COL = "dengue_total"
ADMIN2_COL = "adm_2_name"

missing = [c for c in [DATE_COL, CASES_COL, ADMIN2_COL] if c not in df_raw.columns]
if missing:
    raise KeyError(
        f"Input CSV is missing required columns: {missing}. "
        f"Available columns: {df_raw.columns.tolist()}"
    )

print("Using columns:", {"date": DATE_COL, "cases": CASES_COL, "admin2": ADMIN2_COL})

Using columns: {'date': 'calendar_start_date', 'cases': 'dengue_total', 'admin2': 'adm_2_name'}


## Normalize OpenDengue (Nepal districts)

In [36]:
df_norm = pd.DataFrame({
    "date": pd.to_datetime(df_raw[DATE_COL], errors="coerce"),
    "cases": pd.to_numeric(df_raw[CASES_COL], errors="coerce"),
    "district_name": df_raw[ADMIN2_COL],   # <-- not location yet
})

# Normalize district name for the crosswalk join
df_norm["district_name"] = (
    df_norm["district_name"]
    .astype(str)
    .str.upper()
    .str.strip()
    .str.replace(r"\s+", " ", regex=True)
)

# Keep only valid rows
# Map district_name -> DHIS2 orgUnit UID
df_norm = df_norm.merge(
    locations[["district_name", "location"]],
    on="district_name",
    how="left",
)

# Fail fast (or drop) if mapping is incomplete
unmapped = df_norm["location"].isna().mean()
print(f"Unmapped dengue rows: {unmapped:.2%}")
if unmapped > 0:
    print("Examples:", df_norm.loc[df_norm["location"].isna(), "district_name"].drop_duplicates().head(15).tolist())

# Drop rows with missing values
df_norm = df_norm.dropna(subset=["location"]).copy()
df_norm = df_norm.dropna(subset=["date", "cases", "district_name"])

df_norm

Unmapped dengue rows: 1.30%
Examples: ['CHITAWAN']


Unnamed: 0,date,cases,district_name,location
0,2022-01-01,0,ACHHAM,GnpIwWbdxdl
1,2022-01-01,0,ARGHAKHANCHI,cMWLZfK0O4z
2,2022-01-01,0,BAGLUNG,A3TeVhjjS2u
3,2022-01-01,0,BAITADI,NSLL7YIXBJH
4,2022-01-01,1,BAJHANG,JBTkOU5m0Bu
...,...,...,...,...
2767,2022-01-01,0,DOLPA,RZxElQxEbZN
2768,2022-01-01,0,DOTI,HkwA8YQgJK6
2769,2022-01-01,0,NAWALPARASI WEST,ebbAyOhorzo
2770,2022-01-01,0,MYAGDI,q7VB2VrUr83


## Monthly aggregation

In [37]:
# Convert to month period label (YYYY-MM)
df_norm["time_period"] = df_norm["date"].dt.to_period("M").astype(str)

# Aggregate within month + location
disease = (
    df_norm.groupby(["time_period", "location"], as_index=False)["cases"]
    .sum()
    .rename(columns={"cases": "disease_cases"})
)

print("Aggregated rows:", len(disease))
disease

Aggregated rows: 2736


Unnamed: 0,time_period,location,disease_cases
0,2022-01,A3R7UT64jHf,0
1,2022-01,A3TeVhjjS2u,0
2,2022-01,B7X957nA1lM,0
3,2022-01,BITtrV4c0xf,0
4,2022-01,BdLcDbLQd88,0
...,...,...,...
2731,2024-12,wYKHZ0iyjfd,2
2732,2024-12,xGQwwqLQ819,0
2733,2024-12,yP2GELa6inf,0
2734,2024-12,ycBkX3TlIrL,0


## Filter to districts and align time axis

In [38]:
# Keep only locations present in the GeoJSON backbone
before = len(disease)
disease = disease.merge(locations[["location"]], on="location", how="inner")
after = len(disease)
print(f"Backbone filter kept {after}/{before} rows")

# Build full (time_period x location) grid and fill missing with 0
all_months = pd.period_range(disease["time_period"].min(), disease["time_period"].max(), freq="M").astype(str)
all_locations = locations["location"].sort_values().unique()

grid = pd.MultiIndex.from_product([all_months, all_locations], names=["time_period", "location"]).to_frame(index=False)

disease_full = grid.merge(disease, on=["time_period", "location"], how="left")
disease_full["disease_cases"] = disease_full["disease_cases"].fillna(0)

# Keep integer-looking values as ints where possible
disease_full["disease_cases"] = pd.to_numeric(disease_full["disease_cases"], errors="coerce").fillna(0).astype(int)

print("Final rows (complete grid):", len(disease_full))
disease_full

Backbone filter kept 2736/2736 rows
Final rows (complete grid): 2772


Unnamed: 0,time_period,location,disease_cases
0,2022-01,A3R7UT64jHf,0
1,2022-01,A3TeVhjjS2u,0
2,2022-01,B7X957nA1lM,0
3,2022-01,BITtrV4c0xf,0
4,2022-01,BdLcDbLQd88,0
...,...,...,...
2767,2024-12,wYKHZ0iyjfd,2
2768,2024-12,xGQwwqLQ819,0
2769,2024-12,yP2GELa6inf,0
2770,2024-12,ycBkX3TlIrL,0


## Write output CSV

In [39]:
disease_full.to_csv(OUT_CSV, index=False)
print("Wrote:", OUT_CSV)
OUT_CSV

Wrote: ..\..\data\nepal-dengue-harmonized.csv


WindowsPath('../../data/nepal-dengue-harmonized.csv')

## Next steps

This guide stops after downloading and producing a harmonized, DHIS2-ready OpenDengue dataset for Nepal.

To import the resulting data into DHIS2, see the section for [importing data to DHIS2](../../import-data/intro.md).