---
title: Full workflow for importing Dengue data into DHIS2
short_title: Import Dengue Data
---

This workflow demonstrates the end-to-end preparation of importing dengue case data into DHIS2. We demonstrate the workflow using [**OpenDengue**](https://opendengue.org/data.html) data, otherwise it is expected for countries to use official Ministry of Health data.

The notebook focuses on **data harmonization and preparation** using a worked example for **Nepal (districts / admin2)** and **monthly** data. The final DHIS2 import step follows the same approach as the WorldPop and CHIRPS workflows and is therefore not repeated in full here.

## Inputs

This workflow expects two local input files under `../../guides/data/`:

- `nepal-opendengue.csv` — [**OpenDengue**](https://opendengue.org/data.html) export containing Nepal dengue case counts
- `nepal-locations.geojson` — Nepal district geometries (admin2)

## Output

The workflow produces:

- `nepal-dengue-harmonized.csv` — harmonized monthly dengue cases per district (`time_period`, `location`, `disease_cases`)


In [51]:
from pathlib import Path

import json
import pandas as pd
import geopandas as gpd

pd.set_option("display.max_columns", 200)

## Paths

In [52]:
# Root folder
DATA_FOLDER = Path("../../guides/data")

LOCATIONS_GEOJSON = DATA_FOLDER / "nepal-locations.geojson"
OPENDENGUE_SOURCE_PATH = DATA_FOLDER / "nepal-opendengue.csv"

# Output
OUT_CSV = DATA_FOLDER / "nepal-dengue-harmonized.csv"

for p in [LOCATIONS_GEOJSON, OPENDENGUE_SOURCE_PATH]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required input: {p}")

print("Using inputs:")
print(" -", LOCATIONS_GEOJSON)
print(" -", OPENDENGUE_SOURCE_PATH)

Using inputs:
 - ..\..\guides\data\nepal-locations.geojson
 - ..\..\guides\data\nepal-opendengue.csv


## Load district locations

In [53]:
def norm_name(s: pd.Series) -> pd.Series:
    return (
        s.astype(str)
         .str.upper()
         .str.replace(r"\d+", " ", regex=True)      # remove leading digits
         .str.replace(r"\s+", " ", regex=True)      # normalize all whitespace
         .str.strip()                               # remove any leading or trailing whitespace
    )

with open(LOCATIONS_GEOJSON, "r", encoding="utf-8") as f:
    gj = json.load(f)

districts = pd.DataFrame([{
    "location": feat.get("id"),                      # DHIS2 level-2 orgUnit UID
    "name_raw": feat["properties"].get("name", ""),  # e.g. "101 TAPLEJUNG"
} for feat in gj["features"]])

districts["district_name"] = norm_name(districts["name_raw"])

# sanity
assert districts["location"].notna().all()
assert districts["district_name"].notna().all()
assert not districts.duplicated("district_name").any()

print("districts:", len(districts))
districts

districts: 77


Unnamed: 0,location,name_raw,district_name
0,BdLcDbLQd88,101 TAPLEJUNG,TAPLEJUNG
1,uHEl9oRZm8L,102 SANKHUWASABHA,SANKHUWASABHA
2,Wep3D4POB3H,103 SOLUKHUMBU,SOLUKHUMBU
3,B7X957nA1lM,104 OKHALDHUNGA,OKHALDHUNGA
4,LnJ8MTOmgGa,105 KHOTANG,KHOTANG
...,...,...,...
72,uyGkoQ4F9rT,705 DADELDHURA,DADELDHURA
73,HkwA8YQgJK6,706 DOTI,DOTI
74,GnpIwWbdxdl,707 ACHHAM,ACHHAM
75,RSAZqrdfXy5,708 KAILALI,KAILALI


## Load OpenDengue

In [54]:
df_raw = pd.read_csv(OPENDENGUE_SOURCE_PATH)
print("Loaded:", OPENDENGUE_SOURCE_PATH)
print("Columns:", df_raw.columns.tolist())
df_raw

Loaded: ..\..\guides\data\nepal-opendengue.csv
Columns: ['adm_0_name', 'adm_1_name', 'adm_2_name', 'full_name', 'ISO_A0', 'FAO_GAUL_code', 'RNE_iso_code', 'IBGE_code', 'calendar_start_date', 'calendar_end_date', 'Year', 'dengue_total', 'case_definition_standardised', 'S_res', 'T_res', 'UUID', 'region']


Unnamed: 0,adm_0_name,adm_1_name,adm_2_name,full_name,ISO_A0,FAO_GAUL_code,RNE_iso_code,IBGE_code,calendar_start_date,calendar_end_date,Year,dengue_total,case_definition_standardised,S_res,T_res,UUID,region
0,NEPAL,,,NEPAL,NPL,175,NPL,,1987-01-01,1987-12-31,1987,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
1,NEPAL,,,NEPAL,NPL,175,NPL,,1985-01-01,1985-12-31,1985,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
2,NEPAL,,,NEPAL,NPL,175,NPL,,1986-01-01,1986-12-31,1986,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
3,NEPAL,,,NEPAL,NPL,175,NPL,,1991-01-01,1991-12-31,1991,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
4,NEPAL,,,NEPAL,NPL,175,NPL,,1988-01-01,1988-12-31,1988,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2948,NEPAL,KARNALI PROVINCE,DOLPA,"NEPAL, KARNALI PROVINCE, DOLPA",NPL,22359,NP-KA,,2022-01-01,2022-01-31,2022,0,Total,Admin2,Month,MOH-NPL-2022-Y04-00,SEARO
2949,NEPAL,SUDURPASCHIM PROVINCE,DOTI,"NEPAL, SUDURPASCHIM PROVINCE, DOTI",NPL,22357,NP-MA,,2022-01-01,2022-01-31,2022,0,Total,Admin2,Month,MOH-NPL-2022-Y04-00,SEARO
2950,NEPAL,LUMBINI PROVINCE,NAWALPARASI WEST,"NEPAL, LUMBINI PROVINCE, NAWALPARASI WEST",NPL,22364,NP-LU,,2022-01-01,2022-01-31,2022,0,Total,Admin2,Month,MOH-NPL-2022-Y04-00,SEARO
2951,NEPAL,GANDAKI PROVINCE,MYAGDI,"NEPAL, GANDAKI PROVINCE, MYAGDI",NPL,22363,NP-GA,,2022-01-01,2022-01-31,2022,0,Total,Admin2,Month,MOH-NPL-2022-Y04-00,SEARO


OpenDengue contains multiple administrative levels in the same file, so we subset to only the Admin 2 units. 

In [55]:
df_adm2 = df_raw[df_raw['S_res']=='Admin2']
print('Number of rows after filtering to admin2 units:', len(df_adm2))

Number of rows after filtering to admin2 units: 2772


## Column mapping

In [56]:
# OpenDengue export columns (Nepal example)
DATE_COL = "calendar_start_date"
CASES_COL = "dengue_total"
ADMIN2_COL = "adm_2_name"

missing = [c for c in [DATE_COL, CASES_COL, ADMIN2_COL] if c not in df_adm2.columns]
if missing:
    raise KeyError(
        f"Input CSV is missing required columns: {missing}. "
        f"Available columns: {df_adm2.columns.tolist()}"
    )

print("Using columns:", {"date": DATE_COL, "cases": CASES_COL, "admin2": ADMIN2_COL})

Using columns: {'date': 'calendar_start_date', 'cases': 'dengue_total', 'admin2': 'adm_2_name'}


## Normalize OpenDengue (Nepal districts / admin2)

In [57]:
df_norm = pd.DataFrame({
    "date": pd.to_datetime(df_adm2[DATE_COL], errors="coerce"),
    "cases": pd.to_numeric(df_adm2[CASES_COL], errors="coerce"),
    "district_name": df_adm2[ADMIN2_COL],
})

df_norm["district_name"] = norm_name(df_norm["district_name"])

df_norm = df_norm.dropna(subset=["date", "cases", "district_name"])
df_norm = df_norm[df_norm["district_name"].ne("")]

# Map district_name -> DHIS2 UID (location)
df_norm = df_norm.merge(
    districts[["district_name", "location"]],
    on="district_name",
    how="left",
)

unmapped = df_norm["location"].isna().mean()
print(f"Unmapped dengue rows: {unmapped:.2%}")
if unmapped > 0:
    print("Unmapped examples:", df_norm.loc[df_norm["location"].isna(), "district_name"].drop_duplicates().head(20).tolist())

# Keep only mapped rows (otherwise they will never join downstream)
df_norm = df_norm.dropna(subset=["location"]).copy()

df_norm

Unmapped dengue rows: 1.30%
Unmapped examples: ['CHITAWAN']


Unnamed: 0,date,cases,district_name,location
0,2022-01-01,0,ACHHAM,GnpIwWbdxdl
1,2022-01-01,0,ARGHAKHANCHI,cMWLZfK0O4z
2,2022-01-01,0,BAGLUNG,A3TeVhjjS2u
3,2022-01-01,0,BAITADI,NSLL7YIXBJH
4,2022-01-01,1,BAJHANG,JBTkOU5m0Bu
...,...,...,...,...
2767,2022-01-01,0,DOLPA,RZxElQxEbZN
2768,2022-01-01,0,DOTI,HkwA8YQgJK6
2769,2022-01-01,0,NAWALPARASI WEST,ebbAyOhorzo
2770,2022-01-01,0,MYAGDI,q7VB2VrUr83


## Monthly aggregation

In [58]:
df_norm["time_period"] = (
    df_norm["date"]
    .dt.to_period("M")
    .astype(str)
    .str.replace("-", "", regex=False)  # YYYYMM
)

disease = (
    df_norm.groupby(["time_period", "location"], as_index=False)["cases"]
    .sum()
    .rename(columns={"cases": "disease_cases"})
)

print("Aggregated rows:", len(disease))
disease

Aggregated rows: 2736


Unnamed: 0,time_period,location,disease_cases
0,202201,A3R7UT64jHf,0
1,202201,A3TeVhjjS2u,0
2,202201,B7X957nA1lM,0
3,202201,BITtrV4c0xf,0
4,202201,BdLcDbLQd88,0
...,...,...,...
2731,202412,wYKHZ0iyjfd,2
2732,202412,xGQwwqLQ819,0
2733,202412,yP2GELa6inf,0
2734,202412,ycBkX3TlIrL,0


## Filter to districts and align time axis

In [59]:
# Keep only districts present in the DHIS2 UID
before = len(disease)
disease = disease.merge(districts[["location"]].drop_duplicates(), on="location", how="inner")
after = len(disease)
print(f"Districts  kept {after}/{before} rows")

# Build full (time_period x location) grid — preserve missing as NaN (no imputation)
all_months = pd.Index(
    pd.period_range(
        pd.Period(disease["time_period"].min(), freq="M").to_timestamp(),
        pd.Period(disease["time_period"].max(), freq="M").to_timestamp(),
        freq="M",
    ).astype(str).str.replace("-", "", regex=False),  # YYYYMM
    name="time_period",
)

all_locations = pd.Index(districts["location"].dropna().astype(str).sort_values().unique(), name="location")

grid = pd.MultiIndex.from_product([all_months, all_locations], names=["time_period", "location"]).to_frame(index=False)

disease_full = grid.merge(disease, on=["time_period", "location"], how="left")

# Preserve missingness; just ensure numeric dtype
disease_full["disease_cases"] = pd.to_numeric(disease_full["disease_cases"], errors="coerce")

print("Final rows (complete grid):", len(disease_full))
disease_full

Districts  kept 2736/2736 rows
Final rows (complete grid): 2772


Unnamed: 0,time_period,location,disease_cases
0,202201,A3R7UT64jHf,0.0
1,202201,A3TeVhjjS2u,0.0
2,202201,B7X957nA1lM,0.0
3,202201,BITtrV4c0xf,0.0
4,202201,BdLcDbLQd88,0.0
...,...,...,...
2767,202412,wYKHZ0iyjfd,2.0
2768,202412,xGQwwqLQ819,0.0
2769,202412,yP2GELa6inf,0.0
2770,202412,ycBkX3TlIrL,0.0


## Write output CSV

In [None]:
disease_full.to_csv(OUT_CSV, index=False)
print("Wrote:", OUT_CSV)
OUT_CSV

Wrote: ..\..\guides\data\nepal-dengue-harmonized.csv


WindowsPath('../../guides/data/nepal-dengue-harmonized.csv')

## Import into DHIS2

This workflow stops after producing a harmonized, DHIS2-ready dataset.

To import the resulting data into DHIS2:

- create a data element for dengue case counts
- map locations to DHIS2 organisation units
- submit the data using the DHIS2 Web API

The import mechanics are identical to those used in the WorldPop and CHIRPS workflows and are not repeated here.
