---
title: Full workflow for importing Dengue data into DHIS2
short_title: Import Dengue Data
---

This workflow demonstrates the end-to-end preparation of importing dengue case data into DHIS2. We demonstrate the workflow using [**OpenDengue**](https://opendengue.org/data.html) data, otherwise it is expected for countries to use official Ministry of Health data.

The notebook focuses on **data harmonization and preparation** using a worked example for **Nepal (districts / admin2)** and **monthly** data. The final DHIS2 import step follows the same approach as the WorldPop and CHIRPS workflows and is therefore not repeated in full here.

## Inputs

This workflow expects two local input files under `../../guides/data/`:

- `nepal-opendengue.csv` — [**OpenDengue**](https://opendengue.org/data.html) export containing Nepal dengue case counts
- `nepal-locations.geojson` — Nepal district geometries (admin2)

## Output

The workflow produces:

- `nepal-dengue-harmonized.csv` — harmonized monthly dengue cases per district (`time_period`, `location`, `disease_cases`)


In [None]:
from pathlib import Path

import json
import pandas as pd
import geopandas as gpd

pd.set_option("display.max_columns", 200)


## Paths

In [None]:
# Root folder
DATA_FOLDER = Path("../../guides/data")

LOCATIONS_GEOJSON = DATA_FOLDER / "nepal-locations.geojson"
OPENDENGUE_SOURCE_PATH = DATA_FOLDER / "nepal-opendengue.csv"

# Output
OUT_CSV = DATA_FOLDER / "nepal-dengue-harmonized.csv"

for p in [LOCATIONS_GEOJSON, OPENDENGUE_SOURCE_PATH]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required input: {p}")

print("Using inputs:")
print(" -", LOCATIONS_GEOJSON)
print(" -", OPENDENGUE_SOURCE_PATH)


## Load district locations

In [120]:
def norm_name(s: pd.Series) -> pd.Series:
    return (
        s.astype(str)
         .str.upper()
         .str.strip()
         .str.replace(r"\s+", " ", regex=True)
    )

with open(LOCATIONS_GEOJSON, "r", encoding="utf-8") as f:
    gj = json.load(f)

districts = pd.DataFrame([{
    "location": feat.get("id"),                      # DHIS2 level-2 orgUnit UID
    "name_raw": feat["properties"].get("name", ""),  # e.g. "101 TAPLEJUNG"
} for feat in gj["features"]])

districts["district_name"] = norm_name(districts["name_raw"])

# sanity
assert districts["location"].notna().all()
assert districts["district_name"].notna().all()
assert not districts.duplicated("district_name").any()

print("districts:", len(districts))
districts.head()


districts: 77


Unnamed: 0,location,name_raw,district_name
0,BdLcDbLQd88,101 TAPLEJUNG,101 TAPLEJUNG
1,uHEl9oRZm8L,102 SANKHUWASABHA,102 SANKHUWASABHA
2,Wep3D4POB3H,103 SOLUKHUMBU,103 SOLUKHUMBU
3,B7X957nA1lM,104 OKHALDHUNGA,104 OKHALDHUNGA
4,LnJ8MTOmgGa,105 KHOTANG,105 KHOTANG


## Load OpenDengue

In [122]:
df_raw = pd.read_csv(OPENDENGUE_SOURCE_PATH)
print("Loaded:", OPENDENGUE_SOURCE_PATH)
print("Columns:", df_raw.columns.tolist())
df_raw.head()


Loaded: ../../guides/data/nepal-opendengue.csv
Columns: ['adm_0_name', 'adm_1_name', 'adm_2_name', 'full_name', 'ISO_A0', 'FAO_GAUL_code', 'RNE_iso_code', 'IBGE_code', 'calendar_start_date', 'calendar_end_date', 'Year', 'dengue_total', 'case_definition_standardised', 'S_res', 'T_res', 'UUID', 'region']


Unnamed: 0,adm_0_name,adm_1_name,adm_2_name,full_name,ISO_A0,FAO_GAUL_code,RNE_iso_code,IBGE_code,calendar_start_date,calendar_end_date,Year,dengue_total,case_definition_standardised,S_res,T_res,UUID,region
0,NEPAL,,,NEPAL,NPL,175,NPL,,1987-01-01,1987-12-31,1987,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
1,NEPAL,,,NEPAL,NPL,175,NPL,,1985-01-01,1985-12-31,1985,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
2,NEPAL,,,NEPAL,NPL,175,NPL,,1986-01-01,1986-12-31,1986,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
3,NEPAL,,,NEPAL,NPL,175,NPL,,1991-01-01,1991-12-31,1991,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO
4,NEPAL,,,NEPAL,NPL,175,NPL,,1988-01-01,1988-12-31,1988,0,Total,Admin0,Year,WHOSEARO-ALL-19852009-Y01-00,SEARO


## Column mapping

In [None]:
# OpenDengue export columns (Nepal example)
DATE_COL = "calendar_start_date"
CASES_COL = "dengue_total"
ADMIN2_COL = "adm_2_name"

missing = [c for c in [DATE_COL, CASES_COL, ADMIN2_COL] if c not in df_raw.columns]
if missing:
    raise KeyError(
        f"Input CSV is missing required columns: {missing}. "
        f"Available columns: {df_raw.columns.tolist()}"
    )

print("Using columns:", {"date": DATE_COL, "cases": CASES_COL, "admin2": ADMIN2_COL})


## Normalize OpenDengue (Nepal districts / admin2)

In [None]:
df_norm = pd.DataFrame({
    "date": pd.to_datetime(df_raw[DATE_COL], errors="coerce"),
    "cases": pd.to_numeric(df_raw[CASES_COL], errors="coerce"),
    "district_name": df_raw[ADMIN2_COL],
})

df_norm["district_name"] = norm_name(df_norm["district_name"])

df_norm = df_norm.dropna(subset=["date", "cases", "district_name"])
df_norm = df_norm[df_norm["district_name"].ne("")]

# Map district_name -> DHIS2 UID (location)
df_norm = df_norm.merge(
    districts[["district_name", "location"]],
    on="district_name",
    how="left",
)

unmapped = df_norm["location"].isna().mean()
print(f"Unmapped dengue rows: {unmapped:.2%}")
if unmapped > 0:
    print("Unmapped examples:", df_norm.loc[df_norm["location"].isna(), "district_name"].drop_duplicates().head(20).tolist())

# Keep only mapped rows (otherwise they will never join downstream)
df_norm = df_norm.dropna(subset=["location"]).copy()

df_norm.head()


## Monthly aggregation

In [None]:
df_norm["time_period"] = (
    df_norm["date"]
    .dt.to_period("M")
    .astype(str)
    .str.replace("-", "", regex=False)  # YYYYMM
)

disease = (
    df_norm.groupby(["time_period", "location"], as_index=False)["cases"]
    .sum()
    .rename(columns={"cases": "disease_cases"})
)

print("Aggregated rows:", len(disease))
disease.head()


## Filter to districts and align time axis

In [None]:
# Keep only districts present in the DHIS2 UID
before = len(disease)
disease = disease.merge(districts[["location"]].drop_duplicates(), on="location", how="inner")
after = len(disease)
print(f"Districts  kept {after}/{before} rows")

# Build full (time_period x location) grid — preserve missing as NaN (no imputation)
all_months = pd.Index(
    pd.period_range(
        pd.Period(disease["time_period"].min(), freq="M").to_timestamp(),
        pd.Period(disease["time_period"].max(), freq="M").to_timestamp(),
        freq="M",
    ).astype(str).str.replace("-", "", regex=False),  # YYYYMM
    name="time_period",
)

all_locations = pd.Index(districts["location"].dropna().astype(str).sort_values().unique(), name="location")

grid = pd.MultiIndex.from_product([all_months, all_locations], names=["time_period", "location"]).to_frame(index=False)

disease_full = grid.merge(disease, on=["time_period", "location"], how="left")

# Preserve missingness; just ensure numeric dtype
disease_full["disease_cases"] = pd.to_numeric(disease_full["disease_cases"], errors="coerce")

print("Final rows (complete grid):", len(disease_full))
disease_full.head()


## Write output CSV

In [None]:
disease_full.to_csv(OUT_CSV, index=False)
print("Wrote:", OUT_CSV)
OUT_CSV


## Import into DHIS2

This workflow stops after producing a harmonized, DHIS2-ready dataset.

To import the resulting data into DHIS2:

- create a data element for dengue case counts
- map locations to DHIS2 organisation units
- submit the data using the DHIS2 Web API

The import mechanics are identical to those used in the WorldPop and CHIRPS workflows and are not repeated here.
