---
title: Aggregate climate data to DHIS2 organisation units
short_title: Org unit aggregation
---

In this notebook we will show how to load daily temperature data from NetCDF using [earthkit](https://ecmwf.github.io/earthkit-website/) and aggregate the data to DHIS2 organisation units. 

In [2]:
import earthkit.data
from dhis2eo.utils import aggregate
from dhis2eo.integrations.pandas import dataframe_to_dhis2_json

## Loading the data

Our sample NetCDF file contains daily temperature data for Sierra Leone in July 2025. Let's load the file using earthkit:

In [3]:
file = "../data/era5-land-daily-mean-temperature-2m-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)

See [more examples](https://earthkit-data.readthedocs.io/en/latest/examples/) for how you can load data with eartkit, or see the video below.

:::{iframe} https://www.youtube.com/embed/no01ovW1pF8
:width: 100%
How to get data with earthkit
:::

To more easily work with and display the contents of the dataset we can convert it to an [xarray](https://xarray.dev). It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and one data variable "t2m" (temperature at 2m above sea level). The data source is European Centre for Medium-Range Weather Forecasts ([ECMWF](https://www.ecmwf.int)). 

In [4]:
data_array = data.to_xarray()
data_array

Unnamed: 0,Array,Chunk
Bytes,203.56 kiB,203.56 kiB
Shape,"(31, 41, 41)","(31, 41, 41)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 203.56 kiB 203.56 kiB Shape (31, 41, 41) (31, 41, 41) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",41  41  31,

Unnamed: 0,Array,Chunk
Bytes,203.56 kiB,203.56 kiB
Shape,"(31, 41, 41)","(31, 41, 41)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Loading the organisation units

Eartkit can also be used to load the organisation units from DHIS2 that we [saved as a GeoJSON file](organization-units). 

In [5]:
district_file = "../data/sierra-leone-districts.geojson"
features = earthkit.data.from_source("file", district_file)

The GeoJSON file contains the boundaries of 13 named organisation units in Sierra Leone. For the aggregation, we are particularly interested in the id and the geometry (polygon) of the org unit:

In [6]:
features

Unnamed: 0,type,id,name,hasCoordinatesDown,hasCoordinatesUp,level,grandParentParentGraph,grandParentId,parentGraph,parentId,parentName,dimensions,weight,geometry
0,Polygon,O6uvpzGd5pu,Bo,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-11.5914 8.4875, -11.5906 8.4769, -1..."
1,Polygon,fdc6uOvgoji,Bombali,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-11.8091 9.2032, -11.8102 9.1944, -1..."
2,MultiPolygon,lc3eMKXaEfw,Bonthe,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-12.5568 7.3832, -12.5574 7.38..."
3,Polygon,jUb8gELQApl,Kailahun,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-10.7972 7.5866, -10.8002 7.5878, -1..."
4,MultiPolygon,PMa2VCrupOd,Kambia,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-13.1349 8.8471, -13.1343 8.84..."
5,Polygon,kJq2mPyFEHo,Kenema,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-11.3596 8.5317, -11.3513 8.5234, -1..."
6,Polygon,qhqAxPSTUXp,Koinadugu,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-10.585 9.0434, -10.5877 9.0432, -10..."
7,Polygon,Vth0fbpFcsO,Kono,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-10.585 9.0434, -10.5848 9.0432, -10..."
8,MultiPolygon,jmIPBj66vD6,Moyamba,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-12.6351 7.6613, -12.6346 7.66..."
9,MultiPolygon,TEQlaapDQoK,Port Loko,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-13.119 8.4718, -13.1174 8.470..."


## Aggregating the data to organisation units

To aggregate the data to the org unit features we use the `dhis2eo.utils.aggregate` utility library. We keep the daily period type and only aggregate the data spatially to the org unit features. As part of this we have to provide the `variables` parameter, which defines a dict of variables to aggregate along with the aggregation method to use for each. 

In [12]:
agg_df = aggregate.to_org_units(data, features, variables={'t2m': 'mean'})
agg_df

Unnamed: 0,valid_time,org_unit_id,number,t2m
0,2025-07-01,0,0,296.813660
1,2025-07-01,1,0,297.073822
2,2025-07-01,2,0,297.624634
3,2025-07-01,3,0,296.213226
4,2025-07-01,4,0,297.634094
...,...,...,...,...
398,2025-07-31,8,0,297.373383
399,2025-07-31,9,0,297.727386
400,2025-07-31,10,0,296.751312
401,2025-07-31,11,0,296.744843


We see that the aggregated data is returned as a `pandas.DataFrame` containing the aggregated temperature data for each org unit and each time period (daily).

## Post-processing

Next, we convert temperatures in kelvin to celcius by subtracting 273.15 from the values.

In [8]:
agg_df['t2m'] -= 273.15
agg_df

Unnamed: 0,valid_time,id,number,t2m
0,2025-07-01,O6uvpzGd5pu,0,23.683960
1,2025-07-01,fdc6uOvgoji,0,23.959045
2,2025-07-01,lc3eMKXaEfw,0,24.520996
3,2025-07-01,jUb8gELQApl,0,23.063782
4,2025-07-01,PMa2VCrupOd,0,24.454498
...,...,...,...,...
398,2025-07-31,jmIPBj66vD6,0,24.256500
399,2025-07-31,TEQlaapDQoK,0,24.608307
400,2025-07-31,bL4ooGhyHRQ,0,23.627899
401,2025-07-31,eIQbndfxQMb,0,23.557404


Two decimals is sufficient for our use so we round all the temperature values:  

In [9]:
agg_df['t2m'] = agg_df['t2m'].astype('float64').round(decimals=2)
agg_df

Unnamed: 0,valid_time,id,number,t2m
0,2025-07-01,O6uvpzGd5pu,0,23.68
1,2025-07-01,fdc6uOvgoji,0,23.96
2,2025-07-01,lc3eMKXaEfw,0,24.52
3,2025-07-01,jUb8gELQApl,0,23.06
4,2025-07-01,PMa2VCrupOd,0,24.45
...,...,...,...,...
398,2025-07-31,jmIPBj66vD6,0,24.26
399,2025-07-31,TEQlaapDQoK,0,24.61
400,2025-07-31,bL4ooGhyHRQ,0,23.63
401,2025-07-31,eIQbndfxQMb,0,23.56


## Converting to DHIS2 Format

Use the `dhsi2eo` utility function `dataframe_to_dhis2_json` to translate the `pandas.DataFrame` into the JSON structure used by the DHIS2 Web API:

In [10]:
json_dict = dataframe_to_dhis2_json(
    df = agg_df,                    # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 't2m',              # column containing the value
    data_element_id = 'VJwwPOOvge6' # id of the DHIS2 data element
)

We can display the first 3 items to see that we have one temperature value for each org unit and period combination.

In [11]:
json_dict['dataValues'][:3]

[{'orgUnit': 'O6uvpzGd5pu',
  'period': '20250701',
  'value': 23.68,
  'dataElement': 'VJwwPOOvge6'},
 {'orgUnit': 'fdc6uOvgoji',
  'period': '20250701',
  'value': 23.96,
  'dataElement': 'VJwwPOOvge6'},
 {'orgUnit': 'lc3eMKXaEfw',
  'period': '20250701',
  'value': 24.52,
  'dataElement': 'VJwwPOOvge6'}]

At this point we have successfully aggregated temperature data in a JSON format that can be used by DHIS2. To learn how to import this JSON data into DHIS2, see [our guide for uploading data values using the Python DHIS2 client](../import-data/using-python-client.ipynb). 