---
title: Aggregate climate data to DHIS2 organisation units
short_title: Org unit aggregation
---

In this notebook we will show how to load daily climate data from NetCDF using [earthkit](https://ecmwf.github.io/earthkit-website/) and aggregate temperature and precipitation climate variables to DHIS2 organisation units. 

In [2]:
import geopandas as gpd
import earthkit.data
from earthkit import transforms
from dhis2eo.integrations.pandas import dataframe_to_dhis2_json

## Loading the data

Our sample NetCDF file contains daily temperature and precipitation data for Sierra Leone in July 2025. Let's load the file using earthkit:

In [3]:
file = "../data/era5-daily-temp-precip-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)

See [more examples](https://earthkit-data.readthedocs.io/en/latest/examples/) for how you can load data with eartkit, or see the video below.

:::{iframe} https://www.youtube.com/embed/no01ovW1pF8
:width: 100%
How to get data with earthkit
:::

To more easily work with and display the contents of the dataset we can convert it to an [xarray](https://xarray.dev). It shows that the file includes 3 dimensions (`latitude`, `longitude` and `valid_time`) and two data variables `t2m` (temperature at 2m above sea level), and `tp` (total precipitation). The data source is European Centre for Medium-Range Weather Forecasts ([ECMWF](https://www.ecmwf.int)). 

In [4]:
data_array = data.to_xarray()
data_array

Unnamed: 0,Array,Chunk
Bytes,19.80 kiB,19.80 kiB
Shape,"(30, 13, 13)","(30, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 19.80 kiB 19.80 kiB Shape (30, 13, 13) (30, 13, 13) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",13  13  30,

Unnamed: 0,Array,Chunk
Bytes,19.80 kiB,19.80 kiB
Shape,"(30, 13, 13)","(30, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,19.80 kiB,19.80 kiB
Shape,"(30, 13, 13)","(30, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 19.80 kiB 19.80 kiB Shape (30, 13, 13) (30, 13, 13) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",13  13  30,

Unnamed: 0,Array,Chunk
Bytes,19.80 kiB,19.80 kiB
Shape,"(30, 13, 13)","(30, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Loading the organisation units

We next use geopandas to load our organisation units that we've [downloaded from DHIS2 as a GeoJSON file](../org-units/download-manual.md): 

In [5]:
district_file = "../data/sierra-leone-districts.geojson"
org_units = gpd.read_file(district_file)

The GeoJSON file contains the boundaries of 13 named organisation units in Sierra Leone. For the aggregation, we are particularly interested in the `id` and the `geometry` (polygon) of the org unit:

In [6]:
org_units

Unnamed: 0,type,id,name,hasCoordinatesDown,hasCoordinatesUp,level,grandParentParentGraph,grandParentId,parentGraph,parentId,parentName,dimensions,weight,geometry
0,Polygon,O6uvpzGd5pu,Bo,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-11.5914 8.4875, -11.5906 8.4769, -1..."
1,Polygon,fdc6uOvgoji,Bombali,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-11.8091 9.2032, -11.8102 9.1944, -1..."
2,MultiPolygon,lc3eMKXaEfw,Bonthe,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-12.5568 7.3832, -12.5574 7.38..."
3,Polygon,jUb8gELQApl,Kailahun,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-10.7972 7.5866, -10.8002 7.5878, -1..."
4,MultiPolygon,PMa2VCrupOd,Kambia,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-13.1349 8.8471, -13.1343 8.84..."
5,Polygon,kJq2mPyFEHo,Kenema,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-11.3596 8.5317, -11.3513 8.5234, -1..."
6,Polygon,qhqAxPSTUXp,Koinadugu,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-10.585 9.0434, -10.5877 9.0432, -10..."
7,Polygon,Vth0fbpFcsO,Kono,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"POLYGON ((-10.585 9.0434, -10.5848 9.0432, -10..."
8,MultiPolygon,jmIPBj66vD6,Moyamba,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-12.6351 7.6613, -12.6346 7.66..."
9,MultiPolygon,TEQlaapDQoK,Port Loko,True,False,2,,,ImspTQPwCqd,ImspTQPwCqd,Sierra Leone,{ },1,"MULTIPOLYGON (((-13.119 8.4718, -13.1174 8.470..."


## Aggregating the data to organisation units

To aggregate the data to the org unit features we use the `spatial.reduce` function of [earthkit-transforms](https://earthkit-transforms.readthedocs.io). We keep the daily period type and only aggregate the data spatially to the org unit features. 

Since our climate data variables need to be aggregated using different statistics, we do separate aggregations for each variable. 

### Temperature

To aggregate the temperature variable, we extract the temperature or `t2m` variable, and tell the `spatial.reduce` function to aggregate the data to our organisation units `org_units`. We set `mask_dim='id'` to specify that we want one aggregated value for every unique value in the organisation unit `id` column. Finally, we set `how='mean'` so that we get the average temperature of all gridded values that land inside an organisation unit. 

In [7]:
temp = data_array['t2m']
agg_temp = transforms.spatial.reduce(temp, org_units, mask_dim='id', how='mean')
agg_temp

The result from `spatial.reduce` is an `xarray` object, which doesn't make much sense for our aggregated data. So instead we convert the results to a Pandas dataframe, which allows us to read the results easier:

In [8]:
agg_temp_df = agg_temp.to_dataframe().reset_index()
agg_temp_df

Unnamed: 0,id,valid_time,number,t2m
0,O6uvpzGd5pu,2025-07-01,0,297.221893
1,O6uvpzGd5pu,2025-07-02,0,297.939362
2,O6uvpzGd5pu,2025-07-03,0,297.781494
3,O6uvpzGd5pu,2025-07-04,0,297.493011
4,O6uvpzGd5pu,2025-07-05,0,297.603973
...,...,...,...,...
385,at6UHUQatSo,2025-07-26,0,
386,at6UHUQatSo,2025-07-27,0,
387,at6UHUQatSo,2025-07-28,0,
388,at6UHUQatSo,2025-07-29,0,


We see that the aggregated dataframe contains what seems to be kelvin temperature values for each organisation unit and each time period (daily).

### Precipitation

We use the same approach for precipitation by extracting the precipitation or `tp` variable. The main difference here is that we set `how='sum'` since precipitation is typically reported as the total precipitation for an area (not average). 

In [9]:
precip = data_array['tp']
agg_precip = transforms.spatial.reduce(precip, org_units, mask_dim='id', how='sum')
agg_precip_df = agg_precip.to_dataframe().reset_index()
agg_precip_df

Unnamed: 0,id,valid_time,number,tp
0,O6uvpzGd5pu,2025-07-01,0,0.031982
1,O6uvpzGd5pu,2025-07-02,0,0.045636
2,O6uvpzGd5pu,2025-07-03,0,0.070486
3,O6uvpzGd5pu,2025-07-04,0,0.126077
4,O6uvpzGd5pu,2025-07-05,0,0.126461
...,...,...,...,...
385,at6UHUQatSo,2025-07-26,0,0.000000
386,at6UHUQatSo,2025-07-27,0,0.000000
387,at6UHUQatSo,2025-07-28,0,0.000000
388,at6UHUQatSo,2025-07-29,0,0.000000


We see that the aggregated dataframe contains what seems to be total precipitation values in meters for each organisation unit and each time period (daily). 

## Post-processing

We have now aggregated the temperature and precipitation data to our organisation units. But before we submit the results to DHIS2, we want to make sure they are reported in a format that makes sense to most users. 

For temperature, we convert the data values from kelvin to celcius by subtracting 273.15 from the values:

In [10]:
agg_temp_df['t2m'] -= 273.15
agg_temp_df

Unnamed: 0,id,valid_time,number,t2m
0,O6uvpzGd5pu,2025-07-01,0,24.071899
1,O6uvpzGd5pu,2025-07-02,0,24.789368
2,O6uvpzGd5pu,2025-07-03,0,24.631500
3,O6uvpzGd5pu,2025-07-04,0,24.343018
4,O6uvpzGd5pu,2025-07-05,0,24.453979
...,...,...,...,...
385,at6UHUQatSo,2025-07-26,0,
386,at6UHUQatSo,2025-07-27,0,
387,at6UHUQatSo,2025-07-28,0,
388,at6UHUQatSo,2025-07-29,0,


For precipitation, to avoid small decimal numbers, we convert the reporting unit from meters to millimeters:

In [11]:
agg_precip_df['tp'] *= 1000
agg_precip_df

Unnamed: 0,id,valid_time,number,tp
0,O6uvpzGd5pu,2025-07-01,0,31.982395
1,O6uvpzGd5pu,2025-07-02,0,45.636105
2,O6uvpzGd5pu,2025-07-03,0,70.486000
3,O6uvpzGd5pu,2025-07-04,0,126.076889
4,O6uvpzGd5pu,2025-07-05,0,126.460922
...,...,...,...,...
385,at6UHUQatSo,2025-07-26,0,0.000000
386,at6UHUQatSo,2025-07-27,0,0.000000
387,at6UHUQatSo,2025-07-28,0,0.000000
388,at6UHUQatSo,2025-07-29,0,0.000000


## Converting to DHIS2 Format

Before we can send these data to DHIS2, we need to use the `dhsi2eo` utility function `dataframe_to_dhis2_json` to translate each of our aggregated `pandas.DataFrame` into the JSON structure used by the DHIS2 Web API. 

First, for temperature:

In [12]:
agg_temp_json_dict = dataframe_to_dhis2_json(
    df = agg_temp_df,               # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 't2m',              # column containing the value
    data_element_id = 'VJwwPOOvge6' # id of the DHIS2 data element
)

We can display the first 3 items to see that we have one temperature value for each org unit and period combination.

In [13]:
agg_temp_json_dict['dataValues'][:3]

[{'orgUnit': 'O6uvpzGd5pu',
  'period': '20250701',
  'value': 24.0718994140625,
  'dataElement': 'VJwwPOOvge6'},
 {'orgUnit': 'O6uvpzGd5pu',
  'period': '20250702',
  'value': 24.78936767578125,
  'dataElement': 'VJwwPOOvge6'},
 {'orgUnit': 'O6uvpzGd5pu',
  'period': '20250703',
  'value': 24.631500244140625,
  'dataElement': 'VJwwPOOvge6'}]

And we do the same for precipitation:

In [14]:
agg_precip_json_dict = dataframe_to_dhis2_json(
    df = agg_precip_df,             # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 'tp',               # column containing the value
    data_element_id = 'eHFmngLqpj4' # id of the DHIS2 data element
)

And inspect the results:

In [15]:
agg_precip_json_dict['dataValues'][:3]

[{'orgUnit': 'O6uvpzGd5pu',
  'period': '20250701',
  'value': 31.98239517211914,
  'dataElement': 'eHFmngLqpj4'},
 {'orgUnit': 'O6uvpzGd5pu',
  'period': '20250702',
  'value': 45.636104583740234,
  'dataElement': 'eHFmngLqpj4'},
 {'orgUnit': 'O6uvpzGd5pu',
  'period': '20250703',
  'value': 70.48600006103516,
  'dataElement': 'eHFmngLqpj4'}]

## Next steps

At this point we have successfully aggregated temperature data in a JSON format that can be used by DHIS2. To learn how to import this JSON data into DHIS2, see [our guide for uploading data values using the Python DHIS2 client](../import-data/using-python-client.ipynb). 