---
title: Aggregate data from NetCDF to DHIS2 organisation units
short_title: Aggregata data from NetCDF
---

Load data from NetCDF using [earthkit](https://ecmwf.github.io/earthkit-website/) and aggregate the data to DHIS2 organisation units.  

In [1]:
import earthkit.data
from earthkit.transforms import aggregate
from dhis2eo.integrations.xarray import data_array_to_dhis2_json

Load a NetCDF file using earthkit:

In [2]:
file = "../data/era5-land-daily-mean-temperature-2m-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)

See [more examples](https://earthkit-data.readthedocs.io/en/latest/examples/) for how you can load data with eartkit, or see a [video of how to get data with earthkit](https://www.youtube.com/watch?v=no01ovW1pF8). 

[![How to get data with earthkit](https://img.youtube.com/vi/no01ovW1pF8/2.jpg)](https://www.youtube.com/watch?v=no01ovW1pF8)

To display the contents of the dataset we can convert it to an [xarray](https://xarray.dev). It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and one data variable "t2m" (temperature at 2m above sea level). The data source is European Centre for Medium-Range Weather Forecasts ([ECMWF](https://www.ecmwf.int)). 

In [3]:
data.to_xarray()

Unnamed: 0,Array,Chunk
Bytes,203.56 kiB,203.56 kiB
Shape,"(31, 41, 41)","(31, 41, 41)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 203.56 kiB 203.56 kiB Shape (31, 41, 41) (31, 41, 41) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",41  41  31,

Unnamed: 0,Array,Chunk
Bytes,203.56 kiB,203.56 kiB
Shape,"(31, 41, 41)","(31, 41, 41)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Eartkit can also be used to load the organisation units from DHIS2 that we [saved as a GeoJSON file](organization-units). 

In [4]:
district_file = "../data/sierra-leone-districts.geojson"
features = earthkit.data.from_source("file", district_file)

We can display the first feature to see the information we have for each org unit. For the aggregation, we are particularly interested in the id and the geometry (polygon) of the org unit. 

In [5]:
features[:1]

[type                                                                Polygon
 id                                                              O6uvpzGd5pu
 name                                                                     Bo
 hasCoordinatesDown                                                     True
 hasCoordinatesUp                                                      False
 level                                                                     2
 grandParentParentGraph                                                     
 grandParentId                                                              
 parentGraph                                                     ImspTQPwCqd
 parentId                                                        ImspTQPwCqd
 parentName                                                     Sierra Leone
 dimensions                                                              { }
 weight                                                                    1

To aggregate the data to the org unit features we use the aggregate package of [earthkit-transforms](https://earthkit-transforms.readthedocs.io). We keep the daily period type and only aggregate the data spatially to the org unit features. mask_dim is the dimension (org unit id) that will be created after the reduction of the spatial dimensions (longitude/latitude grid). 

In [6]:
agg_data = aggregate.spatial.reduce(data, features, mask_dim="id")

The aggregated data is returned as an xarray with two dimensions (`id` representing the org unit id and `valid_time` as the time period), and the same temperature variable. 

In [7]:
agg_data

Next, we select the variable we would like to import to DHIS2 (t2m). We also convert temperatures in kelvin to celcius by subtracting 273.15 from the values.

In [8]:
data_array = agg_data['t2m'] - 273.15

This returns a multidimensional xarray.dataArray with the two dimensions (id and valid_time). Two decimals is sufficient for our use so we round all the temperature values:  

In [9]:
rounded_array = data_array.astype('float64').round(decimals = 2)

Flatten the two dimensional array into one dimension using the [xarray stack method](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.stack.html): 

In [10]:
flat_array = rounded_array.stack(index=[...])

Use the `dhsi2eo` utility function `data_array_to_dhis2_json` to translate the data array into the JSON structure used by the DHIS2 Web API:

In [11]:
json_dict = data_array_to_dhis2_json(
  data_array = flat_array,        # flattened data array
  org_unit_dim = 'id',            # dimension containing the org unit id
  period_dim = 'valid_time',      # dimension containing the period 
  data_element_id= 'VJwwPOOvge6'  # id of the DHIS2 data element
)

We can display the first 3 items to see that we have one temperature value for each org unit and period combination.

In [12]:
json_dict['dataValues'][:3]

[{'dataElement': 'VJwwPOOvge6',
  'orgUnit': 'O6uvpzGd5pu',
  'period': '20250701',
  'value': '23.68'},
 {'dataElement': 'VJwwPOOvge6',
  'orgUnit': 'fdc6uOvgoji',
  'period': '20250701',
  'value': '23.96'},
 {'dataElement': 'VJwwPOOvge6',
  'orgUnit': 'lc3eMKXaEfw',
  'period': '20250701',
  'value': '24.52'}]