---
title: Aggregate data from NetCDF to DHIS2 organisation units
short_title: Aggregata data from CICERO
---

Load data from NetCDF using [earthkit](https://ecmwf.github.io/earthkit-website/) and aggregate the data to DHIS2 organization units.  

In [1]:
import earthkit.data
from earthkit.transforms import aggregate
from dhis2eo import dataArrayToJson 

Load a NetCDF file using earthkit. See [more examples](https://earthkit-data.readthedocs.io/en/latest/examples/) for how you can load data with eartkit. 

In [2]:
file = "data/pm_final_srilanka_linearp.nc"
data = earthkit.data.from_source("file", file)

To display the contents of the dataset we can convert it to an [xarray](https://xarray.dev). It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and one data variable "t2m" (temperature at 2m above sea level). The data source is European Centre for Medium-Range Weather Forecasts ([ECMWF](https://www.ecmwf.int)). 

In [3]:
data.to_xarray()

Unnamed: 0,Array,Chunk
Bytes,2.11 GiB,2.11 GiB
Shape,"(1401, 450, 450)","(1401, 450, 450)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.11 GiB 2.11 GiB Shape (1401, 450, 450) (1401, 450, 450) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",450  450  1401,

Unnamed: 0,Array,Chunk
Bytes,2.11 GiB,2.11 GiB
Shape,"(1401, 450, 450)","(1401, 450, 450)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


eartkit can also be used to load the organisation units from DHIS2 that we [saved as a GeoJSON file](organization-units). 

In [4]:
district_file = "data/sri-lanka-provinces.geojson"
features = earthkit.data.from_source("file", district_file)

We can display the first feature to see the information we have for each org unit. For the aggregation, we are particularly interested in the id and the geometry (polygon) of the org unit. 

In [5]:
features[:1]

[shapeName                                     Northern Province
 shapeISO                                                   LK-4
 shapeID                                 99731895B93054189817547
 shapeGroup                                                  LKA
 shapeType                                                  ADM1
 geometry      MULTIPOLYGON (((79.9138052 8.9418344, 79.91834...
 Name: 0, dtype: object]

To aggregate the data to the org unit features we use the aggregate package of [earthkit-transforms](https://earthkit-transforms.readthedocs.io). We keep the daily period type and only aggregate the data spatially to the org unit features. mask_dim is the dimension (org unit id) that will be created after the reduction of the spatial dimensions (longitude/latitude grid). 

In [6]:
agg_data = aggregate.spatial.reduce(data, features, mask_dim="id")

The aggregated data is returned as an xarray with two dimensions (id and valid_time), and the same temperature vaiable. 

In [7]:
agg_data

Next, we select the variable we would like to import to DHIS2 (t2m). We also convert temperatures in kelvin to celcius by subtracting 273.15 from the values.

In [15]:
dataArray = agg_data['__xarray_dataarray_variable__']

This returns a multidimensional xarray.dataArray with the two dimensions (id and valid_time). We rename the dimensions so we have one named "orgUnit" and one named "period".

In [16]:
formatted = dataArray.rename(id='orgUnit', time='period')

The two dimensions are "stacked" into one dimension using the [xarray stack method](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.stack.html): 

In [17]:
stacked = formatted.stack(index=[...])

This can be translated into a JSON string with the dataArrayToJson function.

In [18]:
json = dataArrayToJson(stacked);

We can display the first part of this JSON string to see that we have one temperatue value for each org unit and period combination.

In [19]:
json[:500]

'[{"orgUnit": 0, "period": "2020-03-01", "value": 34.55402113409197}, {"orgUnit": 1, "period": "2020-03-01", "value": 26.434494378715307}, {"orgUnit": 2, "period": "2020-03-01", "value": 22.86223157780822}, {"orgUnit": 3, "period": "2020-03-01", "value": 33.436081392788545}, {"orgUnit": 4, "period": "2020-03-01", "value": 33.29435921316265}, {"orgUnit": 5, "period": "2020-03-01", "value": 31.949300981054762}, {"orgUnit": 6, "period": "2020-03-01", "value": 37.46315641974694}, {"orgUnit": 7, "peri'