---
title: Automated Data Import from Climate Data Store
short_title: Automated Data Import
---

In this notebook we will demonstrate a full workflow for how we can use Climate Tools to automate regularly downloading data from the [Climate Data Store (CDS)](https://cds.climate.copernicus.eu/datasets), aggregating to DHIS2 organisation units, and importing the aggregated climate data back to DHIS2. 

For our example we will connect to a local DHIS2 instance containing the Sierra Leone demo database, setup new data elements for daily Temperature data and daily Total precipitation data, and show how to use `dhis2eo` to download, and import/update DHIS2 with the latest [daily data from the Climate Data Store ERA5 dataset](https://cds.climate.copernicus.eu/datasets/derived-era5-single-levels-daily-statistics?tab=download). 

----------------------------------------
## Requirements

### 1. Connect to DHIS2

In order to run this notebook, you first need to connect to an instance of DHIS2. For our example, we will connect to a local instance of DHIS2 containing the standard Sierra Leone demo database, but you should be able to switch out the instance url and credentials to work directly with your own database. 

In [1]:
from dhis2_client import DHIS2Client
from dhis2_client.settings import ClientSettings

# Create DHIS2 client connection
cfg = ClientSettings(
  base_url="http://localhost:8080",
  username="admin",
  password="district"
)
client = DHIS2Client(settings=cfg)

# Verify connection
info = client.get_system_info()
print("Current DHIS2 version:", info["version"])

Current DHIS2 version: 2.42.2


### 2. Create DHIS2 data elements

We also need to create the data elements for importing data into. If you haven't already created your data elements manually, you can follow the steps below to create the data element using the `python-dhis2-client`.

First create the temperature data element: 

In [6]:
data_element = {
    "name": "2m Temperature (ERA5)",
    "shortName": "Temperature (ERA5)",
    "valueType": "NUMBER",
    "aggregationType": "AVERAGE",
    "domainType": "AGGREGATE"
}
temperature_de = client.create_data_element(data_element)
print(f"Data element creation status: {temperature_de['status']} and UID: {temperature_de['response']['uid']}")

Data element creation status: OK and UID: eHFmngLqpj4


Next, create the total precipitation data element: 

In [12]:
data_element = {
    "name": "Total precipitation (ERA5)",
    "shortName": "Total precipitation (ERA5)",
    "valueType": "NUMBER",
    "aggregationType": "SUM",
    "domainType": "AGGREGATE"
}
precipitation_de = client.create_data_element(data_element)
print(f"Data element creation status: {precipitation_de['status']} and UID: {precipitation_de['response']['uid']}")

{"ts": "2025-10-23T13:13:52+02:00", "level": "ERROR", "logger": "dhis2_client", "message": "HTTP 409 on /api/dataElements: {'httpStatus': 'Conflict', 'httpStatusCode': 409, 'status': 'ERROR', 'message': 'One or more errors occurred, please see full details in import report.', 'response': {'uid': 'O09ozP9xt6Z', 'errorReports': [{'message': 'Property `name` with value `Total precipitation (ERA5)` on object Total precipitation (ERA5) [O09ozP9xt6Z] (DataElement) already exists on object tZkUaALx9sy', 'args': ['name', 'Total precipitation (ERA5)', 'Total precipitation (ERA5) [O09ozP9xt6Z] (DataElement)', 'tZkUaALx9sy'], 'mainKlass': 'org.hisp.dhis.dataelement.DataElement', 'errorCode': 'E5003', 'mainId': 'tZkUaALx9sy', 'errorProperty': 'name', 'errorProperties': ['name', 'Total precipitation (ERA5)', 'Total precipitation (ERA5) [O09ozP9xt6Z] (DataElement)', 'tZkUaALx9sy']}, {'message': 'Property `shortName` with value `Total precipitation (ERA5)` on object Total precipitation (ERA5) [O09ozP

DHIS2HTTPError: ERROR: One or more errors occurred, please see full details in import report.

Since we plan to import daily data values, we also create and assign our data element to a new dataset for climate variables with `Daily` period type:

In [None]:
data_set = {
    "name": "Daily climate data", 
    "shortName": "Daily climate data",
    "periodType": "Daily",
    "dataSetElements": [
        {
            "dataElement": {"id": temperature_de['response']['uid']},
            "dataElement": {"id": precipitation_de['response']['uid']}
        }
    ]
}

data_set_response = client.create_data_set(data_set)
print(f"Data set creation status: {data_set_response['status']} and UID: {data_set_response['response']['uid']}")

Data set creation status: OK and UID: G3O2w8XEfyk


### 3. Register for ERA5 Data Access

#### Authenticate with your ECMWF user

Before you can download the dataset programmatically, you need to [create an ECMWF user](https://www.ecmwf.int/user/login), and authenticate using your user credentials:

- Go to the [CDSAPI Setup page](https://cds.climate.copernicus.eu/how-to-api) and make sure to login.
- Once logged in, scroll down to the section "Setup the CDS API personal access token". 
  - This should show your login credentials, and look something like this:

        url: https://cds.climate.copernicus.eu/api
        key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

- Copy those two lines to a file `.cdsapirc` in your user's $HOME directory.

#### Accept the dataset license

ECMWF requires that you manually accept the user license for each dataset that you download. 

- Start by visiting the Download page of the dataset we are interested in: ["ERA5 post-processed daily statistics on single levels from 1940 to present"](https://cds.climate.copernicus.eu/datasets/derived-era5-single-levels-daily-statistics?tab=download). 
- Scroll down until you get to the "Terms of Use" section.
- Click the button to accept and login with your user if you haven't already. 

----------------------
## Quickstart: Keeping DHIS2 up to date with ERA5 climate data

In many cases, users will often want to download and import a standard set of climate variables from ERA5 into DHIS2. For this reason, Climate Tools provides a simple way to synch DHIS2 with ERA5 climate data for a defined time period. 

Let's try to run the function to import daily ERA5 data since 1 June 2025 until today: 

In [3]:
import dhis2eo
import dhis2eo.org_units
import dhis2eo.data.cds
import dhis2eo.synch
import dhis2eo.utils.aggregate

# define how to import data variables into dhis2
# TODO: these should probably be created per test and maybe initialized with some test values...
variables = {
    't2m': {'data_element_id': 'eHFmngLqpj4', 'method': 'mean'},
    'tp': {'data_element_id': 'tZkUaALx9sy', 'method': 'sum'},
}

# run the synch function
org_unit_level = 2
start_year = 2025
start_month = 6
dhis2eo.synch.synch_dhis2_data(
    client,
    dhis2eo.data.cds.get_daily_era5_data,
    start_year,
    start_month,
    variables=variables,
    org_unit_level=org_unit_level,
)

dhis2eo.synch - INFO - Period: 2025-6
dhis2eo.synch - INFO - Getting data...
dhis2eo.data.cds - INFO - Loading from cache: C:\Users\karimba\AppData\Local\Temp\cds_daily-era5_params-ca5bab_region-af652a_2025-06.nc


ValueError: Failed to decode variable 'valid_time': unable to decode time units 'days since 2025-06-01' with "calendar 'proleptic_gregorian'". Try opening your dataset with decode_times=False or installing cftime if it is not installed.

Note: Running this data import function multiple times in the same month, will result in the entire month being downloaded and imported each time, since the data is updated on a daily basis. But the results from the data import will report how many data values already existed and were ignored, and how many new data values were imported since last time. 

--------------------------------------------
## Custom workflows: Importing ERA5 data into DHIS2

The previous convenience function will likely be sufficient for many use-cases. In the following section we also demonstrate how to do the same process step-by-step, while allowing more control of the process, such as processing the downloaded data before importing. 

In [None]:
import dhis2eo
import dhis2eo.org_units
import dhis2eo.data.cds
import dhis2eo.synch
import dhis2eo.utils.aggregate

### Step 1: Retrieve organisation units

Before we can download the data, we first need to load our organisation units in order to limit which region to download data for.

First we retrieve the organisation units as a GeoJSON dict from the `dhis2-python-client`: 

In [4]:
org_unit_level = 2
org_units_geojson = client.get_org_units_geojson(level=org_unit_level)

Next, load this GeoJSON dict as a `geopandas.GeoDataFrame` by using the `dhis2eo.org_units` module. This makes it easier work with the organisation units for later steps: 

In [5]:
org_units = dhis2eo.org_units.from_dhis2_geojson(org_units_geojson)
print(org_units)

    org_unit_id          name  level  \
0   O6uvpzGd5pu            Bo      2   
1   fdc6uOvgoji       Bombali      2   
2   lc3eMKXaEfw        Bonthe      2   
3   jUb8gELQApl      Kailahun      2   
4   PMa2VCrupOd        Kambia      2   
5   kJq2mPyFEHo        Kenema      2   
6   qhqAxPSTUXp     Koinadugu      2   
7   Vth0fbpFcsO          Kono      2   
8   jmIPBj66vD6       Moyamba      2   
9   TEQlaapDQoK     Port Loko      2   
10  bL4ooGhyHRQ       Pujehun      2   
11  eIQbndfxQMb     Tonkolili      2   
12  at6UHUQatSo  Western Area      2   

                                             geometry  
0   POLYGON ((-11.5914 8.4875, -11.5906 8.4769, -1...  
1   POLYGON ((-11.8091 9.2032, -11.8102 9.1944, -1...  
2   MULTIPOLYGON (((-12.5568 7.3832, -12.5574 7.38...  
3   POLYGON ((-10.7972 7.5866, -10.8002 7.5878, -1...  
4   MULTIPOLYGON (((-13.1349 8.8471, -13.1343 8.84...  
5   POLYGON ((-11.3596 8.5317, -11.3513 8.5234, -1...  
6   POLYGON ((-10.585 9.0434, -10.5877 9.0432, 

### Step 2: Download daily ERA5 data

In order to get users started, we provide a convenience function for downloading the most commonly requested climate variables from the [ERA5 post-processed daily statistics on single levels from 1940 to present](https://cds.climate.copernicus.eu/datasets/derived-era5-single-levels-daily-statistics). 

Simply provide the year, month, and org_units you want to download for. The region to download data for is automatically calculated from the provided organisation units:

In [6]:
data = dhis2eo.data.cds.get_daily_era5_data(2021, 1, org_units)
data.to_xarray()

dhis2eo.data.cds - INFO - Loading from cache: C:\Users\karimba\AppData\Local\Temp\cds_daily-era5_params-ca5bab_region-37098a_2021-01.nc


Unnamed: 0,Array,Chunk
Bytes,17.82 kiB,17.82 kiB
Shape,"(27, 13, 13)","(27, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.82 kiB 17.82 kiB Shape (27, 13, 13) (27, 13, 13) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",13  13  27,

Unnamed: 0,Array,Chunk
Bytes,17.82 kiB,17.82 kiB
Shape,"(27, 13, 13)","(27, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.82 kiB,17.82 kiB
Shape,"(27, 13, 13)","(27, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.82 kiB 17.82 kiB Shape (27, 13, 13) (27, 13, 13) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",13  13  27,

Unnamed: 0,Array,Chunk
Bytes,17.82 kiB,17.82 kiB
Shape,"(27, 13, 13)","(27, 13, 13)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Note: Since data downloads can be slow, this function also caches the download results and reuses it if the file has already been downloaded. 

### Step 3: Data processing

After downloading the data, we might also want to do some processing of the climate data to get the values or units that we want. 

### Step 4: Aggregate the data to organisation units

The next step is using the `aggregate.to_org_units` convenience function to aggregate the climate data to a set of input organisation units. Let's try it for our previously downloaded test data:

In [None]:
variables = {
    't2m': 'mean',
    'tp': 'sum',
}
agg = dhis2eo.utils.aggregate.to_org_units(data, org_units, variables=variables)
print(agg)

    valid_time  org_unit_id  number         t2m        tp
0   2021-01-04  O6uvpzGd5pu       0  299.973450  0.012494
1   2021-01-04  fdc6uOvgoji       0  301.919495  0.000844
2   2021-01-04  lc3eMKXaEfw       0  300.367737  0.021607
3   2021-01-04  jUb8gELQApl       0  300.000153  0.004786
4   2021-01-04  PMa2VCrupOd       0  301.595978  0.000691
..         ...          ...     ...         ...       ...
346 2021-01-30  jmIPBj66vD6       0  300.576996  0.007868
347 2021-01-30  TEQlaapDQoK       0  300.861786  0.000128
348 2021-01-30  bL4ooGhyHRQ       0  299.664490  0.049795
349 2021-01-30  eIQbndfxQMb       0  301.195099  0.000279
350 2021-01-30  at6UHUQatSo       0  300.352264  0.000029

[351 rows x 5 columns]


We see that the aggregated data contains temperature values for each organisation unit (`org_unit_id`) and all the 28 days in February 2012 contained in the downloaded NetCDF data. 

### Step 5: Synch data with DHIS2 time periods

....

In [8]:
data_element_ids = ['gPPVvS6u23w', 'i9W7DhW60kK']
start_year = 2025
start_month = 3
for month_synch_status in dhis2eo.synch.iter_dhis2_monthly_synch_status(client, start_year, start_month, data_element_ids, org_unit_level):
    print(month_synch_status)
    # Download data...
    # Aggregate data...
    # Convert and import to DHIS2...

{'year': 2025, 'month': 3, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 4, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 5, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 6, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 7, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 8, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 9, 'synch_needed': {'gPPVvS6u23w': False, 'i9W7DhW60kK': False}}
{'year': 2025, 'month': 10, 'synch_needed': {'gPPVvS6u23w': True, 'i9W7DhW60kK': True}}


## Next steps

In this notebook we have shown how to use `dhis2eo` convenience functions for automatically synching ERA5 climate data into DHIS2. This function can be run at regular intervals, e.g. every day or week, to fetch and import only the latest temperature data for your org units. But we still need a way to run the script. This can be done either manually, or automatically via a `cron` job. Further guidance on how to automatically schedule running a script will be added in the future. 