[![logo](https://climate.copernicus.eu/sites/default/files/custom-uploads/branding/LogoLine_horizon_C3S.png)](https://climate.copernicus.eu)

# Downloading and preprocessing data for agroclimatic indicators

This notebook shows how to download and preprocess climate model data for bias correction and further use. To apply a bias adjustment method and generate agroclimatinc indicators, three datasets are needed: 
1) Observation or reanalysis data;
2) Historical climate model data over the same reference period that observations are available for; and
3) Climate model data for a future, or more generally, application period that is to be used for generating agroclimatic indicators. 

Here we will download and preprocess CMIP6 data as climate model input and AgERA5 as reanalysis dataset from the Climate Data Store (CDS).

There are many ways to access climate data on different temporal or spatial resolutions. This notebook is meant to illustrate one possible way to download data at daily resolution which is currently the primary temporal resolution supported in ibicus, although some can be applied at monthly resolution. 

## Installing and importing necessary libraries

In order to run this notebook, the python environment has to be prepared by installing a number of additional libraries:
* `cdsapi` (https://pypi.org/project/cdsapi/) - Using the CDS API requests to download data
* `xarray` (https://pypi.org/project/xarray/) - Working with N-D labeled arrays and datasets
* `netCDF4` (https://pypi.org/project/netCDF4/) - backend for reading and writing NetCDF files
* `dask` (https://pypi.org/project/dask/) - Parallel processing and data chunking

In [12]:
! pip install cdsapi xarray netCDF4 dask

Collecting dask
  Downloading dask-2025.3.0-py3-none-any.whl.metadata (3.8 kB)
Collecting click>=8.1 (from dask)
  Using cached click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting cloudpickle>=3.0.0 (from dask)
  Downloading cloudpickle-3.1.1-py3-none-any.whl.metadata (7.1 kB)
Collecting fsspec>=2021.09.0 (from dask)
  Using cached fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting partd>=1.4.0 (from dask)
  Downloading partd-1.4.2-py3-none-any.whl.metadata (4.6 kB)
Collecting toolz>=0.10.0 (from dask)
  Downloading toolz-1.0.0-py3-none-any.whl.metadata (5.1 kB)
Collecting locket (from partd>=1.4.0->dask)
  Downloading locket-1.0.0-py2.py3-none-any.whl.metadata (2.8 kB)
Downloading dask-2025.3.0-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 1.4/1.4 MB 18.7 MB/s eta 0:00:00
Using cached click-8.1.8-py3-none-any.whl (98 kB)
Downloading cloudpickle-3.1.1-py3-none-any.whl (20 kB)
Us

In [1]:
import cdsapi              # Downloading data via CDS API
import zipfile             # Extracting downloaded .zip files
from pathlib import Path   # Working with system paths and directories

import xarray as xr        # Working with data arrays

## 1. Downloading data

Firstly, let us prepare the directory structure for storing the downloaded data. 

*Choose the parent directory where all data will be stored*:

In [2]:
PARENT_PATH = "./Data/agroclim_data" 

In [3]:
# Define the directory structure, starting at the parent path
parent_path = Path(PARENT_PATH)

# Historical data directories
hist_model_path = parent_path / "historical_model_data"
hist_obs_path = parent_path / "historical_observation_data"

# Future projection data directory
future_model_path = parent_path / "future_model_data"

# Create the above-defined directories
hist_model_path.mkdir(parents=True, exist_ok=True)
hist_obs_path.mkdir(parents=True, exist_ok=True)
future_model_path.mkdir(parents=True, exist_ok=True)

### 1.1 Downloading climate model data

To request climate data from the Climate Data Store (CDS) we will use the CDS API.

We make use of the option to manually set the CDS API credentials. First, you have to set two variables: URL and KEY which build together your CDS API key. The string of characters that make up your KEY include your personal User ID and CDS API key. 

*To obtain these, first register or login to the CDS (http://cds.climate.copernicus.eu), then visit https://cds.climate.copernicus.eu/api-how-to and copy the string of characters listed after "key:". Replace the ######### below with your key.*

In [4]:
URL = 'https://cds.climate.copernicus.eu/api'
KEY = '#########' # enter your key instead

In [5]:
# Initialise CDS API client
cds_api = cdsapi.Client(url=URL, key=KEY)

2025-03-25 12:30:15,252 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.


Now let us select model and variable parameters we are interested in:

In [6]:
# Choose model and future projection experiment
MODEL = 'ec_earth3_cc'
EXPERIMENT_FUTURE = 'ssp5_8_5'

# Choose climate variables to extract
VARIABLES = [
    "near_surface_air_temperature", 
    "daily_minimum_near_surface_air_temperature", 
    "daily_maximum_near_surface_air_temperature",
    "precipitation"
]

# Choose area to extract
AREA = [44, -10, 36, 1] # Approximate bounds of the Iberian Peninsula

# Choose years to download for historical data
# 2000 - 2013
YEARS_HIST = [
    "2000",
    "2001",
    "2002",
    "2003",
    "2004",
    "2005",
    "2006",
    "2007",
    "2008",
    "2009",
    "2010",
    "2011",
    "2012",
    "2013",
]

# Choose months to download for historical data
# (all 12 months)
MONTHS_HIST = [
    "01", "02", "03",
    "04", "05", "06",
    "07", "08", "09",
    "10", "11", "12"
]

# Choose days to download for historical data
# (all 31 days)
DAYS_HIST = [
    "01", "02", "03",
    "04", "05", "06",
    "07", "08", "09",
    "10", "11", "12",
    "13", "14", "15",
    "16", "17", "18",
    "19", "20", "21",
    "22", "23", "24",
    "25", "26", "27",
    "28", "29", "30",
    "31"
]

In [7]:
# Choose years to download for future projection data
# 2040-2049
YEARS_FUTURE = [
    "2040",
    "2041",
    "2042",
    "2043",
    "2044",
    "2045",
    "2046",
    "2047",
    "2048",
    "2049",
]

# Months and days are the same as historical data.
# (all possible months and all possible days)
MONTHS_FUTURE = MONTHS_HIST
DAYS_FUTURE = DAYS_HIST

#### 1.1.1 Downloading historical climate model data

Executing the cell below will retrieve the historical climate model data for the selected set of climate variables:

In [60]:
# Loop over selected variables
for variable in VARIABLES:
    print(f"Downloading {variable}...")
    
    # Choose a filename for the historical model data
    filename = f"{variable}_historical_{MODEL}.zip"
    filepath = hist_model_path/filename

    # Choose a directory name for extracting the downloaded data into
    extract_dir = f"{variable}_historical_{MODEL}_extracted"
    extract_path = hist_model_path/extract_dir

    # Download the zip file with selected data
    cds_api.retrieve(
        name = 'projections-cmip6',
        request = {
            "temporal_resolution": "daily",
            "model": MODEL,
            "experiment": "historical",
            "variable": variable,
            "year": YEARS_HIST,
            "month": MONTHS_HIST,
            "day": DAYS_HIST,
            "area": AREA
        },
        target = filepath
    )

    # Extract the zip file
    # ("//?/" to prevent issues with long file paths in Windows)
    with zipfile.ZipFile(r"//?/"+f"{filepath.resolve()}", 'r') as zip_ref:
        zip_ref.extractall(r"//?/"+f"{extract_path.resolve()}")

Downloading near_surface_air_temperature...


2025-03-20 12:23:30,677 INFO Request ID is 6bd186df-b493-4eac-957b-018254cbdb91
2025-03-20 12:23:30,779 INFO status has been updated to accepted
2025-03-20 12:23:38,173 INFO status has been updated to successful


d86cc2672760b93d6b98ff1e31b0330.zip:   0%|          | 0.00/4.35M [00:00<?, ?B/s]

Downloading daily_minimum_near_surface_air_temperature...


2025-03-20 12:23:39,766 INFO Request ID is 8d43cc8b-f49e-4888-b2a7-de2a2984b472
2025-03-20 12:23:39,850 INFO status has been updated to accepted
2025-03-20 12:23:47,206 INFO status has been updated to running
2025-03-20 12:23:52,325 INFO status has been updated to successful


d8ba18abd30364299ed96431ce38e913.zip:   0%|          | 0.00/4.36M [00:00<?, ?B/s]

Downloading daily_maximum_near_surface_air_temperature...


2025-03-20 12:23:56,900 INFO Request ID is 3defb251-28cf-4445-9f59-fed07f6f4b8d
2025-03-20 12:23:56,961 INFO status has been updated to accepted
2025-03-20 12:24:00,866 INFO status has been updated to running
2025-03-20 12:24:04,308 INFO status has been updated to successful


50d1e3745166e5a6e69d93fe08d63b9c.zip:   0%|          | 0.00/4.35M [00:00<?, ?B/s]

Downloading precipitation...


2025-03-20 12:24:06,555 INFO Request ID is 5cf48ba1-6a3b-464e-aa8b-6d7767990673
2025-03-20 12:24:06,617 INFO status has been updated to accepted
2025-03-20 12:24:13,980 INFO status has been updated to successful


48559aa899a33eed105c31108f03264.zip:   0%|          | 0.00/3.97M [00:00<?, ?B/s]

#### 1.1.2 Downloading future climate model data

Now we go through the same steps to download climate data in the future or application period:

In [61]:
# Loop over selected variables
for variable in VARIABLES:
    print(f"Downloading {variable}...")
    
    # Choose a filename for the historical model data
    filename = f"{variable}_future_{MODEL}_{EXPERIMENT_FUTURE}.zip"
    filepath = future_model_path/filename

    # Choose a directory name for extracting the downloaded data into
    extract_dir = f"{variable}_future_{MODEL}_{EXPERIMENT_FUTURE}_extracted"
    extract_path = future_model_path/extract_dir

    # Download the zip file with selected data
    cds_api.retrieve(
        name = 'projections-cmip6',
        request = {
            "temporal_resolution": "daily",
            "model": MODEL,
            "experiment": EXPERIMENT_FUTURE,
            "variable": variable,
            "year": YEARS_FUTURE,
            "month": MONTHS_FUTURE,
            "day": DAYS_FUTURE,
            "area": AREA
        },
        target = filepath
    )

    # Extract the zip file
    # ("//?/" to prevent issues with long file paths in Windows)
    with zipfile.ZipFile(r"//?/"+f"{filepath.resolve()}", 'r') as zip_ref:
        zip_ref.extractall(r"//?/"+f"{extract_path.resolve()}")

Downloading near_surface_air_temperature...


2025-03-20 12:26:49,497 INFO Request ID is 0b2688fa-5a3e-47f8-9abc-de2cfcd01ed5
2025-03-20 12:26:49,573 INFO status has been updated to accepted
2025-03-20 12:26:56,954 INFO status has been updated to successful


e0c7104e5173dd662379ef467b1a69fa.zip:   0%|          | 0.00/3.18M [00:00<?, ?B/s]

Downloading daily_minimum_near_surface_air_temperature...


2025-03-20 12:26:58,636 INFO Request ID is 959b459b-9c2d-4998-8a3d-d03052698044
2025-03-20 12:26:58,732 INFO status has been updated to accepted
2025-03-20 12:27:06,087 INFO status has been updated to successful


eb1a6234f32be384edb4bf8b7bad545b.zip:   0%|          | 0.00/3.18M [00:00<?, ?B/s]

Downloading daily_maximum_near_surface_air_temperature...


2025-03-20 12:27:07,529 INFO Request ID is 21747f8b-2b02-4ee0-923a-8410b65f1d0f
2025-03-20 12:27:07,640 INFO status has been updated to accepted
2025-03-20 12:27:20,118 INFO status has been updated to successful


cc0b33279047bfe5907cdeac82512ca9.zip:   0%|          | 0.00/3.18M [00:00<?, ?B/s]

Downloading precipitation...


2025-03-20 12:27:22,019 INFO Request ID is 344b066e-ba01-4d5a-b39d-dc71c68c568b
2025-03-20 12:27:22,090 INFO status has been updated to accepted
2025-03-20 12:27:25,970 INFO status has been updated to running
2025-03-20 12:27:29,411 INFO status has been updated to successful


d3fe8b8442814653a17ff123c261f657.zip:   0%|          | 0.00/2.79M [00:00<?, ?B/s]

### 1.2 Downloading historical observation (reanalysis) data

Now we need to download historical observation (reanalysis) data.

#### 1.2.1 Downloading AgERA5 data

We will download AgERA5 on daily temporal resolution.

The output of this application is a separate netCDF file for chosen daily statistic for each month for each year. We then concatenate these files manually. First we need to make some selections (make sure the data chosen here is consistent with the cm data pulled above):

In [8]:
# Choose a combination of climate variables and corresponding statistics
VAR_STATS_AGERA = [
    # (min, mean, max) near surface air temperature
    {
        "variable":"2m_temperature",
        "statistics":[
            "24_hour_minimum",
            "24_hour_mean",
            "24_hour_maximum",
        ]
    },

    # precipitation flux (no statistic applies)
    {
        "variable":"precipitation_flux",
        "statistics":[""]
    },
]

AgERA5 data has to be downloaded for each year and each statistic separately, in order to keep request size manageable

In [9]:
# Loop over different climate variables
for var_stat in VAR_STATS_AGERA:
    variable = var_stat["variable"]
    statistics = var_stat["statistics"]
    
    print(f"Downloading {variable}...")

    # Loop over relevant statistics for the current variable
    for statistic in statistics:
        print(f"Statistic: {statistic}")

        # Loop over relevant years
        for year in YEARS_HIST:
            print(f"----- Requesting year: {year} -----")

            # Choose a filename for the historical observation data
            filename = f"{variable}_{statistic}_historical_obs_{year}.zip"
            filepath = hist_obs_path/filename
            
            # Choose a directory name for extracting the downloaded data into
            extract_dir = f"{variable}_{statistic}_historical_obs_extracted"
            extract_path = hist_obs_path/extract_dir
            
            # Download the zip file with selected data
            if filepath.exists():
                # In case we get a CDS API error, we can rerun the cell without re-downloading already present files
                print(f"{filepath} already exists. Skipping...")
                continue
            else:
                cds_api.retrieve(
                    name = "sis-agrometeorological-indicators",
                    request = {
                        "variable": variable,
                        "statistic": [statistic],
                        "year": f"{year}",
                        "month": MONTHS_HIST,
                        "day": DAYS_HIST,
                        "area": AREA,
                        "version":"1_1"
                    },
                    target = filepath
                )
            
                # Extract the zip file
                # ("//?/" to prevent issues with long file paths in Windows)
                with zipfile.ZipFile(r"//?/"+f"{filepath.resolve()}", 'r') as zip_ref:
                    zip_ref.extractall(r"//?/"+f"{extract_path.resolve()}")
                

Downloading 2m_temperature...
Statistic: 24_hour_minimum
----- Requesting year: 2000 -----
Data\agroclim_data\historical_observation_data\2m_temperature_24_hour_minimum_historical_obs_2000.zip already exists. Skipping...
----- Requesting year: 2001 -----
Data\agroclim_data\historical_observation_data\2m_temperature_24_hour_minimum_historical_obs_2001.zip already exists. Skipping...
----- Requesting year: 2002 -----
Data\agroclim_data\historical_observation_data\2m_temperature_24_hour_minimum_historical_obs_2002.zip already exists. Skipping...
----- Requesting year: 2003 -----
Data\agroclim_data\historical_observation_data\2m_temperature_24_hour_minimum_historical_obs_2003.zip already exists. Skipping...
----- Requesting year: 2004 -----
Data\agroclim_data\historical_observation_data\2m_temperature_24_hour_minimum_historical_obs_2004.zip already exists. Skipping...
----- Requesting year: 2005 -----
Data\agroclim_data\historical_observation_data\2m_temperature_24_hour_minimum_historical_