[![logo](https://climate.copernicus.eu/sites/default/files/custom-uploads/branding/LogoLine_horizon_C3S.png)](https://climate.copernicus.eu)

# Downloading PECD4.2 data from the CDS via cdsapi

This notebook provides a practical introduction to retrieving data from the Copernicus Climate Change Service (C3S) through the Climate Data Store (CDS) Application Program Interface (API), a service providing programmatic access to CDS.

The tutorial will demonstrate how to access climate and energy related variables from the Pan-European Climate Database (PECD4.2) derived from reanalysis and climate projections.
At the following [link](https://cds.climate.copernicus.eu/datasets/sis-energy-pecd) you can find an overview of the dataset, the technical documentation and the interface to download the data.

In this example we will download aggregated data in CSV format for one energy variable, Solar Photovoltaic Capacity Factor (or SPV), covering both a historical window (2011-2014) reconstructed using as input ERA5 reanalysis climate data, and a future window (2031-2034) computed using as input 3 different CMIP6 climate projection models for one of the available scenarios, the SSP245.

> **Note**  
>[ERA5](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5) is the fifth-generation atmospheric reanalysis program developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) in collaboration with the Copernicus Climate Change Service (C3S). It operates on a global scale and has a spatial resolution of $0.25° \times \ 0.25°$ (latitude and longitude), which corresponds to approximately 31 km; estimates of atmospheric
variables are provided hourly throughout a temporal coverage of about eight decades, from 1940 to today.

> **Note**  
>[CMIP6](https://pcmdi.llnl.gov/CMIP6/) (Coupled Model Intercomparison Project Phase 6) is an international effort that brings together climate models from research institutions worldwide. Its goal is to standardize and compare climate simulations to better understand past and future climate behavior. The results are widely used in scientific research and reports like those from the IPCC.

> **Note**  
>SSP245 (or SSP2-4.5) climate scenario is one of the plausible future pathways that combine assumptions about human development (like population growth, energy use, and policy) with projections of greenhouse gas emissions. Climate models use these scenarios to simulate how the Earth’s climate might respond under different conditions. The SSP2-4.5 is a “middle-of-the-road” scenario that assumes a moderate population and economic growth, a slow and uneven progress toward sustainability, and some mitigation of emissions (though not aggressive climate policies). The "4.5" refers to the projected radiative forcing — the extra energy trapped in the Earth system — of 4.5 W/m² by the year 2100.

## Learning objectives 🧠

In this notebook, you will learn how to use the cdsapi to download data from the Climate Data Store (CDS). You will then learn how to send an API request using python code through cdsapi. Finally, you will see how to split a large request, which won't be accepted on the CDS, into smaller chunks and send them through a for loop, or using parallel calls, in python.

## Target Audience 🎯

**Anyone** interested in learning how to download data from the PECD4.2 dataset.

## Prepare your environment

### Import libraries

We begin by importing the required libraries: these include the [os](https://docs.python.org/3/library/os.html) module, which provides a way to interact with the operating system and it is used here to create a folder to store the downloaded data; [cdsapi](https://github.com/ecmwf/cdsapi?tab=readme-ov-file), which provides programmatic access to the Copernicus Climate Data Store (CDS), allowing you to download data; [multiprocessing](https://docs.python.org/3/library/multiprocessing.html), which enables the use of multiple processors on your machine and is used here to handle parallel API requests.

In [1]:
import os
import cdsapi
from multiprocessing import Pool

### Set up the CDS API and your credentials


To learn how to use the CDS API, see the [official guide](https://cds.climate.copernicus.eu/how-to-api). If you have already set up your .cdsapirc file locally, you can upload it directly to your home directory.

Alternatively, you can replace None in the following code cell with your API Token as a string (i.e. enclosed in quotes, like ```"your_api_key"```). Your token can be found on the CDS portal at: https://cds.climate.copernicus.eu/profile (you will need to log in to view your credentials).
Remember to agree to the Terms and Conditions of every dataset you intend to download.

In [None]:
# If you have already setup your .cdsapirc file you can leave this as None
cdsapi_key = None
cdsapi_url = "https://cds.climate.copernicus.eu/api"

## Create a function to handle the data download

The data can be downloaded from the PECD CDS download form, by ticking the boxes of interest. Once all the required information is manually selected, scroll to the bottom of the form and click on "Show API request". This will reveal a code block that can be copied and pasted directly into a cell of your Jupyter Notebook. This step has already been done for you in the cell below, but if you'd like to try it yourself, visit the [CDS download form](https://cds.climate.copernicus.eu/datasets/sis-energy-pecd?tab=download).

In the following example we use data from PECD version "PECD4.2" across both the "Historical" and "Future Projections" temporal streams. The selected variable is
"Solar photovoltaic generation capacity factor" and the chosen technology is "60 (SPV industrial rooftop)", i.e. industrial rooftop installations with Si modules.

The origins of the data include:
- "ERA5 reanalysis" (for historical observations);
- "CMCC-CM2-SR5", "EC-Earth3", and "MPI-ESM1-2-HR" climate models (for future projections).

The emission scenario selected is "SSP2-4.5"; data is retrieved at the country level ("nuts_0" spatial resolution) and for the years 2011, 2012, 2031 and 2032.

The API request reflecting these selections is shown in the next cell.

In [3]:
# define our dataset
dataset = "sis-energy-pecd"

# dictionary of base request
request = {
    "pecd_version": "pecd4_2",
    "temporal_period": ["historical", "future_projections"],
    "origin": ["era5_reanalysis", "cmcc_cm2_sr5", "ec_earth3", "mpi_esm1_2_hr"],
    "emission_scenario": ["ssp2_4_5"],
    "variable": ["solar_photovoltaic_generation_capacity_factor"],
    "technology": ["60"],
    "spatial_resolution": ["nuts_0"],
    "year": ["2011", "2012", "2031", "2032"],
}

Before running the download we can make sure there is a dedicated folder ready to host the data.

In [4]:
# create folder to store downloaded data
folder = "cds_data/dowload_data_from_cds"
os.system(f"mkdir -p {folder}")

0

Now that it is all set, we are ready to download the data.

In [6]:
# initialize Client object
client = cdsapi.Client(cdsapi_url, cdsapi_key)
# call the retrieve method that downloads data
client.retrieve(dataset, request, f"{folder}/data.zip")

2025-07-10 14:35:50,067 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.


HTTPError: 403 Client Error: Forbidden for url: https://cds.climate.copernicus.eu/api/retrieve/v1/processes/sis-energy-pecd/execution
cost limits exceeded
Your request is too large, please reduce your selection.

The problem with the previously defined request is that it is too large. So either we manually split it into smaller requests and send them one by one directly through the CDS interface, or we can split it using python code. We will now build a function to send a single api request; later we will call it using the multiprocessing library. This allows us to split a very large request into many smaller ones that can be processed in parallel.

The function `retrieve_cds_data` takes as input several arguments that identify the specific data you need to download:


*   `dataset` (string): the name of the dataset to download from.
*   `pecd_version` (string): The version of the Pan-European Climate Database (PECD) you are interested in.
*   `temporal_period` (list of strings): specifies the time period of the data (e.g., "historical", "future_projections").
*   `origin` (list of strings): indicates the source of the data, such as a specific climate model or reanalysis dataset.
*   `variable` (list of strings): the specific climate or energy variable you want to download (e.g., "2m_temperature", "solar_photovoltaic_generation_capacity_factor").
*   `technology` (list of strings): specifies a technology related to the energy variables.
*   `spatial_resolution` (list of strings): defines the geographical resolution of the data (e.g., "nuts_0").
*   `year` (list of integers): the years for which you want to retrieve data.
*   `emissions` (list of strings, optional, default is None): if applicable, the emissions scenario (e.g., "ssp2_4_5"). This parameter is optional.

In [1]:
def retrieve_cds_data(
    dataset: str,
    pecd_version: str,
    temporal_period: list[str],
    origin: list[str],
    variable: list[str],
    technology: list[str],
    spatial_resolution: list[str],
    year: list[int],
    emissions: list[str] = None,
):

    # dictionary of the api request
    request = {
        "pecd_version": pecd_version,
        "temporal_period": temporal_period,
        "origin": origin,
        "variable": variable,
        "technology": technology,
        "spatial_resolution": spatial_resolution,
        "year": year,
    }

    # build the file path to the downloaded data
    file_path = (
        f"{folder}/"
        f"{pecd_version}_{temporal_period[0]}_{origin[0]}_{variable[0]}_"
        f"{technology[0]}_{spatial_resolution[0]}_{year[0]}"
    )

    # add emissions field if needed
    if emissions is not None:
        request["emission_scenario"] = emissions
        file_path += f"_{emissions[0]}"

    file_path += ".zip"

    # initialize Client object
    client = cdsapi.Client(cdsapi_url, cdsapi_key)
    # call retrieve method that downloads the data
    client.retrieve(dataset, request, file_path)

## Set up the parameters for data download

This section of the code defines several variables that will be used to specify the data to be downloaded from the CDS. These variables act as parameters for the API requests that will be made later. We will create a list of years both for historical data and projection data, then divide those lists into 2 years chunks

In [None]:
# constants
pecd_version = "pecd4_2"
emissions = ["ssp2_4_5"]
technology = ["60"]
spatial_resolution = ["nuts_0"]

# list of years to download
hist_start, hist_end = 2011, 2014
proj_start, proj_end = 2031, 2034
hist_years = [str(i) for i in range(hist_start, hist_end + 1)]
proj_years = [str(i) for i in range(proj_start, proj_end + 1)]

print(f"hist_years: {hist_years}")
print(f"proj_years: {proj_years}")

# divide our list of years into 2 groups of 2 years each
n = 2
hist_years_list = [hist_years[n * i: n * (i + 1)] for i in range(0, len(hist_years) // n)]
proj_years_list = [proj_years[n * i: n * (i + 1)] for i in range(0, len(proj_years) // n)]

print(f"hist_years_list: {hist_years_list}")
print(f"proj_years_list: {proj_years_list}")

# list of variables to download
vars = ["solar_photovoltaic_generation_capacity_factor"]

# dictionary of origins - projection models
origins = {
    "historical": ["era5_reanalysis"],
    "future_projections": ["cmcc_cm2_sr5", "ec_earth3", "mpi_esm1_2_hr"],
}

hist_years: ['2011', '2012', '2013', '2014']
proj_years: ['2031', '2032', '2033', '2034']
hist_years_list: [['2011', '2012'], ['2013', '2014']]
proj_years_list: [['2031', '2032'], ['2033', '2034']]


## Generate a list of API requests

This section of the code focuses on creating a list of requests that will be used to download data from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS). Each item in this list represents a specific data download request.

We will create a nested loop structure. The outer loop iterates through each variable defined in the vars list. For each variable, the code will generate requests for both historical and future projection data, contained in a tuple object. The inner loop iterates through each group of years in the corresponding years list. This list of tuples are necessary in order to call the starmap method of multiprocessing.

In [9]:
requests = []
# outer loop through variables

for var in vars:
    period = "historical"
    # loop through historical years
    for year in hist_years_list:
        request = (
            dataset,
            pecd_version,
            [period],
            origins[period],
            [var],
            technology,
            spatial_resolution,
            year,
        )
        requests.append(request)
    period = "future_projections"
    # loop through projection years
    for year in proj_years_list:
        for origin in origins[period]:
            request = (
                dataset,
                pecd_version,
                [period],
                [origin],
                [var],
                technology,
                spatial_resolution,
                year,
                emissions,
            )
            requests.append(request)

# print requests
print(f"total requests: {len(requests)}")
for request in requests:
    print(request)

total requests: 8
('sis-energy-pecd', 'pecd4_2', ['historical'], ['era5_reanalysis'], ['solar_photovoltaic_generation_capacity_factor'], ['60'], ['nuts_0'], ['2011', '2012'])
('sis-energy-pecd', 'pecd4_2', ['historical'], ['era5_reanalysis'], ['solar_photovoltaic_generation_capacity_factor'], ['60'], ['nuts_0'], ['2013', '2014'])
('sis-energy-pecd', 'pecd4_2', ['future_projections'], ['cmcc_cm2_sr5'], ['solar_photovoltaic_generation_capacity_factor'], ['60'], ['nuts_0'], ['2031', '2032'], ['ssp2_4_5'])
('sis-energy-pecd', 'pecd4_2', ['future_projections'], ['ec_earth3'], ['solar_photovoltaic_generation_capacity_factor'], ['60'], ['nuts_0'], ['2031', '2032'], ['ssp2_4_5'])
('sis-energy-pecd', 'pecd4_2', ['future_projections'], ['mpi_esm1_2_hr'], ['solar_photovoltaic_generation_capacity_factor'], ['60'], ['nuts_0'], ['2031', '2032'], ['ssp2_4_5'])
('sis-energy-pecd', 'pecd4_2', ['future_projections'], ['cmcc_cm2_sr5'], ['solar_photovoltaic_generation_capacity_factor'], ['60'], ['nuts_0']

These requests can be parallelized with multiprocessing (option 1), but you might as well choose to create a simple for loop over the requests list (option 2). Both ways, the result will be the same.

In this example we initialize the Pool object with 8 processes and call the starmap method, passing as arguments the function previously defined and the list of tuples created before.

In [None]:
# Option 1
with Pool(8) as p:
    p.starmap(retrieve_cds_data, requests)

As mentioned, the same can be achieved with a for loop.

In [None]:
# Option 2
for request in requests:
    retrieve_cds_data(*request)

## Unzip downloaded files

Since our data are inside zipped files we need to unzip them. The final NetCDF file will have a name according to the naming conventions of Pan-European Climate Database. You can find the explanation of the different fields in the [production guide](https://confluence.ecmwf.int/pages/viewpage.action?pageId=439598955#ClimateandenergyrelatedvariablesfromthePanEuropeanClimateDatabasederivedfromreanalysisandclimateprojections:Productuserguide(PUG)-Filenamesconventionandcharacteristics) of the PECD.

In [11]:
# Unzipping every file in our folder
for fname in os.listdir(folder):
    if fname.endswith(".zip"):
        subfolder = fname.split(".")[0]
        os.system(f"unzip {folder}/{fname} -d {folder}")

Archive:  cds_data/dowload_data_from_cds/pecd4_2_future_projections_ec_earth3_solar_photovoltaic_generation_capacity_factor_60_nuts_0_2033_ssp2_4_5.zip
  inflating: cds_data/dowload_data_from_cds/P_CMI6_ECEC_ECE3_SPV_0000m_Pecd_NUT0_S203301010000_E203312312300_CFR_TIM_01h_NA-_noc_org_60_SP245_NA---_PhM03_PECD4.2_fv1.csv  
  inflating: cds_data/dowload_data_from_cds/P_CMI6_ECEC_ECE3_SPV_0000m_Pecd_NUT0_S203401010000_E203412312300_CFR_TIM_01h_NA-_noc_org_60_SP245_NA---_PhM03_PECD4.2_fv1.csv  
Archive:  cds_data/dowload_data_from_cds/pecd4_2_future_projections_cmcc_cm2_sr5_solar_photovoltaic_generation_capacity_factor_60_nuts_0_2033_ssp2_4_5.zip
  inflating: cds_data/dowload_data_from_cds/P_CMI6_CMCC_CMR5_SPV_0000m_Pecd_NUT0_S203301010000_E203312312300_CFR_TIM_01h_NA-_noc_org_60_SP245_NA---_PhM03_PECD4.2_fv1.csv  
  inflating: cds_data/dowload_data_from_cds/P_CMI6_CMCC_CMR5_SPV_0000m_Pecd_NUT0_S203401010000_E203412312300_CFR_TIM_01h_NA-_noc_org_60_SP245_NA---_PhM03_PECD4.2_fv1.csv  
Archi

## Take home messages 📌



*  To download the data from the CDS (climate data store), you can build your own API request using the CDS interface
*  There is a size limit for each request, so if you need to download several variables/models or years, you need to split the request into smaller requests; you can do this very easily with python code, and you can also parallelize them

