# Validation and intercomparison of AgERA5 and other reanalysis datasets for agricultural applications

Production date: DD-MM-2025

**Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch.**

Dataset version: 2.0.

Produced by: C3S2_521 contract.

## 🌍 Use case: Agricultural yield estimation and prediction based on reanalysis data

## ❓ Quality assessment question
* How do reanalysis datasets compare to observations, and to each other, for agriculturally relevant variables?

A very short introduction before the assessment statement describing approach taken to answer the user question. One or two key references could be useful,  if the assessment summarises literature (referenced directly in the text, or with numerical labels like this (also listed at the end) `[[1]](https://doi.org/10.1038/s41598-018-20628-2))`giving: [[1]](https://doi.org/10.1038/s41598-018-20628-2)).

[[CDS AgERA5]](https://doi.org/10.24381/cds.6c68c9bb).

## 📢 Quality assessment statement

```{admonition} These are the key outcomes of this assessment
:class: note
* Finding 1
* Finding 2
* Finding 3
* etc
```

## 📋 Methodology

**Agrometeorological indicators from 1979 to present derived from reanalysis** (*AgERA5*; [doi 10.24381/cds.6c68c9bb](https://doi.org/10.24381/cds.6c68c9bb)).

A ‘free text’ introduction to the data analysis steps or a description of the literature synthesis, with a justification of the approach taken, and limitations mentioned. **Mention which CDS catalogue entry is used, including a link, and also any other entries used for the assessment**.

---
Variables of interest for a crop growth simulator such as [PCSE/WOFOST](https://github.com/ajwdewit/pcse):

| Variable name | Statistics | Unit | Example assessment |
|---------------|------------|------|--------------------|
| Solar irradiation | 24 h total | J/m2/day | example |
| 2 m temperature | 24 h min | °C | example |
|| 24 h max |||
|| 24 h mean (optional) |||
| Vapour pressure | 24 h mean | kPa | example |
| Rain | 24 h total | cm/day | example |
| 2 m Wind speed | 24 h mean | m/s | example |
| Snow depth | ??? | cm | example |

[Source](https://pcse.readthedocs.io/en/stable/code.html#pcse.base.WeatherDataContainer)
E0, ES0, ET0 are taken from evapotranspiration calculation

---

* These headings can be specific to the quality assessment, and help guide the user through the ‘story’ of the assessment. This means we cannot pre-define the sections and headings here, as they will be different for each assessment.
* Sub-bullets could be used to outline what will be done/shown/discussed in each section
* The list below is just an example, or may need more or fewer sections, with different headings

The analysis and results are organised in the following steps, which are detailed in the sections below:

**[](section-setup)**
 * Sub-steps or key points listed in bullet below. No strict requirement to match and link to sub-headings.

**[](section-download)**
 * Sub-steps or key points listed in bullet below. No strict requirement to match and link to sub-headings.

**[](section-4)**
 * Sub-steps or key points listed in bullet below. No strict requirement to match and link to sub-headings.
 
**[](section-results)** 
 * Sub-steps or key points listed in bullet below. No strict requirement to match and link to sub-headings.

Any further notes on the method could go here (explanations, caveats or limitations).

## 📈 Analysis and results

(section-setup)=
### 1. Code setup

#### Import required packages
```{note}
This notebook uses [earthkit](https://github.com/ecmwf/earthkit) for 
downloading ([earthkit-data](https://github.com/ecmwf/earthkit-data)) 
and visualising ([earthkit-plots](https://github.com/ecmwf/earthkit-plots)) data.
Because earthkit is in active development, some functionality may change after this notebook is published.
If any part of the code stops functioning, please raise an issue on our GitHub repository so it can be fixed.
```

In [1]:
import earthkit.data as ekd
import earthkit.plots as ekp
import numpy as np
from matplotlib import pyplot as plt

(section-download)=
### 2. Download data
#### General setup
We will be downloading multiple datasets in this notebook.
In this section, we define the parameters common to all datasets: time and space.
This way, these only need to be changed in one place if you wish to modify the notebook for your own use case.

In this notebook, we will be looking at data for the United Kingdom and Ireland every day in January–September 2024:

In [2]:
request_domain = {
    "area": [60, -12, 48, 4]  # North, West, South, East
}

In [3]:
request_time = {
    "year": "2024",
    "month": [f"{mo:02}" for mo in range(1, 10)],
    "day": [f"{d:02}" for d in range(1, 32)],
}

We can define a helper function that adds the time and domain parameters, as well as a dictionary of parameters specific to one dataset (e.g. AgERA5, ERA5-Land), to a number of requests:

In [4]:
def make_full_request(request_dataset: dict, *requests: dict) -> dict:
    base_request = request_time | request_domain | request_dataset
    updated_requests = [base_request | req for req in requests]
    return updated_requests

#### AgERA5
We now define parameters unique to AgERA5:

In [None]:
agera5_ID = "sis-agrometeorological-indicators"

request_agera5 = {
    "version": "2_0",
}

Next, we specify the variables of interest:

In [None]:
# Temperature has to be split into two requests because of size limits
request_irradiation = {
    "variable": "solar_radiation_flux",
}

request_temperature_min = {
    "variable": "2m_temperature",
    "statistic": ["24_hour_minimum"],
}

request_temperature_max = {
    "variable": "2m_temperature",
    "statistic": ["24_hour_maximum"],
}

request_temperature_mean = {
    "variable": "2m_temperature",
    "statistic": ["24_hour_mean"],
}

request_vapour_pressure = {
    "variable": "vapour_pressure",
    "statistic": ["24_hour_mean"],
}

request_rain = {
    "variable": "precipitation_flux",
}

request_wind = {
    "variable": "10m_wind_speed",
    "statistic": ["24_hour_mean"],
}

request_snow = {
    "variable": "snow_thickness",
    "statistic": ["24_hour_mean"],
}

The requests for specific variables are combined with the default, time, and domain parameters and passed to earthkit for download from the CDS:

In [None]:
requests_agera5_combined = make_full_request(request_agera5,
                                             request_irradiation,
                                             request_temperature_min, request_temperature_max, request_temperature_mean,
                                             request_vapour_pressure,
                                             request_rain,
                                             request_wind,
                                             # request_snow,  # left out because of plotting errors; add back later
                                            )

ds_agera5 = ekd.from_source("cds", agera5_ID, *requests_agera5_combined)

As a check, we can inspect the downloaded dataset:

In [None]:
ds_agera5.to_xarray()

#### ERA5-Land
We now define parameters unique to ERA5-Land and the variables of interest:

In [5]:
era5land_ID = "derived-era5-land-daily-statistics"

request_era5land = {
    "time_zone": "utc+00:00",
    "frequency": "1_hourly",
}

In [13]:
# solar_radiation_flux : Not available

request_temperature_min = {
    "variable": ["2m_temperature"],
    "statistic": "daily_minimum",
}

request_temperature_max = {
    "variable": "2m_temperature",
    "statistic": ["daily_maximum"],
}

request_temperature_mean = {
    "variable": "2m_temperature",
    "statistic": ["daily_mean"],
}

# request_vapour_pressure : Not available

request_wind_u = {
    "variable": "10m_u_component_of_wind",
    "statistic": ["daily_mean"],
}

request_wind_v = {
    "variable": "10m_v_component_of_wind",
    "statistic": ["daily_mean"],
}

request_snow = {
    "variable": "snow_depth",
    "statistic": ["daily_mean"],
}

In [14]:
requests_era5land_combined = make_full_request(request_era5land,
                                               request_temperature_min, request_temperature_max, request_temperature_mean,
                                               request_wind,
                                               request_snow
                                              )


In [17]:
print(requests_era5land_combined[2])
ds_era5land = ekd.from_source("cds", era5land_ID, requests_era5land_combined[2])

2025-08-27 17:01:11,410 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.


{'year': '2024', 'month': ['01', '02', '03', '04', '05', '06', '07', '08', '09'], 'day': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31'], 'area': [60, -12, 48, 4], 'time_zone': 'utc+00:00', 'frequency': '1_hourly', 'variable': '2m_temperature', 'statistic': ['daily_mean']}


2025-08-27 17:01:11,625 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-08-27 17:01:11,852 INFO Request ID is cbb19b97-d172-4d79-bbdd-f92778165451
2025-08-27 17:01:12,068 INFO status has been updated to accepted
2025-08-27 17:02:01,870 INFO status has been updated to failed


HTTPError: 400 Client Error: Bad Request for url: https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/cbb19b97-d172-4d79-bbdd-f92778165451/results
The job has failed
The job failed with: FileNotFoundError

In [None]:
ds_era5land.to_xarray()

(section-4)=
### 4. Sel examples

For reference

In [None]:
domain = ekp.geo.domains.union(["United Kingdom", "Ireland"], name="UK & Ireland")

In [None]:
oneday = ds_agera5.sel(date=20240505)

In [None]:
ekp.quickplot(oneday, domain=domain, units="celsius")

In [None]:
temperature = ds_agera5.sel(variable="Temperature_Air_2m_Max_24h")
chart = ekp.Map(domain=domain)
chart.contourf(temperature.sel(date=20240505), units="celsius", levels={"step": 2})
chart.land()
chart.coastlines()
chart.gridlines()
chart.legend()
chart.title()
chart.show()

In [None]:
rad = ds_agera5.sel(variable="Solar_Radiation_Flux")
chart = ekp.Map(domain=domain)
chart.contourf(rad.sel(date=20240505))
chart.land()
chart.coastlines()
chart.gridlines()
chart.legend()
chart.title()
chart.show()

(section-results)=
### 5. Results

#### Results Subsections
Describe what is done in this step/section and what the `code` in the cell does (if code is included). 

If this is the **results section**, we expect the final plots to be created here with a description of how to interpret them, and what information can be extracted for the specific use case and user question. The information in the 'quality assessment statement' should be derived here. 

## ℹ️ If you want to know more

### Key resources

List some key resources related to this assessment. E.g. CDS entries, applications, dataset documentation, external pages.
Also list any code libraries used (if applicable).

Code libraries used:
* Earthkit
  * [earthkit-data](https://github.com/ecmwf/earthkit)
  * [earthkit-plots](https://github.com/ecmwf/earthkit-plots)

### References
[[CDS AgERA5]](https://doi.org/10.24381/cds.6c68c9bb) Boogaard, H., Schubert, J., De Wit, A., Lazebnik, J., Hutjes, R., Van der Grijn, G., (2020): Agrometeorological indicators from 1979 to present derived from reanalysis. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.6c68c9bb (Accessed on DD-MMM-YYYY)