# Validation and intercomparison of AgERA5 and other reanalysis datasets for agricultural applications

Production date: DD-MM-2025

**Please note that this repository is used for development and review, so quality assessments should be considered work in progress until they are merged into the main branch.**

Dataset version: 2.0.

Produced by: C3S2_521 contract.

## 🌍 Use case: Agricultural yield estimation and prediction based on reanalysis data

## ❓ Quality assessment question
* How do reanalysis datasets compare to observations, and to each other, for agriculturally relevant variables?

A very short introduction before the assessment statement describing approach taken to answer the user question. One or two key references could be useful,  if the assessment summarises literature (referenced directly in the text, or with numerical labels like this (also listed at the end) `[[1]](https://doi.org/10.1038/s41598-018-20628-2))`giving: [[1]](https://doi.org/10.1038/s41598-018-20628-2)).

[[CDS AgERA5]](https://doi.org/10.24381/cds.6c68c9bb).

## 📢 Quality assessment statement

```{admonition} These are the key outcomes of this assessment
:class: note
* Finding 1
* Finding 2
* Finding 3
* etc
```

## 📋 Methodology

**Agrometeorological indicators from 1979 to present derived from reanalysis** (*AgERA5*; [doi 10.24381/cds.6c68c9bb](https://doi.org/10.24381/cds.6c68c9bb)).

A ‘free text’ introduction to the data analysis steps or a description of the literature synthesis, with a justification of the approach taken, and limitations mentioned. **Mention which CDS catalogue entry is used, including a link, and also any other entries used for the assessment**.

---
Variables of interest for a crop growth simulator such as [PCSE/WOFOST](https://github.com/ajwdewit/pcse):

| Variable name | Statistics | Unit | Example assessment |
|---------------|------------|------|--------------------|
| Solar irradiation | 24 h total | J/m2/day | example |
| 2 m temperature | 24 h min | °C | example |
|| 24 h max |||
|| 24 h mean (optional) |||
| Vapour pressure | 24 h mean | kPa | example |
| Rain | 24 h total | cm/day | example |
| 2 m Wind speed | 24 h mean | m/s | example |
| Snow depth | ??? | cm | example |

[Source](https://pcse.readthedocs.io/en/stable/code.html#pcse.base.WeatherDataContainer)
E0, ES0, ET0 are taken from evapotranspiration calculation

---

* These headings can be specific to the quality assessment, and help guide the user through the ‘story’ of the assessment. This means we cannot pre-define the sections and headings here, as they will be different for each assessment.
* Sub-bullets could be used to outline what will be done/shown/discussed in each section
* The list below is just an example, or may need more or fewer sections, with different headings

The analysis and results are organised in the following steps, which are detailed in the sections below:

**[](section-setup)**

**[](section-download)**
 * AgERA5, ERA5-Land, E-OBS, ...

**[](section-test)**
 * Sub-steps or key points listed in bullet below. No strict requirement to match and link to sub-headings.
 
**[](section-results)** 
 * Sub-steps or key points listed in bullet below. No strict requirement to match and link to sub-headings.

Any further notes on the method could go here (explanations, caveats or limitations).

## 📈 Analysis and results

(section-setup)=
### 1. Code setup

#### Import required packages
```{note}
This notebook uses [earthkit](https://github.com/ecmwf/earthkit) for 
downloading ([earthkit-data](https://github.com/ecmwf/earthkit-data)) 
and 
visualising ([earthkit-plots](https://github.com/ecmwf/earthkit-plots)) data.
Because earthkit is in active development, some functionality may change after this notebook is published.
If any part of the code stops functioning, please raise an issue on our GitHub repository so it can be fixed.
```

In [None]:
import earthkit.data as ekd
import earthkit.plots as ekp
import xarray as xr
from matplotlib import pyplot as plt

(section-download)=
### 2. Download data
#### General setup
This notebook uses [earthkit-data](https://github.com/ecmwf/earthkit-data) to download files from the CDS.
If you intend to run this notebook multiple times, it is highly recommended that you [enable caching](https://earthkit-data.readthedocs.io/en/latest/guide/caching.html) to prevent having to download the same files multiple times.

We will be downloading multiple datasets in this notebook.
In this section, we define the parameters common to all datasets: time and space.
This way, these only need to be changed in one place if you wish to modify the notebook for your own use case.

In this notebook, we will be looking at data for the United Kingdom and Ireland every day in January–September 2024:

In [None]:
request_domain = {
    "area": [60, -12, 48, 4]  # North, West, South, East
}

In [None]:
request_time = {
    "year": "2024",
    "month": [f"{mo:02}" for mo in range(1, 10)],
    "day": [f"{d:02}" for d in range(1, 32)],
}

We can define a helper function that adds the time and domain parameters, as well as a dictionary of parameters specific to one dataset (e.g. AgERA5, ERA5-Land), to a number of requests:

In [None]:
def make_full_request(request_dataset: dict, *requests: dict) -> dict:
    base_request = request_time | request_domain | request_dataset
    updated_requests = [base_request | req for req in requests]
    return updated_requests

#### AgERA5
We now define parameters unique to AgERA5:

In [None]:
agera5_ID = "sis-agrometeorological-indicators"

request_agera5 = {
    "version": "2_0",
}

Next, we specify the variables of interest:

In [None]:
request_irradiation = {
    "variable": "solar_radiation_flux",
}

# Temperature has to be split into separate requests because of size limits
request_temperature_min = {
    "variable": "2m_temperature",
    "statistic": ["24_hour_minimum"],
}

request_temperature_max = {
    "variable": "2m_temperature",
    "statistic": ["24_hour_maximum"],
}

request_temperature_mean = {
    "variable": "2m_temperature",
    "statistic": ["24_hour_mean"],
}

request_vapour_pressure = {
    "variable": "vapour_pressure",
    "statistic": ["24_hour_mean"],
}

request_rain = {
    "variable": "precipitation_flux",
}

request_wind = {
    "variable": "10m_wind_speed",
    "statistic": ["24_hour_mean"],
}

request_snow = {
    "variable": "snow_thickness",
    "statistic": ["24_hour_mean"],
}

The requests for specific variables are combined with the default, time, and domain parameters and passed to earthkit for download from the CDS:

In [None]:
requests_agera5_combined = make_full_request(request_agera5,
                                             request_irradiation,
                                             request_temperature_min, request_temperature_max, request_temperature_mean,
                                             request_vapour_pressure,
                                             request_rain,
                                             request_wind,
                                             request_snow,
                                            )

ds_agera5 = ekd.from_source("cds", agera5_ID, *requests_agera5_combined)

Earthkit-data downloads the dataset as a field list, which can be manipulated directly.
Here, we convert it to an Xarray object for ease of use later (when comparing multiple datasets):

In [None]:
print("AgERA5 data type from earthkit-data:", type(ds_agera5))
data_agera5 = ds_agera5.to_xarray(compat="equals")
print("AgERA5 data type in Xarray:", type(data_agera5))
data_agera5

#### ERA5-Land
We now define parameters unique to ERA5-Land and the variables of interest:

In [None]:
era5land_ID = "derived-era5-land-daily-statistics"

request_era5land = {
    "time_zone": "utc+00:00",
    "frequency": "1_hourly",
}

In [None]:
# solar_radiation_flux : Not available - get from reanalysis-era5-land

# Temperature has to be split into separate requests because of size limits
request_temperature_min = {
    "variable": "2m_temperature",
    "daily_statistic": "daily_minimum",
}

request_temperature_max = {
    "variable": "2m_temperature",
    "daily_statistic": ["daily_maximum"],
}

request_temperature_mean = {
    "variable": "2m_temperature",
    "daily_statistic": ["daily_mean"],
}

# request_vapour_pressure : Not available
# request_rain : Not available - get from reanalysis-era5-land

request_wind_u = {
    "variable": "10m_u_component_of_wind",
    "daily_statistic": ["daily_mean"],
}

request_wind_v = {
    "variable": "10m_v_component_of_wind",
    "daily_statistic": ["daily_mean"],
}

request_snow = {
    "variable": "snow_depth",
    "daily_statistic": ["daily_mean"],
}

The ERA5-Land dataset is structured differently from AgERA5 and requires more pre-processing before the two can be intercompared.
For this reason, we download the different variables separately.

In [None]:
requests_era5land_combined = make_full_request(request_era5land,
                                               request_temperature_min, request_temperature_max, request_temperature_mean,
                                               request_wind_u, request_wind_v,
                                               request_snow
                                              )

ds_era5land = [ekd.from_source("cds", era5land_ID, req) for req in requests_era5land_combined]
data_era5land = [ds.to_xarray() for ds in ds_era5land]
data_era5land_temperature_min, data_era5land_temperature_max, data_era5land_temperature_mean, data_era5land_wind_u, data_era5land_wind_v, data_era5land_snow = data_era5land

The temperature statistics are downloaded as simply `t2m`.
These need to be renamed before they can be combined:

In [None]:
data_era5land_temperature_min = data_era5land_temperature_min.rename({"t2m": "Temperature_Air_2m_Min_24h"})
data_era5land_temperature_max = data_era5land_temperature_max.rename({"t2m": "Temperature_Air_2m_Max_24h"})
data_era5land_temperature_mean = data_era5land_temperature_mean.rename({"t2m": "Temperature_Air_2m_Mean_24h"})
data_era5land_temperature = xr.merge([data_era5land_temperature_min, data_era5land_temperature_max, data_era5land_temperature_mean], compat="equals")
data_era5land_temperature

The 10 m wind speed is calculated from the two variables representing its U (east–west) and V (north–south) components:

In [None]:
data_era5land_wind = xr.merge([data_era5land_wind_u, data_era5land_wind_v], compat="equals")
data_era5land_wind = data_era5land_wind.assign(
    {"Wind_Speed_10m_Mean_24h": xr.ufuncs.sqrt(data_era5land_wind["u10"]**2 + data_era5land_wind["v10"]**2)}
)
data_era5land_wind

Lastly, we rename the precipitation and snow parameters to match AgERA5:

In [None]:
# data_era5land_temperature_min = data_era5land_temperature_min.rename({"t2m": "Temperature_Air_2m_Min_24h"})
data_era5land_snow = data_era5land_snow.rename({"sde": "Snow_Thickness_Mean_24h"})

Now we can combine the pre-processed variables into a single Xarray dataset.
We also rename the `valid_time` coordinate to match AgERA5.

In [None]:
data_era5land = xr.merge([data_era5land_temperature, data_era5land_wind, data_era5land_snow], compat="equals")
data_era5land = data_era5land.rename({"valid_time": "time"})
data_era5land

#### E-OBS
We now define parameters unique to E-OBS and the variables of interest:

(section-test)=
### 4. Plotting


In [None]:
domain = ekp.geo.domains.union(["United Kingdom", "Ireland"], name="UK & Ireland")

In [None]:
agera5_oneday = data_agera5.sel(time="20240505")
agera5_oneday

In [None]:
era5land_oneday = data_era5land.sel(time="20240505")
era5land_oneday

In [None]:
ekp.quickplot(agera5_oneday, domain=domain, units="celsius")
ekp.quickplot(era5land_oneday, domain=domain, units="celsius")

In [None]:
ekp.quickplot??

In [None]:
chart = ekp.Map(domain=domain)
agera5_oneday = data_agera5.sel(time="20240505")
chart.imshow(agera5_oneday, z="Temperature_Air_2m_Min_24h")
chart.land()
chart.coastlines()
chart.gridlines()
chart.legend()
chart.title()
chart.show()

In [None]:
chart.crs

In [None]:
chart = ekp.Map(domain=domain)
era5land_oneday = era5land_oneday.sel(time="20240505")
chart.contourf(era5land_oneday, z="Temperature_Air_2m_Mean_24h", units="celsius", levels={"step": 2})
chart.land()
chart.coastlines()
chart.gridlines()
chart.legend()
chart.title()
chart.show()

(section-results)=
### 5. Results

#### Results Subsections
Describe what is done in this step/section and what the `code` in the cell does (if code is included). 

If this is the **results section**, we expect the final plots to be created here with a description of how to interpret them, and what information can be extracted for the specific use case and user question. The information in the 'quality assessment statement' should be derived here. 

## ℹ️ If you want to know more

### Key resources

List some key resources related to this assessment. E.g. CDS entries, applications, dataset documentation, external pages.
Also list any code libraries used (if applicable).

Code libraries used:
* Earthkit
  * [earthkit-data](https://github.com/ecmwf/earthkit)
  * [earthkit-plots](https://github.com/ecmwf/earthkit-plots)

### References
[[CDS AgERA5]](https://doi.org/10.24381/cds.6c68c9bb) Boogaard, H., Schubert, J., De Wit, A., Lazebnik, J., Hutjes, R., Van der Grijn, G., (2020): Agrometeorological indicators from 1979 to present derived from reanalysis. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.6c68c9bb (Accessed on DD-MMM-YYYY)