# ACDI assessment

## Task 2: ERA5 Land Data Download via CDS API
**Estimated Time:** ~2 hours

### Problem Description
Set up a Python function to download daily ERA5 Land air temperature data from Copernicus Data Store (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-monthly-means) using CDS API for a user-selected domain (Latitude & Longitude range) and time (month & year). You will have to register at the CDS website to obtain API access credentials. Download the data and describe data format and structure.

## Solution

ERA5-Land data can be downloaded from the Copernicus Data Store (CDS) using two approaches:

1. **`earthkit.data`** ‚Äî A high-level ECMWF library that wraps the CDS API. It handles authentication, request submission, caching, and format conversion automatically. Data is returned as an `earthkit` source object that can be converted directly to an `xarray.Dataset` via `.to_xarray()`.

2. **`cdsapi`** ‚Äî The official low-level CDS API client. Requests are submitted as plain Python dictionaries and the response is downloaded to a local file (e.g. NetCDF or GRIB), which must then be opened manually with a library such as `xarray`.

Both approaches require a valid CDS API key configured in `~/.cdsapirc`.
I will show how to load using earthkit.data (my preference), but I will also show how to load using cdsapi as this is the instruction.

In [None]:
import xarray as xr
import earthkit.data as ek

ds = ek.from_source(
    "cds",
    "reanalysis-era5-land-monthly-means",
    product_type="monthly_averaged_reanalysis",
    variable="2m_temperature",
    year="2025",
    month=["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"],
    time="00:00",
    data_format="netcdf",
    area=[-22, 16, -35, 33],
)

2026-02-18 12:17:20,287 INFO Request ID is f2470d35-ef06-403b-b937-3d7d482ce151
2026-02-18 12:17:20,561 INFO status has been updated to accepted
2026-02-18 12:17:27,733 INFO status has been updated to running
2026-02-18 12:17:37,999 INFO status has been updated to successful


ee9d41137a1b1b86d32d59c4c9acafed.zip:   0%|          | 0.00/368k [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
ds.to_xarray()

Unnamed: 0,Array,Chunk
Bytes,192 B,192 B
Shape,"(12,)","(12,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 192 B 192 B Shape (12,) (12,) Dask graph 1 chunks in 2 graph layers Data type",12  1,

Unnamed: 0,Array,Chunk
Bytes,192 B,192 B
Shape,"(12,)","(12,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,1.03 MiB,1.03 MiB
Shape,"(12, 131, 171)","(12, 131, 171)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.03 MiB 1.03 MiB Shape (12, 131, 171) (12, 131, 171) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",171  131  12,

Unnamed: 0,Array,Chunk
Bytes,1.03 MiB,1.03 MiB
Shape,"(12, 131, 171)","(12, 131, 171)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [12]:
import cdsapi

dataset = "reanalysis-era5-land-monthly-means"
request = {
    "product_type": ["monthly_averaged_reanalysis"],
    "variable": ["2m_temperature"],
    "year": ["2025"],
    "month": [
        "01", "02", "03",
        "04", "05", "06",
        "07", "08", "09",
        "10", "11", "12"
    ],
    "time": "00:00",
    "data_format": "netcdf",
    "download_format": "unarchived",
    "area": [-22, 16, -35, 33]
}

client = cdsapi.Client()
target = 'download.netcdf'
client.retrieve(dataset, request, target)

2026-02-18 12:22:29,219 INFO Request ID is 0dbf288d-159e-42b1-859e-8aa0da60de11
2026-02-18 12:22:29,427 INFO status has been updated to accepted
2026-02-18 12:22:38,519 INFO status has been updated to running
2026-02-18 12:22:43,786 INFO status has been updated to successful


69351f699d918b1fa37fb91900b21c79.nc:   0%|          | 0.00/368k [00:00<?, ?B/s]

'download.netcdf'

### Reading the downloaded file

The downloaded file is a standard NetCDF4 file. Key options for `xr.open_dataset`:
`engine="netcdf4"` explicitly selects the NetCDF4 backend (the default for `.nc`/`.netcdf` files). `mask_and_scale=True` automatically applies ERA5's packed `scale_factor`/`add_offset` and masks fill values as `NaN`. `decode_times=True`converts the raw numeric time axis to `datetime64` for easy indexing.

For large multi-year or global downloads, add `chunks={"valid_time": 1}` to enable lazy Dask-backed loading. We dont need this here as the file is small

In [None]:
ds_cds = xr.open_dataset(
    "download.netcdf",
    engine="netcdf4",
    mask_and_scale=True,
    decode_times=True,
)
ds_cds

print("=== Dimensions ===")
print(dict(ds_cds.dims))

print("\n=== Coordinates ===")
for name, coord in ds_cds.coords.items():
    print(f"  {name}: dtype={coord.dtype}, shape={coord.shape}")

print("\n=== Data Variables ===")
for name, var in ds_cds.data_vars.items():
    print(f"  {name}: dtype={var.dtype}, shape={var.shape}, units={var.attrs.get('units','?')}")

print("\n=== Global Attributes ===")
for k, v in ds_cds.attrs.items():
    print(f"  {k}: {v}")

### NetCDF File Structure

The downloaded file is a **CF-1.7 compliant NetCDF4** file produced by ECMWF. Its structure is as follows:

#### Dimensions
| Dimension | Size | Description |
|---|---|---|
| `valid_time` | 12 | One timestep per month (Jan‚ÄìDec 2025) |
| `latitude` | 131 | 0.1¬∞ grid, ‚àí22.0¬∞ to ‚àí35.0¬∞ N |
| `longitude` | 171 | 0.1¬∞ grid, 16.0¬∞ to 33.0¬∞ E |

The spatial resolution of **0.1¬∞ √ó 0.1¬∞** (~11 km) is native ERA5-Land resolution.

#### Data Variable
- **`t2m`** (`float32`, shape `12 √ó 131 √ó 171`) ‚Äî Monthly mean 2-metre air temperature in **Kelvin (K)**. Subtract 273.15 to convert to ¬∞C.

#### Coordinates
- **`valid_time`** ‚Äî `datetime64[ns]` timestamps marking the first day of each month.
- **`latitude` / `longitude`** ‚Äî Regular lat/lon grid in decimal degrees (WGS84).
- **`expver`** ‚Äî ECMWF experiment version string (internal versioning).
- **`number`** ‚Äî Ensemble member ID (0 for deterministic reanalysis).

#### Global Attributes
- **`Conventions: CF-1.7`** ‚Äî File follows the Climate and Forecast metadata conventions, ensuring interoperability with standard tools (xarray, CDO, NCO, QGIS, etc.).
- **`institution`** ‚Äî European Centre for Medium-Range Weather Forecasts (ECMWF).
- **`history`** ‚Äî Records the GRIB‚ÜíNetCDF conversion performed server-side by cfgrib before download.