# Monthly Data Workflow v2: Downloading and Verifying Multi-Variable Data

**Goal:** To download one full year (1995) of monthly-averaged data for our key climate variables and perform a sanity check to ensure the file is valid and readable. This workflow is based on the successful test download.

In [1]:
# Cell 1: Download Multi-Variable Data for 1995
import cdsapi
import os

# --- Configuration ---
output_dir = '../data/climate_monthly/'
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, 'era5_land_monthly_1995_multi-variable.grib')

# --- API Request ---
request_dictionary = {
    'product_type': 'monthly_averaged_reanalysis',
    'variable': [
        '2m_temperature', 'total_precipitation', 'volumetric_soil_water_layer_1',
        'surface_net_solar_radiation', 'potential_evaporation',
    ],
    'year': '1995',
    'month': [f'{m:02d}' for m in range(1, 13)],
    'time': '00:00',
    'format': 'grib',
    # The crucial key that ensures a clean, unarchived file
    'download_format': 'unarchived',
}

# --- Execute Download ---
try:
    if not os.path.exists(output_file):
        c = cdsapi.Client()
        print("Submitting API request for 1995 multi-variable monthly data...")
        c.retrieve(
            'reanalysis-era5-land-monthly-means',
            request_dictionary,
            output_file
        )
        print(f"\nDownload complete! File saved to: {output_file}")
        print(f"File size: {os.path.getsize(output_file) / 1e6:.2f} MB")
    else:
        print(f"File already exists, skipping download: {output_file}")
except Exception as e:
    print(f"\nAn error occurred: {e}")

2025-09-10 12:15:27,364 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-09-10 12:15:27,537 INFO Request ID is 081410fb-c45b-4f1c-bd1a-ec685e100d2b


Submitting API request for 1995 multi-variable monthly data...


2025-09-10 12:15:27,624 INFO status has been updated to accepted
2025-09-10 12:15:41,308 INFO status has been updated to running
2025-09-10 12:16:17,642 INFO status has been updated to successful
                                                                                                                       


Download complete! File saved to: ../data/climate_monthly/era5_land_monthly_1995_multi-variable.grib
File size: 393.84 MB




## Sanity Check

Now we will attempt to open the GRIB file we just downloaded to confirm it is valid and contains all the variables we requested.

In [2]:
# Cell 2: Sanity Check the Downloaded File
import xarray as xr

# --- File Path ---
file_path = '../data/climate_monthly/era5_land_monthly_1995_multi-variable.grib'

# --- Attempt to Open the File ---
try:
    # We use the cfgrib engine, which we know works with the server's GRIB format.
    ds_climate = xr.open_dataset(file_path, engine='cfgrib')
    
    print("--- SUCCESS! ---")
    print("The multi-variable GRIB file was loaded correctly.")
    print("\nDataset structure:")
    print(ds_climate)
    
    print("\nVariables found in file:")
    for var in ds_climate.data_vars:
        print(f"- {var}")

except Exception as e:
    print("--- FAILED ---")
    print(f"An error occurred while trying to open the file: {e}")

skipping variable: paramId==39 shortName='swvl1'
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\envs\climarisc\lib\site-packages\cfgrib\dataset.py", line 725, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "C:\ProgramData\miniconda3\envs\climarisc\lib\site-packages\cfgrib\dataset.py", line 641, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='step' value=Variable(dimensions=(), data=np.float64(24.0)) new_value=Variable(dimensions=(), data=np.float64(0.0))


--- SUCCESS! ---
The multi-variable GRIB file was loaded correctly.

Dataset structure:
<xarray.Dataset> Size: 1GB
Dimensions:     (time: 12, latitude: 1801, longitude: 3600)
Coordinates:
    number      int64 8B ...
  * time        (time) datetime64[ns] 96B 1995-01-01 1995-02-01 ... 1995-12-01
    step        timedelta64[ns] 8B ...
    surface     float64 8B ...
  * latitude    (latitude) float64 14kB 90.0 89.9 89.8 ... -89.8 -89.9 -90.0
  * longitude   (longitude) float64 29kB 0.0 0.1 0.2 0.3 ... 359.7 359.8 359.9
    valid_time  (time) datetime64[ns] 96B ...
Data variables:
    t2m         (time, latitude, longitude) float32 311MB ...
    tp          (time, latitude, longitude) float32 311MB ...
    ssr         (time, latitude, longitude) float32 311MB ...
    pev         (time, latitude, longitude) float32 311MB ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_su

  vars, attrs, coord_names = xr.conventions.decode_cf_variables(


## Analysis: Linking Yield to Climate (1995)

Now that we have successfully downloaded the multi-variable climate data for 1995, we will process it and link it to the 1995 maize yield data.

In [8]:
# Cell 3 (Corrected): Load Data for Analysis using correct GRIB short names

import xarray as xr
import numpy as np

# --- 1. Load 1995 Maize Yield Data ---
YIELD_PATH = '../data/maize/yield_1995.nc4'
ds_yield = xr.open_dataset(YIELD_PATH)
print("--- Yield Data for 1995 ---")
print(ds_yield)

# --- 2. Load 1995 Monthly Climate Data (Variable by Variable) ---
CLIMATE_PATH = '../data/climate_monthly/era5_land_monthly_1995_multi-variable.grib'

# Define the correct GRIB short names for the variables we want.
variables_to_load = {
    '2t': '2m_temperature',
    'tp': 'total_precipitation',
    'swvl1': 'volumetric_soil_water_layer_1',
    'ssr': 'surface_net_solar_radiation',
    'pev': 'potential_evaporation'
}

data_arrays = []
print("\n--- Loading Monthly Climate Data for 1995 (Variable by Variable) ---")

for short_name in variables_to_load.keys():
    try:
        # Open the SAME file multiple times, but filter for only ONE variable each time
        ds_single_var = xr.open_dataset(
            CLIMATE_PATH, 
            engine='cfgrib',
            backend_kwargs={'filter_by_keys': {'shortName': short_name}}
        )
        print(f"Successfully loaded '{short_name}'")
        data_arrays.append(ds_single_var)
    except Exception as e:
        # This will now correctly tell us if a variable like 'swvl1' is truly missing
        print(f"Could not load variable with shortName '{short_name}'. It may not be in the file. Skipping.")

# --- 3. Merge the individual variables into a single Dataset ---
if data_arrays:
    # Use compat='no_conflicts' to drop the conflicting 'step' coordinate
    ds_climate_monthly = xr.merge(data_arrays, compat='no_conflicts')
    
    print("\n--- Merged Monthly Climate Data for 1995 ---")
    print(ds_climate_monthly)
else:
    print("No climate variables were successfully loaded.")

--- Yield Data for 1995 ---
<xarray.Dataset> Size: 1MB
Dimensions:  (lon: 720, lat: 360)
Coordinates:
  * lon      (lon) float64 6kB 0.25 0.75 1.25 1.75 ... 358.2 358.8 359.2 359.8
  * lat      (lat) float64 3kB -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
Data variables:
    var      (lat, lon) float32 1MB ...

--- Loading Monthly Climate Data for 1995 (Variable by Variable) ---
Successfully loaded '2t'
Successfully loaded 'tp'
Successfully loaded 'swvl1'
Successfully loaded 'ssr'
Successfully loaded 'pev'


  vars, attrs, coord_names = xr.conventions.decode_cf_variables(
  vars, attrs, coord_names = xr.conventions.decode_cf_variables(
  vars, attrs, coord_names = xr.conventions.decode_cf_variables(
  vars, attrs, coord_names = xr.conventions.decode_cf_variables(
  vars, attrs, coord_names = xr.conventions.decode_cf_variables(


MergeError: conflicting values for variable 'step' on objects to be combined. You can skip this check by specifying compat='override'.

In [9]:
# Cell to verify all variables using the correct method
import xarray as xr

# --- Configuration ---
file_path = r'../data/climate_monthly/era5_land_monthly_1995_multi-variable.grib'

# The correct GRIB short names for the variables we requested.
variables_to_check = {
    '2t': '2m_temperature',
    'tp': 'total_precipitation',
    'swvl1': 'volumetric_soil_water_layer_1',
    'ssr': 'surface_net_solar_radiation',
    'pev': 'potential_evaporation'
}

print(f"--- Verifying variables in file: {file_path} ---")

# --- Loop and Check Each Variable ---
for short_name, long_name in variables_to_check.items():
    print(f"\nChecking for: {long_name} (shortName: {short_name})")
    try:
        # Using the correct method you provided
        ds_var = xr.open_dataset(
            file_path,
            engine="cfgrib",
            backend_kwargs={"indexpath": "", "filter_by_keys": {"shortName": short_name}},
        )
        print(f"  -> SUCCESS: Found and loaded '{long_name}'.")
        # print(ds_var) # Optional: uncomment to see the structure
    except Exception as e:
        print(f"  -> FAILED: Variable '{long_name}' not found or could not be loaded. Error: {e}")

--- Verifying variables in file: ../data/climate_monthly/era5_land_monthly_1995_multi-variable.grib ---

Checking for: 2m_temperature (shortName: 2t)


  vars, attrs, coord_names = xr.conventions.decode_cf_variables(


  -> SUCCESS: Found and loaded '2m_temperature'.

Checking for: total_precipitation (shortName: tp)


  vars, attrs, coord_names = xr.conventions.decode_cf_variables(


  -> SUCCESS: Found and loaded 'total_precipitation'.

Checking for: volumetric_soil_water_layer_1 (shortName: swvl1)


  vars, attrs, coord_names = xr.conventions.decode_cf_variables(


  -> SUCCESS: Found and loaded 'volumetric_soil_water_layer_1'.

Checking for: surface_net_solar_radiation (shortName: ssr)


  vars, attrs, coord_names = xr.conventions.decode_cf_variables(


  -> SUCCESS: Found and loaded 'surface_net_solar_radiation'.

Checking for: potential_evaporation (shortName: pev)
  -> SUCCESS: Found and loaded 'potential_evaporation'.


  vars, attrs, coord_names = xr.conventions.decode_cf_variables(
