# Pre-processing UERRA and EURO-CORDEX datasets

- A workflow from the CLIMAAX [Handbook](https://handbook.climaax.eu/) and [MULTI_infrastructure](https://github.com/CLIMAAX/MULTI_infrastructure) GitHub repository.
- See our [how to use risk workflows](https://handbook.climaax.eu/notebooks/workflows_how_to.html) page for information on how to run this notebook.

## Preparation Work

### Load libraries

In [1]:
import os

import cdsapi
import xarray as xr

### Path configuration

In [2]:
# Download folder for UERRA datasets
input_folder = "../data"
os.makedirs(input_folder, exist_ok=True)

# Data after unit format conversion
output_folder = "../data_conv"
os.makedirs(output_folder, exist_ok=True)

### CDS client setup

To access the data on CDS, you need an ECMWF account.
See the [guide on how to set up the API](https://cds.climate.copernicus.eu/how-to-api) for more information.

In [None]:
URL = "https://cds.climate.copernicus.eu/api"
KEY = None  # add your key or provide via ~/.cdsapirc

client = cdsapi.Client(url=URL, key=KEY)

## Step 1: Download datasets
You can download the dataset [UERRA regional reanalysis](https://cds.climate.copernicus.eu/datasets/reanalysis-uerra-europe-single-levels?tab=overview) from the Climate Data Store.
The UERRA dataset contains analyses of surface and near-surface essential climate variables from UERRA-HARMONIE and MESCAN-SURFEX systems. For this assessment the MESCAN-SURFEX was downloaded considering two different variables:

- **Air Temperature**: Measured at 2 meters above the surface (commonly referred to as 2m temperature).
- **Total Precipitation**: The total amount of water (both liquid and solid forms) falling onto the ground or water surface. This dataset includes all types of precipitation and represents an accumulated value over 24 hours, from 06:00 on one day to 06:00 the following day.

On the cell below, an example for the download of 2 meter temperature for year 1981:

In [None]:
dataset = "reanalysis-uerra-europe-single-levels"
request = {
    "origin": "mescan_surfex",
    "variable": "2m_temperature",
    "year": ["1981"],
    "month": [
        "01", "02", "03",
        "04", "05", "06",
        "07", "08", "09",
        "10", "11", "12"
    ],
    "day": [
        "01", "02", "03",
        "04", "05", "06",
        "07", "08", "09",
        "10", "11", "12",
        "13", "14", "15",
        "16", "17", "18",
        "19", "20", "21",
        "22", "23", "24",
        "25", "26", "27",
        "28", "29", "30",
        "31"
    ],
    "time": [
        "00:00", "06:00", "12:00",
        "18:00"
    ],
    "data_format": "netcdf"
}
filename = os.path.join(input_folder, "UERRA-1981.nc")

client.retrieve(dataset, request).download(filename)

## Step 2: Process the data
This code processes multiple NetCDF files from the input folder.
It automates several key tasks, including converting temperature data from Kelvin to Celsius and adding the units for both temperature (°C) and precipitation (mm/day).
Additionally, the code clips the datasets to a specified time period (1981-2010) and saves the modified files in the output directory.

In [5]:
def process_netcdf_files(input_folder, output_folder):
    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
        print(f'Created output folder: {output_folder}')

    # Loop through all the files in the input folder
    for filename in os.listdir(input_folder):
        if filename.endswith('.nc'):
            input_file_path = os.path.join(input_folder, filename)
            output_file_path = os.path.join(output_folder, filename)
            print(f'Processing file: {input_file_path}')

            # Load and clip the dataset
            ds = xr.open_dataset(input_file_path)
            ds_subset = ds.sel({'valid_time': slice('1981', '2010')})

            # Convert temperature from Kelvin to Celsius
            if 't2m' in ds_subset:
                ds_subset['t2m'] = ds_subset['t2m'] - 273.15
                ds_subset['t2m'].attrs['units'] = 'C'  # Add units attribute
                print('Converted t2m to Celsius')

            # Convert precipitation from kg/m²= mm/day
            if 'tp' in ds_subset:
                ds_subset['tp'] = ds_subset['tp']
                ds_subset['tp'].attrs['units'] = 'mm/d'  # Add units attribute
                print('Converted pr to mm/day')

            # Save the processed and clipped data to a new NetCDF file
            ds_subset.to_netcdf(output_file_path)
            print(f'Saved processed file to: {output_file_path}')

In [6]:
process_netcdf_files(input_folder, output_folder)

Processing file: ../data/UERRA-1981.nc
Converted t2m to Celsius
Saved processed file to: ../data_conv/UERRA-1981.nc


## Next step

The UERRA-based hazard assessment workflow continues with the [indicators calculation](02_UERRA_indicatorsCalculation.ipynb).