# Hazard assessment for Infrastructures using Euro-Cordex datasets
## Calculation of the Return levels for the return periods of 10, 20, 30, 50, 100, 150 years

- See our [how to use risk workflows](https://handbook.climaax.eu/notebooks/workflows_how_to.html) page for information on how to run this notebook.

## **Hazard assessment methodology**
Using these averaged datasets, we computed the return levels of total daily precipitation for the return periods of 10, 20, 30, 50, 100, 150 years for each time period.

We utilized outputs from 14 models within the EURO-CORDEX framework to evaluate hazards affecting infrastructure, in this notebook we used the extreme percentiles of total daily precipitation as an indicator of the hazzard. Our analysis included three Representative Concentration Pathways (RCPs): RCP2.6, RCP4.5, and RCP8.5. To structure the future projections, Each RCP scenario was analyzed over three distinct future timeframes: 2021–2050, 2041–2070, and 2071–2100. Additionally, we used the historical period (1981–2010) as a baseline for comparison to evaluate changes in climate hazards over time.

### Analysis of the Return Levels of Precipitation
We computed the return levels of the total daily precipitation for each model, and time period. These calculations were performed for both the historical and future RCP scenarios. To quantify changes, we calculated anomalies by subtracting, for each individual model, the historical dataset from its corresponding future projection (RCP2.6, RCP4.5, and RCP8.5).

To account for uncertainties and provide a robust projection, we computed the average across all 14 models for each return level of precipitation, scenario, and time period. The ensemble averaging process involved aggregating anomalies for all models and then calculating the mean, yielding a single representative dataset for each RCP scenario and timeframe.

## **Limitation of the Euro-Cordex dataset**
The EURO-CORDEX (Coordinated Regional Climate Downscaling Experiment for Europe) project is a set of high-resolution regional climate projections for Europe, designed to support impact, adaptation, and vulnerability assessments under various climate change scenarios. The EURO-CORDEX integrate global climate model (GCM) outputs with regional climate models (RCMs), enabling the simulation of climatic patterns and extremes. The models explore different Representative Concentration Pathways (RCPs) from CMIP5 (RCP2.6, RCP4.5, RCP8.5) and Shared Socioeconomic Pathways (SSPs) from CMIP6 (SSP1-2.6, SSP5-8.5). The simulations cover historical periods (1950–2005) and future projections (2006–2100). These models are validated against observational data and reanalysis datasets

Some of the limitations:
- EURO-CORDEX offers high-resolution data (typically 0.11° ~ 12.5 km and 0.44° ~ 50 km), it may still not fully capture localized phenomena such as urban heat islands, small-scale topographic effects, and small meteorological events.
- Like all climate models, EURO-CORDEX RCMs and their driving GCMs exhibit biases compared to observed data, these Biases can vary regionally and seasonally. And may struggle to accurately simulate extreme weather events such as heatwaves, heavy precipitation, or storms.
- While the dataset captures trends in extremes, very high thresholds (>45°C or >100 mm/day rainfall) may have higher uncertainty due to limited observational data.

### Select area of interest
Before downloading the data, we will define the coordinates of the area of interest, for this workflow we selected the Italy region. Based on the shapefile of the country we will be able to clip the datasets for further processing, and display hazard and damage maps for this area.

### Load libraries

In [1]:
import cftime
import datetime
import os
import re
import xarray as xr
import numpy as np
from scipy.stats import gumbel_rs

### Create the directory structure

In [None]:
# Paths and subfolders
nc_files = "/climax/data/cordex/precip"
general_path = "/climax/indicators/cordex2"
subfolders = ['historical', 'rcp26', 'rcp45', 'rcp85']

In [None]:
# Time ranges to process
rcp_time_ranges = [('2021', '2050'), ('2041', '2070'), ('2071', '2100')]
historical_time_range = [('1981', '2010')]

# Define return periods
return_periods = np.array([10, 20, 30, 50, 100, 150])
exceedance_probs = 1 - (1 / return_periods)

In [None]:
# Function to calculate and save return levels
def calculate_return_levels(file_path, save_path, start_year, end_year):
    print("---------------------------------------------------")
    print(f"Processing {file_path} for time range {start_year}-{end_year}")
    ds = xr.open_dataset(file_path)

    # Select daily max precipitation for the given time range
    ds_sliced = ds.sel(time=slice(start_year, end_year))
    dailyMaxPre = (ds_sliced['pr'] * 86400).resample(time='D').max()  # Convert to mm/day
    dailyMaxPre.attrs['units'] = 'mm/d'

    # Calculate annual maximum precipitation
    annual_max_precip = dailyMaxPre.resample(time='Y').max()

    # Initialize dataset for return levels
    return_levels_ds = xr.Dataset()

    # Loop over each grid point and fit the Gumbel distribution
    for x in annual_max_precip['x']:
        for y in annual_max_precip['y']:
            # Extract the annual maximum series for the grid cell
            annual_max_values = annual_max_precip.sel(x=x, y=y).values
            annual_max_values = annual_max_values[~np.isnan(annual_max_values)]  # Remove NaNs

            if len(annual_max_values) > 0:
                # Fit the Gumbel distribution
                loc, scale = gumbel_r.fit(annual_max_values)

                # Calculate return levels
                return_levels = gumbel_r.ppf(exceedance_probs, loc, scale)

                # Store return levels for each period
                for rp, rl in zip(return_periods, return_levels):
                    return_period_label = f"return_period_{rp}_y"
                    if return_period_label not in return_levels_ds:
                        return_levels_ds[return_period_label] = xr.full_like(annual_max_precip.isel(time=0), np.nan)
                    return_levels_ds[return_period_label].loc[{'x': x, 'y': y}] = rl

# Function to calculate and save return levels
def calculate_return_levels(file_path, save_path, start_year, end_year):
    print("---------------------------------------------------")
    print(f"Processing {file_path} for time range {start_year}-{end_year}")
    ds = xr.open_dataset(file_path)

    # Select daily max precipitation for the given time range
    ds_sliced = ds.sel(time=slice(start_year, end_year))
    dailyMaxPre = (ds_sliced['pr'] * 86400).resample(time='D').max()  # Convert to mm/day
    dailyMaxPre.attrs['units'] = 'mm/d'

    # Calculate annual maximum precipitation
    annual_max_precip = dailyMaxPre.resample(time='Y').max()

    # Initialize dataset for return levels
    return_levels_ds = xr.Dataset()

    # Loop over each grid point and fit the Gumbel distribution
    for x in annual_max_precip['x']:
        for y in annual_max_precip['y']:
            # Extract the annual maximum series for the grid cell
            annual_max_values = annual_max_precip.sel(x=x, y=y).values
            annual_max_values = annual_max_values[~np.isnan(annual_max_values)]  # Remove NaNs

            if len(annual_max_values) > 0:
                # Fit the Gumbel distribution
                loc, scale = gumbel_r.fit(annual_max_values)

                # Calculate return levels
                return_levels = gumbel_r.ppf(exceedance_probs, loc, scale)

                # Store return levels for each period
                for rp, rl in zip(return_periods, return_levels):
                    return_period_label = f"return_period_{rp}_y"
                    if return_period_label not in return_levels_ds:
                        return_levels_ds[return_period_label] = xr.full_like(annual_max_precip.isel(time=0), np.nan)
                    return_levels_ds[return_period_label].loc[{'x': x, 'y': y}] = rl


    # Add metadata to the dataset
    return_levels_ds.attrs['description'] = f'Return levels for time range {start_year}-{end_year}'

    # Save the dataset
    filename = os.path.basename(file_path)
    file_name_no_ext = os.path.splitext(filename)[0]
    new_filename = f"{file_name_no_ext}_return_levels_{start_year}-{end_year}.nc"
    save_file_path = os.path.join(save_path, new_filename)
    return_levels_ds.to_netcdf(save_file_path)

    print(f"Saved return levels to {save_file_path}")
    return save_file_path

In [None]:
# Main processing loop
for subfolder in subfolders:
    print(subfolder)
    folder_path = os.path.join(nc_files, subfolder)
    save_subfolder = os.path.join(general_path, 'returnLevels', subfolder)

    # Create destination subfolder if it doesn't exist
    os.makedirs(save_subfolder, exist_ok=True)

    # Select time ranges based on the scenario
    if subfolder == 'historical':
        time_ranges = historical_time_range
    else:
        time_ranges = rcp_time_ranges

    # Loop through each NetCDF file in the subfolder
    for file in os.listdir(folder_path):
        file_path = os.path.join(folder_path, file)

        # Check if it's a NetCDF file
        if file.endswith('.nc'):
            # Loop through the defined time ranges
            for start_year, end_year in time_ranges:
                print(f"Processing return levels for time range {start_year}-{end_year}")

                # Calculate and save return levels for each time range
                calculate_return_levels(file_path, save_subfolder, start_year, end_year)

print("Return levels calculation complete!")

### **Anomly Calculation**

In [None]:
# Directories
historical_dir = "/climax/indicators/cordex/returnLevels/historical"
rcp26_dir = "/climax/indicators/cordex/returnLevels/rcp26"
rcp45_dir = "/climax/indicators/cordex/returnLevels/rcp45"
rcp85_dir = "/climax/indicators/cordex/returnLevels/rcp85"
output_dir = "/climax/indicators/cordex/returnLevels/anomalies"

# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)

In [None]:
# Function to parse filenames and extract key components
def parse_filename(filename):
    pattern = r"pr_EUR-11_(.+?)_(historical|rcp26|rcp45|rcp85)_r\d+i\d+p\d+_(.+?)_day_\d+_return_levels_([\d-]+).nc"

    match = re.match(pattern, filename)
    if match:
        model = match.group(1)
        scenario = match.group(2)
        rcm = match.group(3)
        time_period = match.group(4)
        return model, scenario, rcm, time_period
    return None

In [None]:
# Load filenames into dictionaries
historical_files = {parse_filename(f): os.path.join(historical_dir, f) for f in os.listdir(historical_dir) if f.endswith(".nc")}
rcp26_files = {parse_filename(f): os.path.join(rcp26_dir, f) for f in os.listdir(rcp26_dir) if f.endswith(".nc")}
rcp45_files = {parse_filename(f): os.path.join(rcp45_dir, f) for f in os.listdir(rcp45_dir) if f.endswith(".nc")}
rcp85_files = {parse_filename(f): os.path.join(rcp85_dir, f) for f in os.listdir(rcp85_dir) if f.endswith(".nc")}

In [None]:
# Function to perform subtraction for all return period variables and save
def subtract_return_levels(historical_file, future_file, output_file):
    # Load the datasets using xarray
    historical_ds = xr.open_dataset(historical_file)
    future_ds = xr.open_dataset(future_file)
    print(f"Processing: {historical_file} and {future_file}")

    # Initialize dataset for anomalies
    anomaly_ds = xr.Dataset()

    # Iterate through all return period variables
    return_period_vars = [var for var in historical_ds.data_vars if var.startswith("return_period")]
    for var in return_period_vars:
        if var in future_ds:
            # Subtract historical return levels from future return levels
            anomaly = future_ds[var] - historical_ds[var]
            anomaly_ds[var] = anomaly

            # Calculate percentage change
            percentage_change = (anomaly / historical_ds[var]) * 100
            anomaly_ds[f"{var}_percentage_change"] = percentage_change

    # Add metadata and coordinates
    anomaly_ds.attrs['description'] = f"Anomalies for return levels ({time_period}) relative to historical (1981-2010)"
    anomaly_ds = anomaly_ds.assign_coords({
        'lat': (('y', 'x'), historical_ds['lat'].values),  # Assign lat as 2D coordinates
        'lon': (('y', 'x'), historical_ds['lon'].values)   # Assign lon as 2D coordinates
    })

    # Save the dataset to the output file
    anomaly_ds.to_netcdf(output_file)
    print(f"Saved anomalies to: {output_file}")

In [None]:
# Match files and process
for key, hist_file in historical_files.items():
    if key is None:
        continue
    model, _, rcm, _ = key

    # Iterate over all future scenarios
    for future_files in [rcp26_files, rcp45_files, rcp85_files]:
        for f_key, fut_file in future_files.items():
            if f_key is None:
                continue
            fut_model, scenario, fut_rcm, time_period = f_key

            # Match by model and RCM
            if model == fut_model and rcm == fut_rcm:
                output_filename = f"pr_EUR-11_{model}_{scenario}_diff_{rcm}_{time_period}.nc"
                output_path = os.path.join(output_dir, output_filename)
                subtract_return_levels(hist_file, fut_file, output_path)

### **Outputs and Hazard Assessment**

For each period, and RCP scenarios we calculated the average across all 14 models, producing a single representative dataset for each timeframe, including both the historical and RCP scenarios. The final outputs of the analysis include both individual model anomalies and ensemble-averaged datasets for each RCP scenario and time period. These datasets provide a comprehensive view of potential future hazards under different climate scenarios, offering insights into changes in the return levels of precipitation relative to the historical baseline. The averaged datasets were used to visualize the spatial distribution and magnitude of these hazards, as demonstrated in the notebook 07_cordex_returnLevelsPrecip_plots.ipynb.

In [4]:
# Directory containing the anomaly files
anomalies_dir = "/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies"
output_dir = "/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/averaged_ensembles"

# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)

In [5]:
# Function to parse filenames and extract key components
def parse_filename(filename):
    pattern = r"pr_EUR-11_(.+?)_(rcp26|rcp45|rcp85)_diff_(.+?)_v\d+_([\d-]+).nc"
    match = re.match(pattern, filename)
    if match:
        model = match.group(1)
        scenario = match.group(2)
        rcm = match.group(3)
        time_period = match.group(4)
        return model, scenario, rcm, time_period
    return None


# Group files by scenario and time period
files = [f for f in os.listdir(anomalies_dir) if f.endswith(".nc")]
grouped_files = {}

for f in files:
    parsed = parse_filename(f)
    if parsed:
        _, scenario, _, time_period = parsed
        key = (scenario, time_period)
        if key not in grouped_files:
            grouped_files[key] = []
        grouped_files[key].append(os.path.join(anomalies_dir, f))


In [None]:
# Function to average files for all return level variables and save the ensemble
def average_ensemble(files, output_file):
    print(f"Processing files: {files}")
    print("-------------------------")

    # Open all datasets
    datasets = [xr.open_dataset(f) for f in files]

    # Get all return level variables (e.g., "return_period_10_y", "return_period_20_y", etc.)
    return_level_vars = [var for var in datasets[0].data_vars if var.startswith("return_period")]

    # Create an empty dataset for the ensemble mean
    ensemble_mean_ds = xr.Dataset()

    # Average each return level variable across all datasets
    for var in return_level_vars:
        print(f"Averaging variable: {var}")
        var_data = [ds[var] for ds in datasets]  # Extract the variable from all datasets
        var_mean = xr.concat(var_data, dim='model').mean(dim='model')  # Compute the mean across models
        ensemble_mean_ds[var] = var_mean  # Add the averaged variable to the output dataset

    # Assign coordinates from the first dataset
    first_ds = datasets[0]
    ensemble_mean_ds = ensemble_mean_ds.assign_coords({'lon': first_ds['lon'], 'lat': first_ds['lat']})

    # Add metadata
    ensemble_mean_ds.attrs['description'] = f"Ensemble mean of anomalies for scenario {scenario}, time period {time_period}"

    # Save the averaged dataset
    ensemble_mean_ds.to_netcdf(output_file)
    print(f"Averaged ensemble saved: {output_file}")

In [None]:
# Process each group
for key, file_list in grouped_files.items():
    scenario, time_period = key
    output_filename = f"pr_EUR-11_{scenario}_ensemble_{time_period}.nc"
    output_path = os.path.join(output_dir, output_filename)
    average_ensemble(file_list, output_path)

print("Ensemble averaging complete!")

### **Standard Deviation Calculation**

In [6]:
output_dir_std = "/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/std_ensembles"

# Ensure the output directory exists
os.makedirs(output_dir_std, exist_ok=True)

In [8]:
# Function to calculate std files for all return level variables and save the ensemble
def average_ensemble(files, output_file):
    print(f"Processing files: {files}")
    print("-------------------------")

    # Open all datasets
    datasets = [xr.open_dataset(f) for f in files]

    # Get all return level variables (e.g., "return_period_10_y", "return_period_20_y", etc.)
    return_level_vars = [var for var in datasets[0].data_vars if var.startswith("return_period")]

    # Create an empty dataset for the ensemble std
    ensemble_std_ds = xr.Dataset()

    # Average each return level variable across all datasets
    for var in return_level_vars:
        print(f"Std variable: {var}")
        var_data = [ds[var] for ds in datasets]  # Extract the variable from all datasets
        var_std = xr.concat(var_data, dim='model').std(dim='model')  # Compute the std across models
        ensemble_std_ds[var] = var_std  # Add the std variable to the output dataset

    # Assign coordinates from the first dataset
    first_ds = datasets[0]
    ensemble_std_ds = ensemble_std_ds.assign_coords({'lon': first_ds['lon'], 'lat': first_ds['lat']})

    # Add metadata
    ensemble_std_ds.attrs['description'] = f"Ensemble std of anomalies for scenario {scenario}, time period {time_period}"

    # Save the averaged dataset
    ensemble_std_ds.to_netcdf(output_file)
    print(f"Std ensemble saved: {output_file}")

# Process each group
for key, file_list in grouped_files.items():
    scenario, time_period = key
    output_filename = f"pr_EUR-11_{scenario}_ensemble_std_{time_period}.nc"
    output_path = os.path.join(output_dir, output_filename)
    average_ensemble(file_list, output_path)

print("Ensemble std complete!")

Processing files: ['/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_ICHEC-EC-EARTH_rcp26_diff_CLMcom-CCLM4-8-17_v1_2021-2050.nc', '/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_NCC-NorESM1-M_rcp26_diff_SMHI-RCA4_v1_2021-2050.nc', '/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_ICHEC-EC-EARTH_rcp26_diff_DMI-HIRHAM5_v2_2021-2050.nc', '/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_NCC-NorESM1-M_rcp26_diff_GERICS-REMO2015_v1_2021-2050.nc', '/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_ICHEC-EC-EARTH_rcp26_diff_KNMI-RACMO22E_v1_2021-2050.nc', '/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_CNRM-CERFACS-CNRM-CM5_rcp26_diff_CNRM-ALADIN63_v2_2021-2050.nc', '/work/cmcc/dg07124/climax/indicators/cordex2/returnLevels/anomalies/pr_EUR-11_MOHC-HadGEM2-ES_rcp26_diff_DMI-HIRHAM5_v2_2021-2050.nc', '/work/cmcc/dg0712

## Contributors
- Giuseppe Giugliano (giuseppe.giugliano@cmcc.it)
- Carmela de Vivo (carmela.devivo@cmcc.it)
- Daniela Quintero (daniela.quintero@cmcc.it)