# Notebook for adding Penman-Monteith PET to an extension
Use the code in this notebook to create a new version of your extension, matching the Caravan core dataset changes of version 1.5. No additional data download is required.

## Background
In version 1.5 of the Caravan dataset, Penman-Monteith PET was added as an additional time series feature. The band is computed from existing ERA5-Land forcings, hence no additional download of new data is required. Additional to the new time series feature, also all pet-related climate indices were recomputed using the new Penman-Monteith PET. For consistency, the old ERA5-Land potential_evaporation time series and climate indices were kept, but renamed for a better identification of the differences. 

## Usage
Make sure to have a local Python environment with recent versions of:

- Numpy
- Pandas
- Xarray
- TQDM

Execute this notebook from within the `code/` directory of the [Caravan repository](https://github.com/kratzert/Caravan) for the `pet` and `caravan_utils` import to work.

This notebook will first create a copy of your extension, then update the files in-place of the local copy. Make sure to specify the `extension_dir`, `extension_name` and `new_extension_dir` in the code block below the imports.

# Imports

In [1]:
import pathlib
import shutil

import numpy as np
import pandas as pd
import xarray as xr
import tqdm

import pet
import caravan_utils

# Global Settings

In [17]:
extension_dir = pathlib.Path("/home/chus-casado/Datos/CAMELS-ES/v102")
extension_name = "camelses"

new_extension_dir = pathlib.Path("/home/chus-casado/Datos/CAMELS-ES/v110") 

print(f"Creating copy of {extension_dir} at {new_extension_dir}")
if new_extension_dir.is_dir():
    print("The copy already exists")
    #raise FileExistsError(f"{new_extension_dir} already exists.")
else:
    shutil.copytree(extension_dir, new_extension_dir)
    print("Finished creating copy")

Creating copy of /home/chus-casado/Datos/CAMELS-ES/v102 at /home/chus-casado/Datos/CAMELS-ES/v110
The copy already exists


# Function definitions

In [11]:
def _add_pm_pet(df):
    df["potential_evaporation_sum_FAO_PENMAN_MONTEITH"] = pet.get_fao_pm_pet(
        surface_pressure_mean=df["surface_pressure_mean"],
        temperature_2m_mean=df["temperature_2m_mean"],
        dewpoint_temperature_2m_mean=df["dewpoint_temperature_2m_mean"],
        u_component_of_wind_10m_mean=df["u_component_of_wind_10m_mean"],
        v_component_of_wind_10m_mean=df["v_component_of_wind_10m_mean"],
        surface_net_solar_radiation_mean=df["surface_net_solar_radiation_mean"],
        surface_net_thermal_radiation_mean=df["surface_net_thermal_radiation_mean"],
    )
    df.rename(columns={'potential_evaporation_sum': 'potential_evaporation_sum_ERA5_LAND'}, inplace=True)
    return df.round(2).map('{:.2f}'.format).map(float)


def _create_new_xr_dataset(df, attrs):
    ds = xr.Dataset.from_dataframe(df).astype(np.float32)
    ds.attrs = attrs
    new_metadata = caravan_utils.get_metadata_info(ds)
    unit_info = ""
    for k in sorted(new_metadata.keys()):
        unit_info = unit_info + f"{k}: {new_metadata[k]}\n"
    ds.attrs["Units"] = unit_info

    return ds


def _save_timeseries_data(df, ds, old_nc_file):
    # Convert to a Path object if it isn't already one
    nc_path = pathlib.Path(old_nc_file)

    # Rebuild the path by replacing the folder named "netcdf" with "csv".
    new_parts = [("csv" if part == "netcdf" else part) for part in nc_path.parts]
    csv_path = pathlib.Path(*new_parts)
   
    # Change the extension from .nc to .csv
    csv_path = csv_path.with_suffix('.csv')
    
    df.to_csv(csv_path)
    ds.to_netcdf(old_nc_file)

# Add Penman-Monteith timeseries

In [12]:
nc_files = list((new_extension_dir / "timeseries" / "netcdf" / extension_name).glob("*.nc"))
if not nc_files:
    raise RuntimeError("No netCDF files found.")

In [13]:
climate_indices = {}
for nc_file in tqdm.tqdm(nc_files):
    # Load from netCDF file to have the attributes dict.
    ds = xr.load_dataset(nc_file)

    df = _add_pm_pet(ds.to_dataframe())
    df = df[sorted(df.columns)]
    ds_new = _create_new_xr_dataset(df, ds.attrs)

    _save_timeseries_data(df, ds_new, nc_file)

    climate_indices[nc_file.stem] = caravan_utils.calculate_climate_indices(df)

df_caravan = pd.DataFrame.from_dict(climate_indices, orient='index')
df_caravan = df_caravan.sort_index(axis='index').sort_index(axis='columns')
df_caravan.index.name = "gauge_id"
df_caravan.head()

100%|███████████████████████████████████████████████████████| 269/269 [02:39<00:00,  1.69it/s]


Unnamed: 0_level_0,aridity_ERA5_LAND,aridity_FAO_PM,frac_snow,high_prec_dur,high_prec_freq,low_prec_dur,low_prec_freq,moisture_index_ERA5_LAND,moisture_index_FAO_PM,p_mean,pet_mean_ERA5_LAND,pet_mean_FAO_PM,seasonality_ERA5_LAND,seasonality_FAO_PM
gauge_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
camelses_1080,1.837725,0.550231,0.0,1.208955,0.036959,3.163093,0.511498,-0.354531,0.409516,3.561555,6.545159,1.959678,0.884612,1.199666
camelses_1103,2.498961,0.613082,0.0,1.209169,0.038511,3.284091,0.527469,-0.537191,0.340782,3.267835,8.166194,2.003452,0.639122,1.267111
camelses_1105,1.766062,0.495438,0.0,1.196319,0.03559,3.162818,0.499909,-0.370075,0.46705,4.053904,7.159447,2.008458,0.752065,1.05144
camelses_1106,1.28037,0.501086,0.0,1.213855,0.036777,3.258913,0.508852,-0.13595,0.467408,3.879794,4.967573,1.944111,0.978018,1.07472
camelses_1109,2.973065,0.5422,0.0,1.215569,0.037051,3.137465,0.508213,-0.626863,0.409344,3.696472,10.989852,2.004226,0.464819,1.15699


# Save new climate indices

In [18]:
df_caravan.to_csv(new_extension_dir / "attributes" / extension_name / f"attributes_caravan_{extension_name}.csv")