# Drought Metrics

This notebook calculates two drought metrics, the Palmer Drought Severity Index (PDSI) and the Evaporative Demand Drought Index (EDDI), using WRF data in the AE catalog. Both PDSI and EDDI require Potentinal Evaportranspiration (PET). PET is computed first using the Penman-Monteith method. At the end of the notebook, the user will be able to export monthly PDSI and EDDI for under different global warming levels for a specific lat/lon as netcdf files to be used for further analyses.   

**Intended Application:** As a user, I want to <font color="red"> export future drought indicies for different global warming levels </font> by:
1. Calculating PET using the Penman-Montieth Method
2. Calculating the PDSI using PET and exporting the monthly timeseries to a netcdf
3. Calculating the EDDI using PET and exporting the monthly timeseries to a netcdf

**Runtime:** With the default settings, this notebook takes approximately 25 minutes to run from start to finish. Modifications to selections may increase the runtime.

**Troubleshooting:** Getting an `IndexError: index 40 is out of bounds for axis 0 with size 40` when trying to run the PDSI calculation? Try changing your lat/lon or point of interest further away from the boundaries of our 3 km WRF domain (as seen in [<span style="color:blue">this graphic</span>](https://analytics.cal-adapt.org/faq/#what-data-is-available)).

### Using the Penman-Monteith method (most physically accurate) to calculate Potential Evapotranspiration

**Variables needed:**
- `tasmin`
- `tasmax`
- `relative humidity`
- `radiation flux`
    - rsds
    - rsus
    - rlds
    - rlus
- `wind speed (10m wind will be converted to 2m)`

### Imports

In [3]:
import xclim
import os
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot  as plt
from pyproj import CRS
from climakitae.core.data_interface import get_data
from climakitae.core.data_load import load
from climakitae.core.data_export import export
from climakitae.util.utils import add_dummy_time_to_wl
from climakitae.util.utils import reproject_data

### Initial Setup

In [4]:
lat = 37.787964
lon = -122.065063

variables_dict = {
    "tasmax": "Maximum air temperature at 2m",
    "tasmin": "Minimum air temperature at 2m",
    "hurs": "Relative humidity",
    "rsds": "Instantaneous downwelling shortwave flux at bottom",
    "rsus": "Instantaneous upwelling shortwave flux at bottom",
    "rlds": "Instantaneous downwelling longwave flux at bottom",
    "rlus": "Instantaneous upwelling longwave flux at bottom",
    "wspd10mean": "Mean wind speed at 10m",
    "precip": "Precipitation (total)",
}

In [None]:
# Making a `intermediate_data` folder to hold intermediate data variables needed for PET
!mkdir $input_data

**IMPORTANT NOTE**: The following cell saves out the intermediate data that is required for PET into an `input_data` directory. If you change the lat/lon coordinate from above and DON'T delete or move the data already inside the `input_data` directory, it will just re-read the same data files.

In [None]:
### Retrieving the different variables needed for PET
datas = []

for _, (variable, var_long_name) in enumerate(variables_dict.items()):

    file_path = f"tmp_data/{variable}_daily.nc"

    if os.path.exists(file_path):
        print(f"Reading {variable} from file.")
        da = xr.open_dataarray(file_path)
        
    else:
        # continue
        print(f"Computing {var_long_name}")
        ae_var_name = var_long_name
        timescale = 'daily'
        # if variable == 'tasmin':
        #     ae_var_name = 'Air Temperature at 2m'
        if variable == 'rlus' or variable == 'rsus':
            timescale = 'hourly'
        da = get_data(
            variable=ae_var_name,
            resolution='3 km',
            timescale=timescale,
            latitude=(lat - 0.1, lat + 0.1),
            longitude=(lon - 0.1, lon + 0.1),
            approach="Warming Level",
            warming_level=[0.8, 1.5, 2.0, 3.0],
            # scenario='SSP 3-7.0',
            # time_slice=(2030, 2060),
            downscaling_method="Dynamical"
        )
        da = load(add_dummy_time_to_wl(da), progress_bar=True)
        if variable == 'tasmin':
            agg_da = da.squeeze().resample(time='D').min()
        elif variable == 'tasmax':
            agg_da = da.squeeze().resample(time='D').max()
        elif variable == 'precip':
            agg_da = da.squeeze().resample(time='D').sum()
        else:
            agg_da = da.squeeze().resample(time='D').mean()
        agg_da.to_netcdf(file_path)  # Save for reuse
        da = agg_da

    datas.append(da)

In [7]:
# Creating daily variables for all hourly variables
tasmin = datas[0]
tasmax = datas[1]
hurs = datas[2] / 100 # Convert from % to fraction
new_hurs = hurs.assign_attrs(units='1')
rsds = datas[3]
rsus = datas[4]
rlds = datas[5]
rlus = datas[6]
sfcWind = datas[7]
precip = datas[8]

In [8]:
# Calculating PET from xclim
pet_calc = xclim.indices.potential_evapotranspiration(
    tasmin=tasmin,
    tasmax=tasmax,
    hurs=new_hurs,
    rsds=rsds,
    rsus=rsus,
    rlds=rlds,
    rlus=rlus,
    sfcWind=sfcWind,
    method="FAO_PM98"
)

In [9]:
# Preserving CRS and a spatial mask for later
crs = CRS.from_cf(pet_calc['Lambert_Conformal'].attrs)
spatial_mask = ~pet_calc.isel(warming_level=0, time=0, simulation=0).isnull()

# PDSI

Here, we will use the PDSI function from the `climate_indices` library, which is also what drought.gov has referenced. However, we will install a specific commit of the package that is compatible with the AE hub environment.

In [11]:
# Making imports from `climate_indices` package
import climate_indices
from climate_indices.palmer import pdsi

In [12]:
# Resampling PET and precip to monthly since the function only takes monthly variables
mon_pet = (pet_calc * 86400 / 25.4).resample(time='1ME').sum()
mon_precip = (precip / 25.4).resample(time='1ME').sum()

In [13]:
### Combining WL objects together, historical WL as 2000-2030, future WL as 2030-2060
def combine_wl_to_dummy_time(
    da: xr.DataArray,
    baseline_wl: float,
    future_wls: list[float],
    start_date: str = "2000-01-31",
) -> xr.DataArray:
    """
    Combine baseline warming level with multiple future warming levels into one
    DataArray along a new 'combined_wl' dimension.

    Parameters
    ----------
    da : xr.DataArray
        Original data with dims including 'warming_level' and 'time'.
    baseline_wl : float
        The warming level used for the first time segment.
    future_wls : list of float
        Warming levels to concatenate after baseline.
    start_date : str
        Start date for the combined time series (monthly freq).
    
    Returns
    -------
    xr.DataArray
        Combined DataArray with new dimension 'combined_wl' and coordinate labels like "0.8 to 1.5".
    """
    months_per_wl = da.sizes['time']
    total_months = 2 * months_per_wl
    new_time = pd.date_range(start_date, periods=total_months, freq='ME')

    combined_list = []
    combined_labels = []

    for fw in future_wls:
        da_base = da.sel(warming_level=baseline_wl)
        da_future = da.sel(warming_level=fw)

        combined = xr.concat([da_base, da_future], dim='time')
        combined = combined.assign_coords(time=new_time)

        wl_flag = np.array([baseline_wl] * months_per_wl + [fw] * months_per_wl)
        combined = combined.assign_coords(warming_level_flag=('time', wl_flag))

        combined_list.append(combined)
        combined_labels.append(f"{int(baseline_wl * 10):02d}_to_{int(fw * 10):02d}")

    combined_da = xr.concat(combined_list, dim='combined_wl')
    combined_da = combined_da.assign_coords(combined_wl=combined_labels)

    return combined_da

In [14]:
# Creating one Dataset of PET and precip with WLs combined
mon_pet_transform = combine_wl_to_dummy_time(mon_pet, baseline_wl=0.8, future_wls=[1.5,2.0,3.0])
mon_precip_transform = combine_wl_to_dummy_time(mon_precip, baseline_wl=0.8, future_wls=[1.5,2.0,3.0])

combined_ds = xr.Dataset({'precip': mon_precip_transform, 'pet': mon_pet_transform})

# Applying spatial mask
combined_ds = combined_ds.where(spatial_mask)

In [15]:
# Helper function to vectorize PDSI calculation across `combined_ds` dimensions.
def calc_pdsi(timeseries: xr.Dataset):
    """
    Compute the Palmer Drought Severity Index (PDSI) from a Dataset.

    Parameters
    ----------
    timeseries : xarray.Dataset
        Dataset containing 'precip' and 'pet' variables with a time dimension.

    Returns
    -------
    xarray.DataArray
        PDSI values along the time dimension.
    """
    # Extracting precip and PET by each timeseries and calculating PDSI
    precip = timeseries['precip'].squeeze()
    pet = timeseries['pet'].squeeze()
    
    pdsi_calc = pdsi(
        precips=precip.values,
        pet=pet.values,
        awc=5,
        data_start_year=2000,
        calibration_year_initial=2000,
        calibration_year_final=2030,
    )
    retval = xr.DataArray(pdsi_calc[0], coords={"time": precip.time.values}, dims=['time'])
    
    # Clipping PDSI to realistic values
    return retval.clip(min=-10, max=10)

In [17]:
# Applies the PDSI function across all dimensions so that a timeseries of PET/precip is always being passed into `pdsi`
pdsi_da = combined_ds.groupby([
    'combined_wl',
    'x',
    'y',
    'simulation'
]).apply(
    lambda timeseries: calc_pdsi(timeseries)
)

# Writing crs and reprojecting PDSI to lat/lon
pdsi_da = pdsi_da.rio.write_crs(crs.to_wkt())
pdsi_da = pdsi_da.transpose('time', 'combined_wl', 'simulation', 'y', 'x')
pdsi_latlon = reproject_data(pdsi_da, 'EPSG:4326')
del pdsi_latlon.attrs["_FillValue"]

### Exporting the results

The results will have the following dimensions:
- time
- wl (showing which WL PDSI was calibrated on, and then which WL PDSI was calculated on)
- lat
- lon
- simulation

In [18]:
# Cleaning and labeling the data before exporting it in the next cell
final_pdsi = pdsi_latlon.isel(time=slice(360, 720))
final_pdsi = final_pdsi.rename({'combined_wl': 'wl'}).rename("pdsi")
final_pdsi = final_pdsi.assign_attrs({
    "long_name": "Palmer Drought Severity Index",
    "units": "from -10 (dry) to +10 (wet)",
})
pdsi_filename = f"pdsi_wl_lat{str(lat).replace('.', '_')}_lon{str(lon).replace('.', '_')}.nc"

In [19]:
# Exporting the DataArray
if os.path.exists(pdsi_filename):
    raise Exception(
        (
            f"File {pdsi_filename} exists. "
            "Please either delete that file from the work space "
            "or specify a new file name here."
        )
    )
else:
    final_pdsi.to_netcdf(pdsi_filename, encoding={"pdsi": {"_FillValue": -9999.0}})

## EDDI

Now, we will calculate EDDI using PET.

In [20]:
# Import `standardized_index` from xclim, which we will apply to our PET data object to generate EDDI
from xclim.indices.stats import standardized_index

In [21]:
def calc_eddi(timeseries: xr.DataArray):
    """
    Compute the Evaporative Demand Drought Index (EDDI) for a time series.

    Parameters
    ----------
    timeseries : xarray.DataArray
        1D time series of ET₀. NaNs are skipped.

    Returns
    -------
    xarray.DataArray
        EDDI values. Positive = dry, negative = wet.
    """
    eddi = standardized_index(
        da=timeseries,
        freq='MS',
        window=1,
        dist="gamma",
        method="ML",
        zero_inflated=False,
        fitkwargs={},
        cal_start="2000-01-31",
        cal_end="2029-12-31"
    )
    # Clipping EDDI to realistic values
    retval = eddi.clip(min=-2.5, max=2.5)
    return retval

In [22]:
# Applies the `calc_eddi` function across all dimensions so that a timeseries of PET is always being passed into `calc_eddi`
eddi_da = combined_ds['pet'].groupby([
    'combined_wl',
    'x',
    'y',
    'simulation'
]).apply(
    lambda timeseries: calc_eddi(timeseries)
)

# Writing crs and reprojecting EDDI to lat/lon
eddi_da = eddi_da.rio.write_crs(crs.to_wkt())
eddi_da = eddi_da.transpose('time', 'combined_wl', 'simulation', 'y', 'x')
eddi_da_latlon = reproject_data(eddi_da, 'EPSG:4326')
del eddi_da_latlon.attrs["_FillValue"]

### Exporting the results

The results will have the following dimensions:
- time
- wl (showing which WL EDDI was calibrated on, and then which WL EDDI was calculated on)
- lat
- lon
- simulation

In [23]:
# Saving these results and cleaning the data
final_eddi = eddi_da_latlon.isel(time=slice(360, 720))
final_eddi = final_eddi.rename({'combined_wl': 'wl'}).rename("eddi")
final_eddi = final_eddi.assign_attrs({
    "long_name": "Evaporative Demand Drought Index",
    "units": "from -2.5 (wet) to +2.5 (dry)",
})
eddi_filename = f"eddi_wl_lat{str(lat).replace('.', '_')}_lon{str(lon).replace('.', '_')}.nc"

In [24]:
# Exporting the DataArray
if os.path.exists(eddi_filename):
    raise Exception(
        (
            f"File {eddi_filename} exists. "
            "Please either delete that file from the work space "
            "or specify a new file name here."
        )
    )
else:
    final_eddi.to_netcdf(eddi_filename, encoding={"eddi": {"_FillValue": -9999.0}})