# Validation of Remotely Sensed Droughts
**Comparing Spatial Trends**


## Overview

In this notebook we continue with drought indicators, as introduced in notebook1. The h SAF ASCAT drought indicator is based on soil moisture (temporal) anomalies, which require a long-term mean for a robust measure. The anomalies are than calculated as standardized deviations from the long-term mean.


AS explained in the first notebook, ASCAT SSM comes in units degree of saturation. Although spatial patterns,... , relative changes as used for drought anomaly detection

## Imports

In [None]:
import cartopy.crs as ccrs
import datashader as ds
import holoviews as hv
import hvplot.pandas  # noqa
import numpy as np
import pandas as pd

## Standardized Precipitation-Evapotranspiration Index

SPEI (Standardized Precipitation-Evapotranspiration Index) 

Definition: 

    SPEI is a more comprehensive index that takes into account both precipitation and potential evapotranspiration (PET), which is a measure of the atmospheric demand for water.
     

Calculation: 

    SPEI is derived from the difference between precipitation and PET, accounting for both water supply (precipitation) and demand (evapotranspiration).
    SPEI values are standardized to facilitate comparison across different regions and timescales.
     

Drought Classification: 

    Similar to SPI, SPEI values range from strongly negative (indicating dry conditions) to strongly positive (indicating wet conditions).
    Drought categories are typically defined similarly to SPI:
        Extremely Dry: SPEI ≤ -2.0
        Severely Dry: -2.0 < SPEI ≤ -1.5
        Moderately Dry: -1.5 < SPEI ≤ -1.0
        Near Normal: -1.0 < SPEI ≤ 1.0
        Moderately Wet: 1.0 < SPEI ≤ 1.5
        Very Wet: 1.5 < SPEI ≤ 2.0
        Extremely Wet: SPEI > 2.0
         
     

Comparison 

Strengths of SPI: 

    Simplicity: SPI is easier to calculate because it only requires precipitation data.
    Wide Acceptance: SPI is one of the most widely used and accepted drought indices globally.
     

Strengths of SPEI: 

    Comprehensiveness: SPEI incorporates both precipitation and evapotranspiration, providing a more complete picture of water availability.
    Climate Change: SPEI is better suited for capturing the effects of rising temperatures on drought conditions, as increasing temperatures lead to higher evapotranspiration and thus exacerbate drought conditions even if precipitation remains constant.
     

Differences in Drought Classification: 

    Sensitivity to Temperature: Because SPEI includes PET, it is more sensitive to changes in temperature. In regions experiencing warming trends, SPEI may indicate more severe or frequent droughts compared to SPI.
    Seasonal Variations: SPEI may show stronger seasonal variations due to differences in evapotranspiration rates across seasons.
    Region-Specific Differences: In regions with significant temperature variations (e.g., arid regions), SPEI may provide a more accurate assessment of drought conditions compared to SPI.
     

In [None]:
%run ./src/download_path.py

url = make_url("spei-6_25_monthly.csv")  # noqa
df1 = pd.read_csv(
    url,
    index_col=["time", "location_id"],
    parse_dates=["time"],
)
df1

Now let's also load our own data.

In [None]:
url = make_url("ascat-6_25_ssm_monthly.csv")  # noqa
df2 = pd.read_csv(
    url,
    index_col=["time", "location_id"],
    parse_dates=["time"],
)[["zscore"]]
df2 = df2[df2.index <= df1.index.max()]
df2

## 

merge() performs join operations similar to relational databases like SQL. Users who are familiar with SQL but new to pandas can reference a 

We use here the default operation which is "left join". THis is a type of join used to combine rows from two tables based on a related column between them. It returns all rows from the left table and includes matched rows from the right table. If there is no match, the result is `np.nan` for columns from the right table. Since we assigned indexes, where time and location_id define a unique observation, the join operations is based on this `pandas.MultIndex`. In other words, the left join ensures that all rows from the left table are included in the result set, even if there are no corresponding rows in the right table.

In [None]:
df_wide = df1.join(df2)
df_wide

## Simplifying Drought Severity with Data Binning

We will now turn the numeric data of the drought indicators; `"spei"` and `"zscore"` to discrete categories by using pandas `cut` method. In pandas, binning data (also known as discretization or quantization) is a technique where continuous numerical data is divided into discrete bins or intervals. This process can be useful for various purposes such as simplifying data, handling outliers, creating histograms, and preparing data for machine learning algorithms that require categorical input. We also provide labels for the binned data turning the columns into pandas categorical data types. Pandas categorical data types are designed to represent data that takes on a limited and usually fixed number of possible values (categories). This type is often used for categorical variables, such as gender, days of the week, or survey responses. They provide efficient storage and operations for categorical data, with the ability to handle category ordering and missing values.

The act of binning and labelling anomaly data according to drought intensity is relative subjective exercise, where the threshold of the bins are subject of discussion and arbitrarily assigned. We follow here the recommendations by World Meteorological Organization and the definitions of McKee et al. 1993^1 for standardized SM based drought indices, where a "moderate" drought starts at 1 unit of standard deviations.

In [None]:
drought_labels = np.array(["Extreme", "Severe", "moderate", "mild", "normal"])
zscore_thresholds = [df_wide["zscore"].min(), -2, -1.5, -1, 0, df_wide["zscore"].max()]
spei_thresholds = [df_wide["spei"].min(), -2, -1.5, -1, 0, df_wide["spei"].max()]

Now we can use the labels and thresholds to bind the columns of thew drouhgt indicators. We make a copy of the original data to preserve.

In [None]:
df_wide_cat = df_wide.copy()
df_wide_cat["zscore"] = pd.cut(df_wide.zscore, zscore_thresholds, labels=drought_labels)
df_wide_cat["spei"] = pd.cut(df_wide.spei, spei_thresholds, labels=drought_labels)

The simplified labelled drought indicators will now enables us a first step to assessing the spatial/areal extent.

To check on our results we will recreate our plot from notebook 1 but now with categorical data.

In [None]:
df_long = df_wide_cat.melt(id_vars=["latitude", "longitude"], ignore_index=False)
df_long

In [None]:
df_long.hvplot.points(
    x="longitude",
    y="latitude",
    groupby=["variable", "time"],
    x_sampling=0.1,
    y_sampling=0.1,
    rasterize=True,
    aggregator=ds.count_cat("value"),
    datashade=True,
    crs=ccrs.PlateCarree(),
    tiles=True,
    frame_width=500,
    clabel="Drought anomaly",
    cmap={
        "Extreme": "#bb0c0c",
        "Severe": "#c57b19",
        "moderate": "#b1bb29",
        "mild": "#1cd87a",
        "normal": "#ffffff",
    },
)

## Spatial Extent

Letls npow turn to calculating the spatial trend. For this we can conviently use the pandas value_count on the two categorical columns of spei and ssm zscore.

In [None]:
col_spei = df_wide_cat.groupby(level=0)["spei"].value_counts(normalize=True).unstack()

In [None]:
col_zscore = (
    df_wide_cat.groupby(level=0)["zscore"].value_counts(normalize=True).unstack()
)

We combine these results

In [None]:
new_keys = pd.Index(["spei", "zscore"], name="indicator")
df_drought_extend = pd.concat(
    [col_spei, col_zscore],
    keys=new_keys,
)
df_drought_extend

In [None]:
mozambique_droughts = [
    {"time": "2007-01-01", "people_affected": 0.52},
    {"time": "2008-01-01", "people_affected": 0.5},
    {"time": "2010-01-01", "people_affected": 0.46},
    {"time": "2016-01-01", "people_affected": 2.30},
    {"time": "2020-01-01", "people_affected": 2.7},
    {"time": "2021-01-01", "people_affected": 1.56},
]

df_droughts = pd.DataFrame(mozambique_droughts).assign(y=1)
df_droughts["time"] = pd.to_datetime(df_droughts["time"], format="%Y-%M-%d")
df_droughts.set_index("time", inplace=True)
labels = df_droughts.hvplot.labels(
    x="time",
    y="y",
    text="{people_affected} mill. people",
    text_baseline="bottom_left",
    hover=False,
    angle=85,
    text_font_size="14px",
)
offset = hv.dim("y") - 0.1
points = df_droughts.hvplot.points(
    x="time", y="y", color="black", hover=False, transforms={"y": offset}
)
df_drought_extend.hvplot.area(
    x="time",
    y=drought_labels[::-1][2:],
    groupby="indicator",
    hover=False,
    frame_width=800,
    padding=((0.1, 0.1), (0, 0.9)),
) * labels * points

In [None]:
url = make_url("drought_indices-6_25_monthly.csv")  # noqa
df_drought_indices = pd.read_csv(
    url,
    index_col=["time", "location_id"],
    parse_dates=["time"],
)
df_drought_indices

In [None]:
def calc_drought_areal_extend(df):
    # make drought categories
    col_names = df.drop(columns=["longitude", "latitude"]).columns
    for name in col_names:
        min_border = df[name].min()
        max_border = df[name].max()
        thresholds = np.array(
            [
                min_border if min_border < -2 else -2.1,
                -2,
                -1.5,
                -1,
                0,
                max_border if max_border > 0 else 0.1,
            ]
        )
        df[name] = pd.cut(df[name], thresholds, labels=drought_labels)

    # calculate relative extend of drought
    new_df = pd.concat(
        [
            df.groupby(level=0)[col].value_counts(normalize=True).unstack()
            for col in col_names
        ],
        keys=pd.Index(col_names, name="indicator"),
    )
    return new_df


df_drought_extend = calc_drought_areal_extend(df_drought_indices.copy())
df_drought_extend

In [None]:
df_drought_extend.hvplot.area(
    x="time",
    y=drought_labels[::-1][2:],
    groupby="indicator",
    hover=False,
    frame_width=800,
    padding=((0.1, 0.1), (0, 0.9)),
) * labels * points

In [None]:
df_long.hvplot.points(
    x="longitude",
    y="latitude",
    groupby=["variable", "time"],
    x_sampling=0.1,
    y_sampling=0.1,
    rasterize=True,
    aggregator=ds.count_cat("value"),
    datashade=True,
    crs=ccrs.PlateCarree(),
    tiles=True,
    frame_width=500,
    clabel="Drought anomaly",
    cmap={
        "Extreme": "#bb0c0c",
        "Severe": "#c57b19",
        "moderate": "#b1bb29",
        "mild": "#1cd87a",
        "normal": "#ffffff",
    },
)

In [None]:
df_confusion = pd.crosstab(df_wide_cat["spei"], df_wide_cat["zscore"], dropna=False)
df_confusion

In [None]:
tot_drought = df_confusion.loc["drought", :].sum()
sensitivity = df_confusion.loc["drought", "drought"] / tot_drought
sensitivity

In [None]:
tot_no_drought = df_confusion.loc["no-drought", :].sum()
specificity = df_confusion.loc["no-drought", "no-drought"] / tot_no_drought
specificity

In [None]:
balanced_accuracy = (sensitivity + specificity) / 2
balanced_accuracy