# M2.3 - Climate and Drought Indices

**Contents:**

- [Studying drought with climate data](#Studying-drought-with-climate-data)
  - [Quantifying drought](#Quantifying-drought)
- [Organizing our project files](#Organizing-our-project-files)
- [Reading a climate time series](#Reading-a-climate-time-series)
- [Computing seasonal and rolling averages](#Computing-seasonal-and-rolling-averages)
  - [The Split-Apply-Combine workflow](#The-Split-Apply-Combine-workflow)
  - [Rolling averages](#Rolling-averages)
- [A simple bucket model](#A-simple-bucket-model)

## Studying drought with climate data

We may think we have an intuitive understanding of drought, but drought takes many forms (Wilhite and Glantz 1985).

- **Meteorological drought** is a period in which precipitation (rainfall) is smaller than some expected amount. The amount of rainfall that was expected will vary between different places but also depends on the time of year (Palmer 1965).

- **Agricultural drought** occurs when plant water demand, particularly crop water demand, exceeds water supply, whatever the source.

- **Hydrologic drought** describes the effects of dry conditions on surface or sub-surface hydrology; i.e., it can be used to describe low streamflow or low reservoir conditions. Because of the potential time lag between a moisture deficit and a change in hydrology, hydrologic drought is often out of phase with other kinds of drought.

- **Socio-economic drought** is defined in terms of the socio-economic effects of dry conditions: a change in crop prices or in animal feed or forage; or the loss of farm or fishery livelihoods.

To this list, we might add a kind of drought that has been recognized more recently as the technology for monitoring soil moisture has improved: a **soil-moisture drought,** or deficit of soil moisture in particular. We previously introduced **flash drought,** a kind of soil-moisture drought characterized by its quick onset and rapid decrease in soil moisture.

### Quantifying drought

**There are several approaches to quantifying the impacts of drought from climate data.** **Drought indices,** such as the Palmer Drought Severity Index (PDSI, Palmer 1965) and the Standardized Precipitation-Evapotranspiration Index (SPEI, Vicente-Serrano et al. 2010), are commonly used, as they provide a dimensionless measure of the severity of drought that is easy to interpret. **Percentiles or ranking** of hydrological conditions can also be used; for example, it is common to describe snowpack conditions (and "snowpack drought") in terms of the percentage of the median historical snowpack depth, on a given date.

A hydrological or water-balance approach can also be used, though it requires good data on the components of a **basin-scale water budget.** We'll talk more about water budgets later. For now, we can imagine a simple "bucket model:" water enters the environment as **precipitation** and leaves the environment as **evapotranspiration (ET)** (the sum of evaporation from wet surfaces and transpiration from plants). Mathematically, we might represent the bucket model as:
$$
\text{Available water} = \text{Precipitation} - \text{ET}
$$

We'll use **potential evapotranspiration (PET)** as our measure of ET, as it represents the amount of water that would be evaporated (and transpired) given the amount of energy (primarily heat and solar radiation) that is available to vaporize water. One way to define drought, consistent with the Meteorological, Agricultural, and Hydrologic drought definitions, is as a period of time during which precipitation is much less than the amount of water that could be lost as ET.

---

## Organizing our project files

Today we'll be working with a dataset called [**Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS).**](https://www.chc.ucsb.edu/data/chirps) CHIRPS combines remote sensing data with global weather station datasets to produce a global gridded record of precipitation. CHIRPS is not produced by NASA, but it is one of the better global records of precipitation available.

CHIRPS data can be downloaded from a variety of sources, but there isn't an interface like EarthData Search's cloud access. Instead, individual data files can be [downloaded manually from a server](https://data.chc.ucsb.edu/products/CHIRPS-2.0/) and [this README](https://data.chc.ucsb.edu/products/CHIRPS-2.0/README-CHIRPS.txt) explains how to download the files.

#### &#x1F6A9; <span style="color:red">Pay Attention</red>

**Because we're still learning, instead of downloading all those files manually, we'll use a prepared dataset:** 

&#x1F449; **Click to download:** [`CHIRPS-v2_Africa_monthly_2014-2024.nc`](http://files.ntsg.umt.edu/data/ScienceCore/CHIRPS-v2_Africa_monthly_2014-2024.nc)

This dataset was produced by merging together [the individual, monthly CHIRPS files for Africa](https://data.chc.ucsb.edu/products/CHIRPS-2.0/africa_monthly/tifs/) from 2014 through 2023. You can view [the script that was used to merge the files at this link.](https://github.com/OpenClimateScience/M2-Computational-Climate-Science/blob/main/scripts/20240611_process_CHIRPS_monthly_into_stack.py) This 10-year record is shorter than we would typically like to use to infer climate variability, but we're trying to keep the dataset small.

#### &#x1F3AF; Best Practice

**We're starting a new analysis. Let's take a moment to organize our project's file system.** Take a look at the example file tree, below, and use it as inspiration to organize your file system.

![](assets/M2_file_tree_CHIRPS.png)

**As we start to gain more sophistication with writing scientific Python code, one of our goals should be to write code that also serves as documentation of our workflow.** A key challenge for open, reproducible science is linking scientific results to the computer code they were created from. 

One way to link scientific results to our Python script(s) is to use a **consistent naming scheme with a unique identifier that groups files together.** One approach to this is to use the current date, in `YYYYMMDD` (Year, Month, Day) format. This 8-digit number will always be unique, because every day is a new day. If you use today's 8-digit date in the filename of your Python script and any output file(s) it generates, you'll have a way of associating those files together.

---

## Reading a climate time series

**To gain more experience using `xarray` to analyze climate datasets, we'll study a 2024 drought in Tiaret, Algeria.** [The Tiaret months-long drought led to violent riots in the region.](https://apnews.com/article/algeria-drought-rain-tebboune-tiaret-riots-09ce23f4ba235aaf1e3afecc7bfe3574) Can we determine how extensive and severe the Tiaret drought is using CHIRPS data?

As before, we'll use `xr.open_mfdataset()` ("open multiple-file dataset") to open this dataset. Even though there is only one file, `xr.open_mfdataset()` allows us to access some useful features in the `dask` library.

For instance, note that the output associated with the `"precip"` DataArray includes information about the total size of the array (1.12 GiB or gigabytes).

In [None]:
import xarray as xr
import numpy as np

ds = xr.open_mfdataset('data_raw/CHIRPS/CHIRPS-v2_Africa_monthly_2014-2024.nc')
ds['precip']

Those 1.12 gigabytes haven't been allocated in memory yet; rather, the variable `ds` points to a *representation* of the dataset that is stored on the hard disk. This is another example of **lazy evaluation,** which sounds bad but is actually a good thing: `xarray` won't read the data into memory until we're actually ready to perform some kind of computation. And because we used `xr.open_mfdataset()`, `xarray` and `dask` will make sure that the loaded data size doesn't exceed our computer's available memory; if the entire dataset is larger than our computer's memory, it will load smaller pieces of it, processing one or more pieces at a time.

**The next thing we should do is replace the NoData values in our dataset.** The `xr.where()` function works just like `np.where()`; we provide a conditional expression on an `xarray.DataArray` as the first argument, then the value to return if it is True, and then the value to return if it is False.

In [None]:
ds['precip'] = xr.where(ds['precip'] == -9999, np.nan, ds['precip'])

We're now ready to plot some data. We can select an arbitrary date using the `sel()` method, specifying we want to subset the `time` dimension. Any dimension with `datetime64[ns]` coordinates can be subset using a timestamp formatted according to [the ISO 8601 standard](https://en.wikipedia.org/wiki/ISO_8601), e.g., `"2023-07-01"`.

In [None]:
ds['precip'].sel(time = '2023-07-01').plot()

#### &#x1F3C1; Challenge: Calculate mean annual precipitation

Our dataset has monthly precipitation over a ten-year period. Make a plot of mean annual precipitation. **You should do this in two steps:**

- Use the `resample()` method, followed by `sum()` to calculate the *total precipitation* in each year. [Remember that you can consult this table to find what resampling frequencies are available.](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects)
- Then, calculate the mean annual precipitation.

Expand the cell below to see my solution.

In [None]:
# First, add up monthly precipitation (over 12 months) in each year
annual_precip = ds['precip'].resample(time = 'YS').sum()
# Then, calculate the average (mean) annual precipitation
mean_annual_precip = annual_precip.mean('time')
mean_annual_precip.plot()

---

## Computing seasonal and rolling averages

In addition to resampling our data to annual intervals, we could resample the data so that it reflects seasonal averages. To do so, we use the `"QS"` (Start of Quarter) frequency. Remember that `resample()` needs to be followed by some kind of aggregation function; we use `sum()` because we want to add up the monthly precipitation in each season.

In [None]:
seasonal_precip = ds['precip'].resample(time = 'QS').sum()
seasonal_precip

&#x1F449; **Note that quarters are represented by the starting month;** that is, Quarter 1 (Q1) begins in January, Quarter 2 (Q2) begins in April, and so on.

In [None]:
seasonal_precip.coords['time']

### The Split-Apply-Combine workflow

We just used the `resample()` function, above, to aggregate our monthly data to seasonal (or quarterly) data. Another way of aggregating our data would be to compute the overall seasonal mean; for example, what is the mean seasonal precipitation in Q1 (January through April)?

To answer this question, we can take our seasonal data and *group the data by season, then calculate the mean value in each season.* Recall that an `xarray.DataArray` has a `groupby()` method for this purpose.

#### &#x1F6A9; <span style="color:red">Pay Attention</red>

**Below, we group by `"time.month"` even though we don't have monthly data; we have seasonal data (see the previous step).** The `time` coordinates of our seasonal data still have a `month` component and there are only four (4) unique months in our seasonal data: January, April, July, and October, representing the start of each Quarter (or season). Therefore, when we group the data by month, we are actually grouping the data by season.

In [None]:
mean_seasonal_precip = seasonal_precip.groupby('time.month').mean()
mean_seasonal_precip

**When we use `groupby()` and an aggregation function like `mean()`, we are using the Split-Apply-Combine workflow:** 

1. We *split the data* into different groups. In this example, the groups are the unique months (seasons).
2. We then *apply* a function to each group. In this example, we applied the `mean()` function to each group.
3. Finally, we *combine* the result, for each group, together. This last step is done automatically and results in a single data cube for all the groups. In this example, we end up with a data cube that has a new `month` dimension with four (4) elements, one for each (monthly or seasonal) group.

**We can now plot the the mean seasonal precipitation in Q1 (January through April).**

In [None]:
mean_seasonal_precip.sel(month = 1).plot()

### Rolling averages

Now let's focus on the Tiaret region. Again, we can use the `sel()` method and the built-in `slice()` function to create a spatial subset around Tiaret, from approximately 0.8 to 1.8 degrees east longitude and from 35.1 to 36.1 degrees north latitude.

#### &#x1F6A9; <span style="color:red">Pay Attention</red>

**Note that, below, we give the latitude or `y` coordinates in descending order: `slice(36.1, 35.1)`.** This is because the starting and ending numbers in a `slice()` must always be in the same order as the data's coordinates, and this dataset's coordinates go from north to south latitudes, or from +90 to -90 degrees latitude.

Let's also calculate the annual precipitation in the Tiaret region, in each year, using `resample()` and `sum()`, as before.

In [None]:
# Note the slice() order for the y coordinate must go from +90 to -90
ds_tiaret = ds.sel(x = slice(0.8, 1.8), y = slice(36.1, 35.1))

tiaret_precip = ds_tiaret['precip'].resample(time = 'YS').sum()
tiaret_precip

We can then compute the **annual precipitation anomaly** by subtracting the overall mean annual precipitation. 

To show a line plot of the time series, we must also average over the Tiaret region, the 20-by-20 pixel region we selected using `slice()`, above.

In [None]:
tiaret_anomaly = tiaret_precip - tiaret_precip.mean('time')

# Average over the 20-by-20 pixel region around Tiaret
tiaret_anomaly.mean(['x', 'y']).plot()

2024 looks very negative in the plot above, but let's not forget that our dataset only includes the first five months of 2024. Since we're comparing annual precipitation in each, it's not appropriate to use partial data from 2024. **Let's instead look at monthly precipitation anomalies.**

Again, we use our **Split-Apply-Combine** workflow to split the data into (monthly) groups, then `apply()` a function to calculate the anomaly. Note that the function we applied is a `lambda` function; it operates on some quantity `x`, subtracting the mean value, `x.mean()` from every (monthly) time step. Because we grouped by `"time.month"`, our precipitation anomalies are relative to the corresponding calendar month. For example, the `2014-01-01` anomaly is relative to the overall January mean.

In [None]:
precip_anomaly = ds_tiaret['precip'].groupby('time.month').apply(lambda x: x - x.mean())
precip_anomaly

We can then plot the monthly precipitation anomalies.

In [None]:
# Again, we average over the spatial domain of our Tiaret subset
precip_anomaly.mean(['x', 'y']).plot()

There's a lot of variation in this plot that makes it hard to see any patterns in the data. **We can apply a rolling average to the data by using the `rolling()` method of an `xarray.DataArray`.**

In [None]:
precip_anomaly.rolling(time = 6).mean().mean(['x', 'y']).plot()

**This has the effect of smoothing the data, allowing us to see that the Tiaret drought is partly caused by a multi-year drying trend.**

---

## A simple bucket model

While the precipitation anomalies we computed, above, give us some information about change in water supply (precipitation), they don't give us any information about change in water demand or losses, such as the result of a warmer or drier atmosphere. Recall that a better measure of hydrologic drought impacts is the *difference* between water supply and water demand, as reflected by precipitation minus potential ET (PET):
$$
\text{Available water} \approx \text{Precipitation} - \text{PET}
$$

Here, we'll consider a slight variation on this hydrologic drought index, **the precipitation-to-PET ratio,** which can be interpreted as the proportion (or percentage) of water demand (PET) that is replenished by precipitation:
$$
\text{Percentage replenished} \approx 100\times \frac{\text{Precipitation}}{\text{PET}}
$$

Although there are [many different ways to calculate PET (Pimentel et al. 2023)](https://doi.org/10.1029/2022WR033447), we'll use [the Hargreaves method](https://www.fao.org/4/X0490E/x0490e07.htm#minimum%20data%20requirements) (Allen et al. 2000), which requires only temperature data.

The temperature data we'll use come from [TerraClimate (Abatzoglou et al. 2018)](https://climatedataguide.ucar.edu/climate-data/terraclimate-global-high-resolution-gridded-temperature-precipitation-and-other-water), a global, gridded temperature dataset. There are many ways to download TerraClimate data; because of its high-resolution, file sizes can be quite large, so we'll use a file created from [this convenient, online subsetting tool](https://climate.northwestknowledge.net/NWTOOLBOX/formattedDownloads.php).

&#x1F449; **Click to download:** [`terraclimate_35.3709N_-1.3218W.csv`](http://files.ntsg.umt.edu/data/ScienceCore/terraclimate_35.3709N_-1.3218W.csv)

This file contains climate time-series data for the Tiaret region, from 1958 to the present.

In [None]:
import pandas as pd

pet = pd.read_csv('data_raw/terraclimate_35.3709N_-1.3218W.csv', skiprows = 11)

# Let's look at just the data since 2014
pet = pet[pet['Year'] >= 2014]
pet

The TerraClimate dataset includes PET data, calculated as "reference ET" according to the FAO Penman-Monteith approach. In the next lesson, we'll calculate PET ourselves using the Hargreaves method. 

But, for now, let's combine the PET estimate from TerraClimate with the CHIRPS precipitation data to compute a simple hydrologic index of drought.

In [None]:
ds_tiaret['precip']

We select the first 120 months of the CHIRPS precipitation, compute the average over the Tiaret region, and divide that by TerraClimate PET.

In [None]:
from matplotlib import pyplot

# NOTE: The CHIRPS data extend monthly through May 2024, but the PET data
#    do not, so we have to subset the CHIRPS data to the first 120 months
precip_pet_ratio = 100 * ds_tiaret['precip'].isel(time = slice(0, 120)).mean(['x','y']) / pet['pet(mm)']
precip_pet_ratio.plot()
pyplot.ylabel('Precipitation-to-PET Ratio (%)')
pyplot.title('Precipitation-to-PET Ratio for Tiaret, Algeria')
pyplot.savefig('results/20240610_Tiaret_precip-to-PET_ratio.png', dpi = 172)

In the plot above, values greater than 100% indicate months where precipitation supply exceeds demands. We can infer that the excess precipitation restores soil water supply, buffering against water loss in the dry season.

Here, we again see that the recent Tiaret drought is preceded by multiple years of inadequate wet-season precipitation. While most wet seasons supply more than 100% of the water demanded by PET, recent wet seasons have supplied less precipitation than might have been lost through PET.

---

### References

Abatzoglou, J. T., S. Z. Dobrowski, S. A. Parks, and K. C. Hegewisch. 2018. TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. *Scientific Data* 5 (1):170191.

Allen, R. G., Pereira, Luis S., Raes, Dirk S., and Smith, Martin eds. 2000. [Crop Evapotranspiration: Guidelines for Computing Crop Water Requirements](https://www.fao.org/4/X0490E/x0490e00.htm.) repr. Rome: Food and Agriculture Organization of the United Nations.

Ault, T. R. 2020. On the essentials of drought in a changing climate. *Science* 368 (6488):256–260.

Palmer, Wayne C. Meteorological drought. Vol. 30. US Department of Commerce, Weather Bureau, 1965.

Wilhite, D. A., and M. H. Glantz. 1985. Understanding the drought phenomenon: The role of definitions. *Water International* 10 (3):111–120.

Wu, I-Pai. 1997. A simple evapotranspiration model for Hawaii: The Hargreaves model. Cooperative Extension Service, College of Tropical Agriculture and Human Resources, University of Hawaii at Manoa. https://www.ctahr.hawaii.edu/oc/freepubs/pdf/EN-106.pdf.

Vicente-Serrano, S. M., S. Beguería, and J. I. López-Moreno. 2010. A multiscalar drought index sensitive to global warming: The Standardized Precipitation Evapotranspiration Index. *Journal of Climate* 23 (7):1696–1718.