# **Regional precipitation variability and extreme events**

---

**Content creators:** Laura Paccini, Raphael Rocha

**Content reviewers:** Marguerite Brown, Ohad Zivan

**Content editors:** Zane Mitrevica, Natalie Steinemann

**Production editors:** TBD

**Our 2023 Sponsors:** TBD

In [None]:
# @title #**Project background** 
#This will be a short video introducing the content creator(s) and motivating the research direction of the template.
#The Tech team will add code to format and display the video

In this project, you will explore rain gauge and satellite data from CHIRPS and MODIS to extract rain estimates and land surface reflectance, respectively. This data will enable identification of extreme events in your region of interest. Besides investigating the relationships between these variables, you are encouraged to study the impact of extreme events on changes in vegetation.

#**Project template**
<p align='center'><a href="https://github.com/ClimatematchAcademy/course-content/blob/main/projects/template-images/precipitation_template_map.svg"><img src="https://github.com/ClimatematchAcademy/course-content/blob/main/projects/template-images/precipitation_template_map.svg?raw=True" alt="Regional precipitation variability and extreme events" vw="100" vh="75" /></a></p>

#**Data exploration notebook**
##**Project setup**


Please run the following cells!
    



In [1]:
# Imports

#Import only the libraries/objects that are necessary for more than one dataset. 
#Dataset-specific imports should be in the respective notebook section.

#If any external library has to be installed, !pip install library --quiet
#follow this order: numpy>matplotlib. 
#import widgets in hidden Figure settings cell


import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Functions

#Only functions that apply to more than one data source (MODIS, CMIP, ERA5, ...) should be here.
#Functions that apply to more than one datafile from a dataset (MODIS: land cover type and MODIS: NPP) 
#should be in the respective section of the notebook.

In [3]:
# Wrap it into a simple function
def seasonal_mean_by_year(ds, last_month, rolling_months=3 ):
    """
    This function calculates the seasonal mean by resampling the input dataset
    to monthly means, applying a rolling mean with the specified season length.
    The resulting dataset is then filtered to select the last month of the 
    corresponding season, which represents the seasonal mean.
    
    Parameters
    ----------
    ds : xarray.Dataset
        The input dataset containing a time dimension.
    last_month : int
        The last month of the seasonal group. For example, if the season is DJF
        (December-January-February), then last_month=2 (for February).
    rolling_months : int, optional
        The number of months for the rolling mean window, by default 3.
        
    Returns
    -------
    xarray.Dataset
        The seasonal mean dataset.
    """
    #Resampling data to monthly means
    ds_ = ds.resample(time = '1M').mean()

    #Rolling mean. 
    ds_ = ds_.rolling(time = rolling_months).mean()

    #Select month to get the average over selected season
    return ds_.sel(time=ds_.time.dt.month == last_month)

In [4]:
# @title Helper functions


##### Coments for main function
# The above function serves mainly for Q2 and Q3. 
# If you want to calculate the seasonal mean for a custom season (e.g. ONDJFM), 
# you can implement the function as follows:
# data_ONDJFM = seasonal_mean_by_year(data,3,rolling_months=6)

#### helper function: 
def standardize_data(da, dim):
    """
    Performs standardization on a DataArray along a specified dimension.
    
    Parameters:
        - da (xarray.DataArray): Input DataArray to be standardized.
        - dim (str): Dimension along which to perform standardization.
        
    Returns:
        - xarray.DataArray: Standardized DataArray.
    """
    # Calculate mean and standard deviation along the specified dimension
    mean = da.mean(dim=dim)
    std = da.std(dim=dim)
    
    # Perform standardization
    standardized_da = (da - mean) / std
    
    return standardized_da

## **CHIRPS Version 2.0 Global Daily 0.25°**

The Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a high-resolution precipitation dataset developed by the Climate Hazards Group at the University of California, Santa Barbara. It combines satellite-derived precipitation estimates with ground-based station data to provide gridded precipitation data at a quasi-global scale between 50°S-50°N. 

Read more about CHIRPS here:

* [The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes](https://www.nature.com/articles/sdata201566)

* [Climate Hazard Group CHG Wiki](https://wiki.chc.ucsb.edu/CHIRPS_FAQ)

### **Indices for extreme events**
The Expert Team on Climate Change Detection and Indices ([ETCCDI]( http://etccdi.pacificclimate.org/list_27_indices.shtml)) has defined various indices that focus on different aspects such as duration or intensity of extreme events. The following functions provide examples of how to compute indices for each category. You can modify these functions to suit your specific needs or create your own custom functions. Here are some tips you can use:

- Most of the indices require daily data, so in order to select a specific season you can just use xarray to subset the data. Example:

` daily_precip_DJF = data_chirps.sel(time=data_chirps['time.season']=='DJF'); `

- A common threshold for a wet event is precipitation greater than or equal to 1mm/day, while a dry (or non-precipitating) event is defined as precipitation less than 1mm/day.
- Some of the indices are based on percentiles. You can define a base period climatology to calculate percentile thresholds, such as the 5th, 10th, 90th, and 95th percentiles, to determine extreme events



In [5]:
def calculate_sdii_index(data):
    """
    This function calculates the Simple Daily Intensity Index (SDII), which
    represents the average amount of precipitation on wet days (days with
    precipitation greater than or equal to 1mm) for each year in the input data.
    The input data should be a Dataset with time coordinates, and the function
    returns a Dataset with the SDII index values for each year in the data.
    ----------
    - data (xarray.Dataset): Input dataset containing daily precipitation data.
    - period (str, optional): Period for which to calculate the SDII index. 
      
    Returns:
    -------
        - xarray.Dataset: Dataset containing the SDII index for the given period.
    """
    # Calculate daily precipitation amount on wet days (PR >= 1mm)
    wet_days = data.where(data >= 1)
    
    # Group by year and calculate the sum precipitation on wet days
    sum_wet_days_grouped = wet_days.groupby('time.year').sum(dim='time')

    # Count number of wet days for each time step
    w = wet_days.groupby('time.year').count(dim='time')

    # Divide by the number of wet days to get SDII index
    sdii = sum_wet_days_grouped/w
    
    return sdii

In [6]:
def calculate_cdd_index(data):
    """
    This function takes a daily precipitation dataset as input and calculates
    the Consecutive Dry Days (CDD) index, which represents the longest sequence
    of consecutive days with precipitation less than 1mm. The input data should
    be a DataArray with time coordinates, and the function returns a DataArray
    with the CDD values for each unique year in the input data.   
    Parameters:
    ----------
      - data (xarray.DataArray): The input daily precipitation data should be 
      a dataset (eg. for chirps_data the SataArray would be chirps_data.precip)
    Returns:
    -------
      - cdd (xarray.DataArray): The calculated CDD index
     
    """
    # Create a boolean array for dry days (PR < 1mm)
    dry_days = data < 1 
     # Initialize CDD array
    cdd = np.zeros(len(data.groupby("time.year"))) 
    # Get unique years as a list
    unique_years = list(data.groupby("time.year").groups.keys())  
    #Iterate for each day
    for i, year in enumerate(unique_years):
      consecutive_trues = []
      current_count = 0
      for day in dry_days.sel(time=data["time.year"] == year).values:
          if day:
              current_count += 1
          else:
              if current_count > 0:
                  consecutive_trues.append(current_count)
                  current_count = 0
      if current_count > 0:
          consecutive_trues.append(current_count)
      #print(consecutive_trues)
      #CDD is the largest number of consecutive days 
      cdd[i] = np.max(consecutive_trues)
    #Transform to dataset
    cdd_da = xr.DataArray(cdd, coords={"year": unique_years}, dims="year")
    return cdd_da

In [7]:
# Code to retrieve and load the data

#### this works after mounting my drive directory and should be changed (Laura)
#files='drive/MyDrive/project_notebook/chirps-v2.0.*'

###Accessing the shared Drive folder (Raphael)
# open_mfdataset doesn't work with wildcards for remote access.
# Workaround retrieving a list of paths with !ls
files_pattern = 'chirps-v2.0.*'
files_dir= 'drive/Shareddrives/Academy/Courses/Climate/Climatematch/06-Projects/01-Resources/Data\ exploration\ notebooks/Regional\ precipitation\ variability\ \&\ extreme\ events\ \(Rapha\ \&\ Laura\)/CHIRPS/'
files_SList = !ls {files_dir}/{files_pattern} #(returns a SList object)
files_list = list(files_SList) # transform to list
files = [f.strip('\'') for f in files_list] # removes extra single quotes

#### Load Data
chirps_data = xr.open_mfdataset(files,combine='by_coords')

## It would be also good to point to the tutorial on Extreme events, although 
## this is covered by the end of the course

We can now visualize the content of the dataset.



In [8]:
# Code to print the shape, array names, etc of the dataset
chirps_data

Unnamed: 0,Array,Chunk
Bytes,32.92 GiB,804.20 MiB
Shape,"(15340, 400, 1440)","(366, 400, 1440)"
Dask graph,42 chunks in 85 graph layers,42 chunks in 85 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.92 GiB 804.20 MiB Shape (15340, 400, 1440) (366, 400, 1440) Dask graph 42 chunks in 85 graph layers Data type float32 numpy.ndarray",1440  400  15340,

Unnamed: 0,Array,Chunk
Bytes,32.92 GiB,804.20 MiB
Shape,"(15340, 400, 1440)","(366, 400, 1440)"
Dask graph,42 chunks in 85 graph layers,42 chunks in 85 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## **MODIS/Terra Vegetation Indices 1-Month (MOD13C2) Version 6.1 L3 Global 0.05° CMG**

Global MODIS (Moderate Resolution Imaging Spectroradiometer) vegetation indices are designed to provide consistent spatial and temporal comparisons of vegetation conditions. Blue, red, and near-infrared reflectances, centered at 469-nanometers, 645-nanometers, and 858-nanometers, respectively, are used to determine the MODIS daily vegetation indices.

The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA's Advanced Very High Resolution Radiometer (AVHRR) NDVI products providing continuity for time series applications over this rich historical archive. MODIS also includes a new Enhanced Vegetation Index (EVI) product that minimizes canopy background variations and maintains sensitivity over dense vegetation conditions. The EVI also uses the blue band to remove residual atmosphere contamination caused by smoke and sub-pixel thin cloud clouds. The MODIS NDVI and EVI products are computed from atmospherically-corrected bi-directional surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows.

Global MOD13C2 data are cloud-free spatial composites of the gridded 16-day 1-kilometer MOD13C2A2, and are provided as a level-3 product projected on a 0.05 degree (5600-meter) geographic Climate Modeling Grid (CMG). 

In [9]:
files_pattern = 'NDVI*'
files_dir= 'drive/Shareddrives/Academy/Courses/Climate/Climatematch/06-Projects/01-Resources/Data\ exploration\ notebooks/Regional\ precipitation\ variability\ \&\ extreme\ events\ \(Rapha\ \&\ Laura\)/MODIS/'
files_SList = !ls {files_dir}/{files_pattern} #(returns a SList object)
files_list = list(files_SList) # transform to list
files = [f.strip('\'') for f in files_list] # removes extra single quotes

#### Load Data
modis_data = xr.open_mfdataset(files,combine='by_coords')


In [24]:
modis_data

Unnamed: 0,Array,Chunk
Bytes,2.21 GiB,101.07 MiB
Shape,"(400, 1440, 515)","(400, 1440, 23)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.21 GiB 101.07 MiB Shape (400, 1440, 515) (400, 1440, 23) Dask graph 24 chunks in 49 graph layers Data type float64 numpy.ndarray",515  1440  400,

Unnamed: 0,Array,Chunk
Bytes,2.21 GiB,101.07 MiB
Shape,"(400, 1440, 515)","(400, 1440, 23)"
Dask graph,24 chunks in 49 graph layers,24 chunks in 49 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


<a name="dataset2-1"></a>
## **Worldbank data: Cereal production**

Cereal production is a crucial component of global agriculture and food security. The World Bank collects and provides data on cereal production, which includes crops such as wheat, rice, maize, barley, oats, rye, sorghum, millet, and mixed grains. The data covers various indicators such as production quantity, area harvested, yield, and production value.

The World Bank also collects data on land under cereals production, which refers to the area of land that is being used to grow cereal crops. This information can be valuable for assessing the productivity and efficiency of cereal production systems in different regions, as well as identifying potential areas for improvement. Overall, the World Bank's data on cereal production and land under cereals production is an important resource for policymakers, researchers, and other stakeholders who are interested in understanding global trends in agriculture and food security.

In [26]:
import pandas as pd

In [27]:
# Code to retrieve and load the data
files= 'drive/Shareddrives/Academy/Courses/Climate/Climatematch/06-Projects/01-Resources/Data exploration notebooks/Regional precipitation variability & extreme events (Rapha & Laura)/data_cereal_land_meta.csv'
ds_cereal_land = pd.read_csv(files)
ds_cereal_land.head() 

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],1976 [YR1976],1977 [YR1977],...,2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020],2021 [YR2021]
0,Afghanistan,AFG,Cereal production (metric tons),AG.PRD.CREL.MT,3950000,4270000,4351000,4481000,4624000,4147000,...,6379000,6520329,6748023.28,5808288.0,5532695.42,4892953.97,4133051.85,5583461.0,6025977.0,4663880.79
1,Afghanistan,AFG,Land under cereal production (hectares),AG.LND.CREL.HA,3923100,3337000,3342000,3404000,3394000,3388000,...,3143000,3182922,3344733.0,2724070.0,2793694.0,2419213.0,1911652.0,2641911.0,3043589.0,2164537.0
2,Albania,ALB,Cereal production (metric tons),AG.PRD.CREL.MT,585830,625498,646200,666500,857000,910400,...,697400,702870,700370.0,695000.0,698430.0,701734.0,678196.0,666065.0,684023.0,691126.7
3,Albania,ALB,Land under cereal production (hectares),AG.LND.CREL.HA,331220,339400,334040,328500,350500,357000,...,142800,142000,143149.0,142600.0,148084.0,145799.0,140110.0,132203.0,131310.0,134337.0
4,Algeria,DZA,Cereal production (metric tons),AG.PRD.CREL.MT,2362625,1595994,1480275,2680452,2313186,1142509,...,5137455,4912551,3435535.0,3761229.6,3445227.37,3478175.14,6066252.82,5633596.78,4393336.75,2784017.29


In [29]:
##Example
ds_cereal_land[(ds_cereal_land['Country Name']=='Brazil')].reset_index(drop=True).iloc[0].transpose()

Country Name                              Brazil
Country Code                                 BRA
Series Name      Cereal production (metric tons)
Series Code                       AG.PRD.CREL.MT
1972 [YR1972]                           22703928
1973 [YR1973]                           23721606
1974 [YR1974]                           26240014
1975 [YR1975]                           26238419
1976 [YR1976]                           31143200
1977 [YR1977]                           30913834
1978 [YR1978]                           24033646
1979 [YR1979]                           27147322
1980 [YR1980]                           33217492
1981 [YR1981]                           32050567
1982 [YR1982]                           33838263
1983 [YR1983]                           29197566
1984 [YR1984]                           32711289
1985 [YR1985]                           36011139
1986 [YR1986]                           37298400
1987 [YR1987]                           44148398
1988 [YR1988]       


Now you are all set to address the questions you are interested in! Just be mindful of the specific coordinate names to avoid any issues. 

You can use the provided functions as examples to compute various indices for extreme events based on duration or intensity. Don't hesitate to modify them according to your specific needs or create your own custom functions.


Happy exploring and analyzing precipitation variability and extreme events in your project!


#**Further reading**


- Zhang, X., Alexander, L., Hegerl, G.C., Jones, P., Tank, A.K., Peterson, T.C., Trewin, B. and Zwiers, F.W., 2011. Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdisciplinary Reviews: Climate Change, 2(6), pp.851-870.
- Schultz, P. A., and M. S. Halpert. "Global correlation of temperature, NDVI and precipitation." Advances in Space Research 13.5 (1993): 277-280. 
- Seneviratne, S.I. et al., 2021: Weather and Climate Extreme Events in a Changing Climate. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V. et al. (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 1513–1766, https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-11/
- IPCC, 2021: Annex VI: Climatic Impact-driver and Extreme Indices [Gutiérrez J.M. et al.(eds.)]. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V. et al. (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 2205–2214, https://www.ipcc.ch/report/ar6/wg1/downloads/report/IPCC_AR6_WGI_AnnexVI.pdf