#**Regional precipitation variability and extreme events**

---




**Content creators:** Laura Paccini, Raphael Rocha

**Content reviewers:** Marguerite Brown, Ohad Zivan

**Content editors:** Zane Mitrevica, Natalie Steinemann

**Production editors:** TBD

**Our 2023 Sponsors:** TBD

In [None]:
# @title #**Project background** 
#This will be a short video introducing the content creator(s) and motivating the research direction of the template.
#The Tech team will add code to format and display the video

In this project, you will explore rain gauge and satellite data from CHIRPS and MODIS to extract rain estimates and land surface reflectance, respectively. This data will enable identification of extreme events in your region of interest. Besides investigating the relationships between these variables, you are encouraged to study the impact of extreme events on changes in vegetation.

#**Project template**
<p align='center'><a href="https://github.com/ClimatematchAcademy/course-content/tree/main/projects/template_images/precipitation_template_map.svg"><img src="https://github.com/ClimatematchAcademy/course-content/tree/main/projects/template_images/precipitation_template_map.svg" alt="Regional precipitation variability and extreme events" vw="100" vh="75" /></a></p>

#**Data exploration notebook**
##**Project setup**


Please run the following cells!
    



In [None]:
# Imports

#Import only the libraries/objects that are necessary for more than one dataset. 
#Dataset-specific imports should be in the respective notebook section.

#If any external library has to be installed, !pip install library --quiet
#follow this order: numpy>matplotlib. 
#import widgets in hidden Figure settings cell


import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Functions

#Only functions that apply to more than one data source (MODIS, CMIP, ERA5, ...) should be here.
#Functions that apply to more than one datafile from a dataset (MODIS: land cover type and MODIS: NPP) 
#should be in the respective section of the notebook.

In [None]:
# Wrap it into a simple function
def seasonal_mean_by_year(ds, last_month, rolling_months=3 ):
    """
    This function calculates the seasonal mean by resampling the input dataset
    to monthly means, applying a rolling mean with the specified season length.
    The resulting dataset is then filtered to select the last month of the 
    corresponding season, which represents the seasonal mean.
    
    Parameters
    ----------
    ds : xarray.Dataset
        The input dataset containing a time dimension.
    last_month : int
        The last month of the seasonal group. For example, if the season is DJF
        (December-January-February), then last_month=2 (for February).
    rolling_months : int, optional
        The number of months for the rolling mean window, by default 3.
        
    Returns
    -------
    xarray.Dataset
        The seasonal mean dataset.
    """
    #Resampling data to monthly means
    ds_ = ds.resample(time = '1M').mean()

    #Rolling mean. 
    ds_ = ds_.rolling(time = rolling_months).mean()

    #Select month to get the average over selected season
    return ds_.sel(time=ds_.time.dt.month == last_month)

In [None]:
# @title Helper functions


##### Coments for main function
# The above function serves mainly for Q2 and Q3. 
# If you want to calculate the seasonal mean for a custom season (e.g. ONDJFM), 
# you can implement the function as follows:
# data_ONDJFM = seasonal_mean_by_year(data,3,rolling_months=6)

#### helper function: 
def standardize_data(da, dim):
    """
    Performs standardization on a DataArray along a specified dimension.
    
    Parameters:
        - da (xarray.DataArray): Input DataArray to be standardized.
        - dim (str): Dimension along which to perform standardization.
        
    Returns:
        - xarray.DataArray: Standardized DataArray.
    """
    # Calculate mean and standard deviation along the specified dimension
    mean = da.mean(dim=dim)
    std = da.std(dim=dim)
    
    # Perform standardization
    standardized_da = (da - mean) / std
    
    return standardized_da

##**CHIRPS Version 2.0 Global Daily 0.25°**

The Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) is a high-resolution precipitation dataset developed by the Climate Hazards Group at the University of California, Santa Barbara. It combines satellite-derived precipitation estimates with ground-based station data to provide gridded precipitation data at a quasi-global scale between 50°S-50°N. 

Read more about CHIRPS here:

* [The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes](https://www.nature.com/articles/sdata201566)

* [Climate Hazard Group CHG Wiki](https://wiki.chc.ucsb.edu/CHIRPS_FAQ)

### **Indices for extreme events**
The Expert Team on Climate Change Detection and Indices ([ETCCDI]( http://etccdi.pacificclimate.org/list_27_indices.shtml)) has defined various indices that focus on different aspects such as duration or intensity of extreme events. The following functions provide examples of how to compute indices for each category. You can modify these functions to suit your specific needs or create your own custom functions. Here are some tips you can use:

- Most of the indices require daily data, so in order to select a specific season you can just use xarray to subset the data. Example:

` daily_precip_DJF = data_chirps.sel(time=data_chirps['time.season']=='DJF'); `

- A common threshold for a wet event is precipitation greater than or equal to 1mm/day, while a dry (or non-precipitating) event is defined as precipitation less than 1mm/day.
- Some of the indices are based on percentiles. You can define a base period climatology to calculate percentile thresholds, such as the 5th, 10th, 90th, and 95th percentiles, to determine extreme events



In [None]:
def calculate_sdii_index(data):
    """
    This function calculates the Simple Daily Intensity Index (SDII), which
    represents the average amount of precipitation on wet days (days with
    precipitation greater than or equal to 1mm) for each year in the input data.
    The input data should be a Dataset with time coordinates, and the function
    returns a Dataset with the SDII index values for each year in the data.
    ----------
    - data (xarray.Dataset): Input dataset containing daily precipitation data.
    - period (str, optional): Period for which to calculate the SDII index. 
      
    Returns:
    -------
        - xarray.Dataset: Dataset containing the SDII index for the given period.
    """
    # Calculate daily precipitation amount on wet days (PR >= 1mm)
    wet_days = data.where(data >= 1)
    
    # Group by year and calculate the sum precipitation on wet days
    sum_wet_days_grouped = wet_days.groupby('time.year').sum(dim='time')

    # Count number of wet days for each time step
    w = wet_days.groupby('time.year').count(dim='time')

    # Divide by the number of wet days to get SDII index
    sdii = sum_wet_days_grouped/w
    
    return sdii

In [None]:
def calculate_cdd_index(data):
    """
    This function takes a daily precipitation dataset as input and calculates
    the Consecutive Dry Days (CDD) index, which represents the longest sequence
    of consecutive days with precipitation less than 1mm. The input data should
    be a DataArray with time coordinates, and the function returns a DataArray
    with the CDD values for each unique year in the input data.   
    Parameters:
    ----------
      - data (xarray.DataArray): The input daily precipitation data should be 
      a dataset (eg. for chirps_data the SataArray would be chirps_data.precip)
    Returns:
    -------
      - cdd (xarray.DataArray): The calculated CDD index
     
    """
    # Create a boolean array for dry days (PR < 1mm)
    dry_days = data < 1 
     # Initialize CDD array
    cdd = np.zeros(len(data.groupby("time.year"))) 
    # Get unique years as a list
    unique_years = list(data.groupby("time.year").groups.keys())  
    #Iterate for each day
    for i, year in enumerate(unique_years):
      consecutive_trues = []
      current_count = 0
      for day in dry_days.sel(time=data["time.year"] == year).values:
          if day:
              current_count += 1
          else:
              if current_count > 0:
                  consecutive_trues.append(current_count)
                  current_count = 0
      if current_count > 0:
          consecutive_trues.append(current_count)
      #print(consecutive_trues)
      #CDD is the largest number of consecutive days 
      cdd[i] = np.max(consecutive_trues)
    #Transform to dataset
    cdd_da = xr.DataArray(cdd, coords={"year": unique_years}, dims="year")
    return cdd_da

In [None]:
# Code to retrieve and load the data

#### this works after mounting my drive directory and should be changed (Laura)
#files='drive/MyDrive/project_notebook/chirps-v2.0.*'

###Accessing the shared Drive folder (Raphael)
# open_mfdataset doesn't work with wildcards for remote access.
# Workaround retrieving a list of paths with !ls
files_pattern = 'chirps-v2.0.*'
files_dir= 'drive/Shareddrives/Academy/Courses/Climate/Climatematch/06-Projects/01-Resources/Data\ exploration\ notebooks/Regional\ precipitation\ variability\ \&\ extreme\ events\ \(Rapha\ \&\ Laura\)/CHIRPS/'
files_SList = !ls {files_dir}/{files_pattern} #(returns a SList object)
files_list = list(files_SList) # transform to list
files = [f.strip('\'') for f in files_list] # removes extra single quotes

#### Load Data
chirps_data = xr.open_mfdataset(files,combine='by_coords')

## It would be also good to point to the tutorial on Extreme events, although 
## this is covered by the end of the course

We can now visualize the content of the dataset.



In [None]:
# Code to print the shape, array names, etc of the dataset
chirps_data

Unnamed: 0,Array,Chunk
Bytes,32.92 GiB,804.20 MiB
Shape,"(15340, 400, 1440)","(366, 400, 1440)"
Dask graph,42 chunks in 85 graph layers,42 chunks in 85 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 32.92 GiB 804.20 MiB Shape (15340, 400, 1440) (366, 400, 1440) Dask graph 42 chunks in 85 graph layers Data type float32 numpy.ndarray",1440  400  15340,

Unnamed: 0,Array,Chunk
Bytes,32.92 GiB,804.20 MiB
Shape,"(15340, 400, 1440)","(366, 400, 1440)"
Dask graph,42 chunks in 85 graph layers,42 chunks in 85 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray



Now you are all set to address the questions you are interested in! Just be mindful of the specific coordinate names to avoid any issues. 

You can use the provided functions as examples to compute various indices for extreme events based on duration or intensity. Don't hesitate to modify them according to your specific needs or create your own custom functions.


Happy exploring and analyzing precipitation variability and extreme events in your project!


##**MODIS/Terra Vegetation Indices 1-Month (MOD13C2) Version 6.1 L3 Global 0.05° CMG**

Global MODIS (Moderate Resolution Imaging Spectroradiometer) vegetation indices are designed to provide consistent spatial and temporal comparisons of vegetation conditions. Blue, red, and near-infrared reflectances, centered at 469-nanometers, 645-nanometers, and 858-nanometers, respectively, are used to determine the MODIS daily vegetation indices.

The MODIS Normalized Difference Vegetation Index (NDVI) complements NOAA's Advanced Very High Resolution Radiometer (AVHRR) NDVI products providing continuity for time series applications over this rich historical archive. MODIS also includes a new Enhanced Vegetation Index (EVI) product that minimizes canopy background variations and maintains sensitivity over dense vegetation conditions. The EVI also uses the blue band to remove residual atmosphere contamination caused by smoke and sub-pixel thin cloud clouds. The MODIS NDVI and EVI products are computed from atmospherically-corrected bi-directional surface reflectances that have been masked for water, clouds, heavy aerosols, and cloud shadows.

Global MOD13C2 data are cloud-free spatial composites of the gridded 16-day 1-kilometer MOD13C2A2, and are provided as a level-3 product projected on a 0.05 degree (5600-meter) geographic Climate Modeling Grid (CMG). 

In [None]:
!pip3 install pyhdf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyhdf
  Downloading pyhdf-0.10.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (739 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m739.8/739.8 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pyhdf
Successfully installed pyhdf-0.10.5


In [None]:
# Dataset-specific imports
from pyhdf.SD import SD
import pprint

In [None]:
# Code to retrieve and load the data

#### Load Data
###Accessing the shared Drive folder (Raphael)
# open_mfdataset doesn't work with wildcards for remote access.
# Workaround retrieving a list of paths with !ls
files_pattern = 'MOD13C2.*'
files_dir= 'drive/Shareddrives/Academy/Courses/Climate/Climatematch/06-Projects/01-Resources/Data\ exploration\ notebooks/Regional\ precipitation\ variability\ \&\ extreme\ events\ \(Rapha\ \&\ Laura\)/MODIS'
files_SList = !ls {files_dir}/{files_pattern} #(returns a SList object)
files_list = list(files_SList) # transform to list
files = [f.strip('\'') for f in files_list] # removes extra single quotes

ds_dict = {}
# extract METADATA loops through all files' list
for file_name in files:
    # open the hdf file for reading
    hdf = SD(file_name)
    # extract the list of SDS in the hdf4 file
    sds = hdf.datasets()

HDF4Error: ignored

In [None]:
# Code to preprocess data
for i in enumerate(sds.keys()):
  print(i)

NameError: ignored

In [None]:
ndvi = hdf.select('CMG 0.05 Deg Monthly NDVI')
evi = hdf.select('CMG 0.05 Deg Monthly EVI')
pprint.pprint(ndvi.attributes())
pprint.pprint(evi.attributes())

{'_FillValue': -3000,
 'add_offset': 0.0,
 'add_offset_err': 0.0,
 'calibrated_nt': 5,
 'long_name': 'CMG 0.05 Deg Monthly NDVI',
 'scale_factor': 10000.0,
 'scale_factor_err': 0.0,
 'units': 'NDVI',
 'valid_range': [-2000, 10000]}
{'_FillValue': -3000,
 'add_offset': 0.0,
 'add_offset_err': 0.0,
 'calibrated_nt': 5,
 'long_name': 'CMG 0.05 Deg Monthly EVI',
 'scale_factor': 10000.0,
 'scale_factor_err': 0.0,
 'units': 'EVI',
 'valid_range': [-2000, 10000]}


In [None]:
!pip install satpy
import satpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting satpy
  Downloading satpy-0.42.1.tar.gz (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m47.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pyproj>=2.2
  Downloading pyproj-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m42.7 MB/s[0m eta [36m0:00:00[0m
Collecting pyresample>=1.24.0
  Downloading pyresample-1.26.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m74.9 MB/s[0m eta [36m0:00:00[0m
Collecting donfig
  Downloading donfig-0.7.0.tar.gz (32 kB)

We are all set. Let's go!

#**Further reading**


- Zhang, X., Alexander, L., Hegerl, G.C., Jones, P., Tank, A.K., Peterson, T.C., Trewin, B. and Zwiers, F.W., 2011. Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdisciplinary Reviews: Climate Change, 2(6), pp.851-870.
-