# Rainfall indices based on the Indian monsoon

in this lesson we are going to apply some of the basics we have learned from the video course concerning the simple manipulation of netcdf files using cdo.  The examples are based on the analysis of the GPCP data.  To help you get the data fast to your machine I've placed a copy of an older version on my dods server, since accessing the latest version from NASA requires a time consuming registration process.  To see how to do this, [read this NASA page](). You can then replace this version of gpcp later if you desire.

In [5]:
# you need to install:
# MAC brew install (or port install)
# UBUNTU sudo apt install
#
# cdo
# wget
# ncview
# netcdf-dev (ubuntu) netcdf (brew)

year1=1998
year2=2005
ddir="../../DATA/gpcp"
fname=gpcp_v01r03_daily_

here we pull the data from the ICTP server using wget...  But you can also follow my video course and write an API to get rainfall data from ERA5 or an alternative retrieval from the CDS if you prefer

In [None]:
# get the data

mkdir -p ./gpcp
stub=gpcp_v01r03_daily_
for year in $(seq ${year1} ${year2}); do
   wget -P $ddir http://clima-dods.ictp.it/Users/tompkins/Observations/GPCP/v1.3/${stub}${year}.nc
done

--2025-03-06 12:56:24--  http://clima-dods.ictp.it/Users/tompkins/Observations/GPCP/v1.3/gpcp_v01r03_daily_1998.nc
Resolving clima-dods.ictp.it (clima-dods.ictp.it)... 140.105.16.180
Connecting to clima-dods.ictp.it (clima-dods.ictp.it)|140.105.16.180|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94726499 (90M) [application/x-netcdf]
Saving to: ‘./gpcp/gpcp_v01r03_daily_1998.nc.1’


2025-03-06 12:56:47 (3.90 MB/s) - ‘./gpcp/gpcp_v01r03_daily_1998.nc.1’ saved [94726499/94726499]

--2025-03-06 12:56:47--  http://clima-dods.ictp.it/Users/tompkins/Observations/GPCP/v1.3/gpcp_v01r03_daily_1999.nc
Resolving clima-dods.ictp.it (clima-dods.ictp.it)... 140.105.16.180
Connecting to clima-dods.ictp.it (clima-dods.ictp.it)|140.105.16.180|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94726499 (90M) [application/x-netcdf]
Saving to: ‘./gpcp/gpcp_v01r03_daily_1999.nc’


2025-03-06 12:57:09 (4.19 MB/s) - ‘./gpcp/gpcp_v01r03_daily_1999.nc’ saved [94

# Cutting out areas

first of all we need to cut out an area, what is the CDO command that is used for this ?

<details>
<summary>Click here for answer</summary>
The command we use is "sellonlatbox"
</details>


In [17]:
#
# Let's first cut out an area 
#
# lon1, lon2, lat1, lat2
india="65,90,8,27"
wafrica="-20,20,-5,30"

region=$india
region_tag=_area$(echo $region | tr , _) 

for year in $(seq ${year1} ${year2}); do
    ifile=${ddir}/${fname}${year}.nc
    ofile=${ddir}/${fname}${year}${region_tag}.nc
    cdo sellonlatbox,${region} $ifile $ofile
done

[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.08s 43MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 42MB]
[32mcdo    sellonlatbox: [0mProcessed 23716800 values from 1 variable over 366 timesteps [0.07s 42MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.06s 41MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.06s 42MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 43MB]
[32mcdo    sellonlatbox: [0mProcessed 23716800 values from 1 variable over 366 timesteps [0.07s 42MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.06s 41MB]


In [12]:


lon1=65
lon2=90
lat1=8
lat2=27

region="${lon1},${lon2},${lat1},${lat2}"
region_tag=area_${lon1}_${lon2}_${lat1}_${lat2}

for year in $(seq ${year1} ${year2}); do
    ifile=${ddir}/${fname}${year}.nc
    ofile=${ddir}/${fname}${year}_${region_tag}.nc
    cdo sellonlatbox,${region} $ifile $ofile
done

[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.08s 43MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 43MB]
[32mcdo    sellonlatbox: [0mProcessed 23716800 values from 1 variable over 366 timesteps [0.07s 41MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 43MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 42MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 44MB]
[32mcdo    sellonlatbox: [0mProcessed 23716800 values from 1 variable over 366 timesteps [0.07s 43MB]
[32mcdo    sellonlatbox: [0mProcessed 23652000 values from 1 variable over 365 timesteps [0.07s 43MB]


# Merging files

now we want to merge the files into a single file for the whole period and just select the summer months

In [25]:
# now the files are much smaller, let's put them together in a single file for ease
cdo -O mergetime ${ddir}/${fname}????${region_tag}.nc ${ddir}/${fname}${region_tag}.nc

# just select the summer months
cdo -O selmon,6/9 ${ddir}/${fname}${region_tag}.nc ${ddir}/${fname}${region_tag}_JJAS.nc

[32mcdo    mergetime: [0mProcessed 1519440 values from 8 variables over 2922 timesteps [0.32s 58MB]
[32mcdo    selmonth: [0mProcessed 507520 values from 1 variable over 2922 timesteps [0.17s 48MB]


# Examine the files

Let's take a look at the resulting file with ncview

Once you are familiar with the details, let's start to make some more involved indices 

These follow https://arxiv.org/abs/2404.12419

We first calculate the anomaly of the annual rainy season.

In [24]:
# now we have a single file, let's look at the annual anomaly
cdo timmean ${ddir}/${fname}${region_tag}_JJAS.nc ${ddir}/${fname}${region_tag}_JJAS_timmean.nc
cdo sub ${ddir}/${fname}${region_tag}_JJAS.nc ${ddir}/${fname}${region_tag}_JJAS_timmean.nc ${ddir}/${fname}${region_tag}_JJAS_anom.nc
cdo yearmean ${ddir}/${fname}${region_tag}_JJAS_anom.nc ${ddir}/${fname}${region_tag}_JJAS_anom_yearmean.nc

[32mcdo(1) timmean: [0mProcess started
[32mcdo    sub: [0mFilling up stream2 >(pipe1.4)< by copying the first timestep.
[32mcdo    sub: [0mProcessed 508040 values from 2 variables over 977 timesteps [0.18s 47MB]
[32mcdo    yearmean: [0mProcessed 507520 values from 1 variable over 976 timesteps [0.07s 39MB]


# Spatial averaging

The spatial average is actually not straightforward to calculate offline.  As is clear from the video, one needs to account for the grid cell size, which means with a regular lat-long grid, weighting for the cosine of the latitude. Luckily cdo accounts for different grid mesh sizes in its fldmean function.



In [13]:
# let's make an all-region index
cdo fldmean ${ddir}/${fname}${region_tag}_JJAS_anom_yearmean.nc ${ddir}/${fname}${region_tag}_JJAS_anom_meanindex.nc

[32mcdo    fldmean: [0mProcessed 4160 values from 1 variable over 8 timesteps [0.02s 33MB]


# Now we can start to use CDO to calculate more interesting indices

### TASK 1:  A wet area index.  

Take a moment to think about this before reading on. You need to use cdo to make the "wet-area" index, which is  the fraction of a region that has had a <b>positive</b> rainfall anomaly in a given particular year... 

In order to do this you will need to use a logical function, like 

- ```ge``` (greater than or equal) which compares two fields and produces a 1 if the first field is larger than or equal to the second, 0 otherwise, or
- ```gtc``` (greater than a constant), which requires an argument (e.g. ```gec,5``` no spaces!) which then gives a 1 if the input field is greater than the threshold.

<details>
<summary>Which of these two function do we need to use here?</summary>
Either can work!  We will need *gec* if we are to see if the anomaly is above zero.  But you don't need to calculate the anomaly necessarily, you could also use ```cdo ge``` to compare the annual mean precipitation directly to the climatology!
</details>

In [14]:
# now we can start to look at the wet area index
# first we set to 1 all points that are with a positive anomaly
cdo gec,0 ${ddir}/${fname}${region_tag}_JJAS_anom_yearmean.nc ${ddir}/${fname}${region_tag}_JJAS_anom_binary.nc

# and now we can add up all the 1s to see what the wet area is
cdo fldmean ${ddir}/${fname}${region_tag}_JJAS_anom_binary.nc ${ddir}/${fname}${region_tag}_JJAS_wetarea_index.nc

[32mcdo    gec: [0mProcessed 4160 values from 1 variable over 8 timesteps [0.02s 33MB]
[32mcdo    fldmean: [0mProcessed 4160 values from 1 variable over 8 timesteps [0.02s 33MB]


### TASK 2: Number of extreme rainy days

In this task you will need to calculate first the 95th percentile of rainfall, and then sum up the number of days within the monsoon season each year that exceed this threshold for each location...

## Percentiles.

Let's say you have a collection of $n$ observations $o_i$.  The Xth percentile simply gives the threshold below which $X$ % of the distribution lies.  Now if $n$ is large, you could simply line up the number in ascending order and make the cut at the point where $X$% of the numbers are smaller than your chosen threshold.  However usually your sample is not large enough, in which case it is common to assume that the sampled values fit a common distribution (e.g. one might assume that they are Normally distributed).  The distribution can be fitted by various methods such as moment matching, and then the thresholds are derived from the fitted distribution.

CDO has a number of built-in options that are derived from the relevant python package scipy:

- nrank
- nist
- rtype8
- linear
- lower
- higher
- nearest
- midpoint
- inverted_cdf
- averaged_inverted_cdf
- closest_observation
- interpolated_inverted_cdf
- hazen
- weibull
- median_unbiased
- normal_unbiased

As you can see the list is long!  We will therefore not delve into the details here, but refer you to the [cdo percentile documentation](https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-520001.10) for this method.  For large samples, there is very little difference between the methods. We simply use the default method in the following.

In [26]:
# make a index for extremes P95 for example
ifile=${ddir}/${fname}${region_tag}_JJAS.nc
percen=95
cdo timpctl,${percen} $ifile -timmin $ifile -timmax $ifile ${ddir}/${fname}${region_tag}_JJAS_p${percen}.nc

cdo ge $ifile ${ddir}/${fname}${region_tag}_JJAS_p${percen}.nc ${ddir}/${fname}${region_tag}_JJAS_p${percen}_binary.nc

# number of extreme rainy days per year
cdo yearsum ${ddir}/${fname}${region_tag}_JJAS_p${percen}_binary.nc ${ddir}/${fname}${region_tag}_JJAS_p${percen}_nevents.nc



[32mcdo(1) timmin: [0mProcess started
[32mcdo(2) timmax: [0mProcess started
[32mcdo    timpctl: [0mProcessed 508560 values from 3 variables over 978 timesteps [0.20s 50MB]
[32mcdo    ge: [0mFilling up stream2 >../../DATA/gpcp/gpcp_v01r03_daily__area65_90_8_27_JJAS_p95.nc< by copying the first timestep.
[32mcdo    ge: [0mProcessed 508040 values from 2 variables over 977 timesteps [0.11s 44MB]
[32mcdo    yearsum: [0mProcessed 507520 values from 1 variable over 976 timesteps [0.07s 38MB]


### TASK 3 : Hands on/Homework

Repeat the above exercises for another region that interests you, (e.g. West African monsoon, south American monsoon, Europe, take an area that interests you). Remember that if you cut down the seasons within the year to target the months when the rains arrive if you are focussing on a monsoon region!


### TASK 4 : Homework

Download from the dataserver SST data for the Pacific region, read up on the ENSO 3.4 index and think about how you could use CDO to make a simple ENSO index. This will be the topic of the next lecture, so don't worry if you get stuck!
