# **Preparing Data for LSWMI**
*Notebook created Dec. 18, 2025*

This notebook documents the processing of ERA5 data and building the Local Southwest Monsoon Index (LSWMI). The technical workflow is based on the methodology from [*Development of Local Southwest Monsoon Index in the Philippines* by Manauis et al. (2024)](https://www.jstage.jst.go.jp/article/sola/advpub/0/advpub_2024-033/_article/-char/en). The technical workflow of the LSWMI was provided by Ms. Jianne Pamintuan, a Graduate Fellow working in the Monsoon Watch PH project of DOST-PAGASA.

All processes will be performed at `G:\My Drive\Thesis\Monsoon\Data` or locally, whichever is more efficient.

## **1. Downloading Datasets**
The following datasets will be used in building the Local Southwest Monsoon Index.

| Parameter | ERA5 Dataset | Variable | Daily Statistic | Pressure Level (if any, in hPa) | Unit | Remarks |
| :--- | :--- | :---: | :---: | :---: | :---: | :--- |
| Zonal wind | ERA5 post-processed daily statistics on pressure levels from 1940 to present | `u` | Daily mean | 1000, 200 | ms<sup>-1</sup> | N/A |
| Meridional wind | ERA5 post-processed daily statistics on pressure levels from 1940 to present | `v` | Daily mean | 1000, 200 | ms<sup>-1</sup> | N/A |
| Mean sea level pressure | ERA5 post-processed daily statistics on single levels from 1940 to present | `msl` | Daily mean | Single level | Pa | N/A |
| Top net thermal radiation | ERA5 post-processed daily statistics on single levels from 1940 to present | `ttr` | Daily mean | Single level | Jm<sup>-2</sup> | Proxy for OLR; this is equal to the negative of OLR |
| Total precipitation | ERA5 post-processed daily statistics on single levels from 1940 to present | `tp` | Daily sum | Single level | m | Variable will be used to build a Rainfall Anomaly Index (RAI) and to verify LSWMI |

Other specifications are as follows:
- **Product type:** Reanalysis
- **Time:** +00:00 UTC
- **Geographical Area:** 0°N-25°N, 110°E-135°E
- **Data Format:** NetCDF4
- **Link:** https://cds.climate.copernicus.eu/datasets

Datasets downloaded for this study have the following specifications:
- **Product type:** Reanalysis
- **Time:** +00:00 UTC
- **Period:** May-October, 1979-2025
- **Geographical Area:** 10°S-60°N, 40°E-180°E
- **Data Format:** NetCDF4
- **Link:** https://cds.climate.copernicus.eu/datasets

## **2. Preparing the Datasets**
**Given:** The downloaded ERA5 datasets are in the `Data` folder and arranged into the following structure:

```data/
└── 1979/
    ├── 1979-05-msl.nc
    ├── 1979-05-tp.nc
    ├── 1979-05-ttr.nc
    ├── 1979-05-u-200.nc
    ├── 1979-05-u-1000.nc
    ├── 1979-05-v-200.nc
    ├── 1979-05-v-1000.nc
    ├── ...
    ├── 1979-10-v-1000.nc
└── 1980/
    ├── 1980-05-msl.nc
    ├── ...
    └── 1980-10-v-1000.nc
└── .../
└── 2025/
    ├── 2025-05-msl.nc
    ├── ...
    └── 2025-10-v-1000.nc
```

**Objective:** Generate new .nc files according to the following specifications:
- **Time:** Pentad (+00:00 UTC)
- **Period:** May-September, 1979-2025
- **Geographical Area:** 0°N-25°N, 110°E-135°E
- **Structure:** Monthly files aggregated into yearly files

```
└── 1979/
    ├── 1979-msl.nc
    ├── 1979-tp.nc
    ├── 1979-ttr.nc
    ├── 1979-u-200.nc
    ├── 1979-u-1000.nc
    ├── 1979-v-200.nc
    ├── 1979-v-1000.nc
    ├── ...
    ├── 1979-v-1000.nc
└── 1980/
    ├── 1980-msl.nc
    ├── ...
    └── 1980-v-1000.nc
└── .../
└── 2025/
    ├── 2025-msl.nc
    ├── ...
    └── 2025-v-1000.nc
```

- **Coordinates:**
  - `valid_time`: Change to `time`
  - `latitude`: Change to `lat`
  - `longitude`: Change to `lon`
  - `isobaricInhPa`: Change to `level`
- **Variables:**
  - `msl`: Convert unit to hPa; `msl = msl / 100`
  - `ttr`: Flip signs of values; `OLR = -ttr`
  - `tp`: Convert unit to mm; `tp = tp * 1000`

### **A. Sanity Check**
A sanity check using the `sanity_check.py` script will do the following:
- blah
- blah
- blah

### **B. Changing Coordinates**
Coordinates will be changed using `coor_change.py`, as outlined below. Rewritten .nc files will be generated and put into the `Data/Fixed` folder.

In [None]:
# coor_change.py

### **C. Adjusting Variables**
`msl`, `ttr`, and `tp` will be adjusted using `msl_adj.py`, `ttr_adj.py`, and `tp_adj.py`. Rewritten .nc files will be generated and put into the `Data/Fixed`.

In [None]:
# msl_adj.py

In [None]:
# ttr_adj.py

In [None]:
# tp_adj.py

### **D. Aggregating to Pentads**
The monthly files will be aggregated into pentad then yearly files using `pentads.py` based on the chart below.

![Pentad chart used in the technical workflow](https://cdn.imgchest.com/files/7e341c0d10ff.png)

New .nc files will be generated and put into the `Data/LSWMI_PrepFinal` folder.

In [None]:
# pentads.py

## **3. Building the WSI, SSI, MSLP-AI, and OLR-AI**
**Given:** The aggregated ERA5 datasets are in `Data/LSWMI_PrepFinal` folder and arranged into the following structure:

```
└── 1979/
    ├── 1979-msl.nc
    ├── 1979-tp.nc
    ├── 1979-ttr.nc
    ├── 1979-u-200.nc
    ├── 1979-u-1000.nc
    ├── 1979-v-200.nc
    ├── 1979-v-1000.nc
    ├── ...
    ├── 1979-v-1000.nc
└── 1980/
    ├── 1980-msl.nc
    ├── ...
    └── 1980-v-1000.nc
└── .../
└── 2025/
    ├── 2025-msl.nc
    ├── ...
    └── 2025-v-1000.nc
```

**Objective:** Build `WSI`, `SSI`, `MSLP-AI`, and `OLR-AI` from the `u`, `v`, `msl`, and `ttr` datasets respective and generate their .nc files.

### **A. Westerly Wind Shear Index (`WSI`)**
The Westerly Wind Shear Index (`WSI`) and Southerly Wind Shear Index (`SSI`) measures the difference between the 1000 hPa (near-surface) and 200 hPa (upper atmosphere) of zonal winds and then standardized. Climatological means are derived from the period 1991-2020.

$$ WSI_i = \frac{WSI102_i-\overline{WSI102_i}}{WSI102_\sigma} $$

$$ WSI102_i = U1000_i - U200_i $$

The `WSI` for 1991-2020 will be computed using `wsi.py`, which will generate an aggregated `wsi.nc` file in the `Data/LSWMI_PrepFinal` folder.

In [None]:
# wsi.py

### **B. Southerly Wind Shear Index (`SSI`)**
The Southerly Wind Shear Index (`SSI`) measures the difference between the 1000 hPa (near-surface) and 200 hPa (upper atmosphere) of meridional winds and then standardized. Climatological means are derived from the period 1991-2020.

$$ SSI_i = \frac{SSI102_i-\overline{SSI102_i}}{SSI102_\sigma} $$

$$ SSI102_i = V1000_i - V200_i $$

The `SSI` for 1991-2020 will be computed using `ssi.py`, which will generate an aggregated `ssi.nc` file in the `Data/LSWMI_PrepFinal` folder.

In [None]:
# ssi.py

### **C. Mean Sea Level Pressure Anomaly Index (`MSLP-AI`)**
The Mean Sea Level Pressure Anomaly Index is computed from the climatological mean for 1991-2020 and then standardized for each pixel $ i $.

$$ MSLPAI_i = \frac{MSLP_i-\overline{MSLP_i}}{MSLP_\sigma} $$

The `MSLPAI` for 1991-2020 will be computed using `mslai.py`, which will generate an aggregated `mslai.nc` file in the `Data/LSWMI_PrepFinal` folder. For clarity's sake, while the documentation may label MSLP as `mslp`, the variable will be named `msl` instead of `mslp` for all MSLP-related computations.

In [None]:
# mslai.py

### **D. Outgoing Longwave Radiation Anomaly Index (`OLR-AI`)**
The Outgoing Longwave Radiation Anomaly Index is computed from the climatological mean for 1991-2020 and then standardized for each pixel $ i $.

$$ OLRAI_i = \frac{OLR_i-\overline{OLR_i}}{OLR_\sigma} $$

The `OLRAI` for 1991-2020 will be computed using `ttrai.py`, which will generate an aggregated `ttrai.nc` file in the `Data/LSWMI_PrepFinal` folder. For clarity's sake, while the documentation may label OLR as `olr`, the variable will be named `ttr` instead of `olr` for all OLR-related computations.

In [None]:
# ttrai.py

### _***Rainfall Anomaly Index**_

## **4. Building LSWMI**
The Local Southwest Monsoon Index integrates the four indices computed in the previous section and mapped them into rainfall through multiple linear regression – linear and quadratic (MLR). The estimated rainfall $ LSWMI_i $ is mapped per pixel, and each pixel has its own MLR model. Furthermore, every pixel has a unique intercept $ \beta_i $ and weights $ \beta_{Wi}(WSI) $, $ \beta_{Si}(SSI) $, $ \beta_{Mi}(MSLPAI) $, and $ \beta_{Oi}(OLRAI) $.

$$ LSWMI_i = \beta_i + \beta_{Wi}(WSI) + \beta_{Si}(SSI) + \beta_{Mi}(MSLPAI) + \beta_{Oi}(OLRAI) $$

For this study, the index will be derived using the `ln-esw` method from the technical workflow, wherein the climatology means during the extended southwest monsoon period (May to September) are used. This method is selected based on the data available and prepared for this study.

**Given:** The indices are aggregated into singular .nc files as structured below:

```
└── LSWMI_PrepFinal/
    ├── mslai.nc
    ├── rai.nc
    ├── ssi.nc
    ├── ttrai.nc
    └── wsi.nc
```

**Objective:** Build `LSWMI` using `WSI`, `SSI`, `MSLP-AI`, and `OLR-AIR` datasets and generate .nc files for 1979-2025.

### **A. Derive the LSWMI**
`lswmi.py` will compute for $ LSWMI_i $ as described in the section above. Generated .nc files will be put in the `Data/LSWMI` folder.

In [None]:
# lswmi.py

### **B. Performance**
The performance of the LSWMI will be assessed by comparing standardized pentad rainfall datasets from all 11 PAGASA stations to their equivalent point-interpolated pentad values from 1991 to 2020. From these data points, statistical metrics (R, RMSD, and SD) will be computed. This analysis will also be performed to each individual PAGASA station included in this study. The limitation in evaluating the LSWMI is due to the availability of data.