# Preparing Indices

This notebook documents the processing of index files in preparation for evaluation the relationship between SWM and MJO/BSISO.

## Sources
| Source | MJO | BSISO | Period | Temporal Resolution | File Extension |
| :--- | :---: | :---: | :---: | :---: | :---: |
| [Bureau of Meteorology](https://www.bom.gov.au/climate/mjo/#tabs=Monitoring) | <span style="color:green">**Yes**</span> | No | 1974 – Present (Realtime) | Daily | .txt |
| [APEC Climate Center](https://apcc21.org/prediction/bsiso/moni?lang=en) | No | <span style="color:green">**Yes**</span> | 1981 – Present (Realtime) | Daily | .dat |
| [Bimodal ISO Index](iprc.soest.hawaii.edu/users/kazuyosh/Bimodal_ISO.html) | <span style="color:green">**Yes**</span> | <span style="color:green">**Yes**</span> | 1979 – 2020 | Daily | .txt |

## Pre-Processing
1. **Objective:** Convert .dat and .txt raw files into .csv files.
2. Save converted .csv files in the `02_processed` folder.
3. Index files will be converted to .csv files via Excel.
- Files:
  - BSISO_APEC.csv
  - BSISO_Kikuchi.csv
  - MJO_BoM.csv
  - MJO_Kikuchi.csv

## Processing
1. **Objective #1:** Trim data and standardize headers.
   - MJO: `year`, `month`, `day`, `nrm`, `phase`
   - BSISO: `year`, `month`, `day`, `nrm`, `phase`
2. **Objective #2:** Filter out days where the normalized amplitude (`amplitude` or `nrm`) > 1.
3. Save filtered .csv files in the `03_final` folder.

### ***Objective 1***
Trim data and standardize headers.
- MJO: `year`, `month`, `day`, `nrm`, `phase`
- BSISO: `year`, `month`, `day`, `nrm`, `phase`

Troublesome files: `BSISO_APEC.csv`

In [2]:
# Standardizing BSISO_APEC.csv
import pandas as pd
from pathlib import Path

ROOT = ROOT = Path("C:/Users/Nitro 5/Documents/MS/Thesis/GitHub/MS_Thesis_SWM")
infile = ROOT / "01_data" / "02_processed" / "BSISO_APEC.csv"
df = pd.read_csv(infile)

# Convert YEAR + DAY to date
df["DATE"] = pd.to_datetime(
    df["YEAR"].astype(str) + "-" + df["DAY"].astype(str),
    format="%Y-%j"
)

# Extract calendar month and day
df["MONTH"] = df["DATE"].dt.month
df["DAY_OF_MONTH"] = df["DATE"].dt.day

# Drop DATE column
df.drop(columns="DATE", inplace=True)

# Output
outdir = ROOT / "01_data" / "02_processed"
outdir.mkdir(parents=True, exist_ok=True)

outfile = outdir / "BSISO_APEC_with_month_day.csv"
df.to_csv(outfile, index=True)