In [1]:
%matplotlib inline

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import gaussian_kde, pearsonr, spearmanr
from scipy.spatial.distance import euclidean
from sklearn.cluster import DBSCAN

from plotting import plot_equatorial_pacific, get_dataframe

# Study of the Effects of ENSO on Tropical Depressions in the North Pacific Ocean

## ABSTRACT

The El Niño-Southern Oscillation (ENSO) is a critical driver of climate variability, with profound impacts on weather patterns across the globe, particularly in the North Pacific Ocean. This study investigates the influence of ENSO phases—El Niño, La Niña, and Neutral on the formation, frequency, intensity, and tracks of tropical depressions (TD) in the North Pacific Ocean. Utilizing historical sea surface temperature (SST) data and tropical depression records, the study employs statistical analysis and geospatial techniques to explore correlations between ENSO phases and tropical depression activity.

The analysis reveals significant variations in the frequency and intensity of tropical depressions across different ENSO phases, with El Niño years showing a higher frequency of intense tropical depressions in the Central and Eastern Pacific, while La Niña years are associated with a westward shift in tropical depression activity towards the Western Pacific. The study also examines the changes in SST anomalies during different ENSO phases and their relationship with tropical depression formation.

These findings provide a deeper understanding of the complex interactions between ENSO and tropical depression dynamics in the North Pacific, offering valuable insights for improving the prediction and management of tropical depressions in the context of ENSO-driven climate variability. The results have significant implications for forecasting, disaster preparedness, and climate adaptation strategies in regions affected by tropical depressions.

### Introduction

The El Niño–Southern Oscillation is a single climate phenomenon that periodically fluctuates between three phases: Neutral, La Niña or El Niño. La Niña and El Niño are opposite phases in the oscillation which are deemed to occur when specific ocean and atmospheric conditions are reached or exceeded.

**El Niño** is the phase where the ocean waters in the central and eastern Pacific become warmer than usual. This warming can cause significant changes in weather patterns, such as increased rainfall in some areas and droughts in others.

**La Niña** is the opposite phase, where the ocean waters in these regions become cooler than average. This cooling also disrupts typical weather patterns, often leading to different sets of extremes, like more intense storms or colder temperatures in certain areas.

**Neutral** refers to the periods between El Niño and La Niña when sea surface temperatures are closer to their average, and global weather patterns are generally more stable.

ENSO is one of the most important drivers of climate variability on Earth, affecting everything from rainfall and temperatures to the occurrence of natural disasters like floods, droughts, and storms. Information on the phases and temperature deviations can be found in the [NOAA](https://origin.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/ONI_v5.php) website.  Warm and cold periods are calculated based on a threshold of $\pm0.5^oC $ for the Oceanic Niño Index (ONI) in the area of $5^oN-5^oS, 170^oW-120^oW$ (red rectangle).
On average, the period starts July-August and peaks November-January. 

Below diagrams showing the **SST** (sea surface temperature) of the ocean during peak of each ENSO phase. The data is obtained from [NASA Earth Data](https://www.earthdata.nasa.gov/topics/ocean/ocean-temperature/sea-surface-temperature) website.

In [3]:
plot_equatorial_pacific(path='data/csv_ready/elnino_2015.csv', cond_name='El Niño Period', vmin=20, vmax=35)

In [4]:
plot_equatorial_pacific('data/csv_ready/neutral_2012.csv', 'Neutral Period', vmin=20, vmax=35)

In [5]:
plot_equatorial_pacific('data/csv_ready/lanina_2010.csv', 'La Niña Period', vmin=20, vmax=35)

In [6]:
la_nina = get_dataframe('data/csv_ready/lanina_2010.csv')
neutral = get_dataframe('data/csv_ready/neutral_2012.csv')
el_nino = get_dataframe('data/csv_ready/elnino_2015.csv')

dataframes = [el_nino, neutral, la_nina]
colors = ['r', 'g', 'b']
x = ['October', 'November', 'December', 'January']
labels = ['El Niño', 'Neutral', 'La Niña']
for i, df in enumerate(dataframes):
    sst = df.groupby(df.index).sst.mean()
    plt.plot(x, sst, c=colors[i], label=labels[i])

plt.ylim(26, 29)

plt.legend()
plt.xlabel('Months')
plt.ylabel('SST in deg C')
plt.title('Difference in the SST between El Niño, La Niña and Neutral Periods')
plt.show()

The diagram above compares the mean temperature of the **El Niño Region**, shown as a red rectangle in the heatmaps. It is a visual representation of the mean temperature between different phases. The difference between the SST during El Niño/La Niña and Neutral phases is the **temperature anomaly**.

### Tropical depressions (TD)

This project focuses on understanding how ENSO affects TDs in the North Pacific Ocean. TDs are low-pressure weather systems that can develop into more severe storms, such as tropical storms or hurricanes. By analyzing historical data on sea surface temperatures (SSTs) and TDs, this study aims to uncover patterns and relationships between the different phases of ENSO and the behavior of TDs in the Eastern, Central, and Western Pacific regions. Understanding these connections can help improve forecasts and prepare for the impacts of these powerful weather systems.

The season of the TDs in **Eastern and Central North Pacific** officially begins in mid-May (Eastern Pacific) or June 1st (Central Pacific) and ends on November 30th, with peak activity typically occurring from August to September. In **Western North Pacific**, unlike the Eastern and Central Pacific, has TDs occurring throughout the year. However, the most active period is from July to October, with a significant concentration of storms.

ENSO peaks in December and the conditions typically take several months to influence global weather patterns fully. Then the primary effects on tropical storms and depressions in the North Pacific would generally be observed during the summer and fall of the year to follow. Basis on the above assumption, the pahse of ENSO will be applied to the years from 1950 till now. The value of the ENSO column of the table below will reflect the ENSO year phase. 1 - El Niño influenced year, 0 -Neutral year, -1 - La Niña influenced year. The phases are depending on whether ONI is greater than 0.5, less than -0.5, or between -0.5 and 0.5.

### Data



### Establishing Hypotheses
**Hypothesis 1**: ENSO and Frequency of Tropical Depressions

- Null Hypothesis $H_0$: There is **no significant difference** in the frequency of tropical depressions across different ENSO phases (El Niño, La Niña, Neutral).
- Alternative Hypothesis $H_1$: There **is a significant difference** in the frequency of tropical depressions across different ENSO phases.

**Hypothesis 2**: ENSO and Intensity of Tropical Depressions

- Null Hypothesis $H_0$: ENSO phases **do not** significantly affect the intensity of tropical depressions (measured by metrics such as maximum wind speed or minimum central pressure).
- Alternative Hypothesis $H_1$: ENSO phases significantly affect the intensity of tropical depressions.

**Hypothesis 3**: ENSO and Tropical Depressions tracks

- Null Hypothesis $H_0$: ENSO phases **do not** significantly affect the track of tropical depressions.
- Alternative Hypothesis $H_1$: ENSO phases significantly affect the track of tropical depressions.

The hypotheses will be applied for TDs in NW, Central and NE Pacific Ocean

### Exploratory Data Analysis (EDA)
In the context of studying the effects of ENSO on TDs in the North Pacific Ocean, EDA will involve temporal, geospatial, and intensity analysis.

#### Temporal Analysis
This analysis will be used to understand how the frequency of TDs varies over time and how this is influenced by different ENSO phases. Additionally, this analysis will explore how sea surface temperature (SST) anomalies correlate with these variations. We start by calculating the frequency of a TD per calendar year. Frequency values are calculated by counting the number of occurrences of tropical depressions within a specific time frame, such as annually. It will help to quantify how active a particular year or ENSO phase was in terms of tropical depression formation.

In [7]:
oni_table = pd.read_csv('data/csv_ready/oni_table.csv', index_col=0)
oni_table.index = pd.to_datetime(oni_table.index)
enso_phase = oni_table.groupby(oni_table.index.year)['enso'].apply(lambda x: x.unique()[0])

jma = pd.read_csv('data/csv_ready/jma_td.csv', index_col=0)
jma.index = pd.to_datetime(jma.index)
frequency_jma = jma.groupby(jma.index.year)['name'].nunique()
frequency_jma = pd.merge(frequency_jma, enso_phase, on='date')
frequency_jma.columns = ['frequency', 'enso']

nhc = pd.read_csv('data/csv_ready/ne_pacific_td.csv', index_col=0)
nhc.index = pd.to_datetime(nhc.index)
frequency_nhc = nhc.groupby(nhc.index.year)['name'].nunique()
frequency_nhc = pd.merge(frequency_nhc,enso_phase, on='date')
frequency_nhc.columns = ['frequency', 'enso']

nhc_cp = nhc[nhc.basin == 'CP']
frequency_nhc_cp = nhc_cp.groupby(nhc_cp.index.year)['name'].nunique()
frequency_nhc_cp = pd.merge(frequency_nhc_cp,enso_phase, on='date')
frequency_nhc_cp.columns = ['frequency', 'enso']

nhc_ep = nhc[nhc.basin == 'EP']
frequency_nhc_ep = nhc_ep.groupby(nhc_ep.index.year)['name'].nunique()
frequency_nhc_ep = pd.merge(frequency_nhc_ep,enso_phase, on='date')
frequency_nhc_ep.columns = ['frequency', 'enso']

In [8]:
colors = {-1: 'blue', 0: 'green', 1: 'red'}
labels = ['La Niña', 'Neutral', 'El Niño']

frequency_tables = [
    ('NW', frequency_jma), 
    ('Central', frequency_nhc_cp), 
    ('NE', frequency_nhc_ep)]

fig, axs = plt.subplots(1, 3, figsize=(30, 8))

for i, table in enumerate(frequency_tables):
    for year, freq, enso in zip(table[1].index, table[1].frequency, table[1].enso):
        axs[i].bar(year, freq, color=colors[enso])

    handles = [plt.Rectangle((0,0),1,1, color=colors[enso]) for enso in colors]
    axs[i].legend(handles, labels, title='ENSO Phase', loc='upper left')
    axs[i].set_title(f'Frequency of Tropical Depressions, originating in the {table[0]} Pacific, by Year')
    axs[i].set_xlabel('Year')
    axs[i].set_ylabel('Number of Tropical Depressions')
    axs[i].set_xlim(1950, 2024)
    axs[i].set_ylim(0, 37)

plt.show()

In [9]:
def add_value_labels(ax, bars):
    for bar in bars:
        height = bar.get_height()
        ax.annotate(f'{height:.2f}',  # Display the height value with 2 decimal places
                    xy=(bar.get_x() + bar.get_width() / 2, height),
                    xytext=(0, 3),  # Offset the text slightly above the bar
                    textcoords="offset points",
                    ha='center', va='bottom')

means_jma = [x for x in frequency_jma.groupby(['enso']).frequency.mean()]
means_nhc = [x for x in frequency_nhc.groupby(['enso']).frequency.mean()]
x = np.arange(len(labels))
width = 0.20

fig, ax = plt.subplots(figsize=(10, 8))

bars1 = ax.bar(x - width/2, means_jma, width, label='NW Pacific (JMA)', color='blue')
bars2 = ax.bar(x + width/2, means_nhc, width, label='Central/NE Pacific (NHC)', color='orange')

ax.set_xlabel('ENSO Phase')
ax.set_ylabel('Mean Frequency of Tropical Depressions')
ax.set_title('Mean Frequency of Tropical Depressions by ENSO Phase')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

add_value_labels(ax, bars1)
add_value_labels(ax, bars2)

plt.show()

**Following conlcusions can be drawn from the diagrams:**
1. *Frequency of Tropical Depressions in the Northwest Pacific (JMA Data)*:
**Neutral Influence**: The highest average frequency of tropical depressions is observed during Neutral years (26.59 TDs per year). The slightly higher frequency during Neutral years suggests that in the NW Pacific, tropical depression formation might be more consistent and less affected by extreme ENSO phases.
**La Niña Influence**: During La Niña years, the frequency is slightly lower at 24.65 TDs per year. La Niña years tend to slightly reduce the frequency of tropical depressions compared to Neutral years but are still more active than El Niño years.
**El Niño Influence**: The lowest frequency is observed during El Niño years, with 23.84 TDs per year. The reduced frequency during El Niño years aligns with the general understanding that El Niño conditions can shift tropical cyclone activity eastward, reducing activity in the NW Pacific.

2. *Frequency of Tropical Depressions in the Central and Northeast Pacific (NHC Data)*:
**La Niña Influence**: The highest average frequency is observed during La Niña years (15.33 TDs per year). The higher frequency during La Niña years suggests that La Niña conditions may enhance tropical depression formation in the Central and NE Pacific regions.
**Neutral Influence**: The frequency during Neutral years is slightly lower (14.77 TDs per year).
**El Niño Influence**: The lowest frequency is again observed during El Niño years (14.16 TDs per year). The reduced frequency during El Niño years is consistent with the general shift of tropical depression activity towards the Central and Eastern Pacific but might indicate that conditions during El Niño years are not as conducive for tropical depressions in these specific regions compared to La Niña years.

#### Intensity Analysis
The goal of the intensity analysis is to determine how the intensity of tropical depressions varies across different ENSO phases. The intensity can be measured by various metrics, such as maximum wind speed or minimum central pressure. By analyzing these metrics, we assess whether certain ENSO phases are associated with stronger or weaker tropical depressions.

In [10]:
# Grouping by year and finding max wind and min press
jma_max_wind = jma.groupby([jma.index.year]).max_wind_kn.max()
nhc_max_wind = nhc.groupby([nhc.index.year]).max_wind_kn.max()

jma_min_press = jma.groupby([jma.index.year]).min_pressure_mBar.min()
nhc_min_press = nhc.groupby([nhc.index.year]).min_pressure_mBar.min()

jma_max_wind = pd.merge(jma_max_wind, enso_phase, on='date')
jma_max_wind.max_wind_kn = jma_max_wind.max_wind_kn.apply(lambda x: None if x == 0 else x)
jma_max_wind = jma_max_wind.dropna()
nhc_max_wind = pd.merge(nhc_max_wind, enso_phase, on='date')

jma_min_press = pd.merge(jma_min_press, enso_phase, on='date')
nhc_min_press = pd.merge(nhc_min_press, enso_phase, on='date')
nhc_min_press.min_pressure_mBar = nhc_min_press.min_pressure_mBar.apply(lambda x: None if x < 0 else x)
nhc_min_press = nhc_min_press.dropna()

In [11]:
# statistics
def extract_stats(df):
    df = df.groupby(['enso']).describe().T
    df = df.reset_index()
    df = df.drop(columns='level_0')
    df = df.pivot_table(columns='level_1')

    return df
    
array_of_data = [jma_max_wind, nhc_max_wind, jma_min_press, nhc_min_press]
jma_max_wind_stats = extract_stats(jma_max_wind)
nhc_max_wind_stats = extract_stats(nhc_max_wind)
jma_min_press_stats = extract_stats(jma_min_press)
nhc_min_press_stats = extract_stats(nhc_min_press)

In [12]:
# function to plot the boxplot and tables
ocean_area = ['NW', 'Central and NE']
def plot_boxplot_and_table(dataset_array, tables_array, col_name, title, ylim=None):
    
    fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(20, 12))
    
    for i, df in enumerate(dataset_array):
        axs[0][i].boxplot([df[df['enso'] == -1][col_name],
                         df[df['enso'] == 0][col_name],
                         df[df['enso'] == 1][col_name]],
                         labels=labels)
        axs[0][i].set_title(f'{title} by ENSO Phase for {ocean_area[i]} Pacific')
        axs[0][i].set_xlabel('ENSO Phase')
        axs[0][i].set_ylabel(title)
        if not ylim is None:
            axs[0][i].set_ylim(ylim)
    
    for i, dt in enumerate(tables_array):
        axs[1][i].axis('tight')
        axs[1][i].axis('off')
        table = axs[1][i].table(cellText=dt.values.round(1),
                     colLabels=dt.columns,
                     rowLabels=labels,
                     cellLoc='center', loc='center')
    
    plt.tight_layout()
    plt.show()

In [13]:
# Boxplot for Maximum Wind Speed
dataset_boxplot = [jma_max_wind, nhc_max_wind]
dataset_table = [jma_max_wind_stats, nhc_max_wind_stats]

plot_boxplot_and_table(dataset_boxplot, dataset_table, 'max_wind_kn', 'Maximum Wind Speed', ylim=(70, 190))

**Following conlcusions can be drawn from the diagrams**:
1. *Mean Wind Speeds*:
**Neutral Years**: The NW Pacific has the highest mean wind speed during Neutral years (117.14 knots), while the NE and Central Pacific also show strong storms but with a slightly lower mean (115.00 knots). This suggests that during Neutral years, both regions experience relatively high-intensity storms, with the NW Pacific showing slightly stronger storms on average.
**La Niña Years**: The mean wind speed is higher in the NE and Central Pacific (117.41 knots) compared to the NW Pacific (108.82 knots). This suggests that La Niña conditions might be associated with stronger storms in the NE and Central Pacific than in the NW Pacific.
**El Niño Years**: The mean wind speed is slightly higher in the NW Pacific (113.13 knots) compared to the NE and Central Pacific (114.80 knots), but the difference is minimal. This indicates that El Niño influences storm intensities similarly across both regions.

2. *Variability (Standard Deviation)*:
**NE and Central Pacific**: There is significantly more variability in wind speeds during all ENSO phases in the NE and Central Pacific compared to the NW Pacific. The standard deviation is highest during El Niño years (28.59 knots) and lowest during La Niña years (22.25 knots). This suggests that while the mean wind speeds are similar, the NE and Central Pacific region experiences a wider range of storm intensities.
**NW Pacific**: The NW Pacific shows lower variability in wind speeds, especially during El Niño years (6.80 knots), indicating more consistent storm intensities during these phases.

3. *Range of Wind Speeds (Min and Max)*:
**NE and Central Pacific**: The range of wind speeds is much broader, particularly during El Niño years, where the maximum wind speed reaches 185.0 knots. This suggests the presence of exceptionally strong storms during El Niño years in the NE and Central Pacific.
**NW Pacific**: The wind speed range is narrower, with a maximum of 140.0 knots during Neutral years. This indicates that the NW Pacific experiences strong storms, but not as extreme as those in the NE and Central Pacific during the same period.

4. *Percentiles (25%, 50%, 75%)*:
**NE and Central Pacific**: The wider interquartile ranges (IQR) reflect the greater variability in storm intensities across all ENSO phases. For example, the IQR during El Niño years spans from 90.0 to 130.0 knots, indicating that while some storms are moderate, others are exceptionally strong.
**NW Pacific**: The percentiles are more tightly clustered, especially during El Niño years, where the IQR spans from 110.0 to 120.0 knots. This reflects more consistent storm intensities in the NW Pacific.

**Conclusion**:

**Regional Differences:** The analysis reveals that the NE and Central Pacific experiences more variability in storm intensities across all ENSO phases, with some extremely strong storms, particularly during El Niño years. In contrast, the NW Pacific sees slightly higher mean intensities during Neutral years but generally experiences more consistent and less variable storm intensities.

**ENSO Phase Influence:** Both regions exhibit strong storms during Neutral and La Niña years, but the impact of El Niño appears more pronounced in the NE and Central Pacific, where the range and variability of storm intensities are much greater.

In [14]:
# Boxplot for Minimum Central Pressure
dataset_boxplot = [jma_min_press, nhc_min_press]
dataset_table = [jma_min_press_stats, nhc_min_press_stats]

plot_boxplot_and_table(dataset_boxplot, dataset_table, 'min_pressure_mBar', 'Minimum Central Pressure (mBar)', ylim=(865, 975))

**Following conlcusions can be drawn from the diagrams**:
1.*Mean Minimum Central Pressure:*
**NW Pacific**:
During all ENSO phases, the mean minimum central pressure is lower (indicating stronger storms) in the NW Pacific compared to the NE and Central Pacific.
**La Niña**: The mean is 906.00 mbar, indicating that La Niña years in the NW Pacific are associated with lower central pressure and potentially more intense storms.
**Neutral**: The mean is 899.55 mbar, which is even lower than La Niña years, suggesting slightly more intense storms during Neutral years.
**El Niño**: The lowest mean pressure is observed during El Niño years (896.28 mbar), suggesting that these years might see the most intense storms in terms of central pressure.
NE and Central Pacific:
**La Niña**: The mean is 930.64 mbar, which is significantly higher than in the NW Pacific, suggesting less intense storms during La Niña years in the NE and Central Pacific.
**Neutral**: The mean is 936.00 mbar, indicating relatively weak storms during Neutral years.
**El Niño**: The mean is 932.57 mbar, slightly lower than during La Niña years, but still significantly higher than in the NW Pacific, indicating less intense storms overall.

2. *Variability (Standard Deviation)*:
**NW Pacific**:
The standard deviation is relatively consistent across ENSO phases, with the lowest during Neutral years (11.33 mbar) and the highest during La Niña years (14.78 mbar). This indicates that the NW Pacific has relatively stable storm intensities with moderate variability.
**NE and Central Pacific**:
The standard deviation is highest during El Niño years (30.74 mbar), suggesting greater variability in storm intensities during these years. This could indicate the presence of both very strong and relatively weak storms during El Niño years.
Lower variability during La Niña and Neutral years suggests more consistent storm intensities, but with generally higher minimum central pressures, indicating less intense storms overall.

3. *Range of Pressures (Min and Max)*:
**NW Pacific**:
The range of minimum central pressures is lower during El Niño years (875.0 to 920.0 mbar) compared to other phases, indicating that storms tend to be more intense during these years.
The highest maximum pressure during La Niña years (930.0 mbar) suggests less intense storms, though still more intense than those in the NE and Central Pacific.
**NE and Central Pacific**:
The pressure range is broader during El Niño years (872.0 to 969.0 mbar), suggesting that this region experiences both very strong and relatively weak storms during these years.
During Neutral and La Niña years, the minimum pressures are higher (918.0 mbar and 900.0 mbar, respectively), indicating generally weaker storms compared to the NW Pacific.

4. *Percentiles (25%, 50%, 75%)*:
**NW Pacific**:
The interquartile range (IQR) is narrower during Neutral years, with the 50th percentile (median) at 900.0 mbar, indicating that the central 50% of storms are fairly consistent in intensity.
During El Niño years, the 25th percentile is the lowest (885.0 mbar), suggesting that a significant portion of storms during these years are particularly intense.
**NE and Central Pacific**:
The IQR is broader during El Niño years, with a median of 940.0 mbar, reflecting greater variability in storm intensities. The lower 25th percentile (925.5 mbar) still indicates less intense storms compared to the NW Pacific.

**Conclusion**:

**Regional Differences**: The NW Pacific tends to experience more intense tropical depressions across all ENSO phases compared to the NE and Central Pacific. This is reflected in the generally lower mean and median central pressures in the NW Pacific, particularly during El Niño years.
ENSO Influence:
**NW Pacific**: The lowest central pressures (indicating stronger storms) are observed during El Niño years, followed by Neutral years, and the least intense storms during La Niña years.
**NE and Central Pacific**: Storms are generally less intense, with higher variability during El Niño years and more consistent, though weaker, storms during La Niña and Neutral years.
**Implications**: These findings suggest that the NW Pacific is more prone to intense tropical depressions during El Niño years, while the NE and Central Pacific experience a wider range of storm intensities during the same period. This variability in storm intensity across regions and ENSO phases has important implications for forecasting, disaster preparedness, and understanding the broader climatic impacts of ENSO.

#### Geospatial Analysis

The goal of geospatial analysis is to examine the spatial distribution and movement of tropical depressions across different ENSO phases in the Pacific Ocean. This analysis helps identify patterns in where and how tropical depressions form and move, depending on the phase of the ENSO cycle.

In [15]:
enso_phase_dt = enso_phase.copy()
enso_phase_dt.index = pd.to_datetime(enso_phase.index.astype(str))

merged = pd.merge(jma, enso_phase_dt, left_on=jma.index.year, right_on=enso_phase_dt.index.year, how='left')
merged = merged.set_index(jma.index)
jma_enso = merged.drop(columns='key_0')

merged = pd.merge(nhc, enso_phase_dt, left_on=nhc.index.year, right_on=enso_phase_dt.index.year, how='left')
merged = merged.set_index(nhc.index)
nhc_enso = merged.drop(columns='key_0')

gdf = pd.read_csv('data/csv_ready/gdf_pacific.csv')

In [16]:
cmaps = {-1: 'Blues', 0: 'Greens', 1: 'Reds'}
phases = {-1: 'La Niña', 0: 'Neutral', 1: 'El Niño'}

def plot_td_density():
    """ the function plots the density of TDs for different phases of ENSO accross North PAcific Ocean """
    datasets = [jma_enso, nhc_enso]
    fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(20 * 3, 8))

    for idx, i in enumerate(range(-1, 2, 1)):
        for df in datasets:
            subset = df[df.enso == i]  
            sns.kdeplot(x=subset['lon'], y=subset['lat'], fill=True, cmap=cmaps[i], bw_adjust=0.8, ax=axs[idx])

        axs[idx].scatter(gdf.lon, gdf.lat, s=0.5, color='black')
        
        x_tick = np.arange(100, 300, 10)
        x_label = [f'{x}°E' if x <= 180 else f'{360 - x}°W' for x in x_tick]
        y_tick = np.arange(-20, 70, 10)
        y_label = [f'{np.abs(x)}°S' if x < 0 else f'{np.abs(x)}°N' for x in y_tick]
        axs[idx].set_xticks(ticks=x_tick)
        axs[idx].set_xticklabels(labels=x_label)
        axs[idx].set_yticks(ticks=y_tick)
        axs[idx].set_yticklabels(labels=y_label)
        
        axs[idx].set_title(f'Density of Tropical Depressions for {phases[i]} phase')
        axs[idx].set_xlabel('Longitude')
        axs[idx].set_ylabel('Latitude')
        
    plt.tight_layout()
    plt.show()

def compute_kde_values(df, phase, gridsize=100):
    """ computes KDE values for each phase and region (NW Pacific and NE/Central Pacific)"""
    subset = df[df.enso == phase]
    kde = gaussian_kde(np.vstack([subset['lon'], subset['lat']]))
    lon_min, lon_max = 100, 300
    lat_min, lat_max = -20, 70
    lon_grid, lat_grid = np.linspace(lon_min, lon_max, gridsize), np.linspace(lat_min, lat_max, gridsize)
    lon_grid, lat_grid = np.meshgrid(lon_grid, lat_grid)
    kde_values = kde(np.vstack([lon_grid.ravel(), lat_grid.ravel()])).reshape(gridsize, gridsize)
    return lon_grid, lat_grid, kde_values

def plot_td_density_difference():
    """ the function plots the density difference of TDs for different phases of ENSO accross North PAcific Ocean """
    kde_values_jma = {phase: compute_kde_values(jma_enso, phase)[2] for phase in range(-1, 2)}
    kde_values_nhc = {phase: compute_kde_values(nhc_enso, phase)[2] for phase in range(-1, 2)}

    # Subtract KDE values to create difference maps for each region
    diff_maps_jma = {
        'El Niño - La Niña (NW Pacific)': kde_values_jma[1] - kde_values_jma[-1],
        'Neutral - La Niña (NW Pacific)': kde_values_jma[0] - kde_values_jma[-1],
        'Neutral - El Niño (NW Pacific)': kde_values_jma[0] - kde_values_jma[1]
    }
    
    diff_maps_nhc = {
        'El Niño - La Niña (NE/Central Pacific)': kde_values_nhc[1] - kde_values_nhc[-1],
        'Neutral - La Niña (NE/Central Pacific)': kde_values_nhc[0] - kde_values_nhc[-1],
        'Neutral - El Niño (NE/Central Pacific)': kde_values_nhc[0] - kde_values_nhc[1]
    }

    fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(20, 12))
    
    for i, diff_dt in enumerate([diff_maps_jma, diff_maps_nhc]):
        for idx, (title, diff_map) in enumerate(diff_dt.items()):
            im = axs[i, idx].imshow(diff_map, extent=[100, 300, -20, 70], origin='lower', cmap='coolwarm')
            axs[i, idx].set_title(title)
            axs[i, idx].scatter(gdf.lon, gdf.lat, s=0.5, color='black')
            
            x_tick = np.arange(100, 300, 25)
            x_label = [f'{x}°E' if x <= 180 else f'{360 - x}°W' for x in x_tick]
            y_tick = np.arange(-20, 70, 20)
            y_label = [f'{np.abs(x)}°S' if x < 0 else f'{np.abs(x)}°N' for x in y_tick]
            axs[i, idx].set_xticks(ticks=x_tick)
            axs[i, idx].set_xticklabels(labels=x_label)
            axs[i, idx].set_yticks(ticks=y_tick)
            axs[i, idx].set_yticklabels(labels=y_label)
            
            axs[i, idx].set_xlabel('Longitude')
            axs[i, idx].set_ylabel('Latitude')
    
    plt.tight_layout()
    plt.show()

In [17]:
plot_td_density()

Density plots help identify areas where tropical depressions are more likely to form or intensify. These areas of higher density represent regions that consistently experience more tropical activity. They allow for direct comparison between phases. It is clear how the spatial distribution of TDs shifts under different climatic conditions.

The color intensity in the density plot corresponds to the density of tropical depressions in that area. **Darker color** indicate regions where TDs are more concentrated. These areas are "hotspots" where tropical depressions are more likely to form or pass through during that particular phase.
**Lighter color** represent areas with lower density, indicating regions where TDs are less frequent.

The **contour lines** represent areas of equal density. The closer the contour lines are, the steeper the change in density, indicating a rapid increase or decrease in the number of TDs in that area.

In [18]:
plot_td_density_difference()

The plot above represent the difference in the spatial density of tropical depressions (TDs) between the different ENSO phases. These difference provide a visual representation of how the spatial distribution of tropical depressions shifts between different ENSO phases, highlighting regions where the density increases or decreases.

**Red Areas:** Indicate regions where the density of tropical depressions is higher during the first phase of comparison (e.g., El Niño) compared to the second phase (e.g., La Niña). For instance, in the plot labeled "El Niño - La Niña (NW Pacific)", red areas show where tropical depressions are more common during El Niño years than La Niña years.

**Blue Areas:** Indicate regions where the density is lower during the first phase compared to the second. For example, blue areas in the "El Niño - La Niña (NW Pacific)" plot show where tropical depressions are more frequent during La Niña years than El Niño years.

**White/Neutral Areas:** Regions with little to no color (white or light gray) suggest that there is no significant difference in the density of tropical depressions between the compared phases.

**Dominance Interpretation:** The overall color gives a quick visual cue as to which ENSO phase is more influential in driving tropical depression activity in the region.

#### Interpretation for the NW Pacific:

**El Niño Period:**

*Observation:* The density difference plot shows higher density of tropical depressions (TDs) to the east and west of the Philippines during El Niño compared to La Niña.

*Interpretation:* This suggests that during El Niño years, tropical depressions tend to remain more concentrated in the western Pacific, particularly around the Philippines, rather than quickly turning northeast and transitioning into extratropical systems. The tendency for TDs to linger in this region could indicate that the steering flow patterns during El Niño are less conducive to an early northeastward turn, potentially leading to prolonged impacts in the western Pacific, including increased storm activity and potential landfall in the Philippines and nearby regions.

**La Niña Period:**

*Observation:* The density difference plot indicates a higher concentration of TDs near Japan during La Niña compared to El Niño.

*Interpretation:* This pattern suggests that during La Niña years, tropical depressions are more likely to take a northeasterly track, moving towards Japan and becoming extratropical systems. The shift in TD concentration toward Japan implies that the atmospheric conditions during La Niña favor an earlier and more consistent northeastward movement of storms, increasing the likelihood of storm impacts in Japan and surrounding areas.

**Neutral Phase:**

*Observation:* The main difference in TD concentration during the Neutral phase is observed around the Marshall Islands.

*Interpretation:* This indicates that during Neutral years, the spatial distribution of TDs shifts somewhat, with a notable increase in activity around the Marshall Islands. This could reflect a more balanced atmospheric circulation pattern during Neutral phases, where the conditions neither strongly favor the patterns observed during El Niño nor La Niña. The increased activity around the Marshall Islands might suggest a broader range of storm tracks, with some TDs originating or intensifying in this region before potentially moving towards either the western Pacific or northeastward.

#### Interpretation for the Central and NE Pacific:

**El Niño Period:**

*Observation:* The density difference plot shows a higher density of tropical depressions (TDs) in the region around 130°W, extending westward toward Hawaii.

*Interpretation:* This suggests that during El Niño years, tropical depressions are more likely to follow a westward track, increasing the likelihood of impacting the Hawaiian Islands compared to La Niña years. The altered atmospheric and oceanic conditions during El Niño may lead to a more westward steering flow, which pushes TDs towards the central Pacific, making landfall on Hawaii more probable.


**La Niña Period:**

*Observation:* The density difference plot indicates a higher concentration of TDs near the coast of Mexico.

*Interpretation:* This pattern suggests that during La Niña years, tropical depressions are more likely to develop and intensify near Mexico, with a greater chance of making landfall there compared to El Niño years. La Niña conditions typically favor stronger easterly trade winds and a more conducive environment for tropical cyclone formation near the eastern Pacific, leading to more frequent landfalls in Mexico.

**Neutral Phase:**

*Observation:* The Neutral phase, when compared to both El Niño and La Niña, shows that tropical depression activity tends to be more evenly distributed across the Pacific, rather than being concentrated in specific regions. However, there are still distinct areas where El Niño and La Niña exert stronger influences.

*Interpretation compared to El Niño:* This suggests that during Neutral years, there is a higher density of tropical depressions (TDs) in the broader Pacific region compared to El Niño years. However, during El Niño, the density of TDs is particularly concentrated southeast of California and around Hawaii. The strong blue southeast of California indicates that during El Niño, more tropical depressions tend to form or pass through this area, while the light blue around Hawaii suggests a similar, but less pronounced, tendency for TDs to impact the region during El Niño. In contrast, during Neutral years, the overall activity is more spread out across the Pacific, but with less concentration in these specific regions.

*Interpretation  compared to La Niña:* Neutral years again show more evenly spread activity, while La Niña creates a strong concentration of TDs off the coast of Mexico, indicating a marked difference in behavior between these phases.

In [19]:
def run_dbscan(df, eps=1.0, min_samples=5):
    coords = df[['lon', 'lat']].values
    db = DBSCAN(eps=eps, min_samples=min_samples).fit(coords)
    labels = db.labels_
    unique_labels = set(labels)
    clusters = [coords[labels == label] for label in unique_labels if label != -1]  # Exclude noise (-1)
    return clusters, labels

def compute_cluster_metrics(clusters):
    num_clusters = len(clusters)
    cluster_sizes = [len(cluster) for cluster in clusters]
    average_size = np.mean(cluster_sizes)
    centroids = [np.mean(cluster, axis=0) for cluster in clusters]
    return num_clusters, average_size, centroids

def compare_clusters(centroids1, centroids2):
    distances = []
    for c1 in centroids1:
        for c2 in centroids2:
            distances.append(euclidean(c1, c2))
    return np.mean(distances), np.median(distances), np.min(distances), np.max(distances)

def compare_enso_phases_dbscan(df, eps=1.0, min_samples=5):
    phases = {
        'La Niña': df[df.enso == -1],
        'El Niño': df[df.enso == 1],
        'Neutral': df[df.enso == 0]
    }

    results = {}

    # Compute DBSCAN metrics for each phase
    for phase, df in phases.items():
        clusters, _ = run_dbscan(df, eps=eps, min_samples=min_samples)
        metrics = compute_cluster_metrics(clusters)
        results[phase] = metrics

    # Compare clusters between phases
    comparisons = {}
    phases_list = list(phases.keys())
    for i in range(len(phases_list)):
        for j in range(i + 1, len(phases_list)):
            phase1, phase2 = phases_list[i], phases_list[j]
            mean_dist, median_dist, min_dist, max_dist = compare_clusters(results[phase1][2], results[phase2][2])
            comparisons[f'{phase1} vs {phase2}'] = {
                'Mean Distance': mean_dist,
                'Median Distance': median_dist,
                'Min Distance': min_dist,
                'Max Distance': max_dist
            }

    return results, comparisons

In [20]:
def plot_clusters(results, comparisons, region):
    """Plots the number of clusters with average cluster size for each ENSO phase."""
    
    phases = list(results.keys())
    num_clusters = [results[phase][0] for phase in phases]
    avg_cluster_sizes = [results[phase][1] for phase in phases]

    fig, ax1 = plt.subplots(figsize=(10, 6))

    # Bar chart for the number of clusters
    bars = ax1.bar(phases, num_clusters, color='skyblue', alpha=0.7, label='Number of Clusters')
    ax1.set_xlabel('ENSO Phases')
    ax1.set_ylabel('Number of Clusters', color='skyblue')
    ax1.tick_params(axis='y', labelcolor='skyblue')
    add_value_labels(ax1, bars)
    
    # Line chart for the average cluster size
    ax2 = ax1.twinx()
    ax2.scatter(phases, avg_cluster_sizes, color='orange', marker='o', linestyle='-', label='Average Cluster Size')
    ax2.set_ylabel('Average Cluster Size', color='orange')
    ax2.tick_params(axis='y', labelcolor='orange')

    for i, value in enumerate(avg_cluster_sizes):
        ax2.annotate(f'{value:.2f}', 
                    xy=(phases[i], value), 
                    xytext=(8, 0), 
                    textcoords="offset points",
                    ha='left', va='center', 
                    color='orange')
    
    # Title and layout
    fig.suptitle(f'Number of Clusters and Average Cluster Size by ENSO Phase in {region} Pacific')
    fig.tight_layout()
    plt.show()

def plot_centroid_distance(comparisons, region):
    """Plots the centroid distance comparison between ENSO phases individually."""
    
    labels = ['Min Distance', 'Mean Distance', 'Median Distance', 'Max Distance']
    
    # Create a subplot for each phase comparison
    fig, axs = plt.subplots(nrows=1, ncols=len(comparisons), figsize=(18, 6))
    
    for idx, (comparison, metrics) in enumerate(comparisons.items()):
        distances = [metrics['Min Distance'], metrics['Mean Distance'], 
                     metrics['Median Distance'], metrics['Max Distance']]
        
        bars = axs[idx].bar(labels, distances, color=['blue', 'green', 'red', 'orange'])
        axs[idx].set_title(f'Centroid Distance: {comparison}')
        axs[idx].set_ylabel('Distance (Degrees)')
        axs[idx].set_ylim(0, max(distances) * 1.2)
        add_value_labels(axs[idx], bars)
    
    fig.suptitle(f'Centroid Distance Comparisons Between ENSO Phases in {region} Pacific')
    plt.tight_layout()
    plt.show()

def plot_centroid_model(results, comparisons, region):
    """Combines the cluster and centroid distance plots into one function."""
    
    plot_clusters(results, comparisons, region)
    plot_centroid_distance(comparisons, region)

In [21]:
def print_enso_comparison(results, comparisons):
    # Print cluster numbers and average sizes
    for phase, metrics in results.items():
        print(f"Number of {phase} Clusters: {metrics[0]}")
        print(f"Average {phase} Cluster Size: {metrics[1]:.2f}")
        print()

    # Print centroid distance comparisons
    print("Centroid Distance Comparisons:")
    for comparison, metrics in comparisons.items():
        print(f"{comparison}:")
        print(f"  Mean Distance: {metrics['Mean Distance']:.2f}")
        print(f"  Median Distance: {metrics['Median Distance']:.2f}")
        print(f"  Min Distance: {metrics['Min Distance']:.2f}")
        print(f"  Max Distance: {metrics['Max Distance']:.2f}")
        print()

In [22]:
results, comparisons = compare_enso_phases_dbscan(jma_enso)
plot_centroid_model(results, comparisons, 'NW')

# numerical representation of the results
# print_enso_comparison(results, comparisons)

#### Analysis and Explanation of the Figures for the NW Pacific

##### Cluster Analysis

**Interpretation on number of clusters:**

The El Niño phase has the highest number of clusters (36), indicating a more fragmented distribution of tropical depressions (TDs) across the NW Pacific during El Niño years. This fragmentation suggests that TDs are spread over a wider area, forming more distinct clusters.
The La Niña phase has the fewest clusters (18), suggesting that TDs tend to be more concentrated in fewer areas, likely due to stronger and more consistent steering currents or favorable conditions in specific regions.
The Neutral phase falls in between, with 28 clusters, indicating a distribution that is less concentrated than La Niña but not as widespread as El Niño.

**Interpretation of average cluster size:**

The La Niña phase has the largest average cluster size (1299.33), suggesting that when clusters do form during La Niña, they tend to be larger and possibly more intense. This could be due to more favorable and concentrated conditions for TD development in certain regions.
The El Niño phase has the smallest average cluster size (589.81), which aligns with the higher number of clusters. This suggests that while there are more clusters, they are generally smaller and less concentrated, possibly due to more variable conditions across the region.
The Neutral phase has a moderate average cluster size (838.07), indicating a balance between the concentration seen in La Niña and the fragmentation seen in El Niño.

##### Centroid Distance Comparisons:

**La Niña vs El Niño:**

The centroid distances between La Niña and El Niño clusters show a moderate mean (31.03) and median (31.96) distance, indicating that while there are some differences in where TDs tend to form during these phases, there is also significant overlap.
The small minimum distance (0.38) suggests that in some cases, TDs can form in very similar locations during both phases, while the maximum distance (62.24) indicates that TDs can also form in significantly different locations depending on the phase.

**La Niña vs Neutral:**

The centroid distances between La Niña and Neutral phases are slightly larger than between La Niña and El Niño, with a mean of 34.84 and a median of 36.31. This suggests that the spatial distribution of TDs during Neutral phases is somewhat different from La Niña, with TDs likely forming in more varied locations.
The maximum distance (92.36) is quite large, indicating that there can be substantial differences in where TDs form between these two phases, possibly reflecting different steering currents or atmospheric conditions.

**El Niño vs Neutral:**

The centroid distances between El Niño and Neutral phases are similar to those between La Niña and Neutral phases, with a mean of 35.03 and a median of 34.89. This suggests that the distribution of TDs during Neutral phases has significant overlap with El Niño, but also shows substantial differences in certain areas.
The maximum distance (90.99) indicates that in some cases, TDs can form in very different regions during these two phases, further emphasizing the variability in TD formation locations during Neutral years compared to El Niño.

**Overall Summary:**

*La Niña Phase:* TDs are more concentrated, forming fewer but larger clusters. The overlap with El Niño and Neutral phases is moderate, but there are significant areas where TDs form in distinct locations during La Niña compared to the other phases.

*El Niño Phase:* TDs are more widely dispersed, forming many smaller clusters. This phase shows more variability in where TDs form, with some overlap with both La Niña and Neutral phases, but also significant differences in certain areas.

*Neutral Phase:* The distribution of TDs during Neutral years is intermediate between La Niña and El Niño, showing both some concentration and some dispersion. The centroid distances indicate that while there is overlap with both La Niña and El Niño, there are also distinct regions where TDs form differently during Neutral years.

In [23]:
results, comparisons = compare_enso_phases_dbscan(nhc_enso)
plot_centroid_model(results, comparisons, 'Central and NE')

# numerical representation of the results
# print_enso_comparison(results, comparisons)

#### Analysis and Explanation of the Figures for the Central and NE Pacific

##### Cluster Analysis

**Interpretation on number of clusters:**

The Neutral phase has the highest number of clusters (30), suggesting that tropical depressions (TDs) are more widely distributed across the Central and NE Pacific during Neutral years. This could indicate more variable atmospheric conditions that allow TDs to form in a broader range of locations.
The El Niño phase has a moderate number of clusters (26), indicating that while TDs are still fairly widespread, there may be slightly more focused areas of formation compared to Neutral years.
The La Niña phase has the fewest clusters (21), suggesting that during La Niña years, TDs tend to form in more concentrated areas, possibly due to more stable and predictable atmospheric patterns that limit the regions where TDs can develop.

**Interpretation of average cluster size:**

The La Niña phase has the largest average cluster size (542.29), indicating that when clusters do form during La Niña, they tend to be larger and potentially more impactful. This could reflect more robust or sustained conditions that support the development of larger systems.
The El Niño phase has a smaller average cluster size (369.31), which suggests that while TDs are still forming, they are generally more dispersed, leading to smaller clusters.
The Neutral phase has the smallest average cluster size (289.37), which aligns with the higher number of clusters. This suggests that during Neutral years, TDs may be more scattered and less intense, resulting in a greater number of smaller clusters.

##### Centroid Distance Comparisons:

**La Niña vs El Niño:**

The centroid distances between La Niña and El Niño clusters show a moderate mean (31.07) and median (22.41) distance, indicating that while there are differences in the locations where TDs form during these phases, there is also some overlap.
The small minimum distance (1.03) suggests that in certain cases, TDs can form in very similar locations during both La Niña and El Niño phases. However, the large maximum distance (112.10) indicates that TDs can also form in significantly different regions depending on the phase, reflecting the different atmospheric and oceanic conditions associated with La Niña and El Niño.

**La Niña vs Neutral:**

The centroid distances between La Niña and Neutral phases are similar to those between La Niña and El Niño, with a mean of 30.77 and a median of 25.34. This suggests that while there are notable differences in where TDs tend to form, there is also some overlap, particularly in regions that are conducive to TD formation regardless of the ENSO phase.
The maximum distance (156.85) is the largest among all comparisons, indicating that during Neutral years, TDs can form in regions quite different from those during La Niña. This could reflect the more variable and less predictable nature of Neutral years, where the distribution of TDs is more spread out.

**El Niño vs Neutral**

The centroid distances between El Niño and Neutral phases are similar to the other comparisons, with a mean of 30.38 and a median of 23.31. This suggests that while El Niño and Neutral phases have differences in where TDs form, there is also some overlap, particularly in regions that are consistently favorable for TD formation.
The minimum distance (0.27) is very small, indicating that in some cases, TDs can form in almost the exact same locations during both phases. However, the maximum distance (146.83) suggests that there can also be significant shifts in where TDs form, likely reflecting the varying impacts of El Niño and Neutral conditions on atmospheric circulation patterns.

**Overall Summary:**

*La Niña Phase:* TDs during La Niña years tend to form in fewer, larger clusters, suggesting more concentrated areas of favorable conditions. The overlap with El Niño and Neutral phases is moderate, but there are also significant differences in where TDs form, reflecting the distinct atmospheric patterns associated with La Niña.

*El Niño Phase:* TDs during El Niño years are more dispersed, forming in a greater number of smaller clusters. The overlap with La Niña and Neutral phases suggests that while some regions remain favorable for TD formation across phases, El Niño conditions can also lead to shifts in where TDs are likely to develop.

*Neutral Phase:* The distribution of TDs during Neutral years is the most dispersed, with the highest number of clusters but the smallest average cluster size. This suggests that TDs are more spread out during Neutral years, with conditions that allow for TD formation across a broader range of locations. The centroid distances indicate that Neutral years can exhibit significant differences in TD formation locations compared to both La Niña and El Niño, likely due to the less predictable and more variable nature of Neutral conditions.

### Statistical Analysis


In [24]:
def custom_wind_mean(series):
    """ the function does not aggregate non-valid wind """
    filtered_series = series[series > 0]
    if len(filtered_series) > 0:
        return filtered_series.mean()
    else:
        return np.nan

def custom_pressure_mean(series):
    """ the function does not aggregate non-valid pressure """
    filtered_series = series[series > 0]
    if len(filtered_series) > 0:
        return filtered_series.mean()
    else:
        return np.nan

def combine_date_month(df):
    df['date'] = [f'{df.year[i]}-0{df.month[i]}' if len(str(df.month[i])) == 1 else f'{df.year[i]}-{df.month[i]}' for i in df.index]
    
    df.date = pd.to_datetime(df.date)
    df.index = pd.Index(df.date)

    df = df.drop(columns=['year', 'month', 'date'])
    return df

def prepare_statistical_dataframe(df1, corr_period):
    """ the function aggregates count/mena pressure and mean wind per month and merges with ONI monthly data """

    if corr_period == 'm':
        df2 = oni_table
        query = [df1.index.year, df1.index.month]
    else:
        df2 = oni_temp
        query = [df1.index.year]
    
    if 'category' in df1:
        counted = 'category'
    else:
        counted = 'type_of_depression'
    
    df1 = df1.groupby(query).agg({
        counted: 'count',
        'min_pressure_mBar': custom_pressure_mean,  
        'max_wind_kn': custom_wind_mean,
        'lat': 'mean',
        'lon': 'mean',
    }).rename(columns={
        counted: 'frequency', 
        'min_pressure_mBar': 'average_min_pressure', 
        'max_wind_kn': 'average_max_wind',
        'lat': 'average_lat',
        'lon': 'average_lon'
    })

    if corr_period == 'm':
        df1.index.names = ['year', 'month']
        df1 = df1.reset_index()
        df1 = combine_date_month(df1) 
       
    return pd.merge(df1, df2, on='date')

def perform_correlation(df, corr_period='m', corr_type='pearson'):
    
    df = prepare_statistical_dataframe(df, corr_period)
    
    correlation = {
        'pearson': pearsonr,
        'spearman': spearmanr
    }
    if corr_period == 'm':
        # shifting the anomaly to allign with the assumption that previous year anomally affects this year TD season
        df['sst_anomaly'] = df['sst_anomaly'].shift(freq=pd.DateOffset(months=6))
        
    df = df.dropna()

    return {
        'corr_type': corr_type,
        'corr_period': corr_period,
        'Frequency': correlation[corr_type](df['sst_anomaly'], df['frequency'])[0],
        'Pressure': correlation[corr_type](df['sst_anomaly'], df['average_min_pressure'])[0],
        'Wind': correlation[corr_type](df['sst_anomaly'], df['average_max_wind'])[0],
        'Latitude': correlation[corr_type](df['sst_anomaly'], df['average_lat'])[0],
        'Longitude': correlation[corr_type](df['sst_anomaly'], df['average_lon'])[0]
    }
    
    # return f'{corr_type[0].upper() + corr_type[1:]} Correlation\nFrequency: {frequency}\nPressure: {pressure}\nWind: {wind}\nLat: {latitude}\nLon: {longitude}'

In [29]:
def plot_correlations(corr_period='m', corr_type='pearson'):
    """ Function to calculate and plot Pearson or Spearman correlations 
    
    
    """
    dataframes = [jma_enso, nhc_enso]
    correlations = [perform_correlation(df, corr_period=corr_period, corr_type=corr_type) for df in dataframes]   
    
    periods = {
        'm': 'Monthly',
        'y': 'Yearly'
    }

    fig, axs = plt.subplots(1, 2, figsize=(12, 6))
    
    for i in range(2):
        data = correlations[i]
        period = periods[data['corr_period']]
        name = data['corr_type'][0].upper() + data['corr_type'][1:]

        # removing unwanted keys
        del data['corr_period']
        del data['corr_type']
        
        variables = [x for x in data.keys()]
        correlation_values = [x for x in data.values()]
        df = pd.DataFrame(correlation_values, index=variables, columns=[f'{name} Correlation'])
        sns.heatmap(df, annot=True, cmap='coolwarm', vmin=-1, vmax=1, linewidths=0.5, ax=axs[i], cbar=[True if i == 1 else False][0])
        axs[i].set_title(f'{name} {period} Correlation Heatmap for {ocean_area[i]} Pacific')
    plt.tight_layout()
    plt.show()

In [30]:
plot_correlations()