# Lab Exercise 7: Principal Component Analysis (PCA) of Surface Winds

In this exercise, we will explore how Principal Component Analysis (PCA), or Empirical Orthogonal Function (EOF) analysis, can be used to identify dominant modes of variability in surface winds.

Using monthly 10 m winds from ERA5, we will extract spatial modes and time series (Principal Components) that represent large-scale patterns such as the Amihan–Habagat monsoon reversal and the ENSO influence on wind variability.

By the end of this exercise, you should be able to:
1. Load gridded wind data efficiently with xarray and dask.
2. Compute EOFs using the xeofs library.
3. Interpret the first mode as the seasonal monsoon cycle.
4. Relate higher modes (e.g., Mode 4) to ENSO phases using the Niño 3.4 index.

### Step 0 – Setup and Data Preparation

For this exercise we’ll analyze surface winds from the ERA5 Reanalysis using `xarray`, `dask`, and `xeofs`.

Make sure your environment is ready before running the next steps. Install the needed packages either through anaconda prompt, or by running the script provided in a new cell.

```bash
!conda install -c conda-forge -n meteo203 xeofs dask -y
```

In [None]:
!conda install -c conda-forge -n meteo203 xeofs dask -y

Download the monthly surface wind data from [this link](https://drive.google.com/file/d/1jLCPelp5nwUqFLbcD25Bl7SVXKv2Tk6F/view?usp=drive_link). In the same directory as this notebook, create a new folder `era5-surface-winds`. Extract the contents of the file you downloaded in this folder.

Check if the setup is working using the following cell.

In [None]:
# Step 0 – Verify setup and environment
import xeofs as xe
import xarray as xr
from pathlib import Path
import numpy as np

data_path = Path("era5-surface-winds")
files = sorted(data_path.glob("era5-surface-wind-monthly-*.nc"))

print(f"Found {len(files)} NetCDF files in {data_path}")
print("First few files:", [f.name for f in files[:5]])


If the script above displays the list of the netcdf files, we can proceed with Step 1. 

---
### Step 1 – Load the ERA5 surface wind dataset

ERA5 monthly 10 m wind components (u10, v10) are stored in yearly NetCDF files.

We’ll use `xarray` and `dask` to load them efficiently.

**Note:** When you run the next cell, you might see messages like
```bash
sh: 1: getfattr: not found
```
These are harmless warnings related to file attributes.
You can safely ignore them — they do not affect the dataset or results.

In [None]:
from pathlib import Path

data_path = Path("era5-surface-winds")
ds = xr.open_mfdataset(
    str(data_path / "era5-surface-wind-monthly-*.nc"),
    combine="by_coords",
    parallel=True
)

#### Inspect the dataset Let’s check what variables and dimensions are included.

In [None]:
ds

In [None]:
# Display dataset summary
ds.info()

# Optional: view coordinate ranges
print("Time coverage:", str(ds.time[0].values)[:10], "to", str(ds.time[-1].values)[:10])
print("Latitude range:", float(ds.latitude.min()), "to", float(ds.latitude.max()))
print("Longitude range:", float(ds.longitude.min()), "to", float(ds.longitude.max()))


#### Questions
1. What are the main dimensions of this dataset? 
2. Which variables are included?
3. How many years of monthly data do we have?
4. What is the spatial extent (lat, lon) of this dataset?

---
### Step 2 – Plot the long-term mean of the zonal wind (u10)
The zonal wind (`u10`) represents the **east–west component** of the near-surface wind:

- Positive values → westerly winds (blowing from west to east)
- Negative values → easterly winds (blowing from east to west)

In this step, we’ll compute the long-term mean of `u10` (averaged from 1970–2024)
and visualize the resulting climatological pattern.

In [None]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

u_mean = ds["u10"].mean(dim="time")

fig, ax = plt.subplots(figsize=(10,5), subplot_kw={"projection": ccrs.PlateCarree()})
pcm = u_mean.plot(ax=ax, transform=ccrs.PlateCarree(), cmap="RdBu_r", add_colorbar=True)
ax.coastlines(resolution="50m")
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.set_title("Mean Zonal Wind (u10) – ERA5 1970–2024")
plt.show()


Interpreting the map. The sign of `u10` tells us the predominant wind direction.

| `u10` value | Direction | Interpretation |
|--------------|------------|----------------|
| **Negative (blue)** | Easterlies | Winds blowing **from east to west** |
| **Positive (red)** | Westerlies | Winds blowing **from west to east** |



Over the tropical Pacific, the long-term mean shows strong negative values,
meaning persistent easterlies — these are the trade winds that drive equatorial upwelling and influence ENSO variability.

Meanwhile, positive values appear over the mid-latitudes (especially north of ~25°N),
corresponding to westerly flows associated with the subtropical jet stream.

#### Questions
1. What do you notice about the transition from easterlies (blue) to westerlies (red)?
2. How does this relate to the Hadley cell and the intertropical convergence zone (ITCZ)?

---
### Step 3 – Plot Seasonal Averages of Zonal Wind (JJA vs SON)

Before we apply PCA, let’s first explore the seasonal structure of the 10 m zonal wind (u10).
We’ll compute and compare the climatological means for two key monsoon seasons:
- JJA (June–August) → Habagat / Southwest Monsoon
- SON (September–November) → Transition toward Amihan / Northeast Monsoon

This helps us visualize how the prevailing wind direction changes through the year.

In [None]:
# Compute monthly climatology (mean over all years)
u_monthly_clim = ds["u10"].groupby("time.month").mean("time")

# Select representative monsoon seasons
u_JJA = u_monthly_clim.sel(month=[6,7,8]).mean("month")
u_SON = u_monthly_clim.sel(month=[9,10,11]).mean("month")

# --- Plot ---
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

fig, axes = plt.subplots(1, 2, figsize=(12,5), subplot_kw={'projection': ccrs.PlateCarree()})

for ax, data, title in zip(
    axes,
    [u_JJA, u_SON],
    ["JJA (Jun–Aug) – Southwest Monsoon", "SON (Sep–Nov) – Transition to Northeast Monsoon"]
):
    pcm = data.plot(
        ax=ax, transform=ccrs.PlateCarree(), cmap="RdBu_r",
        vmin=-6, vmax=6, add_colorbar=False
    )
    ax.coastlines(resolution="50m")
    ax.add_feature(cfeature.BORDERS, linestyle=":")
    ax.set_title(title, fontsize=11)

# Shared colorbar below both subplots
cbar_ax = fig.add_axes([0.25, 0.08, 0.5, 0.03])  # [left, bottom, width, height]
fig.colorbar(pcm, cax=cbar_ax, orientation="horizontal", label="u10 (m/s)")

plt.suptitle("Seasonal Mean Zonal Wind (u10) – ERA5 1970–2024", fontsize=13, y=0.98)
plt.tight_layout(rect=[0, 0.1, 1, 0.95])
plt.show()


#### Questions

1. Which regions show negative u10 values (easterlies) and which show positive u10 values (westerlies)?
2. How does the latitude affect the direction and strength of the zonal wind?
3. In what regions do you notice the strongest winds during JJA and SON?

---
### Step 4 – Extracting Dominant Wind Patterns using EOF Analysis

Empirical Orthogonal Function (EOF) analysis — also known as Principal Component Analysis (PCA) — helps us identify the dominant spatial patterns of variability in a dataset and their time evolution.

In this step, we’ll apply EOF analysis to the monthly 10 m zonal wind (u10) to see how large-scale wind patterns (like the monsoon reversal) emerge as principal modes

In [None]:
# Select u10 for EOF analysis
u = ds["u10"]

# --- Initialize EOF model ---
# 'use_coslat=True' weights grid cells by latitude area (important for global data)
model = xe.single.EOF(n_modes=5, use_coslat=True)

# --- Fit the model ---
model.fit(u, dim="time")

# --- Extract results ---
components = model.components()  # spatial patterns (EOFs)
scores = model.scores()          # time series (PCs)

# Show available modes
components


#### Understanding what happens here.

| Step           | What It Does                                                   | Meteorological Meaning                                     |
| -------------- | -------------------------------------------------------------- | ---------------------------------------------------------- |
| `fit()`        | Computes the covariance matrix of `u10` and finds eigenvectors | Identifies patterns that explain the largest variance      |
| `components()` | Spatial maps of each mode                                      | *Where* the variability happens (e.g. monsoon regions)     |
| `scores()`     | Time evolution of each mode                                    | *When* the variability happens (e.g. seasonal cycle, ENSO) |


The first mode (Mode 1) typically represents the largest-scale, most consistent pattern in your data — often the annual or monsoon cycle.

Now let's plot EOF1 and PC1. 

In [None]:
# --- Plot EOF1 ---
fig, ax = plt.subplots(figsize=(6,5), subplot_kw={'projection': ccrs.PlateCarree()})
components.sel(mode=1).plot(
    ax=ax, transform=ccrs.PlateCarree(),
    cmap="RdBu_r", vmin=-0.05, vmax=0.05
)
ax.coastlines(resolution="50m")
ax.add_feature(cfeature.BORDERS, linestyle=":")
ax.set_title("EOF 1 – Dominant Wind Pattern (u10)")
plt.show()



In [None]:
# --- Plot PC1 (time series) ---
fig, ax = plt.subplots(figsize=(14, 4))  # wider aspect ratio
scores.sel(mode=1).plot(ax=ax, color='darkred', lw=1.2)

ax.set_title("PC 1 – Temporal Evolution of EOF 1 (u10)", fontsize=12)
ax.set_ylabel("Standardized amplitude")
ax.set_xlabel("Year")
ax.set_ylim(-1500, 1500)
ax.grid(alpha=0.3, linestyle="--", linewidth=0.5)

# Optional: add a horizontal zero line for reference
ax.axhline(0, color="black", lw=0.8, alpha=0.7)

plt.tight_layout()
plt.show()


The PC 1 time series shows how the leading wind pattern changes through time.

The full PC 1 time series spans several decades, so it can be difficult to see the individual seasonal cycle. Let’s zoom in to one year to visualize how the monsoon reversal happens within a typical year.

In [None]:
# Select one representative year (e.g., 2024)
pc1 = scores.sel(mode=1)

# Zoom in to one year
pc1_2024 = pc1.sel(time=slice("2024-01", "2024-12"))

# Center monthly timestamps by shifting 15 days earlier
pc1_2024_centered = pc1_2024.copy()
pc1_2024_centered["time"] = pc1_2024_centered["time"] - np.timedelta64(15, "D")

# Plot
pc1_2024_centered.plot(marker="o", color="firebrick")
plt.title("PC 1 (u10) – Year 2024: Seasonal Monsoon Cycle")
plt.ylabel("Standardized amplitude")
plt.grid(alpha=0.3)
plt.show()


The first Principal Component (PC1) represents the **dominant mode of variability** in the 10m zonal wind. By examining both the long-term record (1970-2024) and a single year (2024), we can explore how large-scale and seasonal wind patterns vary over time.

It summarizes how large-scale near-surface wind patterns evolve over time. Examine both the long-term record (1970 – 2024) and the single-year cycle (2024) before answering.

#### Questions
1. Long-Term Variability (1970 – 2024)
    - Describe the general shape and rhythm of the PC 1 time series across the full period.
    - What patterns or fluctuations stand out when you look across decades?
    - How would you characterize the years with unusually high or low amplitudes?
    - In what ways might long-term climate variations (e.g., ENSO, Pacific warming trends) influence this pattern? <br>
  
2. Seasonal Evolution (Year 2024 Example)
    - How does the PC 1 amplitude progress through the months of 2024?
    - Which parts of the year correspond to stronger or weaker zonal winds near the Philippines?
    - What do the rising and falling segments of the curve suggest about seasonal transitions?
    - Around which months does the shift between prevailing wind regimes occur, and how does this compare with the observed Amihan onset (mid-November 2024)?


---
### Step 5 – Linking Wind Variability (EOF 4) to Sea Surface Temperature (SST) Anomalies

In this final step, we’ll explore whether one of the higher EOF modes of the surface wind field
corresponds to ENSO-related variability — similar to what you studied in Lesson 06.

To do this, we’ll load the same ERSST v6 dataset and compute the Niño 3.4 index, which represents the standardized SST anomaly in the central-eastern equatorial Pacific.

You’ll then compare this index with Mode 4 of the u10 EOF analysis to see if the winds respond to oceanic warming and cooling patterns.

In [None]:
# --- Load SST data (edit the path if needed) ---
file_path = '../../lessons/06_pca/ersst.v6.195001_202412.nc'   # ← Update this if your file is in another folder
ds_sst = xr.open_dataset(file_path)

# Inspect structure
ds_sst





#### Step 5a. Reproduce the Niño 3.4 anomaly and standardized index computation. 

Edit the file_path variable so that it points to your local copy of the SST file. You can copy your earlier code cells and adjust as needed. 

To recap, here are the key steps that are implemented in [Lesson 06](../../lessons/06_pca/06a_intro_to_pca.ipynb):
1. Extract the SST field
2. Compute the baseline climatology (1950–1979)
3. Select Niño 3.4 region and compute anomalies
4. Apply a 5-month rolling mean and standardize

Remember, we are calculating the ONI, NOT the PCA of the SSTs. After successfuly calculating, reproduce the plot in Lesson 06 in a cell below. You can choose to copy the script in **Lesson 6, Part 7**.

Before proceeding to the next part, ensure that you have set the `nino34_std` variable.

To ensure that you have the complete variables, replicate the ONI plot in **Lesson 6, Part 7** in additional cells below.

#### Step 5b. Which Wind Mode Resembles the ENSO Signal?

The Niño 3.4 index represents the standardized SST anomaly in the central–eastern equatorial Pacific. ENSO events affect wind patterns across the tropics, often weakening or reversing the trade winds.

In this step, we’ll check which EOF mode of the 10 m wind field shows the strongest similarity
to the Niño 3.4 index — first numerically (correlation), then visually.

In [None]:
components

In [None]:
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

# --- Make sure nino34_std is already computed from Step 5A ---
# We'll align it with the wind data period (up to 2024)
nino34_std['time'] = pd.to_datetime(nino34_std['time'].values).to_period('M').to_timestamp()
nino34_std = nino34_std.sel(time=slice("1970", "2024"))

# --- Step 1: Compute correlation for each EOF mode ---
corrs = []

for mode_num in model.components()["mode"].values:
    pc = model.scores().sel(mode=mode_num)

    # Smooth and standardize (same as Niño 3.4)
    pc_smooth = pc.rolling(time=5, center=True).mean()
    pc_std = (pc_smooth - pc_smooth.mean(dim='time')) / pc_smooth.std(dim='time')

    # Align in time
    pc_std['time'] = pd.to_datetime(pc_std['time'].values).to_period('M').to_timestamp()
    pc_std = pc_std.sel(time=slice("1970", "2024"))
    pc_std, nino_aligned = xr.align(pc_std, nino34_std, join='inner')

    # Compute correlation
    r = float(xr.corr(pc_std, nino_aligned, dim='time').values)
    corrs.append((mode_num, r))

# --- Step 2: Display results ---
import pandas as pd
corr_df = pd.DataFrame(corrs, columns=['Mode', 'Correlation']).set_index('Mode')
display(corr_df.sort_values('Correlation', ascending=False))

# --- Step 3: Plot correlation as bar chart ---
corr_df.plot.bar(color='teal', legend=False)
plt.ylabel('Correlation with Niño 3.4')
plt.title('Which Wind EOF Mode Resembles ENSO?')
plt.grid(alpha=0.3)
plt.show()


In [None]:
# --- Choose the mode that looks most similar to the ENSO pattern ---
ens_mode = 

pc_ens = model.scores().sel(mode=ens_mode)

# Apply 5-month centered rolling mean
pc_ens_smooth = pc_ens.rolling(time=5, center=True).mean()

# Standardize (zero mean, unit variance)
pc_ens_std = (pc_ens_smooth - pc_ens_smooth.mean(dim='time')) / pc_ens_smooth.std(dim='time')

# import pandas as pd

pc_ens_std['time'] = pd.to_datetime(pc_ens_std['time'].values).to_period('M').to_timestamp()
nino34_std['time'] = pd.to_datetime(nino34_std['time'].values).to_period('M').to_timestamp()

pc_ens_std = pc_ens_std.sel(time=slice("1970", "2016"))
nino34_std = nino34_std.sel(time=slice("1970", "2016"))


# import matplotlib.pyplot as plt
# import xarray as xr

# Align and plot
pc_ens_std, nino34_std_aligned = xr.align(pc_ens_std, nino34_std, join='inner')

plt.figure(figsize=(10,4))
pc_ens_std.plot(label='pc_ens (Wind EOF 4)', color='k')
nino34_std_aligned.plot(label='Niño 3.4 Index', color='r')
plt.legend(); plt.grid(True)
plt.ylabel('Standardized amplitude')
plt.title('Mode 4 (Wind EOF) vs Niño 3.4 (5-month smoothing)')
plt.show()

# Compute correlation
r = xr.corr(pc_ens_std, nino34_std_aligned, dim='time')
print(f'Correlation (pc_ens vs Niño 3.4): {float(r.values):.2f}')


#### Questions

1. How closely do the peaks and troughs of the two series align across time?
2. What does the sign of the correlation (positive or negative) tell you about the wind response to El Niño and La Niña?
3. Do the strongest El Niño years (e.g., 1982–83, 1997–98, 2015–16) appear clearly in your chosen mode’s PC?
4. Does the relationship remain stable across decades, or does it vary?
5. What could explain any lag or mismatch between the wind and SST anomalies?

---

In this exercise, you explored how EOF analysis helps separate different scales of wind variability — the seasonal monsoon cycle and the interannual ENSO influence.
Review your results and reflect on what they reveal about tropical climate dynamics.

### Final Questions
1. How does EOF 1 capture the large-scale monsoon reversal between Amihan and Habagat?
2. What distinguishes your chosen ENSO-related mode (spatially and temporally) from EOF 1?
3. How do the Niño 3.4 index and your wind mode’s PC move together during major El Niño and La Niña events?
4.  What might explain periods when their relationship weakens or shifts over time?
5. In what ways do seasonal and interannual variability interact to influence Philippine or western Pacific climate patterns?
6. How could similar EOF methods be extended to study other variables (e.g., pressure, rainfall, SST)?