# PCA Analysis on Spot Yields

This notebook performs **Principal Component Analysis (PCA)** on spot yield data, 
applying the following steps:

1. Generate or load spot yield data.
2. Compute **12-month proportionate change** in yields.
3. Apply **log transformation** to stabilize variance.
4. Standardize the data for PCA.
5. Perform **PCA** to extract principal components.
6. Analyze eigenvectors and explained variance.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Set seed for reproducibility
np.random.seed(42)


In [None]:
# Generate synthetic spot yield data (500 days, 5 maturities)
n_days = 500
maturities = ["1Y", "2Y", "5Y", "10Y", "30Y"]

# Simulate yield movements (base yield + random walk component)
base_yield_curve = np.array([2.0, 2.5, 3.0, 3.5, 4.0])  # Base rates for maturities
random_changes = np.cumsum(np.random.normal(scale=0.02, size=(n_days, len(maturities))), axis=0)
spot_yields = base_yield_curve + random_changes

# Convert to DataFrame
spot_yield_df = pd.DataFrame(spot_yields, columns=maturities)
spot_yield_df.head()


In [None]:
# Compute 12-month proportionate change in yield
prop_change = spot_yield_df.pct_change(periods=12)

# Apply log transformation (adding 1 to handle zero/negative values safely)
log_transformed = np.log1p(prop_change)

# Drop NaN values (caused by initial differencing)
log_transformed = log_transformed.dropna()

# Display transformed data
log_transformed.head()


In [None]:
# Standardize data (PCA is sensitive to scale)
scaler = StandardScaler()
log_transformed_scaled = scaler.fit_transform(log_transformed)

# Perform PCA
pca = PCA()
pca.fit(log_transformed_scaled)

# Explained variance
explained_variance = pca.explained_variance_ratio_

# Eigenvectors (principal component loadings)
eigenvectors = pca.components_

# Convert to DataFrame for better readability
pca_results_df = pd.DataFrame(
    eigenvectors,
    columns=maturities,
    index=[f"PC{i+1}" for i in range(len(maturities))]
)

# Display PCA results
pca_results_df


In [None]:
# Plot explained variance
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(maturities) + 1), explained_variance, marker="o", linestyle="-")
plt.xlabel("Principal Component")
plt.ylabel("Explained Variance Ratio")
plt.title("Explained Variance by Principal Component")
plt.grid()
plt.show()
