# Systemic Risk Index (SRI) Creation

This notebook constructs the Systemic Risk Index (SRI) by applying Principal Component Analysis (PCA) to a set of key financial stress indicators.

The process involves the following steps:

1.  **Load Data**: The cleaned, weekly market data from the previous phase is loaded.
2.  **Select Inputs**: Three indicators are selected as inputs for the index:
    *   `VIX`: CBOE Volatility Index
    *   `MOVE`: Merrill Lynch Option Volatility Estimate (Treasury Volatility)
    *   `BAMLC0A0CM`: ICE BofA US Corporate Index Effective Yield
3.  **Standardize Inputs**: The selected indicators are standardized using `StandardScaler` to convert them to z-scores. This ensures each variable has a mean of 0 and a standard deviation of 1, preventing any single indicator from dominating the analysis due to its scale.
4.  **Apply PCA**: Principal Component Analysis is performed on the standardized data to identify the primary axis of shared variance among the indicators.
5.  **Extract & Orient PC1**: The first principal component (PC1) is extracted. We then check its component loadings to ensure that a higher index value corresponds to higher risk (e.g., a positive relationship with VIX). If the orientation is inverted, we multiply the component by -1.
6.  **Rescale to 0-100**: The final, oriented index is rescaled to a more intuitive 0-100 range using `MinMaxScaler`, where 0 represents the lowest systemic risk in the sample period and 100 represents the highest.

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA

In [3]:
# Load Data
df = pd.read_csv("../data/cleaned_market_data.csv", index_col=0, parse_dates=True)

In [4]:
df.head()

Unnamed: 0,GLD,SPY,TLT,UUP,MOVE,VIX,T10Y2Y,BAMLC0A0CMEY,USRECP
2007-03-02,63.709999,138.669998,90.199997,24.959999,76.699997,18.610001,-0.04,5.46,0.0
2007-03-09,64.25,140.779999,89.400002,25.16,63.599998,14.09,-0.07,5.55,0.0
2007-03-16,64.620003,138.529999,89.849998,24.870001,67.0,16.790001,-0.03,5.52,0.0
2007-03-23,65.150002,143.389999,88.790001,24.93,68.699997,12.95,0.02,5.58,0.0
2007-03-30,65.739998,142.0,88.279999,24.790001,67.900002,14.64,0.07,5.6,0.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 959 entries, 2007-03-02 to 2025-07-11
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   GLD           959 non-null    float64
 1   SPY           959 non-null    float64
 2   TLT           959 non-null    float64
 3   UUP           959 non-null    float64
 4   MOVE          959 non-null    float64
 5   VIX           959 non-null    float64
 6   T10Y2Y        959 non-null    float64
 7   BAMLC0A0CMEY  959 non-null    float64
 8   USRECP        959 non-null    float64
dtypes: float64(9)
memory usage: 74.9 KB


In [None]:
# STEP 1: Select the data for the index
risk_factors = df[["VIX", "MOVE", "BAMLC0A0CMEY"]]

# STEP 2: Standardize the data
scaler = StandardScaler()
scaled_factors = scaler.fit_transform(risk_factors)

# STEP 3: Apply PCA
pca = PCA(n_components=1)
principal_component = pca.fit_transform(scaled_factors)

# STEP 4: Put the new index back into a DataFrame
sri_raw = pd.Series(
    principal_component.flatten(), index=risk_factors.index, name="SRI_raw"
)

# STEP 5: Interpretation & Verification
# We need to check if our index makes sense. Does high VIX lead to high risk?
loadings = pd.Series(pca.components_[0], index=risk_factors.columns)
print("PCA Component Loadings:")
print(loadings)

# STEP 6: Rescale the SRI if needed
# If you want to rescale the SRI to a 0-100 range, you can do so using MinMaxScaler
min_max_scaler = MinMaxScaler(feature_range=(0, 100))
sri_scaled = min_max_scaler.fit_transform(sri_raw.to_numpy().reshape(-1, 1))

# Finally, add it to your main DataFrame
df["SRI"] = sri_scaled

# You can then rescale it to 0-100 for easier dashboarding, but the raw version is what you use for correlation analysis.
print("\nFirst 5 values of the new Systemic Risk Index:")
print(df["SRI"].head())

PCA Component Loadings:
VIX             0.517378
MOVE            0.626638
BAMLC0A0CMEY    0.582791
dtype: float64

First 5 values of the new Systemic Risk Index:
2007-03-02    20.405234
2007-03-09    15.742449
2007-03-16    17.796485
2007-03-23    16.167461
2007-03-30    17.062623
Name: SRI, dtype: float64


In [31]:
# Save the updated DataFrame
df.to_csv("../data/systemic_risk_index.csv")