# Hidden Regimes in Correlation Space
### Premise:
Rather than price or volatility, correlation structure between assets defines market regimes. When average pairwise correlations spike, markets enter “systemic” regimes. We are trying to predict if regime changes influence volatility clustering or crashes.

### Hypothesis:
Market regimes are better identified by cross-sectional correlation structure than by single time-series levels; when average pairwise correlations and network centrality spike, the market is in a “systemic” regime that precedes volatility clustering and large drawdowns.

### Research Question:
Can we detect latent market regimes by clustering the time-varying correlation structure across major sectors/assets, and do transitions to a “high-correlation” regime predict subsequent volatility spikes or market crashes?

### Simple Alternatives:
Does this approach beat just watching VIX or simple correlation metrics?

## Clean and Collect Data
Assets
- Broad equities: (SPY)
- Sector ETFs: XLK (Tech), XLF (Financials), XLE (Energy), XLY (Consumer Discretionary), XLP (Staples), XLI (Industrials), XLV (Health), XLU (Utilities), XLB (Materials), IWM (Russell 2000)
- Other asset classes: TLT (long Treasuries), HYG (high yield), GLD (gold), USO (oil), BTC-USD (crypto)

Data:
- yfinance for ETF prices
- FRED for macros

Frequency:
- daily close prices

Time Span:
- past 10 years to include crises of past regime transitions (2015, 2018, 2020, 2022)

In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
from fredapi import Fred
from datetime import date

In [None]:
FRED_API_KEY = "f124dd8cecaa3c98064f3736e29c1ba8"
fred = Fred(api_key=FRED_API_KEY)

start_date = "2013-01-01"
end_date = date.today().isoformat()

In [None]:
etf_tickers = [
    "SPY",  # broad market
    "XLK", "XLF", "XLE", "XLY", "XLP", "XLI", "XLV", "XLU", "XLB", "IWM"
]

etf_data = yf.download(etf_tickers, start=start_date, end=end_date, progress=False)["Adj Close"]
etf_data = etf_data.ffill()

etf_returns = etf_data.pct_change().dropna()

In [None]:
macro_series = {
    "DGS10": "10Y_Treasury_Yield",
    "FEDFUNDS": "Fed_Funds_Rate",
    "CPIAUCSL": "CPI",
    "INDPRO": "Industrial_Production",
    "UNRATE": "Unemployment_Rate"
}

macro_data = pd.DataFrame()

for series_id, label in macro_series.items():
    s = fred.get_series(series_id, observation_start=start_date, observation_end=end_date)
    macro_data[label] = s

macro_data = macro_data.resample("D").ffill()

In [None]:
merged_df = etf_returns.merge(macro_data, left_index=True, right_index=True, how="left")
merged_df = merged_df.ffill()

In [None]:
print("ETF returns shape:", etf_returns.shape)
print("Macro data shape:", macro_data.shape)
print("Merged dataset shape:", merged_df.shape)

merged_df.tail()