In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
import plotly.express as px

# 1. Data Preparation

Before performing any financial analysis, raw market data must be properly prepared and validated.

Financial datasets often contain missing observations, inconsistent timestamps, or structural differences across assets. 
If not handled correctly, these issues can introduce biases, forward-looking errors, or misleading statistical results.

The objective of the data preparation stage is to transform raw price data into a clean and consistent dataset suitable for quantitative analysis.

This stage includes:

- Loading historical market data
- Validating data integrity
- Handling missing values
- Aligning assets across time
- Computing returns

Proper data preparation ensures that all subsequent risk, performance, and statistical metrics are reliable and comparable across assets.

## 1.1 Data Loading

In this section, we load historical price data for the selected assets.

The dataset contains daily closing prices representing different segments of global financial markets. 
Each asset serves as a proxy for a specific risk factor or market exposure.

Example asset representation:

- Equity Markets (e.g., SPY, QQQ)
- Energy Sector (e.g., XLE)
- Digital Assets (e.g., BTC)

Using closing prices allows consistent comparison across assets and avoids intraday noise.

All assets are combined into a single time-indexed dataframe, where:

- Rows represent trading dates
- Columns represent individual assets
- Values correspond to adjusted closing prices

This structure enables synchronized time-series analysis across markets.

In [None]:
assets = {
    "SPY": "SPY",        # mercado general USA
    "QQQ": "QQQ",        # tecnología
    "XLE": "XLE",        # energía
    "BTC": "BTC-USD"     # crypto / liquidez global
}

In [None]:
data = yf.download(
    list(assets.values()),
    start="2015-01-01",
    auto_adjust=True
)["Close"]

data.tail()

## 1.2 Data Validation

Before performing any statistical analysis, data integrity must be verified.

Financial time series frequently contain missing observations due to:

- Different asset inception dates
- Market holidays
- Trading suspensions
- Data download inconsistencies

Missing values (NaNs) can severely distort statistical calculations such as:

- Returns
- Volatility
- Sharpe Ratio
- Correlation
- Momentum indicators

Therefore, we validate the dataset by:

1. Checking for missing values
2. Inspecting data availability across assets
3. Aligning time series
4. Removing incomplete observations when necessary

This step ensures that all subsequent calculations are based on synchronized and reliable data.

In [None]:
# ===============================
# DATA VALIDATION
# ===============================
print(data.info())

data.isna().sum()
print("Dataset shape:", data.shape)
print("\nMissing values per asset:")
print(data.isna().sum())

print("\nDate range:")
print(data.index.min(), "->", data.index.max())



In [None]:


# ===============================
# DATA CLEANING
# ===============================

data = data.dropna()
print("Info after cleaning:",data.info())
print("Shape after cleaning:", data.shape)

In [None]:
fig = px.line(
    data,
    title="Asset Prices"
)

fig.show()

In [None]:
normalized = data / data.iloc[0]

fig = px.line(
    normalized,
    title="Normalized Performance (Base = 1)"
)

fig.show()

In [None]:
fig = px.line(
    normalized,
    title="Normalized Performance (Log Scale)",
    log_y=True
)

fig.show()

In [None]:
recent = data.loc["2022":]
normalized_recent = recent / recent.iloc[0]

px.line(
    normalized_recent,
    title="Normalized Performance since 2022",
    log_y=True
)

## 1.3 Return Calculation

Asset prices themselves are non-stationary and cannot be directly compared across assets.

To perform statistical analysis, prices must be transformed into returns.

Returns measure the relative change in price between two consecutive periods and represent the fundamental unit of financial analysis.

We compute logarithmic returns:

r_t = ln(P_t / P_{t-1})

Where:

- P_t = price at time t
- P_{t-1} = price at time t-1

Log returns are preferred because they:

- Are time additive
- Improve statistical stability
- Allow aggregation across periods
- Are widely used in quantitative finance and risk modeling

All subsequent risk and performance metrics are derived from returns rather than prices.

### Log Returns Calculation

In [None]:
returns = np.log(data / data.shift(1))

returns = returns.dropna()

returns.head()

In [None]:
from pathlib import Path

BASE_DIR = Path().resolve().parent
DATA_DIR = BASE_DIR / "data"

DATA_DIR.mkdir(exist_ok=True)

data.to_csv(DATA_DIR / "market_prices.csv")
returns.to_csv(DATA_DIR / "market_returns.csv")

### Log Returns Visualization

We visualize log returns to inspect volatility clustering,
extreme events, and potential data anomalies before computing
risk metrics.



In [None]:
fig = px.line(
    returns,
    title="Log Returns",
)

fig.add_hline(y=0, line_dash="dash")

fig.show()

### Cumulative Returns

Cumulative performance is computed by summing log returns
and exponentiating the result:

Cumulative Return = exp(sum(r_t))

This represents the growth of $1 invested through time.

In [None]:
cumulative_returns = np.exp(returns.cumsum())

fig = px.line(
    cumulative_returns,
    title="Cumulative Performance (Growth of $1)"
)

fig.show()


### Normalized Prices

Prices are normalized by dividing by the initial value,
allowing direct comparison of asset performance regardless
of their absolute price level.

In [None]:
normalized_prices = data / data.iloc[0]

fig = px.line(
    normalized_prices,
    title="Normalized Prices"
)

fig.show()

In [None]:
rolling_perf = (1 + returns).rolling(252).apply(np.prod, raw=True)

In [None]:
px.line(
    rolling_perf,
    title="Rolling 1Y Performance"
)

# 2. Return Distribution & Tail Risk

Financial returns are not normally distributed.

This section analyzes the statistical properties of asset returns,
focusing on distribution shape and extreme events (tail risk).

Understanding return distributions helps identify asymmetry,
fat tails, and crash susceptibility across assets.

## 2.1 Return Distribution Statistics

This section characterizes the statistical properties of asset returns.

Understanding the distribution of returns is essential before performing
risk analysis, regime detection, or portfolio construction.

We compute the annualized mean return and annualized volatility
assuming 252 trading days per year.

### 2.1.1 Annualized Mean Return

The mean return represents the average expected return of an asset.

Since returns are computed at daily frequency, we annualize the mean
assuming 252 trading days per year.

Mathematically:

$$
\mu_{annual} = \mu_{daily} \times 252
$$

Where:

- $\mu_{daily}$ = average daily return
- 252 = approximate number of trading days per year

Annualized mean return provides an estimate of the long-term drift
of an asset.

In [None]:
mean_annual = returns.mean() * 252
mean_annual

### 2.1.2 Annualized Volatility

Volatility measures the dispersion of returns and represents
the risk of an asset.

We compute volatility as the standard deviation of daily returns
and annualize it assuming 252 trading days per year.

Mathematically:

$$
\sigma_{annual} = \sigma_{daily} \times \sqrt{252}
$$

Where:

- $\sigma_{daily}$ = standard deviation of daily returns
- $\sqrt{252}$ = annualization factor

Annualized volatility allows comparison of risk levels across assets.

In [None]:
volatility_annual = returns.std() * (252 ** 0.5)
volatility_annual


### 2.1.3 Skewness

Skewness measures the asymmetry of return distributions.

Positive skew indicates large upside tail events,
while negative skew suggests crash-prone behavior.

In [None]:
skewness = returns.skew()

skewness

All analyzed assets exhibit negative skewness,
indicating asymmetric downside risk.

Returns tend to experience occasional large losses
rather than extreme positive shocks.

This confirms that financial markets are crash-prone
and deviate significantly from normal distributions.

### 2.1.4 Kurtosis

Kurtosis measures tail heaviness relative to a normal distribution.

High kurtosis indicates frequent extreme events
and elevated tail risk.

In [None]:
kurtosis = returns.kurtosis()

kurtosis

Kurtosis measures the frequency of extreme return events
relative to a normal distribution.

A normal distribution has a kurtosis of 3.

All analyzed assets exhibit excess kurtosis,
indicating fat tails and frequent extreme market moves.

This confirms that financial returns are not normally distributed
and are characterized by crash risk and volatility clustering.

## 2.2 Tail Risk Visualization

Financial returns are not normally distributed and often exhibit fat tails,
meaning extreme events occur more frequently than predicted by a Gaussian distribution.

Tail risk refers to the probability of large negative or positive market moves.

Visualizing return distributions helps identify:

- Crash susceptibility
- Extreme volatility regimes
- Asymmetric risk behavior
- Presence of outliers

Understanding tail behavior is critical for portfolio construction and risk management.

In [None]:
import plotly.express as px

returns_long = returns.reset_index().melt(
    id_vars="Date",
    var_name="Asset",
    value_name="Return"
)

px.histogram(
    returns_long,
    x="Return",
    facet_col="Asset",
    facet_col_wrap=2,
    nbins=150,
    title="Return Distribution - Tail Visualization"
)

In [None]:
for col in returns.columns:
    
    fig = px.histogram(
        returns,
        x=col,
        nbins=200,
        title=f"{col} Return Distribution"
    )
    
    fig.show()


### Interpretation

If returns were normally distributed, tails would decay rapidly.

However, financial markets typically exhibit:

- Fat left tails (large crashes)
- Occasional extreme positive returns
- Non-symmetric distributions

Assets with heavier tails are exposed to higher tail risk
and require stronger risk management.
If returns were normally distributed, tails would decay rapidly.

However, financial markets typically exhibit:

- Fat left tails (large crashes)
- Occasional extreme positive returns
- Non-symmetric distributions

Assets with heavier tails are exposed to higher tail risk
and require stronger risk management.

Tail behavior can be quantitatively described using:

- Skewness → asymmetry of returns
- Kurtosis → thickness of distribution tails

Higher kurtosis indicates higher probability of extreme events.

In [None]:
skewness = returns.skew()
kurtosis = returns.kurtosis()

skewness, kurtosis


## 2.3 Drawdowns & Crash Behaviour

While return distributions describe daily behavior, drawdowns measure the real
experience of investors during market declines.

A drawdown represents the percentage decline from a previous peak in cumulative performance.

Mathematically:

Drawdown_t = (Portfolio Value_t / Running Maximum_t) − 1

Drawdowns allow us to evaluate:

- Maximum loss experienced by investors
- Market crash severity
- Recovery dynamics
- Risk persistence across regimes

Maximum Drawdown (MDD) is one of the most important risk metrics used by
portfolio managers and institutional investors.

### 2.3.1 Depth

In [None]:
cum_returns = np.exp(returns.cumsum())
running_max = cum_returns.cummax()

drawdowns = cum_returns / running_max - 1
px.line(
    drawdowns,
    title="Asset Drawdowns"
)



In [None]:
max_drawdown = drawdowns.min()

max_drawdown.to_frame(name="Max Drawdown")

### 2.3.2 Duration

Drawdown duration measures how long an asset remains below its previous peak.

While drawdown magnitude measures *how much* an asset loses,
drawdown duration measures *how long investors must wait to recover losses*.

This is a critical risk metric because prolonged recovery periods
increase behavioral risk and capital lock-up.

Mathematically:

Drawdown Duration = Number of consecutive periods
during which price remains below its previous maximum.

Long durations indicate slow recoveries and higher psychological risk.

In [None]:
# Cumulative performance
cum_returns = np.exp(returns.cumsum())

# Running peak
running_max = cum_returns.cummax()

# Drawdown
drawdown = cum_returns / running_max - 1

# Boolean: are we in drawdown?
underwater = drawdown < 0

# Duration counter
drawdown_duration = underwater.astype(int)

for col in drawdown_duration.columns:
    drawdown_duration[col] = (
        drawdown_duration[col]
        .groupby((drawdown_duration[col] == 0).cumsum())
        .cumsum()
    )

drawdown_duration.tail()

In [None]:
import plotly.express as px

px.line(
    drawdown_duration,
    title="Drawdown Duration (Days Underwater)"
)

# 3. Risk Characterization

While return distributions describe the statistical properties of assets,
they do not capture how risk evolves through time.

Financial markets are dynamic systems where volatility, uncertainty,
and risk exposure change across different market environments.

Risk Characterization focuses on measuring the time-varying behavior of risk.

The objective of this section is to understand:

- How risk fluctuates over time
- When markets transition between calm and stressed regimes
- Which assets lead or lag risk expansion
- How risk-adjusted performance changes across market cycles

These metrics form the foundation for later regime detection
and dynamic asset allocation models.

In institutional portfolio management, risk is not treated as static,
but as a state variable of the market.

## 3.1 Annualized Volatility

Volatility measures the dispersion of returns and represents
the fundamental risk of holding an asset.

It is computed as the standard deviation of returns and annualized
assuming 252 trading days per year.

Mathematically:

$$
\sigma_{annual} = \sigma_{daily} \times \sqrt{252}
$$

Where:

- $\sigma_{daily}$ is the standard deviation of daily returns
- 252 represents the approximate number of trading days per year

Annualized volatility provides a comparable measure of risk
across assets with different return characteristics.

This metric represents the long-term average risk level,
serving as a baseline before analyzing time-varying risk dynamics.

In [None]:
volatility = returns.std() * (252 ** 0.5)
volatility

## 3.2 Rolling Volatility

Annualized volatility provides a long-term estimate of risk,
but financial markets do not exhibit constant volatility.

Instead, volatility tends to cluster through time,
a phenomenon known as volatility clustering.

Rolling volatility measures how risk evolves dynamically
by computing volatility over a moving time window.

Mathematically:

$$
\sigma_t = Std(r_{t-n}, ..., r_t)
$$

Where:

- $n$ is the rolling window length
- $r_t$ represents asset returns

The rolling volatility is then annualized:

$$
\sigma_{annual,t} = \sigma_t \times \sqrt{252}
$$

This metric allows us to identify:

- Risk expansion periods
- Market stress events
- Volatility regimes
- Cross-asset risk transmission

Rising volatility often precedes or accompanies market drawdowns,
making it a key indicator for regime detection.

In [None]:
rolling_window = 90

rolling_volatility = (
    returns.rolling(rolling_window).std()
    * (252 ** 0.5)
)

rolling_volatility

In [None]:
px.line(
    rolling_volatility,
    title="Rolling Annualized Volatility (90D)"
)

## 3.3 Volatility Regimes

Financial markets alternate between periods of low and high volatility,
commonly referred to as market regimes.

Rather than treating volatility as a continuous variable,
we classify market conditions into discrete risk states.

Volatility regimes help identify:

- Calm market environments
- Transitional phases
- Stress or crisis periods

To standardize volatility across time,
we compute the Z-score of rolling volatility.

Mathematically:

$$
Z_t = \frac{\sigma_t - \mu_\sigma}{\sigma_\sigma}
$$

Where:

- $\sigma_t$ is rolling volatility
- $\mu_\sigma$ is the mean volatility
- $\sigma_\sigma$ is the standard deviation of volatility

The Z-score expresses how extreme current volatility is
relative to its historical behavior.

Typical interpretation:

- Z < -1 → Low volatility regime
- -1 ≤ Z ≤ 1 → Normal regime
- Z > 1 → High volatility regime
- Z > 2 → Stress regime

In [None]:
vol_zscore = (
    rolling_volatility -
    rolling_volatility.mean()
) / rolling_volatility.std()

vol_zscore = vol_zscore.dropna()

vol_zscore

In [None]:
px.line(
    vol_zscore,
    title="Volatility Regime Z-Score"
)

## 3.4 Correlation Structure

While volatility measures the risk of individual assets,
systemic risk emerges when assets begin to move together.

Correlation captures the degree to which asset returns co-move over time.

Low correlation implies diversification benefits,
while high correlation indicates market synchronization,
typically observed during crises and risk-off environments.

Correlation between assets i and j is defined as:

$$
\rho_{i,j} =
\frac{Cov(r_i, r_j)}
{\sigma_i \sigma_j}
$$

Where:

- $Cov(r_i, r_j)$ is the covariance between returns
- $\sigma_i$, $\sigma_j$ are the standard deviations of returns

Correlation ranges between:

- +1 → perfectly positively correlated
- 0 → independent movements
- -1 → perfectly negatively correlated

Monitoring correlation dynamics is essential because
diversification tends to fail during stress regimes.

In [None]:
correlation_matrix = returns.corr()

correlation_matrix

In [None]:
px.imshow(
    correlation_matrix,
    text_auto=True,
    title="Return Correlation Matrix"
)

### Rolling Correlation

Market relationships evolve through time.

Rolling correlation allows us to observe how diversification
changes across market regimes.

Correlation typically increases during crises,
a phenomenon known as correlation breakdown.

In [None]:
rolling_corr_btc_spy = (
    returns["BTC-USD"]
    .rolling(90)
    .corr(returns["SPY"])
)

px.line(
    rolling_corr_btc_spy,
    title="Rolling Correlation: BTC vs SPY (90D)"
)

## 3.5 Correlation Regimes

Financial markets transition through different systemic states
characterized by changing relationships between assets.

During normal environments, correlations tend to remain moderate,
allowing diversification benefits.

However, during stress or crisis regimes, correlations typically rise,
indicating synchronized market behavior and systemic risk.

Tracking the evolution of average market correlation provides
a proxy for regime detection.

An increase in cross-asset correlation often signals
risk concentration and potential market instability.

In [None]:
rolling_window = 90

rolling_corr = returns.rolling(rolling_window).corr()

In [None]:
avg_corr = (
    rolling_corr
    .groupby(level=0)
    .mean()
    .mean(axis=1)
)

In [None]:
px.line(
    avg_corr,
    title="Average Market Correlation (90D Rolling)"
)

In [None]:
avg_corr_smooth = avg_corr.rolling(30).mean()

px.line(
    avg_corr_smooth,
    title="Smoothed Average Market Correlation"
)

### Conclusion — Correlation Regimes

The analysis of rolling market correlation reveals that
asset relationships are not static but evolve through time,
reflecting underlying changes in market structure.

Periods of low average correlation indicate diversified market conditions,
where asset-specific drivers dominate performance and portfolio
diversification remains effective.

Conversely, sharp increases in cross-asset correlation signal
the emergence of systemic risk, as markets begin to move
in a synchronized manner.

Historically, correlation spikes coincide with liquidity shocks,
macro tightening cycles, and broad risk-off events,
during which diversification benefits deteriorate.

Therefore, average market correlation serves as a key
system-level indicator for identifying transitions between:

- Risk-on environments
- Transitional regimes
- Systemic stress or crisis conditions

This metric will later be incorporated as a regime feature,
allowing quantitative models to detect shifts in market behavior
without explicitly defining crisis events.

## 3.6 Volatility of Volatility

Volatility itself is not constant through time.
Periods of market stress are often characterized
by rapid changes in volatility rather than
high volatility alone.

Volatility of Volatility measures the instability
of risk by computing the standard deviation
of rolling volatility.

Higher values indicate unstable market conditions
and regime transitions.

In [None]:
rolling_vol = returns.rolling(90).std()

vol_of_vol = rolling_vol.rolling(90).std()

px.line(
    vol_of_vol,
    title="Volatility of Volatility"
)

## 3.7 Conclusion — Risk Characterization

Risk analysis shows that market behavior cannot be adequately
described using a single volatility measure.

Through time, assets exhibit significant variations in
risk magnitude, stability, and cross-asset interaction.
Annualized and rolling volatility reveal how risk evolves,
while volatility regimes highlight transitions between
stable and stressed environments.

Drawdown analysis captures realized downside risk,
providing insight into capital impairment during market stress.
Additionally, correlation structure analysis demonstrates
that diversification benefits are not constant and tend
to deteriorate during systemic events.

The inclusion of volatility of volatility further shows
that unstable risk dynamics often emerge during regime
transitions, preceding major market dislocations.

Together, these metrics provide a multidimensional
description of market risk conditions, forming the
foundation for subsequent analysis of risk-adjusted
performance and market regime classification.

# 4. Risk-Adjusted Performance

Raw returns alone provide limited information about asset quality,
as higher returns may simply reflect higher levels of risk.

Risk-adjusted performance measures evaluate how efficiently
an asset converts risk exposure into returns.

These metrics allow comparison across assets with different
volatility profiles and help identify periods where markets
reward or penalize risk-taking behavior.

In this section, we analyze performance normalized by risk
to detect changes in market efficiency and regime dynamics.

## 4.1 Sharpe Ratio

The Sharpe Ratio measures the excess return generated
per unit of risk taken.

It evaluates how efficiently an asset compensates investors
for the volatility they endure.

Mathematically:

Sharpe = Mean(Returns) / Std(Returns)

Higher Sharpe ratios indicate more efficient risk-adjusted performance,
while low or negative values suggest poor compensation for risk exposure.

This metric enables direct comparison between assets
with different volatility characteristics.

In [None]:
sharpe = returns.mean() / returns.std()

sharpe

In [None]:
px.bar(
    sharpe,
    title="Sharpe Ratio by Asset"
)

## 4.2 Rolling Sharpe Ratio

While the static Sharpe Ratio summarizes long-term performance,
it assumes constant risk-return characteristics through time.

Financial markets evolve across different regimes,
where risk efficiency expands and contracts.

The Rolling Sharpe Ratio measures risk-adjusted performance
over a moving window, allowing identification of periods
where assets efficiently reward risk-taking.

Mathematically:

$$
Rolling\ Sharpe_t =
\frac{
\mu_{t-window:t}
}{
\sigma_{t-window:t}
}
$$

Where:

- $\mu$ represents the mean return within the rolling window
- $\sigma$ represents the standard deviation of returns
- $window$ is the rolling lookback period

The ratio is annualized assuming 252 trading days:

$$
Rolling\ Sharpe_{annual} =
Rolling\ Sharpe_t \times \sqrt{252}
$$

In [None]:
rolling_window = 90

rolling_mean = returns.rolling(rolling_window).mean()
rolling_vol = returns.rolling(rolling_window).std()

rolling_sharpe = rolling_mean / rolling_vol
rolling_sharpe_annual = rolling_sharpe * (252 ** 0.5)

In [None]:
px.line(
    rolling_sharpe_annual,
    title="Rolling Sharpe Ratio (90D)"
)

## 4.3 Rolling Z-Score

The Z-Score measures how extreme current performance is
relative to its own historical behavior.

Instead of evaluating absolute returns,
the Z-Score standardizes observations by expressing them
in units of standard deviation from the rolling mean.

This allows identification of statistically abnormal
market conditions such as momentum extremes,
panic selloffs, or euphoric rallies.

Mathematically:

$$
Z_t =
\frac{
X_t - \mu_{t-window:t}
}{
\sigma_{t-window:t}
}
$$

Where:

- $X_t$ is the current observation
- $\mu$ is the rolling mean
- $\sigma$ is the rolling standard deviation

A Z-Score near zero indicates normal conditions,
while large positive or negative values signal
statistical extremes.

In [None]:
rolling_window = 90

rolling_mean = returns.rolling(rolling_window).mean()
rolling_std = returns.rolling(rolling_window).std()

z_score = (returns - rolling_mean) / rolling_std

In [None]:
px.line(
    z_score,
    title="Rolling Z-Score of Returns (90D)"
)

## 4.4 Conclusion — Risk Adjusted Performance

Risk-adjusted metrics provide a framework to evaluate
whether returns adequately compensate for the level
of risk assumed by investors.

While raw returns may appear attractive, measures such
as the Sharpe Ratio reveal substantial differences in
return efficiency across assets and time periods.

Rolling Sharpe analysis shows that risk compensation
is highly dynamic, with periods of strong efficiency
often followed by rapid deterioration during market
stress or regime transitions.

Z-Score normalization further allows identification
of statistically extreme performance environments,
highlighting moments where returns deviate significantly
from their historical expectations.

Together, these indicators demonstrate that performance
should not be evaluated solely in absolute terms, but
relative to the risk required to achieve it.

This risk-efficiency perspective provides a critical
bridge between market risk characterization and the
analysis of trend persistence and momentum dynamics
developed in subsequent sections.