In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
import plotly.express as px

# 1. Data Preparation

Before performing any financial analysis, raw market data must be properly prepared and validated.

Financial datasets often contain missing observations, inconsistent timestamps, or structural differences across assets. 
If not handled correctly, these issues can introduce biases, forward-looking errors, or misleading statistical results.

The objective of the data preparation stage is to transform raw price data into a clean and consistent dataset suitable for quantitative analysis.

This stage includes:

- Loading historical market data
- Validating data integrity
- Handling missing values
- Aligning assets across time
- Computing returns

Proper data preparation ensures that all subsequent risk, performance, and statistical metrics are reliable and comparable across assets.

## 1.1 Data Loading

In this section, we load historical price data for the selected assets.

The dataset contains daily closing prices representing different segments of global financial markets. 
Each asset serves as a proxy for a specific risk factor or market exposure.

Example asset representation:

- Equity Markets (e.g., SPY, QQQ)
- Energy Sector (e.g., XLE)
- Digital Assets (e.g., BTC)

Using closing prices allows consistent comparison across assets and avoids intraday noise.

All assets are combined into a single time-indexed dataframe, where:

- Rows represent trading dates
- Columns represent individual assets
- Values correspond to adjusted closing prices

This structure enables synchronized time-series analysis across markets.

In [None]:
assets = {
    "SPY": "SPY",        # mercado general USA
    "QQQ": "QQQ",        # tecnología
    "XLE": "XLE",        # energía
    "BTC": "BTC-USD"     # crypto / liquidez global
}

In [None]:
data = yf.download(
    list(assets.values()),
    start="2015-01-01",
    auto_adjust=True
)["Close"]

data.tail()

## 1.2 Data Validation

Before performing any statistical analysis, data integrity must be verified.

Financial time series frequently contain missing observations due to:

- Different asset inception dates
- Market holidays
- Trading suspensions
- Data download inconsistencies

Missing values (NaNs) can severely distort statistical calculations such as:

- Returns
- Volatility
- Sharpe Ratio
- Correlation
- Momentum indicators

Therefore, we validate the dataset by:

1. Checking for missing values
2. Inspecting data availability across assets
3. Aligning time series
4. Removing incomplete observations when necessary

This step ensures that all subsequent calculations are based on synchronized and reliable data.

In [None]:
# ===============================
# DATA VALIDATION
# ===============================
print(data.info())

data.isna().sum()
print("Dataset shape:", data.shape)
print("\nMissing values per asset:")
print(data.isna().sum())

print("\nDate range:")
print(data.index.min(), "->", data.index.max())



In [None]:


# ===============================
# DATA CLEANING
# ===============================

data = data.dropna()
print("Info after cleaning:",data.info())
print("Shape after cleaning:", data.shape)

In [None]:
fig = px.line(
    data,
    title="Asset Prices"
)

fig.show()

In [None]:
normalized = data / data.iloc[0]

fig = px.line(
    normalized,
    title="Normalized Performance (Base = 1)"
)

fig.show()

In [None]:
fig = px.line(
    normalized,
    title="Normalized Performance (Log Scale)",
    log_y=True
)

fig.show()

In [None]:
recent = data.loc["2022":]
normalized_recent = recent / recent.iloc[0]

px.line(
    normalized_recent,
    title="Normalized Performance since 2022",
    log_y=True
)

## 1.3 Return Calculation

Asset prices themselves are non-stationary and cannot be directly compared across assets.

To perform statistical analysis, prices must be transformed into returns.

Returns measure the relative change in price between two consecutive periods and represent the fundamental unit of financial analysis.

We compute logarithmic returns:

r_t = ln(P_t / P_{t-1})

Where:

- P_t = price at time t
- P_{t-1} = price at time t-1

Log returns are preferred because they:

- Are time additive
- Improve statistical stability
- Allow aggregation across periods
- Are widely used in quantitative finance and risk modeling

All subsequent risk and performance metrics are derived from returns rather than prices.

### Log Returns Calculation

In [None]:
returns = np.log(data / data.shift(1))

returns = returns.dropna()

returns.head()

In [None]:
from pathlib import Path

BASE_DIR = Path().resolve().parent
DATA_DIR = BASE_DIR / "data"

DATA_DIR.mkdir(exist_ok=True)

data.to_csv(DATA_DIR / "market_prices.csv")
returns.to_csv(DATA_DIR / "market_returns.csv")

### Log Returns Visualization

We visualize log returns to inspect volatility clustering,
extreme events, and potential data anomalies before computing
risk metrics.



In [None]:
fig = px.line(
    returns,
    title="Log Returns",
)

fig.add_hline(y=0, line_dash="dash")

fig.show()

### Cumulative Returns

Cumulative performance is computed by summing log returns
and exponentiating the result:

Cumulative Return = exp(sum(r_t))

This represents the growth of $1 invested through time.

In [None]:
cumulative_returns = np.exp(returns.cumsum())

fig = px.line(
    cumulative_returns,
    title="Cumulative Performance (Growth of $1)"
)

fig.show()


### Normalized Prices

Prices are normalized by dividing by the initial value,
allowing direct comparison of asset performance regardless
of their absolute price level.

In [None]:
normalized_prices = data / data.iloc[0]

fig = px.line(
    normalized_prices,
    title="Normalized Prices"
)

fig.show()

In [None]:
rolling_perf = (1 + returns).rolling(252).apply(np.prod, raw=True)

In [None]:
px.line(
    rolling_perf,
    title="Rolling 1Y Performance"
)

# 2. Return Distribution & Tail Risk

Financial returns are not normally distributed.

This section analyzes the statistical properties of asset returns,
focusing on distribution shape and extreme events (tail risk).

Understanding return distributions helps identify asymmetry,
fat tails, and crash susceptibility across assets.

## 2.1 Return Distribution Statistics

This section characterizes the statistical properties of asset returns.

Understanding the distribution of returns is essential before performing
risk analysis, regime detection, or portfolio construction.

We compute the annualized mean return and annualized volatility
assuming 252 trading days per year.

### 2.1.1 Annualized Mean Return

The mean return represents the average expected return of an asset.

Since returns are computed at daily frequency, we annualize the mean
assuming 252 trading days per year.

Mathematically:

$$
\mu_{annual} = \mu_{daily} \times 252
$$

Where:

- $\mu_{daily}$ = average daily return
- 252 = approximate number of trading days per year

Annualized mean return provides an estimate of the long-term drift
of an asset.

In [None]:
mean_annual = returns.mean() * 252
mean_annual

### 2.1.2 Annualized Volatility

Volatility measures the dispersion of returns and represents
the risk of an asset.

We compute volatility as the standard deviation of daily returns
and annualize it assuming 252 trading days per year.

Mathematically:

$$
\sigma_{annual} = \sigma_{daily} \times \sqrt{252}
$$

Where:

- $\sigma_{daily}$ = standard deviation of daily returns
- $\sqrt{252}$ = annualization factor

Annualized volatility allows comparison of risk levels across assets.

In [None]:
volatility_annual = returns.std() * (252 ** 0.5)
volatility_annual


### 2.1.3 Skewness

Skewness measures the asymmetry of return distributions.

Positive skew indicates large upside tail events,
while negative skew suggests crash-prone behavior.

In [None]:
skewness = returns.skew()

skewness

All analyzed assets exhibit negative skewness,
indicating asymmetric downside risk.

Returns tend to experience occasional large losses
rather than extreme positive shocks.

This confirms that financial markets are crash-prone
and deviate significantly from normal distributions.

### 2.1.4 Kurtosis

Kurtosis measures tail heaviness relative to a normal distribution.

High kurtosis indicates frequent extreme events
and elevated tail risk.

In [None]:
kurtosis = returns.kurtosis()

kurtosis

Kurtosis measures the frequency of extreme return events
relative to a normal distribution.

A normal distribution has a kurtosis of 3.

All analyzed assets exhibit excess kurtosis,
indicating fat tails and frequent extreme market moves.

This confirms that financial returns are not normally distributed
and are characterized by crash risk and volatility clustering.

## 2.2 Tail Risk Visualization

Financial returns are not normally distributed and often exhibit fat tails,
meaning extreme events occur more frequently than predicted by a Gaussian distribution.

Tail risk refers to the probability of large negative or positive market moves.

Visualizing return distributions helps identify:

- Crash susceptibility
- Extreme volatility regimes
- Asymmetric risk behavior
- Presence of outliers

Understanding tail behavior is critical for portfolio construction and risk management.

In [None]:
import plotly.express as px

returns_long = returns.reset_index().melt(
    id_vars="Date",
    var_name="Asset",
    value_name="Return"
)

px.histogram(
    returns_long,
    x="Return",
    facet_col="Asset",
    facet_col_wrap=2,
    nbins=150,
    title="Return Distribution - Tail Visualization"
)

In [None]:
for col in returns.columns:
    
    fig = px.histogram(
        returns,
        x=col,
        nbins=200,
        title=f"{col} Return Distribution"
    )
    
    fig.show()


### Interpretation

If returns were normally distributed, tails would decay rapidly.

However, financial markets typically exhibit:

- Fat left tails (large crashes)
- Occasional extreme positive returns
- Non-symmetric distributions

Assets with heavier tails are exposed to higher tail risk
and require stronger risk management.
If returns were normally distributed, tails would decay rapidly.

However, financial markets typically exhibit:

- Fat left tails (large crashes)
- Occasional extreme positive returns
- Non-symmetric distributions

Assets with heavier tails are exposed to higher tail risk
and require stronger risk management.

Tail behavior can be quantitatively described using:

- Skewness → asymmetry of returns
- Kurtosis → thickness of distribution tails

Higher kurtosis indicates higher probability of extreme events.

In [None]:
skewness = returns.skew()
kurtosis = returns.kurtosis()

skewness, kurtosis


## 3.3 Drawdowns & Crash Behaviour

While return distributions describe daily behavior, drawdowns measure the real
experience of investors during market declines.

A drawdown represents the percentage decline from a previous peak in cumulative performance.

Mathematically:

Drawdown_t = (Portfolio Value_t / Running Maximum_t) − 1

Drawdowns allow us to evaluate:

- Maximum loss experienced by investors
- Market crash severity
- Recovery dynamics
- Risk persistence across regimes

Maximum Drawdown (MDD) is one of the most important risk metrics used by
portfolio managers and institutional investors.

### 3.3.1 Depth

In [None]:
cum_returns = np.exp(returns.cumsum())
running_max = cum_returns.cummax()

drawdowns = cum_returns / running_max - 1
px.line(
    drawdowns,
    title="Asset Drawdowns"
)



In [None]:
max_drawdown = drawdowns.min()

max_drawdown.to_frame(name="Max Drawdown")

### 3.3.2 Duration

Drawdown duration measures how long an asset remains below its previous peak.

While drawdown magnitude measures *how much* an asset loses,
drawdown duration measures *how long investors must wait to recover losses*.

This is a critical risk metric because prolonged recovery periods
increase behavioral risk and capital lock-up.

Mathematically:

Drawdown Duration = Number of consecutive periods
during which price remains below its previous maximum.

Long durations indicate slow recoveries and higher psychological risk.

In [None]:
# Cumulative performance
cum_returns = np.exp(returns.cumsum())

# Running peak
running_max = cum_returns.cummax()

# Drawdown
drawdown = cum_returns / running_max - 1

# Boolean: are we in drawdown?
underwater = drawdown < 0

# Duration counter
drawdown_duration = underwater.astype(int)

for col in drawdown_duration.columns:
    drawdown_duration[col] = (
        drawdown_duration[col]
        .groupby((drawdown_duration[col] == 0).cumsum())
        .cumsum()
    )

drawdown_duration.tail()

In [None]:
import plotly.express as px

px.line(
    drawdown_duration,
    title="Drawdown Duration (Days Underwater)"
)

## Volatility

Volatility measures the dispersion of returns and represents market risk.

Formula:

σ = std(r)

Why it matters:

- Higher volatility implies higher uncertainty
- Risk-adjusted metrics depend on volatility
- Portfolio allocation is primarily risk allocation

### Annualized Volatility

Volatility measures the dispersion of returns and represents the **risk** of an asset.

We compute volatility as the standard deviation of returns and annualize it assuming **252 trading days per year**.

Mathematically:

$$
\sigma_{annual} = \sigma_{daily} \times \sqrt{252}
$$

Where:

- $\sigma_{daily}$ = standard deviation of daily returns  
- 252 = approximate number of trading days in a year  

Volatility allows comparison of risk levels across assets.

In [None]:
volatility = returns.std() * (252 ** 0.5)
volatility

## Risk Adjusted Returns

Risk-adjusted performance evaluates how much return is generated per unit of risk.

### Sharpe Ratio

The Sharpe Ratio measures the return generated per unit of risk.

It evaluates how efficiently an asset compensates investors for the volatility they endure.

Mathematically:

$$
Sharpe = \frac{E[R]}{\sigma}
$$

Where:

- $E[R]$ = average return
- $\sigma$ = standard deviation of returns

Higher Sharpe Ratios indicate better risk-adjusted performance.

For daily data, the Sharpe Ratio can be annualized as:

$$
Sharpe_{annual} = Sharpe_{daily} \times \sqrt{252}
$$

Why we use it:

- Raw returns are misleading
- Investors are compensated for risk, not absolute return
- Enables cross-asset comparison

In [None]:
sharpe = returns.mean() / returns.std()
sharpe_annual = sharpe * (252 ** 0.5)
sharpe_annual

### Rolling Sharpe Ratio

Markets are non-stationary and risk-adjusted performance changes through time.

The Rolling Sharpe Ratio evaluates how efficiently an asset generates returns relative to risk over a moving window.

For each time period:

$$
Sharpe_t =
\frac{\mu_{t,w}}
{\sigma_{t,w}}
$$

Where:

- $\mu_{t,w}$ = mean return over rolling window $w$
- $\sigma_{t,w}$ = volatility over rolling window $w$

We annualize the metric assuming 252 trading days:

$$
Sharpe_{annual} = Sharpe_{daily} \times \sqrt{252}
$$

Rolling Sharpe helps identify periods of favorable or unfavorable risk-adjusted performance and is commonly used to detect market regimes.

In [None]:
rolling_window = 90

rolling_mean = returns.rolling(rolling_window).mean()
rolling_vol = returns.rolling(rolling_window).std()

rolling_sharpe = rolling_mean / rolling_vol

rolling_sharpe_annual = rolling_sharpe * (252 ** 0.5)

rolling_sharpe_annual = rolling_sharpe_annual.dropna()

rolling_sharpe_annual

In [None]:
px.line(
    rolling_sharpe_annual.dropna(),
    title="Rolling Sharpe Ratio (90D)"
)

## Rolling Z-Score

The Z-Score measures how far current returns deviate from their historical average in units of standard deviation.

It allows us to identify statistically unusual market conditions such as extreme rallies or sell-offs.

Mathematically:

Z_t = (R_t - μ_t) / σ_t

Where:

- R_t = current return
- μ_t = rolling mean of returns
- σ_t = rolling standard deviation of returns

The rolling window ensures that the comparison is made relative to recent market conditions rather than the full historical sample.

### Interpretation

- Z ≈ 0 → returns are normal relative to recent history
- Z > 1 → above-average performance
- Z > 2 → statistically extreme positive move
- Z < -2 → statistically extreme negative move

### Why we use it

Financial markets tend to revert after extreme deviations.

The Z-Score helps detect:

- momentum overheating
- panic selling
- regime transitions
- mean reversion opportunities

It transforms returns into standardized units, allowing comparison across assets with different volatility levels.

In [None]:
rolling_window = 90

rolling_mean = returns.rolling(rolling_window).mean()
rolling_std = returns.rolling(rolling_window).std()

z_score = (returns - rolling_mean) / rolling_std
z_score = z_score.dropna()
z_score

In [None]:
fig = px.line(
    z_score,
    title="Rolling Z-Score of Returns (90-Day Window)"
)

fig.add_hline(y=2, line_dash="dash")
fig.add_hline(y=-2, line_dash="dash")
fig.add_hline(y=0, line_dash="dot")

fig.show()

## 5. Momentum

Momentum measures the tendency of an asset to continue moving in the same direction.

It captures trend persistence and is one of the most robust empirical factors in financial markets.

We compute momentum as the cumulative return over a rolling window.

Mathematically:

Momentum_t = (P_t / P_{t-n}) - 1

Where:

P_t = current price  
P_{t-n} = price n periods ago  

A positive momentum indicates an upward trend, while negative momentum signals downward pressure.

Momentum helps identify trend-following regimes and investor positioning.

In [None]:
momentum_window = 90

momentum = data.pct_change(momentum_window)

momentum = momentum.dropna()

momentum

In [None]:
import plotly.express as px

px.line(
    momentum,
    title="90-Day Momentum"
)

## 5.3 Momentum Stability

Momentum stability measures how consistent an asset's trend is through time.

Instead of evaluating only the magnitude of momentum, we analyze the variability of momentum itself.

Stable momentum indicates persistent trends, while unstable momentum suggests noisy or speculative price movements.

Mathematically:

Momentum Stability = Rolling Std(Momentum)

Lower values indicate smoother and more reliable trends.

This metric helps distinguish sustainable trends from volatile price expansions.

In [None]:

rolling_window = 90
momentum_stability = momentum.rolling(90).std()

momentum_stability
px.line(
    momentum_stability,
    title="Momentum Stability"
)

## Autocorrelation

Autocorrelation measures the relationship between current returns and past returns.

It evaluates whether price movements tend to persist or reverse over time.

Mathematically:

Autocorrelation = Corr(R_t , R_{t-k})

Where:

R_t = return at time t  
R_{t-k} = return lagged by k periods

Positive autocorrelation suggests trend persistence,
while negative autocorrelation indicates mean reversion.

Autocorrelation is widely used to identify market regimes and trading behavior.

In [None]:
rolling_window = 90
autocorr_1 = returns.rolling(rolling_window).apply(
    lambda x: x.autocorr(lag=1)
)

autocorr_1 = autocorr_1.dropna()
autocorr_1

In [None]:
px.line(
    autocorr_1,
    title="Rolling Autocorrelation (Lag 1)"
)


## Momentum Stability Z-Score

Momentum Stability Z-Score measures how unusual the current trend stability is relative to its historical behavior.

Instead of analyzing absolute stability levels, we standardize momentum stability using a rolling Z-score.

Mathematically:

Z = (X - μ) / σ

Where:

X = current momentum stability  
μ = rolling mean of momentum stability  
σ = rolling standard deviation of momentum stability  

Positive values indicate unusually unstable trends,
while negative values indicate unusually smooth and persistent trends.

This metric helps identify regime transitions and trend exhaustion.

In [None]:
stability_mean = momentum_stability.rolling(252).mean()
stability_std = momentum_stability.rolling(252).std()

momentum_stability_z = (
    momentum_stability - stability_mean
) / stability_std

momentum_stability_z = momentum_stability_z.dropna()
momentum_stability_z


In [None]:
px.line(
    momentum_stability_z,
    title="Momentum Stability Z-Score"
)

## 2.3 Drawdowns

Drawdown measures the decline from a historical peak in cumulative returns.

It represents the realized loss experienced by an investor from peak to trough.

Mathematically:

Drawdown = (P_t / Peak_t) - 1

Where:

P_t = cumulative performance at time t  
Peak_t = running maximum of cumulative performance  

Drawdowns are essential to understand downside risk and stress periods.

In [None]:
cum_returns = (1 + returns).cumprod() # acumulación de retornos

rolling_max = cum_returns.cummax() # máximo histórico

drawdown = cum_returns / rolling_max - 1

px.line(
    drawdown,
    title="Drawdowns"
)


## 6.2 Trend Velocity

Trend velocity measures the speed of price movement over time.

It is computed as the rate of change of momentum.

Higher values indicate faster trend formation,
while lower values indicate slow-moving markets.

In [None]:
velocity = momentum.diff()

px.line(
    velocity,
    title="Trend Velocity"
)

## 6.3 Trend Acceleration

Trend acceleration measures the change in trend velocity.

It captures whether trends are strengthening or weakening.

Acceleration is the second derivative of price dynamics.

In [None]:
acceleration = velocity.diff()

px.line(
    acceleration,
    title="Trend Acceleration"
)