# Project 04 — VaR & CVaR (Historical + Monte Carlo) — Interactive Risk Model

**Goal**
- Compute **Value-at-Risk (VaR)** and **Conditional VaR (CVaR / Expected Shortfall)**.
- Compare **Historical**, **Parametric (Normal)**, and **Monte Carlo** approaches.
- Visualize **tail risk** clearly (VaR threshold + CVaR tail mean).
- Add **stress tests** (vol shocks, correlation stress, scenario shocks).
- Provide an **interactive mini risk dashboard** (tickers, weights, confidence level, horizon, #paths).

> Notebook is written in **English**. Interactive charts use **Plotly** and controls use **ipywidgets**.

## 0) Setup (paths + reproducibility)
We store interactive HTML charts in an `assets/` folder next to this notebook.

In [1]:
from __future__ import annotations

from pathlib import Path
import numpy as np

SEED = 42
rng = np.random.default_rng(SEED)

PROJECT_DIR = Path.cwd()
ASSETS_DIR = PROJECT_DIR / "assets"
ASSETS_DIR.mkdir(parents=True, exist_ok=True)

print("CWD:", PROJECT_DIR)
print("ASSETS_DIR:", ASSETS_DIR.resolve())

CWD: c:\Users\Karim\Desktop\quant-finance-portfolio\projects\04_var_cvar_monte_carlo
ASSETS_DIR: C:\Users\Karim\Desktop\quant-finance-portfolio\projects\04_var_cvar_monte_carlo\assets


## 1) Imports

In [2]:
import math
import numpy as np
import pandas as pd

import plotly.graph_objects as go
import plotly.express as px

import ipywidgets as widgets
from IPython.display import display, clear_output

# Optional: market data
try:
    import yfinance as yf
    HAVE_YFINANCE = True
except Exception:
    HAVE_YFINANCE = False

HAVE_YFINANCE

True

## 2) VaR / CVaR — definitions (loss convention)

Let **P&L** be the profit-and-loss over a horizon (e.g., 1 day).
Define **Loss** as:
$$
L = -\text{P\&L}
$$
so that big negative P&L becomes a large positive loss.

For confidence level $\alpha$ (e.g., 95% or 99%):
- **VaR** is the $\alpha$-quantile of the loss distribution:
$$
\text{VaR}_\alpha = q_\alpha(L)
$$
- **CVaR / Expected Shortfall** is the *average loss in the tail beyond VaR*:
$$
\text{CVaR}_\alpha = \mathbb{E}[L \mid L \ge \text{VaR}_\alpha]
$$

**Key intuition**
- VaR answers: *"How bad can it get (threshold) with probability α?"*
- CVaR answers: *"If we are in the worst (1−α) cases, what is the average loss?"*

**Property (bonus)**
$$
\text{CVaR}_\alpha \ge \text{VaR}_\alpha
$$
because CVaR averages losses in the tail, which are all at least VaR.

## 3) Data pipeline (download or synthetic fallback)

In [3]:
def download_prices(tickers: list[str], start: str = "2018-01-01", end: str | None = None) -> pd.DataFrame:
    if not HAVE_YFINANCE:
        raise RuntimeError("yfinance is not installed. Install it with: pip install yfinance")
    data = yf.download(tickers, start=start, end=end, progress=False, auto_adjust=True)
    # yfinance returns multi-index columns when multiple tickers
    if isinstance(data.columns, pd.MultiIndex):
        px_df = data["Close"].copy()
    else:
        px_df = data[["Close"]].copy()
        px_df.columns = [tickers[0]]
    px_df = px_df.dropna(how="all")
    px_df = px_df.dropna(axis=1, how="all")
    return px_df

def log_returns(prices: pd.DataFrame) -> pd.DataFrame:
    return np.log(prices / prices.shift(1)).dropna()

def synthetic_returns(n_days: int = 2000, mu: float = 0.08, sigma: float = 0.20, dt: float = 1/252, seed: int = 42) -> pd.DataFrame:
    rng_ = np.random.default_rng(seed)
    r = (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * rng_.standard_normal(size=n_days)
    return pd.DataFrame({"SYNTH": r})

def get_returns(tickers: list[str], start="2018-01-01") -> pd.DataFrame:
    if HAVE_YFINANCE:
        try:
            prices = download_prices(tickers, start=start)
            rets = log_returns(prices)
            # keep only columns that downloaded well
            rets = rets.dropna(axis=1, how="any")
            if rets.shape[1] == 0:
                raise RuntimeError("No valid returns after cleaning.")
            return rets
        except Exception as e:
            print("⚠️ Data download failed, using synthetic returns. Reason:", e)
            return synthetic_returns()
    else:
        print("⚠️ yfinance not available, using synthetic returns.")
        return synthetic_returns()

rets = get_returns(["SPY"])
rets.head(), rets.shape

(Ticker           SPY
 Date                
 2018-01-03  0.006305
 2018-01-04  0.004206
 2018-01-05  0.006642
 2018-01-08  0.001827
 2018-01-09  0.002261,
 (2002, 1))

## 4) Portfolio construction (weights + P&L)

We compute portfolio daily **log-return**:
- If 1 asset: it is the asset return.
- If multiple assets: weighted sum of returns.

Then convert returns to **P&L** using a notional value `V0`:
$$
\text{P\&L} = V_0 \cdot R
$$

**Interpretation**  
A VaR of 20,000 at 99% means: **"With 99% confidence, daily loss should not exceed ~20k"** (under the chosen model).

In [4]:
def normalize_weights(w: np.ndarray) -> np.ndarray:
    w = np.asarray(w, dtype=float)
    s = w.sum()
    if s <= 0:
        raise ValueError("Weights must sum to a positive number.")
    return w / s

def portfolio_returns(returns: pd.DataFrame, weights: np.ndarray | None = None) -> pd.Series:
    if returns.shape[1] == 1:
        return returns.iloc[:, 0].rename("port_ret")
    if weights is None:
        weights = np.ones(returns.shape[1]) / returns.shape[1]
    w = normalize_weights(weights)
    port = returns.values @ w
    return pd.Series(port, index=returns.index, name="port_ret")

def pnl_from_returns(port_ret: pd.Series, V0: float = 1_000_000.0) -> pd.Series:
    return (V0 * port_ret).rename("PnL")

# quick demo
port_ret = portfolio_returns(rets)
pnl = pnl_from_returns(port_ret, V0=1_000_000.0)
pnl.head()

Date
2018-01-03    6305.339921
2018-01-04    4206.057495
2018-01-05    6641.867459
2018-01-08    1827.113148
2018-01-09    2260.699964
Name: PnL, dtype: float64

## 5) Core risk functions: VaR / CVaR (historical)

In [5]:
def var_cvar_from_pnl(pnl: np.ndarray | pd.Series, alpha: float = 0.99) -> dict:
    pnl = np.asarray(pnl, dtype=float)
    loss = -pnl  # loss convention (positive = bad)
    var = np.quantile(loss, alpha)
    tail = loss[loss >= var]
    cvar = tail.mean() if tail.size > 0 else var
    return {"VaR": float(var), "CVaR": float(cvar), "mean_loss": float(loss.mean()), "loss": loss}

alpha = 0.99
risk_hist = var_cvar_from_pnl(pnl, alpha=alpha)
risk_hist

{'VaR': 34773.41445605278,
 'CVaR': 52367.1198937182,
 'mean_loss': -522.4868979862617,
 'loss': array([-6305.33992143, -4206.05749462, -6641.86745859, ...,
         2736.07906046, 11064.52310035, -8542.33106719], shape=(2002,))}

### Historical tail visualization (interactive Plotly)

**How to interpret**
- The histogram shows the loss distribution from observed returns.
- The dashed vertical line = VaR threshold (α-quantile).
- The dotted line = CVaR (average loss in the tail beyond VaR).

In [6]:
def plot_loss_hist(loss: np.ndarray, VaR: float, CVaR: float, title: str) -> go.Figure:
    df = pd.DataFrame({"loss": loss})
    fig = px.histogram(df, x="loss", nbins=80, template="plotly_dark", title=title)
    fig.add_vline(x=VaR, line_width=3, line_dash="dash",
                  annotation_text=f"VaR={VaR:,.0f}", annotation_position="top left")
    fig.add_vline(x=CVaR, line_width=3, line_dash="dot",
                  annotation_text=f"CVaR={CVaR:,.0f}", annotation_position="top right")
    fig.update_layout(xaxis_title="Loss (positive = loss)", yaxis_title="Count")
    return fig

fig = plot_loss_hist(risk_hist["loss"], risk_hist["VaR"], risk_hist["CVaR"],
                     title=f"Historical Loss Distribution — alpha={alpha:.2%}")
fig.show()
fig.write_html(ASSETS_DIR / "historical_loss_var_cvar.html")
print("Saved:", ASSETS_DIR / "historical_loss_var_cvar.html")

Saved: c:\Users\Karim\Desktop\quant-finance-portfolio\projects\04_var_cvar_monte_carlo\assets\historical_loss_var_cvar.html


## 6) Parametric (Normal) VaR/CVaR — quick math + limitations

Assume returns are Normal:
$$
R \sim \mathcal{N}(\mu,\sigma^2)
$$
Then loss $L=-V_0R$ is also Normal.

**Parametric VaR**
$$
\text{VaR}_\alpha = -V_0\mu + V_0\sigma\, z_\alpha
$$
where $z_\alpha$ is the $\alpha$ quantile of a standard Normal.

**Parametric CVaR**
For Normal losses,
$$
\text{CVaR}_\alpha = -V_0\mu + V_0\sigma \frac{\varphi(z_\alpha)}{1-\alpha}
$$

**Interpretation**
Fast but often underestimates tail risk when real returns exhibit **fat tails**.

In [7]:
from scipy.stats import norm

def var_cvar_parametric_normal(returns: np.ndarray | pd.Series, V0: float, alpha: float) -> dict:
    r = np.asarray(returns, dtype=float)
    mu = r.mean()
    sig = r.std(ddof=1)
    z = norm.ppf(alpha)
    VaR = -V0 * mu + V0 * sig * z
    CVaR = -V0 * mu + V0 * sig * (norm.pdf(z) / (1 - alpha))
    return {"VaR": float(VaR), "CVaR": float(CVaR), "mu": float(mu), "sigma": float(sig), "z": float(z)}

V0 = 1_000_000.0
risk_norm = var_cvar_parametric_normal(port_ret, V0=V0, alpha=alpha)

pd.DataFrame({
    "Method": ["Historical", "Parametric Normal"],
    "VaR": [risk_hist["VaR"], risk_norm["VaR"]],
    "CVaR": [risk_hist["CVaR"], risk_norm["CVaR"]],
}).set_index("Method")

Unnamed: 0_level_0,VaR,CVaR
Method,Unnamed: 1_level_1,Unnamed: 2_level_1
Historical,34773.414456,52367.119894
Parametric Normal,28108.302149,32278.792292


## 7) Monte Carlo VaR/CVaR (single-asset or multi-asset, correlated)

We fit mean/cov from historical returns and simulate many future returns:
- **Multivariate Normal** (baseline)
- Optional **Student-t** (fatter tails) via a scale-mixture trick

Then compute portfolio P&L and VaR/CVaR from simulated losses.

In [8]:
def mc_simulate_returns_normal(returns: pd.DataFrame, n_paths: int, horizon_days: int, seed: int = 42) -> np.ndarray:
    """
    Simulate horizon_days aggregated log-returns using multivariate Normal with historical mean/cov.
    Returns array shape (n_paths, d).
    """
    rng_ = np.random.default_rng(seed)
    mu = returns.mean().values          # daily mean vector
    cov = returns.cov().values          # daily covariance matrix
    Z = rng_.multivariate_normal(mean=mu, cov=cov, size=(n_paths, horizon_days))  # (paths, days, d)
    R_h = Z.sum(axis=1)  # (paths, d)
    return R_h

def mc_simulate_returns_t(returns: pd.DataFrame, n_paths: int, horizon_days: int, df: float = 6.0, seed: int = 42) -> np.ndarray:
    """
    Multivariate t via: Normal / sqrt(Chi2/df)
    """
    rng_ = np.random.default_rng(seed)
    mu = returns.mean().values
    cov = returns.cov().values
    d = returns.shape[1]

    Z = rng_.multivariate_normal(mean=np.zeros(d), cov=cov, size=(n_paths, horizon_days))
    g = rng_.chisquare(df, size=(n_paths, horizon_days))
    scale = np.sqrt(df / g)  # (paths, days)
    X = mu + Z * scale[..., None]
    R_h = X.sum(axis=1)
    return R_h

def mc_var_cvar_portfolio(
    returns: pd.DataFrame, weights: np.ndarray | None, V0: float, alpha: float,
    n_paths: int = 100_000, horizon_days: int = 1,
    dist: str = "normal", df_t: float = 6.0, seed: int = 42
) -> dict:
    if returns.shape[1] == 1:
        weights = np.array([1.0])

    w = normalize_weights(np.ones(returns.shape[1]) / returns.shape[1] if weights is None else weights)

    if dist == "normal":
        R_h = mc_simulate_returns_normal(returns, n_paths, horizon_days, seed=seed)
    elif dist == "t":
        R_h = mc_simulate_returns_t(returns, n_paths, horizon_days, df=df_t, seed=seed)
    else:
        raise ValueError("dist must be 'normal' or 't'")

    port_ret_h = R_h @ w  # (paths,) horizon log-return
    pnl_h = V0 * port_ret_h
    risk = var_cvar_from_pnl(pnl_h, alpha=alpha)
    return {"pnl": pnl_h, **risk, "weights": w, "dist": dist, "horizon_days": horizon_days}

risk_mc = mc_var_cvar_portfolio(rets, weights=None, V0=V0, alpha=alpha, n_paths=50_000, horizon_days=1, dist="normal", seed=SEED)
pd.DataFrame({"VaR":[risk_mc["VaR"]], "CVaR":[risk_mc["CVaR"]]})

Unnamed: 0,VaR,CVaR
0,28236.256299,32445.599885


### MC loss distribution plot

**Interpretation**
- Compare this MC tail to historical: do we capture fat tails?
- If you switch to Student-t, CVaR typically increases (fatter tails).

In [9]:
fig = plot_loss_hist(-risk_mc["pnl"], risk_mc["VaR"], risk_mc["CVaR"],
                     title=f"Monte Carlo Loss Distribution ({risk_mc['dist']}) — alpha={alpha:.2%}, horizon=1d")
fig.show()
fig.write_html(ASSETS_DIR / "mc_loss_var_cvar.html")

## 8) Tail focus: zoom on worst losses

**Interpretation**
This plot zooms the right tail (worst losses). CVaR is the mean of that region beyond VaR.

In [10]:
def plot_tail(loss: np.ndarray, VaR: float, alpha: float, title: str) -> go.Figure:
    loss = np.asarray(loss, float)
    tail = loss[loss >= VaR]
    df = pd.DataFrame({"tail_loss": tail})
    fig = px.histogram(df, x="tail_loss", nbins=60, template="plotly_dark", title=title)
    fig.add_vline(x=VaR, line_width=3, line_dash="dash", annotation_text=f"VaR@{alpha:.0%}",
                  annotation_position="top left")
    fig.update_layout(xaxis_title="Tail losses (L >= VaR)", yaxis_title="Count")
    return fig

fig = plot_tail(-risk_mc["pnl"], risk_mc["VaR"], alpha, title="Tail-only view (losses beyond VaR)")
fig.show()
fig.write_html(ASSETS_DIR / "tail_zoom.html")

## 9) Stress tests (simple but useful)

We stress the distribution by modifying volatility/correlation (toy but effective).

In [11]:
def stress_vol_shock(returns: pd.DataFrame, vol_mult: float) -> pd.DataFrame:
    mu = returns.mean()
    centered = returns - mu
    return mu + vol_mult * centered

def stress_corr_to_one(returns: pd.DataFrame) -> pd.DataFrame:
    mu = returns.mean().values
    sig = returns.std(ddof=1).values
    cov_stressed = np.outer(sig, sig)  # corr=1 -> cov = sigma_i sigma_j
    fake = returns.copy()
    fake.attrs["cov_override"] = cov_stressed
    fake.attrs["mu_override"] = mu
    return fake

def mc_simulate_returns_normal_with_override(returns: pd.DataFrame, n_paths: int, horizon_days: int, seed: int = 42) -> np.ndarray:
    rng_ = np.random.default_rng(seed)
    mu = returns.attrs.get("mu_override", returns.mean().values)
    cov = returns.attrs.get("cov_override", returns.cov().values)
    Z = rng_.multivariate_normal(mean=mu, cov=cov, size=(n_paths, horizon_days))
    return Z.sum(axis=1)

def mc_var_cvar_portfolio_normal_override(returns: pd.DataFrame, weights: np.ndarray | None, V0: float, alpha: float,
                                         n_paths: int, horizon_days: int, seed: int = 42) -> dict:
    if returns.shape[1] == 1:
        weights = np.array([1.0])
    w = normalize_weights(np.ones(returns.shape[1]) / returns.shape[1] if weights is None else weights)
    R_h = mc_simulate_returns_normal_with_override(returns, n_paths, horizon_days, seed=seed)
    port_ret_h = R_h @ w
    pnl_h = V0 * port_ret_h
    risk = var_cvar_from_pnl(pnl_h, alpha=alpha)
    return {"pnl": pnl_h, **risk, "weights": w}

n_paths = 50_000
horizon_days = 1

base = mc_var_cvar_portfolio(rets, None, V0, alpha, n_paths=n_paths, horizon_days=horizon_days, dist="normal", seed=SEED)
vol_up = mc_var_cvar_portfolio(stress_vol_shock(rets, 1.5), None, V0, alpha, n_paths=n_paths, horizon_days=horizon_days, dist="normal", seed=SEED)
corr1 = mc_var_cvar_portfolio_normal_override(stress_corr_to_one(rets), None, V0, alpha, n_paths=n_paths, horizon_days=horizon_days, seed=SEED)

pd.DataFrame({
    "Scenario": ["Base (Normal MC)", "Vol shock x1.5", "Corr → 1 (diversification collapse)"],
    "VaR": [base["VaR"], vol_up["VaR"], corr1["VaR"]],
    "CVaR": [base["CVaR"], vol_up["CVaR"], corr1["CVaR"]],
}).set_index("Scenario").round(2)

Unnamed: 0_level_0,VaR,CVaR
Scenario,Unnamed: 1_level_1,Unnamed: 2_level_1
Base (Normal MC),28236.26,32445.6
Vol shock x1.5,42615.63,48929.64
Corr → 1 (diversification collapse),28236.26,32445.6


## 10) Interactive Risk Dashboard (tickers, alpha, horizon, paths)

**Tip**
MC with 100k paths is accurate but slower; start with 20k–50k.

In [12]:
# --- Widgets ---
tickers_text = widgets.Text(value="SPY", description="Tickers (csv)")
start_text = widgets.Text(value="2018-01-01", description="Start")

V0_slider = widgets.FloatLogSlider(value=1_000_000, base=10, min=4, max=7, step=0.05, description="V0")
alpha_slider = widgets.FloatSlider(value=0.99, min=0.90, max=0.995, step=0.005, description="alpha")
horizon_slider = widgets.IntSlider(value=1, min=1, max=20, step=1, description="horizon(d)")
paths_slider = widgets.IntSlider(value=30000, min=5000, max=120000, step=5000, description="MC paths")

dist_dd = widgets.Dropdown(options=[("Normal", "normal"), ("Student-t (fat tails)", "t")], value="normal", description="MC dist")
df_slider = widgets.FloatSlider(value=6.0, min=3.0, max=30.0, step=1.0, description="t df")

method_dd = widgets.Dropdown(options=[("Historical", "hist"), ("Parametric Normal", "param"), ("Monte Carlo", "mc")],
                             value="mc", description="Method")

btn = widgets.Button(description="Run model", button_style="success")
out = widgets.Output()

display(widgets.VBox([
    widgets.HBox([tickers_text, start_text]),
    widgets.HBox([V0_slider, alpha_slider, horizon_slider]),
    widgets.HBox([paths_slider, method_dd, dist_dd, df_slider]),
    btn,
    out
]))

def run_model(_):
    with out:
        clear_output(wait=True)

        tickers = [t.strip().upper() for t in tickers_text.value.split(",") if t.strip()]
        start = start_text.value.strip()
        alpha = float(alpha_slider.value)
        V0 = float(V0_slider.value)
        horizon = int(horizon_slider.value)
        n_paths = int(paths_slider.value)
        method = str(method_dd.value)
        dist = str(dist_dd.value)
        df_t = float(df_slider.value)

        returns = get_returns(tickers, start=start)
        port_ret = portfolio_returns(returns)  # equal weights by default

        # horizon aggregation for historical/parametric: rolling sum of log-returns
        if horizon > 1:
            port_ret_h = port_ret.rolling(horizon).sum().dropna()
        else:
            port_ret_h = port_ret

        pnl_hist = pnl_from_returns(port_ret_h, V0=V0)
        hist = var_cvar_from_pnl(pnl_hist, alpha=alpha)
        param = var_cvar_parametric_normal(port_ret_h, V0=V0, alpha=alpha)

        if method == "mc":
            risk_mc = mc_var_cvar_portfolio(returns, None, V0, alpha, n_paths=n_paths, horizon_days=horizon, dist=dist, df_t=df_t, seed=SEED)
            chosen_name = f"Monte Carlo ({dist})"
            chosen = {"VaR": risk_mc["VaR"], "CVaR": risk_mc["CVaR"], "pnl": risk_mc["pnl"]}
        elif method == "hist":
            chosen_name = "Historical"
            chosen = {"VaR": hist["VaR"], "CVaR": hist["CVaR"], "pnl": pnl_hist.values}
        else:
            chosen_name = "Parametric Normal"
            mu, sig = param["mu"], param["sigma"]
            sim_r = np.random.default_rng(SEED).normal(mu, sig, size=120000)
            sim_pnl = V0 * sim_r
            chosen = {"VaR": param["VaR"], "CVaR": param["CVaR"], "pnl": sim_pnl}

        summary = pd.DataFrame({
            "Method": ["Historical", "Parametric Normal", chosen_name],
            "VaR": [hist["VaR"], param["VaR"], chosen["VaR"]],
            "CVaR": [hist["CVaR"], param["CVaR"], chosen["CVaR"]],
        }).set_index("Method").round(2)

        display(summary)

        loss = -np.asarray(chosen["pnl"], dtype=float)
        fig = plot_loss_hist(loss, chosen["VaR"], chosen["CVaR"],
                             title=f"{chosen_name} Loss Distribution — alpha={alpha:.2%}, horizon={horizon}d")
        fig.show()

        # Backtest exceedances on historical VaR
        loss_hist = -pnl_hist.values
        exceed = (loss_hist > hist["VaR"]).mean()
        print(f"Historical exceedance rate: {exceed:.2%} (target ~ {(1-alpha):.2%})")

btn.on_click(run_model)
run_model(None)

VBox(children=(HBox(children=(Text(value='SPY', description='Tickers (csv)'), Text(value='2018-01-01', descrip…

## 11) What to say in interviews (short)
- Built a VaR/CVaR engine with **historical**, **parametric**, and **Monte Carlo** approaches.
- Visualized tail risk with interactive charts (VaR threshold + CVaR tail mean).
- Added simple **stress tests** (vol shock, correlation collapse) and interpreted impacts.
- Implemented a small **interactive risk dashboard** for rapid scenario exploration.