# CVaR Portfolio Optimization

The goal of this project is to build a **long-only equity portfolio** that **minimizes Conditional Value-at-Risk (CVaR)** at level Beta. To train the portfolio on **2019** daily returns, then **evaluate** the same allocation on **2020**, and compare to a benchmark (**NDX**) as a single-asset portfolio.

## Setup & Load

In [1]:
import pandas as pd
import numpy as np
from gurobipy import Model, GRB, quicksum

prices_2019 = pd.read_csv("stocks2019.csv", index_col = 0, parse_dates = True)
prices_2020 = pd.read_csv("stocks2020.csv", index_col = 0, parse_dates = True)

  prices_2020 = pd.read_csv("stocks2020.csv", index_col = 0, parse_dates = True)


In [2]:
prices_2019[:3]

Unnamed: 0_level_0,NDX,ATVI,ADBE,AMD,ALXN,ALGN,GOOGL,GOOG,AMZN,AMGN,...,TCOM,ULTA,VRSN,VRSK,VRTX,WBA,WDAY,WDC,XEL,XLNX
X,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-01-02,6360.870117,46.35038,224.570007,18.83,98.050003,202.119995,1054.680054,1045.849976,1539.130005,182.458298,...,27.59,247.970001,147.759995,107.364159,164.080002,63.920757,159.740005,36.312473,46.041862,84.600266
2019-01-03,6147.129883,44.704514,215.699997,17.049999,100.209999,184.779999,1025.469971,1016.059998,1500.280029,179.681961,...,26.959999,243.360001,142.589996,104.08345,163.729996,63.253742,154.020004,33.835327,45.860447,81.41497
2019-01-04,6422.669922,46.488358,226.190002,19.0,106.0,186.710007,1078.069946,1070.709961,1575.390015,185.824142,...,28.549999,255.029999,148.970001,108.678413,172.699997,65.358139,163.350006,35.268467,46.309223,85.184715


In [3]:
prices_2020[:3]

Unnamed: 0,NDX,ATVI,ADBE,AMD,ALXN,ALGN,GOOGL,GOOG,AMZN,AMGN,...,TCOM,ULTA,VRSN,VRSK,VRTX,WBA,WDAY,WDC,XEL,XLNX
2020-01-02,8872.219727,58.266792,334.429993,49.099998,107.839996,283.679993,1368.680054,1367.369995,1898.01001,231.917572,...,36.970001,254.550003,196.729996,150.167465,219.449997,56.046688,167.460007,64.771545,60.782249,100.115349
2020-01-03,8793.900391,58.286655,331.809998,48.599998,106.410004,280.440002,1361.52002,1360.660034,1874.969971,230.343124,...,36.18,250.169998,200.880005,151.62822,217.979996,56.046688,168.440002,63.774597,61.074562,97.810677
2020-01-06,8848.519531,59.34967,333.709991,48.389999,106.580002,285.880005,1397.810059,1394.209961,1902.880005,232.110748,...,35.689999,250.949997,202.740005,151.976013,224.029999,56.530506,169.490005,62.550629,60.986866,95.771927


## Convert prices --> returns & define the investable universe

**Goal**
- Convert daily prices to **arithmetic daily returns**
- Keep 'NDX' aside as a **benchmark** and **exclude it** from investable assets
- Keep only tickers present in **BOTH** years to be robust to missing names/delistings

In [4]:
# convert prices to arithmetic daily returns and drop the initial NaNs
rets_2019_all = prices_2019.pct_change().dropna(how = "all")
rets_2020_all = prices_2020.pct_change().dropna(how = "all")

# separate benchmark series; exclude it from investable universe
index_col = "NDX"
if index_col not in rets_2019_all or index_col not in rets_2020_all:
    raise ValueError(f"expected benchmark column '{index_col}' missing from csvs")

ndx_2019 = rets_2019_all[index_col].dropna()
ndx_2020 = rets_2020_all[index_col].dropna()

# investable = everything except the index column
inv_2019 = rets_2019_all.drop(columns = [index_col]).dropna(axis = 1, how = "any")
inv_2020 = rets_2020_all.drop(columns = [index_col]).dropna(axis = 1, how = "any")

# align the investable tickers across years (intersection)
common_cols = sorted(set(inv_2019.columns) & set(inv_2020.columns))
if len(common_cols) == 0:
    raise ValueError("no common investable tickers across 2019 and 2020")

rets_2019 = inv_2019[common_cols].dropna()
rets_2020 = inv_2020[common_cols].dropna()

# small helper to print percents nicely
def as_pct(x):
    return 100.0 * float(x)

We want a **long-only** portfolio that minimizes tail risk measured by **CVar** at level Beta. CVaR is the **average loss** in the worst (1-Beta) fraction of days.
Daily **loss** on day \(k\) for weights \(x\) and returns \(y_k\): \(L_k = -x^\top y_k\).
- Decision variables: \(x\) (weights), \(\alpha\) (VaR threshold), \(u_k\) (tail slacks).

**Linear Program (Rockafellar–Uryasev):**


## Solve for the weights on the 2019 training sample

Optimize x to minimize Beta-CVaR on 2019 with a long-only budget and daily expected return floor, R.  
We’ll return the weights, VaR (alpha), and CVaR (objective value).

In [5]:
def solve_cvar_min_long_only(returns_df, beta=0.95, min_return=0.0002, quiet=True):
    """
    minimize beta-CVaR of daily losses (Rockafellar-Uryasev LP) on a training sample.

    returns:
        dict with:
          - 'x': pd.Series of optimal weights (sorted descending)
          - 'VaR': optimal alpha (daily VaR at level beta)
          - 'CVaR': optimal objective value (daily CVaR at level beta)
          - 'model': gurobi model (if you want diagnostics)
    """
    # matrices
    Y = returns_df.values              # shape: q x n (rows: days, cols: assets)
    q, n = Y.shape
    mu = returns_df.mean(axis = 0).values  # mean daily returns per asset

    # build model
    m = Model("min_cvar")
    if quiet:
        m.Params.OutputFlag = 0  # silence solver log

    # decision variables
    x = m.addVars(n, lb = 0.0, name = "x")                 # long-only weights
    alpha = m.addVar(lb = -GRB.INFINITY, name = "alpha")   # VaR threshold
    u = m.addVars(q, lb = 0.0, name = "u")                 # tail slack variables

    # VaR + average tail loss --> the objective
    inv_tail_mass = 1.0 / ((1.0 - beta) * q)
    m.setObjective(alpha + inv_tail_mass * quicksum(u[k] for k in range(q)), GRB.MINIMIZE)

    # tail constraints: u_k >= -x^T y_k - alpha  (excess loss above VaR)
    for k in range(q):
        m.addConstr(u[k] >= -quicksum(Y[k, j] * x[j] for j in range(n)) - alpha, name = f"tail_{k}")

    # fully invested, long-only
    m.addConstr(quicksum(x[j] for j in range(n)) == 1.0, name = "budget")

    # small daily expected return floor
    m.addConstr(quicksum(mu[j] * x[j] for j in range(n)) >= min_return, name = "min_return")

    # optimize
    m.optimize()

    # collect solution
    x_series = pd.Series([x[j].X for j in range(n)], index = returns_df.columns, name = "weight")
    x_series = x_series.sort_values(ascending = False)
    return {"x": x_series, "VaR": alpha.X, "CVaR": m.objVal, "model": m}


## Re-evaluate CVaR on new data for a fixed portfolio

Hold x fixed (from 2019) and compute VaR/CVaR on 2020 by optimizing only alpha and u. This isolates out-of-sample risk for the same allocation

In [6]:
def evaluate_cvar_fixed_weights(returns_df, weights, beta = 0.95, quiet = True):
    
    # align weights to columns
    w = weights.reindex(returns_df.columns).fillna(0.0).values
    Y = returns_df.values
    q, n = Y.shape

    # daily portfolio losses for the fixed weights
    losses = -Y.dot(w)

    # build eval model
    m = Model("eval_cvar_fixed_x")
    if quiet:
        m.Params.OutputFlag = 0

    alpha = m.addVar(lb = -GRB.INFINITY, name = "alpha")
    u = m.addVars(q, lb = 0.0, name = "u")

    inv_tail_mass = 1.0 / ((1.0 - beta) * q)
    m.setObjective(alpha + inv_tail_mass * quicksum(u[k] for k in range(q)), GRB.MINIMIZE)

    # tail constraints for the realized losses
    for k in range(q):
        m.addConstr(u[k] >= float(losses[k]) - alpha, name = f"tail_eval_{k}")

    m.optimize()
    return {"VaR": alpha.X, "CVaR": m.objVal}

## Compute benchmark CVaR for single series (NDX)

Treat NDX like a one-asset portfolio with weight 1. Compute its VaR/CVaR on each year

In [8]:
def evaluate_cvar_single_series(series_returns, beta = 0.95, quiet = True):

    losses = -series_returns.dropna().values  # loss = -return
    q = losses.shape[0]

    m = Model("eval_cvar_single")
    if quiet:
        m.Params.OutputFlag = 0

    alpha = m.addVar(lb = -GRB.INFINITY, name = "alpha")
    u = m.addVars(q, lb = 0.0, name = "u")

    inv_tail_mass = 1.0 / ((1.0 - beta) * q)
    m.setObjective(alpha + inv_tail_mass * quicksum(u[k] for k in range(q)), GRB.MINIMIZE)

    for k in range(q):
        m.addConstr(u[k] >= float(losses[k]) - alpha, name=f"tail_single_{k}")

    m.optimize()
    return {"VaR": alpha.X, "CVaR": m.objVal}

## Run experiment (Beta = 0.95)
- Train on 2019 with Beta = 0.95 and R = 0.02% daily
- Report in-sample (2019) and out-of-sample (2020) VaR/CVaR
- Compare with NDX for both years
- Print top weights

In [9]:
beta_values   = [0.95]   # later: [0.90, 0.95, 0.99] for sensitivity analysis
min_daily_ret = 0.0002   # 0.02% minimum expected daily return

experiment_out = []
for beta in beta_values:
    # 1) train: learn weights on 2019
    train = solve_cvar_min_long_only(rets_2019, beta = beta, min_return = min_daily_ret, quiet = True)
    w_star = train["x"]

    # 2) evaluate: fixed weights on 2019 (in-sample) and 2020 (out-of-sample)
    ins = evaluate_cvar_fixed_weights(rets_2019, w_star, beta = beta, quiet = True)
    oos = evaluate_cvar_fixed_weights(rets_2020, w_star, beta = beta, quiet = True)

    # 3) benchmark: NDX alone in both periods
    ndx_2019_stats = evaluate_cvar_single_series(ndx_2019, beta = beta, quiet = True)
    ndx_2020_stats = evaluate_cvar_single_series(ndx_2020, beta = beta, quiet = True)

    print(f"\n=== beta = {beta:.2f} ===")
    print(f"in-sample  (2019)  VaR = {as_pct(ins['VaR']):.4f}%   CVaR = {as_pct(ins['CVaR']):.4f}%")
    print(f"out-of-sample(2020) VaR = {as_pct(oos['VaR']):.4f}%   CVaR = {as_pct(oos['CVaR']):.4f}%")
    print(f"NDX 2019            VaR = {as_pct(ndx_2019_stats['VaR']):.4f}%   CVaR = {as_pct(ndx_2019_stats['CVaR']):.4f}%")
    print(f"NDX 2020            VaR = {as_pct(ndx_2020_stats['VaR']):.4f}%   CVaR = {as_pct(ndx_2020_stats['CVaR']):.4f}%")

    # show top allocations for interpretability
    top5 = w_star.head(5)
    print("\nTop-5 weights:")
    for t, w in top5.items():
        print(f"  {t:<10s} {as_pct(w):6.2f}%")

    experiment_out.append({
        "beta": beta,
        "weights": w_star,
        "ins": ins,
        "oos": oos,
        "ndx2019": ndx_2019_stats,
        "ndx2020": ndx_2020_stats
    })

Set parameter Username
Set parameter LicenseID to value 2699652
Academic license - for non-commercial use only - expires 2026-08-25

=== beta = 0.95 ===
in-sample  (2019)  VaR = 0.8500%   CVaR = 1.1090%
out-of-sample(2020) VaR = 2.5389%   CVaR = 4.6563%
NDX 2019            VaR = 1.6209%   CVaR = 2.4743%
NDX 2020            VaR = 3.9098%   CVaR = 5.6495%

Top-5 weights:
  XEL         30.39%
  CHTR        26.50%
  AMGN         8.08%
  CTXS         6.56%
  CHKP         6.27%


## Results & Interpretation (β = 0.95)

Running the CVaR optimization on 2019 returns gave the following:

- **In-sample (2019)**  
  - Portfolio VaR ≈ **0.85%** and CVaR ≈ **1.11%**  
  - Benchmark NDX VaR ≈ **1.62%** and CVaR ≈ **2.47%**  
  - → The optimized portfolio cuts tail risk roughly in half compared to the index.

- **Out-of-sample (2020)**  
  - Portfolio VaR ≈ **2.54%** and CVaR ≈ **4.66%**  
  - Benchmark NDX VaR ≈ **3.91%** and CVaR ≈ **5.65%**  
  - → Tail risk rose sharply in the pandemic year, but the optimized portfolio still offered protection relative to the index.

- **Top allocations (2019 solution)**  
  - XEL ~30%  
  - CHTR ~27%  
  - AMGN ~8%  
  - CTXS ~7%  
  - CHKP ~6%  
____________________________________________________________________________________________
- The optimizer concentrated heavily in a few defensive or stable-growth names (utilities like XEL, healthcare like AMGN, and tech with steady cash flows). This reflects the model’s focus on minimizing downside rather than diversifying broadly.
- The sharp rise in CVaR from 2019 to 2020 highlights **non-stationarity**: risk patterns change when the market regime shifts. A portfolio that looks “safe” in one year can still be hit hard in an unprecedented shock like COVID.
- Even so, the portfolio consistently dominated the benchmark on CVaR, showing that **systematic risk control pays off**, even if it can’t fully shield against extreme environments.