<a href="https://colab.research.google.com/github/PercyMayaba/codepipeline-s3-game/blob/main/Market_Risk_Monte_Carlo_Notebook_(JSE).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Environment: imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy.stats import norm

In [None]:
pip install yfinance



In [None]:
# 1) Portfolio: 8 real JSE underlyings (tickers for yfinance)
# NOTE: These are the underlying equities commonly used for JSE derivatives.
# The notebook assumes exchange-listed derivatives exist (single-stock futures / options)
# You should confirm exact derivative contract codes with the JSE or your broker.
TICKERS = [
'NPN.JO', # Naspers Ltd (Class N)
'SOL.JO', # Sasol Limited
'MTN.JO', # MTN Group
'AGL.JO', # Anglo American plc
'SBK.JO', # Standard Bank Group
'FSR.JO', # FirstRand Limited
'ANG.JO', # AngloGold Ashanti
'BHG.JO', # BHP Group Limited (JSE listing)
]

In [None]:
# Parameters & helper functions
RISK_FREE_RATE = 0.08 # annual, placeholder (replace with actual SA yield curve)
DAYS_PER_YEAR = 252


def fetch_price_series(tickers, start='2018-01-01', end=None):
    data = yf.download(tickers, start=start, end=end, progress=False)
    # Handle multi-level columns if present
    if isinstance(data.columns, pd.MultiIndex):
        # Select 'Adj Close' for all tickers
        prices = data['Adj Close']
    elif 'Adj Close' in data:
        # Single ticker case or older yfinance versions
        prices = data['Adj Close'].to_frame()
    else:
        raise ValueError("Could not find 'Adj Close' in downloaded data.")

    prices = prices.dropna(how='all')
    # Ensure column names are just the ticker symbols
    prices.columns = [col[0] if isinstance(col, tuple) else col for col in prices.columns]
    return prices


def log_returns(price_df):
    return np.log(price_df / price_df.shift(1)).dropna()


# Annualize daily vol
def annualize_vol(std_daily):
    return std_daily * np.sqrt(DAYS_PER_YEAR)

In [None]:
# Monte Carlo - correlated GBM (simulate final prices at horizon)


def simulate_gbm_final(S0, mu, sigma, corr, T, n_paths=50000, seed=42):
    """
    Simulate final prices for correlated GBM under historical (real-world) drift mu.
    S0, mu, sigma are 1D arrays (n_assets,)
    corr is correlation matrix
    T is time horizon in years
    Returns S_T: shape (n_paths, n_assets)
    """
    np.random.seed(seed)
    n_assets = len(S0)
    L = np.linalg.cholesky(corr)
    Z = np.random.normal(size=(n_paths, n_assets))
    correlated = Z @ L.T
    drift = (mu - 0.5*(sigma**2)) * T
    diffusion = sigma * np.sqrt(T) * correlated
    logS = np.log(S0) + drift + diffusion
    S_T = np.exp(logS)
    return S_T

In [None]:
print(TICKERS)

['NPN.JO', 'SOL.JO', 'MTN.JO', 'AGL.JO', 'SBK.JO', 'FSR.JO', 'ANG.JO', 'BHG.JO']


In [None]:
# Define the time horizon for the simulation in days
T_days = 252 # Example: 1 year horizon (assuming 252 trading days)



# Task
Define `S_T` by calling the `simulate_gbm_final` function with the calculated/defined inputs.

## Define initial prices (s0)

### Subtask:
Get the most recent closing prices for the tickers defined in the `TICKERS` list.


**Reasoning**:
Fetch the historical price data and extract the most recent closing prices into a NumPy array.



In [None]:
prices_df = fetch_price_series(TICKERS)
S0 = prices_df.iloc[-1].values

  data = yf.download(tickers, start=start, end=end, progress=False)


**Reasoning**:
The error indicates that 'Adj Close' is not directly available, likely due to `auto_adjust=True` being the new default in `yfinance`. This means the data is already adjusted, and the columns are likely just 'Open', 'High', 'Low', 'Close', 'Volume'. The function needs to be updated to handle this. I will modify the `fetch_price_series` function to select the 'Close' column when 'Adj Close' is not present and `auto_adjust=True`.



In [None]:
def fetch_price_series(tickers, start='2018-01-01', end=None):
    data = yf.download(tickers, start=start, end=end, progress=False)
    # Handle multi-level columns if present
    if isinstance(data.columns, pd.MultiIndex):
        # Select 'Adj Close' if available, otherwise 'Close'
        if 'Adj Close' in data.columns.get_level_values(0):
            prices = data['Adj Close']
        elif 'Close' in data.columns.get_level_values(0):
             prices = data['Close']
        else:
             raise ValueError("Could not find 'Adj Close' or 'Close' in downloaded data.")
    elif 'Adj Close' in data:
        # Single ticker case or older yfinance versions
        prices = data['Adj Close'].to_frame()
    elif 'Close' in data:
        # Single ticker case with auto_adjust=True
        prices = data['Close'].to_frame()
    else:
        raise ValueError("Could not find 'Adj Close' or 'Close' in downloaded data.")

    prices = prices.dropna(how='all')
    # Ensure column names are just the ticker symbols
    prices.columns = [col[0] if isinstance(col, tuple) else col for col in prices.columns]
    return prices

prices_df = fetch_price_series(TICKERS)
S0 = prices_df.iloc[-1].values

  data = yf.download(tickers, start=start, end=end, progress=False)


## Calculate historical returns

### Subtask:
Calculate the daily log returns from the price data stored in the `prices_df` DataFrame.


**Reasoning**:
Calculate the daily log returns from the price data.



In [None]:
daily_log_returns = np.log(prices_df / prices_df.shift(1)).dropna()

## Calculate historical drift (mu)

### Subtask:
Calculate the annualized mean of the daily log returns stored in the `daily_log_returns` DataFrame to represent the historical drift (`mu`).


**Reasoning**:
Calculate the mean of daily log returns and annualize them to get the historical drift (mu).



In [None]:
mu = daily_log_returns.mean() * DAYS_PER_YEAR
mu = mu.values

## Calculate historical volatility (sigma)

### Subtask:
Calculate the annualized standard deviation of the daily log returns stored in the `daily_log_returns` DataFrame to represent the historical volatility (`sigma`).


**Reasoning**:
Calculate the annualized standard deviation of the daily log returns and store it in the `sigma` variable.



In [None]:
sigma = daily_log_returns.std() * np.sqrt(DAYS_PER_YEAR)
sigma = sigma.values

## Calculate correlation matrix (corr)

### Subtask:
Calculate the correlation matrix of the daily log returns stored in the `daily_log_returns` DataFrame.


**Reasoning**:
Calculate the correlation matrix of the daily log returns and store it as a NumPy array named `corr`.



In [None]:
corr = daily_log_returns.corr().values

## Define portfolio weights

### Subtask:
Define the weights for each asset in the portfolio.

**Reasoning**:
Define an array of equal weights for each asset in the portfolio.

In [None]:
n_assets = len(TICKERS)
weights = np.array([1.0 / n_assets] * n_assets)
print(weights)

[0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125]


# Task
Calculate and report the VaR and ES of the portfolio.

## Define portfolio weights

### Subtask:
Define the weights for each asset in the portfolio.


**Reasoning**:
Define an array of equal weights for each asset in the portfolio.



In [None]:
n_assets = len(TICKERS)
weights = np.array([1.0 / n_assets] * n_assets)
print(weights)

[0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125]


## Calculate portfolio initial value

### Subtask:
Calculate the initial value of the portfolio based on the initial prices (`S0`) and the portfolio weights (`weights`).


**Reasoning**:
Calculate the initial portfolio value by taking the dot product of the initial prices and the weights.



In [None]:
portfolio_initial_value = np.dot(S0, weights)
print(f"Initial Portfolio Value: {portfolio_initial_value:.2f}")

Initial Portfolio Value: 53154.25


# Task
Calculate and report the Value at Risk (VaR) and Expected Shortfall (ES) for a portfolio of assets using a Geometric Brownian Motion (GBM) simulation. The analysis should include fetching historical price data for a predefined list of tickers, calculating historical risk metrics (drift, volatility, correlation), simulating future price paths using GBM, calculating the portfolio's profit and loss (PnL), and finally computing and reporting the VaR and ES at a 99% confidence level.

## Define initial prices (s0)

### Subtask:
Get the most recent closing prices for the tickers defined in the `TICKERS` list.


**Reasoning**:
Fetch the historical price data using the updated `fetch_price_series` function and extract the most recent closing prices into a NumPy array `S0`.



In [None]:
prices_df = fetch_price_series(TICKERS)
S0 = prices_df.iloc[-1].values

  data = yf.download(tickers, start=start, end=end, progress=False)


## Define portfolio weights

### Subtask:
Define the weights for each asset in the portfolio.


**Reasoning**:
Define an array of equal weights for each asset in the portfolio.



In [None]:
n_assets = len(TICKERS)
weights = np.array([1.0 / n_assets] * n_assets)
print(weights)

[0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125]


## Calculate portfolio initial value

### Subtask:
Calculate the initial value of the portfolio based on the initial prices (`S0`) and the portfolio weights (`weights`).


**Reasoning**:
Calculate the initial portfolio value by taking the dot product of the initial prices and the weights.



In [None]:
portfolio_initial_value = np.dot(S0, weights)
print(f"Initial Portfolio Value: {portfolio_initial_value:.2f}")

Initial Portfolio Value: 53118.88


## Define time horizon (t)

### Subtask:
Convert the time horizon in days (`T_days`) to years (`T`).


**Reasoning**:
Convert the time horizon from days to years.



In [None]:
T = T_days / DAYS_PER_YEAR
print(f"Time Horizon (T) in years: {T:.2f}")

Time Horizon (T) in years: 1.00


## Define number of paths (n paths)

### Subtask:
Define the number of simulation paths for the Monte Carlo simulation.


**Reasoning**:
Define the number of simulation paths for the Monte Carlo simulation and print its value.



In [None]:
n_paths = 200000
print(f"Number of simulation paths: {n_paths}")

Number of simulation paths: 200000


## Simulate final prices (s t)

### Subtask:
Call the `simulate_gbm_final` function with the calculated/defined inputs to get `S_T`.


**Reasoning**:
Call the `simulate_gbm_final` function with the calculated inputs and print the shape of the resulting array.



In [None]:
S_T = simulate_gbm_final(S0, mu, sigma, corr, T, n_paths=n_paths)
print("Shape of simulated final prices (S_T):", S_T.shape)

Shape of simulated final prices (S_T): (200000, 8)


## Calculate simulated portfolio final values

### Subtask:
Calculate the simulated final values of the portfolio based on the simulated final prices (`S_T`) and the portfolio weights (`weights`).


**Reasoning**:
Calculate the simulated final portfolio values by taking the dot product of the simulated final prices and the portfolio weights.



In [None]:
portfolio_final_values = np.dot(S_T, weights)
print("Shape of simulated final portfolio values:", portfolio_final_values.shape)

Shape of simulated final portfolio values: (200000,)


## Calculate portfolio pnl (pv pnl)

### Subtask:
Calculate the portfolio profit and loss (`pv_pnl`) as the difference between the simulated final values and the initial value.


**Reasoning**:
Calculate the portfolio profit and loss by subtracting the initial portfolio value from the simulated final portfolio values and store it in `pv_pnl`.



In [None]:
pv_pnl = portfolio_final_values - portfolio_initial_value
print("Shape of portfolio PnL (pv_pnl):", pv_pnl.shape)

Shape of portfolio PnL (pv_pnl): (200000,)


## Calculate risk metrics

### Subtask:
Calculate VaR and ES using the `var_es` function with the calculated `pv_pnl`.


**Reasoning**:
Calculate VaR and ES at 99% and 95% confidence levels using the calculated pv_pnl.



In [None]:
def var_es(pnl_array, alpha=0.99):
    losses = -pnl_array
    var = np.quantile(losses, alpha)
    es = losses[losses >= var].mean()
    return var, es

# Portfolio Risk Analysis using Monte Carlo Simulation (Geometric Brownian Motion)

This repository contains a Jupyter notebook that performs a risk analysis for a portfolio of South African JSE underlyings using a Monte Carlo simulation based on the Geometric Brownian Motion (GBM) model. The analysis focuses on calculating the Value at Risk (VaR) and Expected Shortfall (ES) of the portfolio over a specified time horizon.

## Methodology

The risk analysis follows these key steps:

1.  **Data Loading**: Historical adjusted closing prices for a predefined list of JSE tickers are fetched using the `yfinance` library.
2.  **Historical Risk Metrics Calculation**: Daily log returns are calculated from the historical price data. The annualized historical drift (mean) and volatility (standard deviation) of these log returns, as well as their correlation matrix, are computed. These historical metrics are used as inputs for the GBM simulation.
3.  **Portfolio Definition**: An equally-weighted portfolio is defined for the selected assets. The initial value of the portfolio is calculated based on the most recent closing prices and the defined weights.
4.  **Monte Carlo Simulation (Correlated GBM)**: The `simulate_gbm_final` function is used to simulate multiple potential price paths for each asset over the specified time horizon (e.g., 1 year). The simulation uses a correlated Geometric Brownian Motion model, incorporating the historical drift, volatility, and the correlation matrix between assets.
5.  **Portfolio Profit and Loss (PnL) Calculation**: For each simulated price path, the final value of the portfolio is calculated based on the simulated asset prices and the portfolio weights. The portfolio PnL for each path is then determined as the difference between the simulated final value and the initial portfolio value.
6.  **Value at Risk (VaR) and Expected Shortfall (ES) Calculation**: Using the distribution of simulated portfolio PnL values, the VaR and ES are calculated at specified confidence levels (e.g., 99% and 95%).
    *   **VaR** represents the maximum expected loss over a given time horizon at a certain confidence level.
    *   **ES** (also known as Conditional VaR) represents the expected loss given that the loss is greater than or equal to the VaR. It provides a more conservative measure of tail risk.

## Results

Based on the analysis performed with a 1-year time horizon (`T=1.00`), 200,000 simulation paths, and an equally-weighted portfolio of the specified JSE tickers, the following risk metrics were calculated:

*   **Initial Portfolio Value**: 53118.88 ZAR
*   **Portfolio VaR 99%**: 28437.74 ZAR
*   **Portfolio ES 99%**: 30693.93 ZAR
*   **Portfolio VaR 95%**: 22672.69 ZAR
*   **Portfolio ES 95%**: 26205.81 ZAR

## Key Findings and Insights

*   The initial portfolio value based on the most recent closing prices and equal weights for the 8 assets was calculated to be \$53118.88.
*   The Geometric Brownian Motion (GBM) simulation was performed for a one-year time horizon (`T=1.00`) using 200,000 simulation paths.
*   The simulated final prices resulted in a portfolio PnL calculated for each of the 200,000 simulation paths.
*   The calculated risk metrics at the 99% confidence level are: Portfolio VaR 99% (ZAR): 28437.74 and Portfolio ES 99% (ZAR): 30693.93.
*   The calculated risk metrics at the 95% confidence level are: Portfolio VaR 95% (ZAR): 22672.69 and Portfolio ES 95% (ZAR): 26205.81.
*   The Expected Shortfall (ES) at both confidence levels (99% and 95%) is higher than the corresponding VaR, as expected, indicating the average loss in the worst-case scenarios is greater than the VaR threshold.

## Next Steps

*   Consider performing a sensitivity analysis to understand how changes in input parameters (e.g., time horizon, number of simulations, confidence level) affect the calculated VaR and ES.
*   Explore alternative risk models beyond GBM (e.g., historical simulation, GARCH models) to compare results and assess model risk.
*   Implement backtesting procedures to evaluate the accuracy of the VaR and ES estimates based on historical data.
*   Extend the analysis to include transaction costs, taxes, and other real-world factors.
*   Consider optimizing the portfolio weights based on risk and return objectives.

## Usage

To run this analysis, execute the cells in the provided Jupyter notebook sequentially. Ensure you have the necessary libraries installed (e.g., `yfinance`, `numpy`, `pandas`, `matplotlib`).

**Reasoning**:
The function `var_es` was not defined in the current session. I need to redefine the `var_es` function before using it to calculate VaR and ES.



In [None]:
def var_es(pnl_array, alpha=0.99):
    losses = -pnl_array
    var = np.quantile(losses, alpha)
    es = losses[losses >= var].mean()
    return var, es

var_99, es_99 = var_es(pv_pnl, alpha=0.99)
var_95, es_95 = var_es(pv_pnl, alpha=0.95)

print('\nPortfolio VaR 99% (ZAR):', np.round(var_99, 2))
print('Portfolio ES 99% (ZAR):', np.round(es_99, 2))
print('Portfolio VaR 95% (ZAR):', np.round(var_95, 2))
print('Portfolio ES 95% (ZAR):', np.round(es_95, 2))


Portfolio VaR 99% (ZAR): 28437.74
Portfolio ES 99% (ZAR): 30693.93
Portfolio VaR 95% (ZAR): 22672.69
Portfolio ES 95% (ZAR): 26205.81


## Report risk metrics

### Subtask:
Print the calculated VaR and ES values.


**Reasoning**:
Print the calculated VaR and ES values for both 99% and 95% confidence levels, formatted to two decimal places.



In [None]:
print('\nPortfolio VaR 99% (ZAR):', np.round(var_99, 2))
print('Portfolio ES 99% (ZAR):', np.round(es_99, 2))
print('Portfolio VaR 95% (ZAR):', np.round(var_95, 2))
print('Portfolio ES 95% (ZAR):', np.round(es_95, 2))


Portfolio VaR 99% (ZAR): 28437.74
Portfolio ES 99% (ZAR): 30693.93
Portfolio VaR 95% (ZAR): 22672.69
Portfolio ES 95% (ZAR): 26205.81


## Summary:

### Data Analysis Key Findings

*   The initial portfolio value based on the most recent closing prices and equal weights for the 8 assets was calculated to be \$53118.88.
*   The Geometric Brownian Motion (GBM) simulation was performed for a one-year time horizon (`T=1.00`) using 200,000 simulation paths.
*   The simulated final prices resulted in a portfolio PnL calculated for each of the 200,000 simulation paths.
*   The calculated risk metrics at the 99% confidence level are: Portfolio VaR 99% (ZAR): 28437.74 and Portfolio ES 99% (ZAR): 30693.93.
*   The calculated risk metrics at the 95% confidence level are: Portfolio VaR 95% (ZAR): 22672.69 and Portfolio ES 95% (ZAR): 26205.81.

### Insights or Next Steps

*   The Expected Shortfall (ES) at both confidence levels (99% and 95%) is higher than the corresponding VaR, as expected, indicating the average loss in the worst-case scenarios is greater than the VaR threshold.
*   Consider performing a sensitivity analysis to understand how changes in input parameters (e.g., time horizon, number of simulations, confidence level) affect the calculated VaR and ES.
