In [13]:
#Basic distributional risk metrics for a universe of assets.
#Import necessary packages

import numpy as np
import yfinance as yf
import pandas as pd

#Choose asset tickers and pull daily close prices.

tickers = ['GOOGL', 'AMZN', 'AAPL', 'MSFT', 'META', 'NVDA', 'TSLA']

start_date = '2021-09-01'
end_date = '2025-11-17'

price_data = yf.download(tickers, start=start_date, end=end_date, auto_adjust = False)

df = price_data

print(df.head())

[*********************100%***********************]  7 of 7 completed

Price        Adj Close                                                  \
Ticker            AAPL        AMZN       GOOGL        META        MSFT   
Date                                                                     
2021-09-01  149.158737  173.949997  144.220795  379.709198  292.057800   
2021-09-02  150.273743  173.156006  142.305511  372.980682  291.399841   
2021-09-03  150.909409  173.902496  142.754898  373.954651  291.390259   
2021-09-07  153.246918  175.464493  143.292206  379.838409  290.461182   
2021-09-08  151.701645  176.274994  142.706741  375.256653  290.490265   

Price                                   Close                          ...  \
Ticker           NVDA        TSLA        AAPL        AMZN       GOOGL  ...   
Date                                                                   ...   
2021-09-01  22.396118  244.696671  152.509995  173.949997  145.215500  ...   
2021-09-02  22.351208  244.130005  153.649994  173.156006  143.287003  ...   
2021-09-03  22.79




What have I assumed in this analysis?

Most obviously, this analysis considers close prices only. Intraday trading would not ignore intraday trading volatility and tail events for example.

Yahoo finance provides an adjusted close price which handles corporate action such as stock splits, mergers and dividends. We will use the adjusted close price in this analysis.

When we calculate risk metrics such as volatility, CVaR, VaR etc. we assume that the returns are independent and identically distributed. Financial data often does not have stable return distributions and contains autocorrelation. Volatility and CVaR assume that the returns are distributed approximately to a Gaussian distribution, however markets tend to have skew and fat tails.

I assume beta is constant and that historical returns represent future behaviour. 

I assume rebalance at the end of the day using close prices. Overnight gaps either are ignored or treated as single period returns.

I also assume that asset returns combine linearly and that I have no transaction costs. When I consider options I will need to consider non linear exposures.

I aim to improve my future analysis by refining these assumptions to create more advanced models and forecasts.

In [14]:
#Data exploration

print(df.isna().sum().sum())
print(df.shape)
print(df.columns.levels)

0
(1057, 42)
[['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume'], ['AAPL', 'AMZN', 'GOOGL', 'META', 'MSFT', 'NVDA', 'TSLA']]


In [15]:
#I create a matrix of DoD returns for risk metrics analysis
close_prices = df['Adj Close']
returns = close_prices.pct_change().dropna()
print(returns.head())
print(returns.shape)

Ticker          AAPL      AMZN     GOOGL      META      MSFT      NVDA  \
Date                                                                     
2021-09-02  0.007475 -0.004564 -0.013280 -0.017720 -0.002253 -0.002005   
2021-09-03  0.004230  0.004311  0.003158  0.002611 -0.000033  0.019959   
2021-09-07  0.015489  0.008982  0.003764  0.015734 -0.003188 -0.007923   
2021-09-08 -0.010084  0.004619 -0.004086 -0.012062  0.000100 -0.014253   
2021-09-09 -0.006705 -0.011726 -0.001044  0.001139 -0.009860 -0.007252   

Ticker          TSLA  
Date                  
2021-09-02 -0.002316  
2021-09-03  0.001611  
2021-09-07  0.026378  
2021-09-08  0.001262  
2021-09-09  0.001313  
(1056, 7)


I will consider the main risk metrics. Mean return, Volatility, Skewness, Kurtosis, Max drawdown, Value-at-Risk VaR, Conditional Value-at-Risk CVaR. 

Mean return measures the average (mean) return per period of a portfolio of stocks. This function considers a series (Portfolio of asset returns, $r_{i,t}$) and timeframe and returns a series of average returns over the timeframe considered asset-wise. $$\bar{r} = \frac{1}{T}\sum_{t=1}^Tr_t$$ 

In [17]:
mean_return = returns.mean()
print(mean_return)

Ticker
AAPL     0.000731
AMZN     0.000552
GOOGL    0.000820
META     0.000868
MSFT     0.000669
NVDA     0.002604
TSLA     0.001237
dtype: float64


Volatility measures the standard deviation from the mean of returns. This function inputs a series of returns at each imestep and returns a positive real number. $$\sigma = \sqrt{\frac{1}{T-1}\sum_{t=1}^T(r_t-\bar{r})^2}$$

In [None]:
vol = returns.std()
print(vol)

#Variance scales with time therefore volatility scales with the square root of time. So we can consider the volatility 
#across a defined timescale. Annual volatility considers daily volatility over the number of trading days in a year.

annual_vol = np.sqrt(252) * vol
print(annual_vol)

Ticker
AAPL     0.017953
AMZN     0.023188
GOOGL    0.020184
META     0.028792
MSFT     0.016770
NVDA     0.034149
TSLA     0.039137
dtype: float64
Ticker
AAPL     0.284992
AMZN     0.368103
GOOGL    0.320419
META     0.457055
MSFT     0.266217
NVDA     0.542106
TSLA     0.621283
dtype: float64


Skewness measures the aysmmetry of the probability distribution of a real valued random variable about its mean. Skew (Pearson's moment coefficient of skewness) is the third standardised moment of a random variable $X$. Simply, we are considering if losses are typically larger than gains and vice versa. Skewness inputs a dataframe of portfolio returns and returns a series of skewness for each stock in the portfolio. $$\gamma_{1} = E[(\frac{X-\mu}{\sigma})^3]$$ If $\sigma$ and $\mu$ are finite (which they are in the vast majority of cases I will consider) we can expand the above to give $$\gamma_{1} = \frac{E[X^3] - 3\mu\sigma^2-\mu^3}{\sigma^3}$$

In [21]:
skew = returns.skew()
print(skew)

Ticker
AAPL     0.504798
AMZN     0.176160
GOOGL    0.075148
META    -0.194329
MSFT     0.234173
NVDA     0.545025
TSLA     0.290225
dtype: float64


Kurtosis measures the degree of tailedness in the probability distribution of a real valued random variable $X$.
Simply, kurtosis measures how heavy the tails of the distribution are. Higher kurtosis results in more extreme events. This function inputs a dataframe of portfolio returns and returns a series of kurtosis for each stock considered in the portfolio. Kurtosis is the fourth standardised moment defined as $$Kurt[X] = E([\frac{X-\mu}{\sigma}]^4)$$

In [22]:
kurt = returns.kurt()
print(kurt)

Ticker
AAPL      7.012126
AMZN      4.852457
GOOGL     3.058533
META     20.094752
MSFT      3.268261
NVDA      4.464531
TSLA      2.823907
dtype: float64


Max drawdown measures the largest peak-to-trough drop in Net Asset Value (NAV). This formula considers an input of a dataframe of stock portfolio returns and returns a series of max drawdown for each stock over the considered timeframe.

In [24]:
nav = (1 + returns).cumprod()
cummax = nav.cummax()
drawdowns = (nav - cummax) / cummax
max_drawdown = drawdowns.min()
print(max_drawdown)

Ticker
AAPL    -0.333605
AMZN    -0.557258
GOOGL   -0.443201
META    -0.767361
MSFT    -0.371485
NVDA    -0.663351
TSLA    -0.736322
dtype: float64


Value-at-Risk (VaR) can be considered as non-parametric (Historical VaR considers historical returns) or parametric (Gaussian VaR assumes returns follow a normal distribution). Historical VaR sorts the exact historical returns from worst to best and asks what is the worst loss experienced? The VaR at a chosen confidence interval is the relevant percentile (5th for $95%$ VaR) Gaussian VaR asks what would be the worst loss if returns were normal. Markets are not normally distributed so real risk is understated. Historical VaR captures tail events and has better accuracy in crises. 

Parametric VaR is sensitive to standard deviation, non parametric VaR is sensitive to historical window and bias from unusual past events.

Parametric VaR assumes $$R_p  ∼ \mathcal{N}(\mu, \sigma^{2})$$ then for a confidence level $c$: $$VaR_c = -(\mu + z_c\sigma)$$

Both forms consider a dataframe of portfolio returns and return a series of VaR

In [27]:
#Historic
var_95 = returns.quantile(0.05)
var_99 = returns.quantile(0.01)
print(var_95)
print(var_99)

#Parametric
from scipy.stats import norm

confidence = 0.05
z = norm.ppf(1- confidence)
VaR = -(mean_return + z * vol)
print(VaR)

Ticker
AAPL    -0.028851
AMZN    -0.033641
GOOGL   -0.030853
META    -0.040668
MSFT    -0.026597
NVDA    -0.051446
TSLA    -0.061009
Name: 0.05, dtype: float64
Ticker
AAPL    -0.047379
AMZN    -0.057697
GOOGL   -0.049088
META    -0.067544
MSFT    -0.042042
NVDA    -0.077289
TSLA    -0.097410
Name: 0.01, dtype: float64
Ticker
AAPL    -0.030260
AMZN    -0.038693
GOOGL   -0.034020
META    -0.048226
MSFT    -0.028253
NVDA    -0.058775
TSLA    -0.065612
dtype: float64


Conditional Value at Risk or Expected Shortfall gives us the expected loss beyond the VaR threshold. ES is more sensitive to tail shape of the distribution than VaR. If $c$ is our confidence interval, $$CVaR_{c} = E[r_t | r_t < VaR_{c}]$$ Formally, if $X \in L^p(\mathcal{F})$ is the payoff of our portfolio at future time and $0 < c < 1$ then the expected shortfall is given $$ES_c(X) = \frac{1}{c}\int^c_0VaR_\gamma(X)d\gamma$$ where $VaR_{\gamma}$ is the value at risk. 

We assume $$R_p  ∼ \mathcal{N}(\mu, \sigma^{2})$$ then the analytical formula for CVaR at confidence level $c$ is $$CVaR_c = -\mu + \sigma\frac{\phi(z_c)}{c}$$ where $z_c = \Phi^{-1}(c)$ is the standard normal quantile and $\phi(z)$ is the standard normal PDF evaluated at $z_c$. The negative sign ensures CVaR is positive for losses.

This function inputs a dataframe of portfolio returns and outputs a series of real CVaR values.
CVaR assumes normal returns ignoring fat tails and skewness and captures linear portfolio exposure. It uses historical mean and covariance as expected future parameters.

In [31]:
#Non parametric CVaR
cvar_95 = returns[returns <= var_95].mean()
print(-cvar_95)

#Parametric CVaR
confidence2 = 0.95
z2 = norm.ppf(confidence2)

PDF_z = norm.pdf(z2)
CVaR = -mean_return + vol * PDF_z / (1-confidence2)
print(CVaR)

Ticker
AAPL     0.039856
AMZN     0.051665
GOOGL    0.044711
META     0.062625
MSFT     0.037038
NVDA     0.069666
TSLA     0.084176
dtype: float64
Ticker
AAPL     0.036301
AMZN     0.047279
GOOGL    0.040815
META     0.058522
MSFT     0.033923
NVDA     0.067837
TSLA     0.079492
dtype: float64


These formula should be updated to consider matrix formatted formula (w, portfolio weights and covariance matrix)