# Portfolio Variance
Let's make up two stocks portfolio and calculate variance

In [1]:
import numpy as np
import pandas as pd
import time
import os
import matplotlib.pyplot as plt

In [2]:
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

### load data

In [3]:
# load data from csv
all_stocks = pd.read_csv('20200101-20210101.csv').iloc[:,1:]
universe = all_stocks.sort_index(axis=0, ascending=False)
# convert date to standard string format, easy to filter
universe["date"] = pd.to_datetime(universe["trade_date"], format='%Y%m%d')
universe["date"] = universe.date.apply(lambda x: x.strftime("%Y-%m-%d"))
# drop missing data
universe = universe.dropna()
universe = universe.sort_values(by=["date", "ts_code"]).reset_index(drop=True)

In [9]:
# process data
returns_df = universe.pivot(index='date', columns='ts_code', values='close')
returns_df = returns_df.pct_change()[1:].fillna(0)
returns_df

ts_code,000001.SZ,000002.SZ,000004.SZ,000005.SZ,000006.SZ,000007.SZ,000008.SZ,000009.SZ,000010.SZ,000011.SZ,...,688668.SH,688678.SH,688679.SH,688686.SH,688698.SH,688699.SH,688777.SH,688788.SH,688981.SH,689009.SH
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-01-03,0.018376,-0.015663,0.000896,0.003185,0.014815,-0.004188,0.010782,-0.039039,0.000000,-0.002103,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2020-01-06,-0.006403,-0.016849,-0.026846,0.000000,-0.003650,-0.003155,-0.005333,0.100000,0.005865,-0.010537,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2020-01-07,0.004687,0.007934,0.016092,0.015873,0.005495,0.003165,0.016086,0.017045,0.032070,0.017039,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2020-01-08,-0.028571,-0.002519,-0.014480,-0.015625,-0.018215,-0.011567,-0.018470,-0.006983,-0.008475,-0.027225,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2020-01-09,0.007803,0.016414,0.024793,0.019048,0.014842,0.004255,0.013441,-0.012658,0.051282,0.011841,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-12-25,-0.012048,0.005727,-0.000460,0.003984,-0.012238,-0.006993,0.000000,0.017426,0.056757,0.001693,...,-0.042585,-0.014220,0.000000,0.0,0.000000,0.000845,-0.018178,-0.039874,-0.010739,0.027360
2020-12-28,0.044900,0.011388,-0.058038,-0.011905,-0.026549,-0.002347,-0.019531,-0.031621,-0.081841,-0.029586,...,-0.063722,-0.107212,-0.155100,0.0,0.000000,-0.055743,0.041200,-0.200000,-0.026203,0.066578
2020-12-29,0.016976,0.003519,0.010269,0.004016,0.005455,0.011765,0.000000,-0.002721,0.100279,0.032230,...,0.007075,-0.019214,-0.032358,0.0,0.000000,-0.033810,-0.038495,-0.124932,0.009610,-0.032959
2020-12-30,0.001565,-0.004909,-0.009197,-0.004000,0.001808,-0.100000,0.003984,0.034106,0.005063,0.026160,...,0.000669,-0.009795,0.012219,0.0,-0.033467,0.003518,0.015954,0.038170,0.075005,0.044539


## Let's look at a two stock portfolio

Let's pretend we have a portfolio of two stocks.  We'll pick PingAn and WanKe in this example.

In [10]:
pa_col = returns_df.columns[0]
wk_col = returns_df.columns[1]
asset_return_1 = returns_df[pa_col].rename('asset_return_pa')
asset_return_2 = returns_df[wk_col].rename('asset_return_wk')
asset_return_df = pd.concat([asset_return_1,asset_return_2],axis=1)
asset_return_df.head(2)

Unnamed: 0_level_0,asset_return_pa,asset_return_wk
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-03,0.018376,-0.015663
2020-01-06,-0.006403,-0.016849


## Factor returns
Let's make up a "factor" by taking an average of all stocks in our list.  You can think of this as an equal weighted index of the 490 stocks, kind of like a measure of the "market".  We'll also make another factor by calculating the median of all the stocks.  These are mainly intended to help us generate some data to work with.  We'll go into how some common risk factors are generated later in the lessons.

Also note that we're setting axis=1 so that we calculate a value for each time period (row) instead of one value for each column (assets).

In [11]:
factor_return_1 = returns_df.mean(axis=1)
factor_return_2 = returns_df.median(axis=1)
factor_return_l = [factor_return_1, factor_return_2]

## Factor exposures

Factor exposures refer to how "exposed" a stock is to each factor.  We'll get into this more later.  For now, just think of this as one number for each stock, for each of the factors.

In [12]:
from sklearn.linear_model import LinearRegression

In [13]:
"""
For now, just assume that we're calculating a number for each 
stock, for each factor, which represents how "exposed" each stock is
to each factor. 
We'll discuss how factor exposure is calculated later in the lessons.
"""
def get_factor_exposures(factor_return_l, asset_return):
    lr = LinearRegression()
    X = np.array(factor_return_l).T
    y = np.array(asset_return.values)
    lr.fit(X,y)
    print(lr.intercept_)
    return lr.coef_

In [14]:
factor_exposure_l = []
for i in range(len(asset_return_df.columns)):
    factor_exposure_l.append(
        get_factor_exposures(factor_return_l,
                             asset_return_df[asset_return_df.columns[i]]
                            ))
    
factor_exposure_a = np.array(factor_exposure_l)

0.00614415940460716
0.005453302613336214


In [15]:
print(f"factor_exposures for asset 1 {factor_exposure_a[0]}")
print(f"factor_exposures for asset 2 {factor_exposure_a[1]}")

factor_exposures for asset 1 [-1.56542126  2.39046995]
factor_exposures for asset 2 [-1.88919287  2.5323888 ]


# Portfolio Variance
We calculate variance by variable first and do it agin by matrix.

## Variance of stock 1

Calculate the variance of stock 1.  
$\textrm{Var}(r_{1}) = \beta_{1,1}^2 \textrm{Var}(f_{1}) + \beta_{1,2}^2 \textrm{Var}(f_{2}) + 2\beta_{1,1}\beta_{1,2}\textrm{Cov}(f_{1},f_{2}) + \textrm{Var}(s_{1})$

In [35]:
factor_exposure_1_1 = factor_exposure_a[0][0]
factor_exposure_1_2 = factor_exposure_a[0][1]
common_return_1 = factor_exposure_1_1 * factor_return_1 + factor_exposure_1_2 * factor_return_2
specific_return_1 = asset_return_1 - common_return_1

## Variance of stock 2
Calculate the variance of stock 2.  
$\textrm{Var}(r_{2}) = \beta_{2,1}^2 \textrm{Var}(f_{1}) + \beta_{2,2}^2 \textrm{Var}(f_{2}) + 2\beta_{2,1}\beta_{2,2}\textrm{Cov}(f_{1},f_{2}) + \textrm{Var}(s_{2})$

In [18]:
factor_exposure_2_1 = factor_exposure_a[1][0]
factor_exposure_2_2 = factor_exposure_a[1][1]
common_return_2 = factor_exposure_2_1 * factor_return_1 + factor_exposure_2_2 * factor_return_2
specific_return_2 = asset_return_2 - common_return_2

## Specific return
Calculate specific return.
$ \textrm{Var}(s_{2}) = \textrm{Var}(r_{2}) - \beta_{2,1}^2 \textrm{Var}(f_{1}) + \beta_{2,2}^2 \textrm{Var}(f_{2}) + 2\beta_{2,1}\beta_{2,2}\textrm{Cov}(f_{1},f_{2}) $

In [40]:
common_return = factor_exposure_a.dot(factor_return_l)
specific_return = asset_return_df.values - common_return.T
specific_return = [np.var(specific_return[:,0],ddof=1), np.var(specific_return[:,1],ddof=1)]
specific_return

[0.0003200165850523789, 0.0002851129202584598]

## Quiz 2: Do it with Matrices!

Create matrices $\mathbf{F}$, $\mathbf{B}$ and $\mathbf{S}$, where  
$\mathbf{F}= \begin{pmatrix}
\textrm{Var}(f_1) & \textrm{Cov}(f_1,f_2) \\ 
\textrm{Cov}(f_2,f_1) & \textrm{Var}(f_2) 
\end{pmatrix}$
is the covariance matrix of factors,  

$\mathbf{B} = \begin{pmatrix}
\beta_{1,1}, \beta_{1,2}\\ 
\beta_{2,1}, \beta_{2,2}
\end{pmatrix}$ 
is the matrix of factor exposures, and  

$\mathbf{S} = \begin{pmatrix}
\textrm{Var}(s_i) & 0\\ 
0 & \textrm{Var}(s_j)
\end{pmatrix}$
is the matrix of specific variances.  

$\mathbf{X} = \begin{pmatrix}
x_{1} \\
x_{2}
\end{pmatrix}$

we can calculate variance of portfolio blow:

$\textrm{Var}(r_p)$ = $\mathbf{X}^T(\mathbf{BFB}^T + \mathbf{S})\mathbf{X}$ 

## Quiz 3: Calculate portfolio variance using matrices

In [32]:
# TODO: covariance matrix of factors
F = covm_f1_f2
F

array([[0.00021041, 0.00020202],
       [0.00020202, 0.00020049]])

In [33]:
# TODO: matrix of factor exposures
B = factor_exposure_a
B

array([[-1.56542126,  2.39046995],
       [-1.88919287,  2.5323888 ]])

In [41]:
# TODO: matrix of specific variances
S = np.diag(specific_return)
S

array([[0.00032002, 0.        ],
       [0.        , 0.00028511]])

#### Hint for column vectors
Try using [reshape](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.reshape.html)

In [42]:
# TODO: make a column vector for stock weights matrix X
weight_1 = 0.60
weight_2 = 0.40
X = np.array([weight_1,weight_2]).reshape(2,1)
X

array([[0.6],
       [0.4]])

In [43]:
# TODO: covariance matrix of assets
var_portfolio = (X.T).dot(B.dot(F).dot(B.T)+S).dot(X)
print(f"portfolio variance is \n{var_portfolio[0][0]:.8f}")

portfolio variance is 
0.00029006


## Solution
[Solution notebook is here](portfolio_variance_solution.ipynb)