In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Multi-factor models

We simply assumed that the residual $u$ is uncorrelated

This will never be exactly correct, but we better be sure that we are not missing some big sources of correlation 

If we miss our covariance assumption will be off

- Suppose we have a long only portfolio, are we going to over-estimate or under-estimate our true portfolio variance?

- Suppose we have a long-short, that goes long some assets and short others, would the same conclusion apply? why?

- Suppose we choose a portfolio by minimizing variance (like we just did) when are you going to over-estimate the variance of your portfolio?
    - If there is co-movement that you miss?
    - If there is no true c0-movement but you assumed there was co-movement?

Real data are likely to have many factors

- industry factors: tech stocks move all together
- Stocks that have high valuation multiples tends to move together with other high multiple stocks
- Small stocks tend to move with other small firms
- Firms that have been recently investing a lot move with other similar firms
- etc


We deal with this, by simply adding more factors to our model. Say we now have M different factors

$$r_t^i=b_{i,1}f_t^1+b_{i,2}f_t^2+b_{i,3}f_t^3+...+b_{i,M}f_t^M+u_{i,t}+u_{i,t}$$


Where $b_{i,j}$ measures the exposure of asset $i$ to factor $j$

IF we stack these exposures in a M by 1 vector $B_i=[b_{i,1},b_{i,2},...b_{i,M}]$ and the factors in a M by 1 vector $F_t=[f^1_t,f^2_t,...,f^M_t]$ we can write this in matrix notation

$$r_t^i=B_i@F_t+u_{i,t}$$


As before we can also stack the individual returns :

$$R_t=B@F_t+U_t$$

where

-  $R_t$ is a N by 1 vector with the excess returns of the N assets
-  $B$ is N by M matrix where each row has the exposure of an asset with respect to each M factor and each column has the exposures of the different assets with respect to a particular factor 
- $U_t$ as before is a N by 1 vector with the residual risk of each N asset


## Estimating a multi-factor model

Again, this is a very simple regression!

We simply run a multivariate regression for each asset and the coefficients on the different factors is our estimate!

## Application

Lets apply what we learned!

Lets do a 8 factor model---total overkill for these 50 stocks--

I will get the 3 factors from the Fama-French 3 factor model + 5 industry portfolios

What are these factors?

* HML is the value strategy that buys high book to market firms and sell low book to market firms

* SMB is a size strategy that buys firms with low market capitalization and sell firms with high market capitalizations

* the other are just portfolios based on the industry code of each firm that trades in the stock exchange

* Here they are split only in 5 sectors which is obviously too coarse

* One typically will use many more industry factors--say 50 or so.

* but since in this example we have only 50 stocks--that would be kind of silly thing to do








In [2]:

import pandas_datareader.data as web

# import 50 stocks data set
from datetime import datetime
# Create a date parser function that will allow pandas to read the dates in the format we have them
date_parser = lambda x: datetime.strptime(x, "%Y%m")
# our 50 stock returns data

url = "https://raw.githubusercontent.com/amoreira2/Fin418/main/assets/data/Retuns50stocks.csv"

# Use pd.read_csv with the date_parser
df  = pd.read_csv(url, parse_dates=['date'], date_parser=date_parser)
# Set the date column as the index

# first, create the DataFrame
df.set_index("date",inplace=True)
# put the returns in percentage format
df=df/100

# import factors


# Define the start and end dates for the data
start_date = '2000-01-01'
end_date = '2023-12-31'

# Get the Fama-French three-factor model data
ff_factors = web.DataReader('F-F_Research_Data_Factors', 'famafrench', start_date, end_date)[0]

# Get the five industry portfolio data
industry_portfolios = web.DataReader('5_Industry_Portfolios', 'famafrench', start_date, end_date)[0]

# Merge the two datasets together
merged_data = pd.merge(ff_factors, industry_portfolios, on='Date')
factors_names=merged_data.columns
merged_data.index = pd.to_datetime(merged_data.index.to_timestamp())
# put the returns in percentage format
merged_data=merged_data/100
#Merge with your df dataset to make sure the dates match
merged_data = pd.merge(merged_data, df.drop(columns=['Market']), left_on='Date',right_index=True,how='right')

# # Split the merged dataset into factors and individual assets
Rf = merged_data[['RF']]
Factors = merged_data[factors_names].drop(columns=['RF'])
Assets = merged_data.drop(columns=factors_names)

# factors
Factors.info()
Assets.info()


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 180 entries, 2000-01-01 to 2014-12-01
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mkt-RF  180 non-null    float64
 1   SMB     180 non-null    float64
 2   HML     180 non-null    float64
 3   Cnsmr   180 non-null    float64
 4   Manuf   180 non-null    float64
 5   HiTec   180 non-null    float64
 6   Hlth    180 non-null    float64
 7   Other   180 non-null    float64
dtypes: float64(8)
memory usage: 12.7 KB
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 180 entries, 2000-01-01 to 2014-12-01
Data columns (total 50 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   CTL     180 non-null    float64
 1   T       180 non-null    float64
 2   CSCO    180 non-null    float64
 3   FCX     180 non-null    float64
 4   XL      180 non-null    float64
 5   IVZ     180 non-null    float64
 6   AMT     180 non-null    float64
 7   WHR 

In [3]:
Beta=pd.DataFrame(index=Assets.columns[:-1],columns=Factors.columns,dtype=float)
VarU=pd.DataFrame(index=Assets.columns[:-1],columns=['VarU'],dtype=float)
x= sm.add_constant(Factors)
for stocki in Assets.columns:
    y= Assets[stocki]
    # run the regression
    results= sm.OLS(y,x).fit()
    # store beta parameters skipping the constant
    Beta.loc[stocki,:]=results.params[1:]
    # store the residual variance
    VarU.loc[stocki,'VarU']=results.resid.var()

Beta

Unnamed: 0,Mkt-RF,SMB,HML,Cnsmr,Manuf,HiTec,Hlth,Other
CTL,1.773661,-0.584068,0.013224,0.591461,-0.151144,-0.368082,0.024628,-0.729825
T,-0.765983,-0.70887,0.519897,0.565646,0.168277,0.789765,0.338093,-0.314557
CSCO,-1.209827,0.063438,0.304196,-0.0304,-0.0173,1.789234,0.132417,0.326659
FCX,0.579567,0.240257,-0.104024,0.38478,1.676865,-0.040493,-0.662841,-0.3798
XL,-2.138662,0.087942,-0.098236,0.743264,0.194891,0.325159,-0.088233,2.389057
IVZ,-2.943381,0.143413,-0.059142,0.369939,0.983337,1.722916,0.600754,1.188089
AMT,-3.074392,0.521988,0.450322,-0.642305,1.21921,2.357371,0.86701,0.144035
WHR,-0.123852,0.510154,1.116104,1.142745,-0.24719,0.323533,0.157214,0.360687
IR,-1.389932,0.025733,0.109128,1.001246,0.943648,0.638801,-0.102587,0.521383
WFT,-2.30703,0.411222,-0.381759,-0.359601,2.801833,0.651049,-0.450652,0.940941


## Reconstructing the co-variance matrix

We have

$$R_t=B@F_t+u_t$$

Then

$$Var(R_t)=B@Var(F_t)@B.T+Var(U_t)$$

The big difference is that now $F$ is a vector of factors


so $Var(F_t)$ is a M by M variance covariance matrix

In [4]:
Var_F=Factors.cov()
Cov_F=Beta @ Var_F @ Beta.T + np.diag(VarU['VarU'].values)
Cov_F

Unnamed: 0,CTL,T,CSCO,FCX,XL,IVZ,AMT,WHR,IR,WFT,...,JCI,SWK,DVN,TMO,PEP,LNC,EMR,MLM,CCI,NU
CTL,0.00655,0.001408,0.002038,0.002353,0.002344,0.003068,0.001478,0.002345,0.002524,0.002014,...,0.001866,0.001834,0.001465,0.001725,0.000972,0.002767,0.001863,0.001482,0.002324,0.000972
T,0.001408,0.004688,0.00184,0.001529,0.002097,0.002673,0.001342,0.002003,0.002108,0.001364,...,0.001478,0.001464,0.001073,0.001628,0.000898,0.002289,0.001655,0.001333,0.001848,0.000682
CSCO,0.002038,0.00184,0.011381,0.004141,0.003665,0.007108,0.005885,0.004123,0.004269,0.003808,...,0.003918,0.002949,0.001776,0.003202,0.000737,0.004472,0.002989,0.002185,0.006565,0.000827
FCX,0.002353,0.001529,0.004141,0.016977,0.004431,0.006325,0.004123,0.004492,0.005487,0.007251,...,0.004302,0.004351,0.005067,0.002811,0.000827,0.005794,0.003777,0.003399,0.005176,0.001782
XL,0.002344,0.002097,0.003665,0.004431,0.016149,0.00639,0.001929,0.006672,0.005633,0.004734,...,0.005489,0.00478,0.003223,0.002946,0.001755,0.008175,0.004011,0.004223,0.003393,0.001667
IVZ,0.003068,0.002673,0.007108,0.006325,0.00639,0.013659,0.006441,0.006358,0.006517,0.006437,...,0.005675,0.004862,0.004039,0.004611,0.001651,0.007721,0.004788,0.004047,0.007738,0.001823
AMT,0.001478,0.001342,0.005885,0.004123,0.001929,0.006441,0.031007,0.002818,0.003462,0.004313,...,0.002842,0.002246,0.002495,0.003106,0.000404,0.0034,0.002591,0.001729,0.006626,0.000725
WHR,0.002345,0.002003,0.004123,0.004492,0.006672,0.006358,0.002818,0.013016,0.005424,0.004185,...,0.005186,0.004573,0.002741,0.002847,0.001674,0.007106,0.003566,0.003761,0.003953,0.001647
IR,0.002524,0.002108,0.004269,0.005487,0.005633,0.006517,0.003462,0.005424,0.010436,0.005557,...,0.004651,0.004379,0.003777,0.003135,0.001402,0.006352,0.003864,0.003627,0.004728,0.001701
WFT,0.002014,0.001364,0.003808,0.007251,0.004734,0.006437,0.004313,0.004185,0.005557,0.015806,...,0.004336,0.004459,0.006108,0.002894,0.000642,0.006455,0.004048,0.003746,0.005082,0.001747


- If we compare the in-sample Variance of our minimum variance portfolios for
    - The unrestricted case
    - The single-factor covariance
    - The multi-factor covariance

- which one will have lowest variance? What will have the highest?
- Now split the sample in two. Repeat the covariance estimation procedure for each of these approaches for the first half of the sample
- Now use the weights to compute the variance of each of the portfolios in the second half
- Is the order likely to change? Why? Why not?

## Application: How will your portfolio risk change as you add positions


You have portfolio $X_0$ and you want to sell w of your positions to invest in a fund with portfolio $X_1$. How your portfolio variance will change as a function of you reallocation?

- The answer is simple

$$Var(wX_1R_t+(1-w)X_0R_t)-Var(X_0R_t)$$

- But also kind of misleading since you might not have good data to estimate the variance of the new portfolio

- Now if you know each portfolio factor betas,$\beta_0=X_0@B$ and $\beta_1=X_1@B$ , and at least one of this portfolio is large and well diversified, then for small tilts, i.e. $w$ small, we have 


$$\frac{Var(wX_1R_t+(1-w)X_0R_t)-Var(X_0R_t)}{\Delta w}|_{w\approx 0} =\beta_1Var(F)\beta_0'$$


- The fact that one is well diversified just means that you can ignore the covariance-terms of the portfolios asset specific risks

- So you see above why a large pool of money when allocating money to an active manager will want to regulate their factor exposure

- funds with similar volatilities will be perceived as very different risks depending on how the exposure of portfolio relates to the exposure of the fund




## Performance Attribution


- We can use factor models to decompose a manager strategy

- What explains their returns? 

- tilts they have?  What kind of stocks they like?


### Application: What does Cathie Wood  Likes ? 

![fig](../../assets/plots/CW_image.jfif)



Cathie Wood is a renowned stock-picker and the founder of ARK Invest, which manages around 60 billion in assets and invests in innovative technologies such as self-driving cars and genomics. She gained fame for her success in the male-dominated world of investing, her persuasive investment arguments, and her proven track record in the stock market. Prior to founding ARK Invest, she gained experience at The Capital Group, Jennison Associates, and AllianceBernstein, and co-founded Tupelo Capital Management, a hedge fund. Wood is known for her unconventional investment strategies and her advocacy for investing in disruptive technologies, which has garnered her a large following in the investing world. Her estimated net worth is around $250 million.


Citations:
https://www.nytimes.com/2021/08/22/business/cathie-wood-ark-stocks.html




In [5]:
df=pd.read_pickle('../../assets/data/df_WarrenBAndCathieW.pkl')
_temp=df.dropna()
# select the columns to use as factors
Factors=_temp.drop(['BRK','RF','ARKK'],axis=1)
Factors.head(3)
ArK=_temp.ARKK-_temp.RF

What are these factors?

* HML is the value strategy that buys high book to market firms and sell low book to market firms

* SMB is a size strategy that buys firms with low market capitalization and sell firms with high market capitalizations

* RmW is the strategy that buys firms with high gross profitability and sell firms with low gross profitability

* CmA is the strategy that buys firms that are investing little (low CAPEX) and sell firms that are investing a lot (high CAPEX) 

* MOM is the momentum strategy that buy stocks that did well in the last 12 months and short the ones that did poorly


We will discuss more later

for now just think of them as important trading strategies that practicioners know

In [6]:

x= sm.add_constant(Factors)
y= ArK
results= sm.OLS(y,x).fit()
results.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.781
Model:,OLS,Adj. R-squared:,0.78
Method:,Least Squares,F-statistic:,1069.0
Date:,"Thu, 28 Mar 2024",Prob (F-statistic):,0.0
Time:,16:24:08,Log-Likelihood:,5908.9
No. Observations:,1804,AIC:,-11800.0
Df Residuals:,1797,BIC:,-11770.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0004,0.000,1.821,0.069,-3.03e-05,0.001
Mkt-RF,1.1736,0.020,58.714,0.000,1.134,1.213
SMB,0.6944,0.037,18.984,0.000,0.623,0.766
HML,-0.6521,0.038,-16.938,0.000,-0.728,-0.577
RMW,-0.9037,0.054,-16.883,0.000,-1.009,-0.799
CMA,-0.5034,0.071,-7.129,0.000,-0.642,-0.365
Mom,-0.0397,0.025,-1.559,0.119,-0.090,0.010

0,1,2,3
Omnibus:,55.31,Durbin-Watson:,2.135
Prob(Omnibus):,0.0,Jarque-Bera (JB):,128.873
Skew:,0.116,Prob(JB):,1.04e-28
Kurtosis:,4.289,Cond. No.,345.0


- How much can we explain of ARKK return behavior?

- What kind of stocks CW likes?

- How much of her portfolio variance comes from market exposure alone?

- If you were to construct a replicating portfolio of her fund

- What would be the volatility of your residual risk?






## "Endogenous" Benchmarking


* it is common for large portfolio allocators to set benchmarks for the managers that they allocate to

* The most common benchmark is simply returns of the S&P500 which is almost the same thing as the returns of the market portfolio ( large caps dominate the returns of any market-cap portfolio)

* You might also have endogenous benchmarks

* Use a set of Factors F and estimate $r^b_t=\sum \beta_j F_{j,t}$ 

* I.e use as a bechmark the multifactor combination that best replicates the portfolio.

* typically this is not done contractually but implicitly: You will allocate to the different funds based on their alpha

* Captures the idea that one should pay different prices for alpha (very hard to get) and beta( easier, the gains are in implementation)



