## Multi-factor Model
Author: Victor Xiao

__Multi-factor model(MFM)__ is a factor models constructed based on the basis of __Arbitrage Pricing Theory(APT)__. 

The core concept of __APT__ is that: for any security or portfolio expected return, there is a correlating group of unknown systematic factors; while maintianing the law of one-price, that is, assets class of the same risk-return must subjects to the same price. (Otherwise opportunity for arbitrage arose.) 

The general formula for APT is as follow: 

$$r_i = a_i + \sum^K_{k=1}b_{ik}f_k + \epsilon$$

- Where, $f_k$ is the factor that affects asset return, reflecting the kth risk factor exposure for the asset, known as __risk factor__. 
- $b_{ik}$ describes the sensitivity of the asset i to factor k, knowns as __factor loading__ coefficient of asset i to factor k.

While APT sets the foundational framework for understanding the mechanism that affects the asset, it does not describe what the factors are. After devoting many resources into studying of this problem, Barra proposed the __Barra Multi-factor model__, the baseline multifactor model the industry uses today. 

This notebook attempts to explore the general process involved in the construction of a multi-factor model. 

### 1. Categories of Factors

The first step to construct multi-factor is to choose the approriate factors. Generally speaking, factors can be categorized into three classes: 

- factor that reflects external impacts.
- factors that reflects comparative cross-sectional properties.
- internal or statistical factors. 

### Barra China Equity Model(CNE5)

The ten style of CNE5 comprise of total of 21 descriptors. Below we will go through each style of factors and provide their definition and expression. 

> __Beta__

Components: __Beta__ Beta($\beta$)

Computed at the slope coefficient in the time-series regression of excess stock return $r_t - r_{ft}$, against the cap-weighted excess return of the estimation universe $R_t$:

$$r_t - r_{ft} = a + \beta R_t + e_t$$

The regression cofficients are estimated over the trailling 252 trading days of returns with half life of 63 trading days. \


> __Momentum__

Components: __RSTR__ Relative Strength

Computed at the sume of excess log returns over the trailling T = 504 trading days with a lag of $L = 21$ trading days. 

$$RSTR = \sum^{T+L}_{t=L}w_t [\ln(1+r_t) - \ln(1+r_{ft})]$$

Where $r_t$ is the stock return on day $t$, $r_{ft}$ is the risk-free return, $w_t$ is an exponential weight with a half-life of 126 trading days. 

> __Size__

Components: __LNCAP__ Natural log of Market cap

Computed by taking the logarithm of the total market capitalization of the firm. 

> __Earnings Yield__

Definition: 0.68 * EPIBS + 0.11 * ETOP + 0.21 * CETOP

Components: 

__EPIBS__: Analyst Predicted Earnings-to-Price. 

Earning ratios forecasted by analysts. 

__ETOP__: Trailing earnings-to-price ratio. 

Computed by dividing the trailing 12-month earnings by the current market capitalization. Trailing earnings are defined 



In [None]:
# Necessary Imports
import numpy as np
import pandas as pd

# Reading the Data into the dataframe