# Portfolio Construction and Analysis -- Factor Analysis

## Objectives
1. Use machine learning methods to describe returns of an asset using machine learning algorithms.
2. Test different algorithms and types of regressions from scikit learn.
3. Find more reasonable and accurate way to calculate expected returns.

In [53]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
import time

import cvxpy as cp
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import r2_score

import sys
sys.path.insert(0, r'C:\Users\user\Documents\GitHub\Portfolio-Construction-and-Analysis\python_files')
import functions_1 as fnc
import FactorModelLibForMOOC as fm
import Portfolio_construction_6_factor_analysis as pc6

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Data
I don't have factors data right now like world equity index, Treasury returns, High Yield returns, Inflation and Currency factors, etc. So for now I will use my other assets as factors to predict S&P 500 returns, because the goal here is to make functions that can do factor analysis, I can get factors data later and then do proper research.

In [45]:
df = fnc.local_returns_data(path = r'C:\Users\user\Documents\GitHub\Portfolio-Construction-and-Analysis\Data\cleaned_data\historical_returns_data_1.csv')
r_week = fnc.change_timeframe(df.dropna(), 'W', resample_index=False)
r_week.head(3)

Unnamed: 0_level_0,Vanguard High Yield Corporate Fund,Vanguard Total Intl Stock Idx Fund,Vanguard Mid Cap Index Fund,S&P 500,Vanguard Value Index Fund,Vanguard Small Cap Value Index Fund,Vanguard Small Cap Index Fund,iShares J.P. Morgan USD Emerging Markets Bond ETF,Vanguard Emerging Markets Stock Index Fund,Vanguard Real Estate Index Fund,SPDR S&P 500 ETF Trust,SPDR Gold Shares
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2007-12-23,0.0,0.017179,0.022814,0.02157,0.013379,0.025857,0.027753,-0.000691,0.028865,0.035314,0.022819,0.010984
2007-12-30,0.0,0.045053,-0.00526,-0.003893,-0.003029,-0.015264,-0.011402,0.005537,0.036896,-0.016299,-0.004505,0.035866
2008-01-06,0.001821,-0.026898,-0.049416,-0.045841,-0.029914,-0.0559,-0.052585,0.005013,-0.034739,-0.049184,-0.033444,0.025824


In [46]:
r_week = r_week.drop(columns='SPDR S&P 500 ETF Trust')
asset = r_week[['S&P 500']]
factors = r_week.drop(columns=asset.columns)

### 2.2 OLS Regression <a class="anchor" id="2.2"></a>

Now let's return to factor models.  Building a factor module is equivalent to solving for the factor loadings defined in part 1.  Ordinary Least Squares (OLS) regression is the simplest way.

As we mentioned in part 1, OLS regression is equivalent to solving the following optimization problem

\begin{equation*} 
    \hat{\beta}^{\text{OLS}} = argmin_{\beta}\bigg\{\sum_{t=1}^{n} (y_t - {X_t}^T {\bf \beta})^2 \bigg\}
\end{equation*}

In our notation, n in the number of data points.  In this case, OLS regression has a closed form solution.

\begin{equation*} 
    \hat{\beta}^{\text{OLS}} = ({\bf X}^\intercal {\bf X})^{-1} {\bf X}^\intercal{\bf Y}
\end{equation*}

Where ${\bf Y}$ is the vector representation of $y_t$, and ${\bf X}$ is the matrix representation of $X_t$

Let's take a second to look at, $\hat{\beta}$, the factor loadings.  What do they mean?  They represent the effect on the dependent variable (in this case, the asset return) associated with movement in the underlying factor.

In this course we will be using the sikitlearn package to build models.  But since we have a closed form solution, let's check that the closed form solution gives the same answer as scikit-learn.

In [73]:
def linear_regression(DependentVar, Factors, return_model=False):
    
    lg = LinearRegression(fit_intercept=True)
    lg.fit(Factors, DependentVar)

    pc6.display_factor_loadings(lg.intercept_, lg.coef_, Factors.columns.to_list())

    if return_model:
        return lg

In [76]:
linear_regression(asset, factors)

                    Intercept  Vanguard High Yield Corporate Fund  \
Regression Results  -0.000335                           -0.080213   

                    Vanguard Total Intl Stock Idx Fund  \
Regression Results                            0.004322   

                    Vanguard Mid Cap Index Fund  Vanguard Value Index Fund  \
Regression Results                     0.449746                   0.925884   

                    Vanguard Small Cap Value Index Fund  \
Regression Results                             -0.60636   

                    Vanguard Small Cap Index Fund  \
Regression Results                       0.473601   

                    iShares J.P. Morgan USD Emerging Markets Bond ETF  \
Regression Results                                           0.022643   

                    Vanguard Emerging Markets Stock Index Fund  \
Regression Results                                    0.018404   

                    Vanguard Real Estate Index Fund  SPDR Gold Shares  
Regressio