### Project #1

This project is intended to demonstrate capability with basic Risk Factor analysis and Portfolio Optimization.  This is somewhat of a major project so it would be best to split up duties as much as possible to avoid being overwhelmed.

The short summary of the project is:

- Choose 20 stock tickers from at least three distinct industries
- Pull stock data for these industries as well as the Fama-French monthly data
- Do a Fama-MacBeth two-stage regression framework to estimate factor loadings, factor prices, and expected returns
- Create a fully invested market neutral portfolio
- Check to see how your return forecasts performed
- Check to see how your portfolio performed

Your submission will be a Jupyter Notebook with problem setup and answers.  You will be graded on:

- 20% Completing all aspects of the project
- 50% Correctness of the methods
- 15% Clarity of the coding
- 15% Quality of the Commentary

Clarity of coding:

- No additional files should be needed for me to run the code.
- Variable names should help clarify what your code is doing
- Additional comments should be made if you think I will need it to understand your code (but still be concise)
- The code should be linear and not convoluted

Clarity of comments on the results:

- You should have opinions on your work and I need to see that.
- You should be able to explain whether the results make sense or not and talk about regularities that you are seeing.
- You should have at least one comment talking about ways to improve the process.
- Comments are especially important in those sections where you are being asked to judge the results.

In [1]:
import yfinance as yf
import pandas as pd
import datetime as dt
import statsmodels.api as sm
from scipy import stats
from scipy.optimize import minimize
import sympy as sp
import numpy as np


### Step 1:  

Choose 20 stocks.  We are going to use January 1st, 2010 through December 31st, 2019, to avoid Covid.  Choose only stocks that have full data for this time period to make things simple.  If you want to choose companies with less available data, make sure that you cover at least the last four years.

#### 20 Stocks we chose from 5 different sectors
Technology:   
AAPL (Apple Inc.), MSFT (Microsoft Corp.), GOOGL (Alphabet Inc.), INTC(Intel Corporation), ORCL (Oracle)

Healthcare:  
JNJ (Johnson & Johnson), PFE (Pfizer Inc.), MRK (Merck & Co.), LLY (Eli Lilly)

Financials:  
JPM (JPMorgan Chase), BAC (Bank of America), WFC (Wells Fargo), C (Citigroup)

Consumer Discretionary:  
AMZN (Amazon), HD (Home Depot), NKE (Nike)

Energy:  
XOM (Exxon Mobil), CVX (Chevron), BP (BP p.l.c.), COP (ConocoPhillips)

Comments:

We obtained stock prices for 20 different tickers across five distinct industries, covering the period from January 1st, 2010, to December 31st, 2019. The historical stock data was retrieved using the yfinance library and stored in a DataFrame.

In [2]:
tickers = [
    # Technology
    'AAPL', 'MSFT', 'GOOGL', 'INTC', 'ORCL',
    # Healthcare
    'JNJ', 'PFE', 'MRK', 'LLY',
    # Financials
    'JPM', 'BAC', 'WFC', 'C',
    # Consumer Discretionary
    'AMZN', 'HD', 'NKE',
    # Energy
    'XOM', 'CVX', 'BP', 'COP'
]

start_date = '2009-12-01'
end_date = '2019-12-31'

stock_data = yf.download(tickers, start=start_date, end=end_date)['Adj Close']
stock_data.tail(5)


[*********************100%***********************]  20 of 20 completed


Ticker,AAPL,AMZN,BAC,BP,C,COP,CVX,GOOGL,HD,INTC,JNJ,JPM,LLY,MRK,MSFT,NKE,ORCL,PFE,WFC,XOM
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2019-12-23 00:00:00+00:00,68.908989,89.650002,31.230625,29.596033,65.659027,54.042694,97.327332,67.3647,195.350479,52.246304,128.496155,118.73806,123.125572,75.713333,150.861755,94.930695,49.833099,30.583786,47.377075,55.60574
2019-12-24 00:00:00+00:00,68.974503,89.460503,31.275038,29.54167,65.517303,54.101265,97.335403,67.05545,196.65155,52.405083,128.048615,119.066917,123.08799,75.465752,150.833008,95.025574,49.582588,30.560452,47.38588,55.392143
2019-12-26 00:00:00+00:00,70.342979,93.438499,31.541422,29.49507,66.551056,54.377338,97.5457,67.955231,196.785248,52.766739,127.960861,120.330452,123.116165,75.383224,152.069305,95.566467,49.703205,30.630453,47.676426,55.479156
2019-12-27 00:00:00+00:00,70.316292,93.489998,31.390463,29.401886,66.417656,54.3606,97.30307,67.564705,196.027756,52.996082,127.890678,120.417,123.472939,75.515266,152.347305,96.382553,49.601143,30.583786,47.47393,55.289314
2019-12-30 00:00:00+00:00,70.733612,92.344498,31.212875,29.19997,66.284271,54.051071,96.939087,66.820038,193.657318,52.590309,127.495834,119.975639,123.266403,75.127373,151.034302,95.651863,48.896011,30.264879,47.192177,54.964958


In [3]:
fama_french_url = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV.zip"
ff_data = pd.read_csv(fama_french_url, skiprows=3, index_col=0)
ff_data = ff_data[~pd.isnull(ff_data.SMB)]

ff_data = ff_data[(ff_data.index >= '2010-01') & (ff_data.index <= '2020-01')]
ff_data.index = pd.to_datetime(ff_data.index, format='%Y%m').strftime('%Y-%m')  # datetime format 'YYYY-MM'

ff_data = ff_data.reset_index()
ff_data.rename(columns={'index': 'date'}, inplace=True)

In [4]:
ff_data.tail(13)

Unnamed: 0,date,Mkt-RF,SMB,HML,RF
107,2018-12,-9.57,-2.37,-1.88,0.2
108,2019-01,8.4,2.88,-0.45,0.21
109,2019-02,3.4,2.06,-2.71,0.18
110,2019-03,1.1,-3.05,-4.12,0.19
111,2019-04,3.97,-1.72,2.16,0.21
112,2019-05,-6.94,-1.31,-2.37,0.21
113,2019-06,6.93,0.28,-0.7,0.18
114,2019-07,1.19,-1.93,0.47,0.19
115,2019-08,-2.58,-2.39,-4.79,0.16
116,2019-09,1.43,-0.97,6.77,0.18


In [5]:
ff_data[['Mkt-RF', 'SMB', 'HML', 'RF']] = ff_data[['Mkt-RF', 'SMB', 'HML', 'RF']].apply(pd.to_numeric, errors='coerce')
# Fama-French dataset were in string format.
# Convert columns to numeric types--so that I can handle the error caused by division on strings.

ff_data['Mkt-RF'] = ff_data['Mkt-RF'] / 100
ff_data['SMB'] = ff_data['SMB'] / 100
ff_data['HML'] = ff_data['HML'] / 100

a, b, c = ff_data[['Mkt-RF', 'SMB', 'HML']].mean()

print(f'The prices of risk are:  beta {10_000 * a:.2f}, SMB {10_000 * b:.2f}, HML {10_000 * c:.2f}')

The prices of risk are:  beta 109.13, SMB -1.13, HML -19.74


#### Step 2:  

Pull monthly data for all of your stocks and Fama & French’s 3-factor model. Use the Adj Close and calculate monthly returns using Pandas function pct_change().

Procedure:
- Check for missing data (NaN values) for each stock.
- Print the number of missing data points for each stock

If there are no missing data points, we can proceed with calculating the monthly returns.<br>
If missing data points are found, we can fill them using forward fill. Alternatively, we can apply backward fill or linear interpolation.<br>
In our case, there's no need to address missing data, as all 20 stocks have 0 missing data points.

In [6]:
# Fill missing data using forward fill
# stock_data.fillna(method='ffill', inplace=True)
missing_data_count = stock_data.isnull().sum()
missing_data_count

Ticker
AAPL     0
AMZN     0
BAC      0
BP       0
C        0
COP      0
CVX      0
GOOGL    0
HD       0
INTC     0
JNJ      0
JPM      0
LLY      0
MRK      0
MSFT     0
NKE      0
ORCL     0
PFE      0
WFC      0
XOM      0
dtype: int64

In [7]:
monthly_data = stock_data.resample('M').last()
stock_returns = monthly_data.pct_change().dropna()
stock_returns.index = stock_returns.index.strftime('%Y-%m')
stock_returns = stock_returns.reset_index()
stock_returns.rename(columns={'Date': 'date'}, inplace=True)

stock_returns.head()

  monthly_data = stock_data.resample('M').last()


Ticker,date,AAPL,AMZN,BAC,BP,C,COP,CVX,GOOGL,HD,...,JNJ,JPM,LLY,MRK,MSFT,NKE,ORCL,PFE,WFC,XOM
0,2010-01,-0.088597,-0.067722,0.007968,-0.031913,0.003021,-0.060114,-0.063255,-0.14523,-0.031801,...,-0.024065,-0.064392,-0.014281,0.044882,-0.075459,-0.035114,-0.058028,0.025838,0.053354,-0.05514
1,2010-02,0.065397,-0.055897,0.097497,-0.037401,0.024097,0.010246,0.012091,-0.005925,0.113888,...,0.009896,0.077812,-0.01066,-0.034049,0.022145,0.060392,0.068951,-0.050603,-0.036667,0.015428
2,2010-03,0.14847,0.146706,0.07208,0.072543,0.191176,0.066042,0.048824,0.076538,0.044572,...,0.034921,0.066237,0.054746,0.023349,0.021626,0.091636,0.043002,-0.022792,0.138259,0.030462
3,2010-04,0.111021,0.009796,-0.001121,-0.08621,0.079012,0.156732,0.073981,-0.073036,0.089026,...,-0.013804,-0.047427,-0.034511,-0.061848,0.042677,0.032789,0.008152,-0.025073,0.063945,0.011794
4,2010-05,-0.016125,-0.084902,-0.117218,-0.176414,-0.093821,-0.114808,-0.084482,-0.076222,-0.038887,...,-0.085031,-0.070455,-0.049132,-0.038527,-0.151394,-0.046503,-0.127561,-0.079515,-0.132177,-0.101806


In [8]:
stock_returns

Ticker,date,AAPL,AMZN,BAC,BP,C,COP,CVX,GOOGL,HD,...,JNJ,JPM,LLY,MRK,MSFT,NKE,ORCL,PFE,WFC,XOM
0,2010-01,-0.088597,-0.067722,0.007968,-0.031913,0.003021,-0.060114,-0.063255,-0.145230,-0.031801,...,-0.024065,-0.064392,-0.014281,0.044882,-0.075459,-0.035114,-0.058028,0.025838,0.053354,-0.055140
1,2010-02,0.065397,-0.055897,0.097497,-0.037401,0.024097,0.010246,0.012091,-0.005925,0.113888,...,0.009896,0.077812,-0.010660,-0.034049,0.022145,0.060392,0.068951,-0.050603,-0.036667,0.015428
2,2010-03,0.148470,0.146706,0.072080,0.072543,0.191176,0.066042,0.048824,0.076538,0.044572,...,0.034921,0.066237,0.054746,0.023349,0.021626,0.091636,0.043002,-0.022792,0.138259,0.030462
3,2010-04,0.111021,0.009796,-0.001121,-0.086210,0.079012,0.156732,0.073981,-0.073036,0.089026,...,-0.013804,-0.047427,-0.034511,-0.061848,0.042677,0.032789,0.008152,-0.025073,0.063945,0.011794
4,2010-05,-0.016125,-0.084902,-0.117218,-0.176414,-0.093821,-0.114808,-0.084482,-0.076222,-0.038887,...,-0.085031,-0.070455,-0.049132,-0.038527,-0.151394,-0.046503,-0.127561,-0.079515,-0.132177,-0.101806
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,2019-08,-0.016461,-0.048474,-0.103325,-0.054724,-0.088902,-0.116791,-0.033952,-0.022714,0.066545,...,-0.006897,-0.052931,0.042710,0.041933,0.015037,-0.015247,-0.075311,-0.076143,-0.027178,-0.067624
116,2019-09,0.072962,-0.022733,0.067330,0.028146,0.073504,0.091989,0.007475,0.025711,0.024250,...,0.007946,0.071272,-0.010091,-0.020013,0.008487,0.111479,0.057050,0.010689,0.083100,0.031104
117,2019-10,0.110684,0.023475,0.071992,-0.002106,0.040243,-0.023912,-0.020742,0.030840,0.011033,...,0.020560,0.069935,0.018957,0.029461,0.031216,-0.046529,-0.005397,0.067910,0.023593,-0.043053
118,2019-11,0.077554,0.013587,0.065558,0.002727,0.052838,0.085869,0.018462,0.035979,-0.059980,...,0.048489,0.054755,0.035761,0.006001,0.059463,0.046722,0.030281,0.013612,0.064908,0.020448


### Step 3:  

- Calculate factor loadings for all of your stocks using all the data from 2010 through 2018 except the final December.  
- Use that final December to estimate factor prices.  Note that your results are likely to be a bit noisy due to only using 20 stocks.  
- Calculate the expected monthly returns for the stocks using the 1 Mo risk free rate at the end of 2018 and your factor calculations.



1 Mo risk free rate at the end of 2018

#### 1st Pass: Time-Series Regressions

In the first pass, run **time-series regressions** for each stock in the portfolio to estimate its **factor loadings** (also called **betas**).

For each stock, regress its **excess returns** on the factors:

Ri,t - Rf = βi1 * F1,t + βi2 * F2,t + ... + βij * Fj,t + εi,t


Where:
- `Ri,t - Rf`: Excess return of stock `i` over the risk-free rate at time `t`.
- `βi1, βi2, ... βij`: **Factor loadings** or **betas** for stock `i`, representing how sensitive stock `i`'s returns are to each factor.
- `F1,t, F2,t, ...`: Factor realizations at time `t` (e.g., market risk premium, SMB, HML).
- `εi,t`: Residual or unexplained return.

You do this for **every stock** in your portfolio to get its factor loadings for each factor.

Process:

1. Loop through each stock: For each stock in the dataset, run a regression using its returns data and the Fama-French factors.
2. Define the regression model: Set up the independent variables (Mkt-RF, SMB, HML) as the Fama-French factors, and use the stock's return as the dependent variable.
3. Run the regression: Perform the Ordinary Least Squares (OLS) regression to estimate the relationship between the stock's return and the Fama-French factors.
4. Extract and store the betas: Retrieve the regression coefficients (betas) for Mkt-RF, SMB, and HML and add them to a list for later storage in a DataFrame.

In [9]:
ff_data_estimation_period = ff_data[(ff_data['date'] >= '2010-01') & (ff_data['date'] < '2018-12')]

avg_rf = ff_data_estimation_period['RF'].mean()

print(avg_rf)

0.026261682242990653


In [10]:
betas_df = pd.DataFrame(columns=['Stock', 'Mkt-RF', 'SMB', 'HML','Constant', 'p_value'])
betas_list = []

for stock in tickers:

    stock_data = pd.merge(stock_returns[['date', stock]], ff_data, on='date')
    stock_data1 = stock_data[(stock_data['date'] >= '2010-01') & (stock_data['date'] <= '2018-12')]

    X = sm.add_constant(stock_data1[['Mkt-RF', 'SMB', 'HML']])
    y = stock_data1[stock]
    model = sm.OLS(y, X).fit()

    betas = model.params[['Mkt-RF', 'SMB', 'HML', 'const']]

    #### step 5
    stock_const = betas['const']
    stock_se = model.bse['const']  # correct, SE of the constant
    t_stat = (stock_const - avg_rf) / stock_se #correct
    p_value = stats.t.sf(abs(t_stat), df=model.df_resid) * 2 #correct, two tail
    #### step 5

    betas_list.append({
        'Stock': stock,
        'Mkt-RF': betas['Mkt-RF'],
        'SMB': betas['SMB'],
        'HML': betas['HML'],
        'Constant': betas['const'],
        'p_value': p_value
    })

betas_df = pd.DataFrame(betas_list)
betas_df

Unnamed: 0,Stock,Mkt-RF,SMB,HML,Constant,p_value
0,AAPL,1.078061,-0.288579,-0.81254,0.007745,0.00485354
1,MSFT,1.204677,-0.859916,0.039241,0.00377,6.176004e-06
2,GOOGL,1.258891,-0.741881,-0.58167,0.000697,8.358984e-06
3,INTC,0.898071,-0.186208,0.042699,0.003619,4.977413e-05
4,ORCL,1.29172,-0.358768,-0.025873,-0.003884,9.542239e-10
5,JNJ,0.668211,-0.540978,-0.060663,0.003312,1.412446e-10
6,PFE,0.785066,-0.26124,-0.221524,0.004357,5.241765e-08
7,MRK,0.537695,-0.339541,-0.31609,0.005331,5.098201e-06
8,LLY,0.356759,0.050602,-0.36199,0.01096,0.0005398776
9,JPM,1.339023,-0.192314,1.162343,0.001047,1.253495e-08


#### 2nd Pass: Cross-Sectional Regressions

In the second pass, run **cross-sectional regressions** at each time period, using the **factor loadings** (`βij`) from the first pass to estimate the **factor risk premiums** (prices of risk, `λj`).

For each time period `t`, the regression is:

Ri,t - Rf = λ1 * βi1 + λ2 * βi2 + ... + λj * βij + εi,t

Where:
- `Ri,t - Rf`: Excess return of stock `i` at time `t`.
- `βi1, βi2, ... βij`: **Factor loadings** (estimated from the first pass) for stock `i`.
- `λ1, λ2, ... λj`: The **factor risk premiums** (prices of risk) you are estimating, representing the compensation investors demand for bearing each factor risk.

By running this regression for each time period, you can estimate the **average factor risk premium** over time.

$E(r_i) = r_f + \sum{\beta_{ij} \lambda_j}$<br>

- Use that final December to estimate factor prices.  Note that your results are likely to be a bit noisy due to only using 20 stocks.  
- Use data of every month to estimate factor prices, then get the average of them.
- Calculate the expected monthly returns for the stocks using the 1 Mo risk free rate at the end of 2018 and your factor calculations.

𝒓𝒊 = 𝜶𝒊 + 𝜷𝟏∗𝒓𝒎 + 𝜷𝟐∗𝑺𝑴𝑩 + 𝜷𝟑∗𝑯𝑴𝑳

• we assume risk free rate is 0
• ri=expectedreturnofstocki
• rm=expectedmarketreturn(marketriskpremium)
• 𝑆𝑀𝐵=SmalMinusBig:sizepremium
• HML=HighbooktopriceMinusLow:valuepremium
• β1,β2,β3=factorexposurecoeficients
• 𝛼𝑖=excessreturnthatcannotbeexplainedbymarket,sizeandvalue

Run the cross-sectional regression: R_i,t - R_f = lambda_1 * beta_1 + lambda_2 * beta_2 + lambda_3 * beta_3 + error

In [11]:
lambda_mkt_rf = []
lambda_smb = []
lambda_hml = []

date_range = pd.date_range('2010-01', '2019-01', freq='ME').strftime('%Y-%m')

for date in date_range:
    stock_returns_month = stock_returns[stock_returns['date'] == date]

    y = stock_returns_month.drop(columns=['date']).set_index(stock_returns_month.index)
    y = y.T

    betas_for_stocks = betas_df[['Stock', 'Mkt-RF', 'SMB', 'HML']].set_index('Stock')
    betas_for_stocks = betas_for_stocks.sort_index(ascending=True)
    X = sm.add_constant(betas_for_stocks)

    model = sm.OLS(y, X).fit()
    # print(X.index)
    # print(y.index)
    # print(model.summary())

    # Store the estimated factor prices (lambda values) for this month
    lambda_mkt_rf.append(model.params['Mkt-RF'])
    lambda_smb.append(model.params['SMB'])
    lambda_hml.append(model.params['HML'])



Average the estimated factor risk premiums across all months-- from 2010-01 to 2018-12.


In [12]:
average_lambda_mkt_rf = sum(lambda_mkt_rf) / len(lambda_mkt_rf)
average_lambda_smb = sum(lambda_smb) / len(lambda_smb)
average_lambda_hml = sum(lambda_hml) / len(lambda_hml)

print("Estimated Average Factor Prices (Risk Premiums):")
print(f"Average Lambda Mkt-RF: {average_lambda_mkt_rf}")
print(f"Average Lambda SMB: {average_lambda_smb}")
print(f"Average Lambda HML: {average_lambda_hml}")

Estimated Average Factor Prices (Risk Premiums):
Average Lambda Mkt-RF: 0.004921551675032602
Average Lambda SMB: 0.006755679435357526
Average Lambda HML: -0.00686463273864351


Calculate the expected monthly returns for the stocks using the 1 Mo risk free rate at the end of 2018 and your factor calculations.

For each stock:

Expected_Return = rf_2018 + betas_for_stocks['Mkt-RF'] * average_lambda_mkt_rf + betas_for_stocks['SMB'] * average_lambda_smb + betas_for_stocks['HML'] * average_lambda_hml


The column 'Expected_Return' is the result for step3. And we will use it again in step5 later.

In [13]:
rf_18=ff_data.loc[ff_data['date'] == '2018-12', 'RF'].values[0]

for stock in tickers:
    betas_for_stocks['Expected_Return'] = rf_18 + \
                betas_for_stocks['Mkt-RF'] * average_lambda_mkt_rf + \
                betas_for_stocks['SMB'] * average_lambda_smb + \
                betas_for_stocks['HML'] * average_lambda_hml

betas_for_stocks

Unnamed: 0_level_0,Mkt-RF,SMB,HML,Expected_Return
Stock,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AAPL,1.078061,-0.288579,-0.81254,0.208934
AMZN,1.392425,-0.307618,-1.32968,0.213902
BAC,1.36793,0.20392,1.568716,0.197341
BP,1.405606,-0.215354,0.466026,0.202264
C,1.6161,-0.043738,1.080023,0.200244
COP,0.920868,0.434121,1.110531,0.199841
CVX,0.995014,0.016683,0.803751,0.199492
GOOGL,1.258891,-0.741881,-0.58167,0.205177
HD,0.976939,0.218111,0.16629,0.20514
INTC,0.898071,-0.186208,0.042699,0.202869


### Step 4:  

Figure out the portfolio weights that create a portfolio where the sum of weights is equal to **one** and the weighted average of the CAPM betas is **zero**.

we had the **CAPM betas** for each stock from Step 3. Now, we need to determine the portfolio weights (denoted as `w_i` for each stock) that satisfy these two conditions.
1. ∑ w_i = 1 (weights sum to 1)
2. ∑ w_i * β_i = 0 (weighted average of betas is 0)

This is essentially a **linear system of equations** where you solve for the weights.





Extract 20 betas for CAPM from dataframe; Initialize weights evenly among all stocks. Set up 2 constraints: Sum of weights = 1; Weighted average of betas = 0.

Define the objective function to minimize portfolio variance； Get the optimal portfolio weights

In [14]:
# Lagrange Mutiplier Method
stock_returns_data = stock_returns.drop(columns=['date'])
covariance_matrix = stock_returns_data.cov()
covariance_matrix = covariance_matrix.values
n = covariance_matrix.shape[0]

w = sp.symbols(f'w:{n}')
lambda1, lambda2 = sp.symbols('lambda1 lambda2')

cov_matrix = sp.Matrix(covariance_matrix)
betas = sp.Matrix(np.array(betas_df['Mkt-RF']))
objective = sum([w[i] * cov_matrix[i,j] * w[j] for i in range(n) for j in range(n)])

constraint1 = sum(w) - 1  # Sum of weights = 1
constraint2 = sum(w[i] * betas[i] for i in range(n))  # Weighted average of betas = 0
L = objective + lambda1 * constraint1 + lambda2 * constraint2

derivatives = [sp.diff(L, w_i) for w_i in w]
derivatives += [sp.diff(L, lambda1), sp.diff(L, lambda2)]

solution = sp.solve(derivatives, *w, lambda1, lambda2)

# Extract the optimized weights
optimized_weights = [solution[w_i] for w_i in w]
portfolio_weights_df = pd.DataFrame({'Ticker': tickers, 'Weight': optimized_weights})

print(portfolio_weights_df) # required format

# Check the sum of weight == 1 & weighted average beta == 0.
sum_weights = sum(optimized_weights)
print(f"\nSum of weights: {sum_weights}")
weighted_average_beta = sum([optimized_weights[i] * betas[i] for i in range(n)])
print(f"Weighted Average Beta: {weighted_average_beta}")



   Ticker                Weight
0    AAPL    0.0363543650502289
1    MSFT    -0.198449281602345
2   GOOGL   -0.0257312457696319
3    INTC    -0.165463123286162
4    ORCL    -0.348392469920482
5     JNJ     0.342765281096500
6     PFE    -0.146412170483616
7     MRK     0.253332722592340
8     LLY     0.474562412073941
9     JPM   -0.0892966957017245
10    BAC  -0.00175973692075305
11    WFC     0.281477533593666
12      C    -0.232727712627465
13   AMZN     0.204920626966557
14     HD     0.102524216388807
15    NKE     0.315899766784070
16    XOM     0.137149519638123
17    CVX     0.303239028052382
18     BP    -0.388747824623800
19    COP     0.144754788699366

Sum of weights: 1.00000000000000
Weighted Average Beta: 1.24900090270330E-15


As the printed results show, the weights for each stock are displayed.

Negative weights indicate that we are shorting those stocks, while positive weights mean we are holding long positions.

### Step 5:  

- For each of your time series regressions in step 3, check to see if your regression constant differs significantly from the average 1 Mo risk free rate for the estimation period, indicating the presence of an “in sample” alpha.  
- Check the average monthly return for each stock in 2019 to see if they correspond in some what to the values you calculated in step 3.


Filter Fama-French data from 2010-01 to 2018-11, and calculate the average 1-month risk-free rate (RF)

In [15]:
ff_data_estimation_period = ff_data[(ff_data['date'] >= '2010-01') & (ff_data['date'] < '2018-12')]

avg_rf = ff_data_estimation_period['RF'].mean()

print(f"Ave 1-Month Rf (Estimation Period): {avg_rf}")


Ave 1-Month Rf (Estimation Period): 0.026261682242990653


For each of your time series regressions in step 3, check to see if your regression constant differs significantly from the average 1 Mo risk free rate for the estimation period, indicating the presence of an “in sample” alpha.

- assume the significant level is 5%.
- the standard error of betas_df['Constant']  is calculated as σ/√n.
- n = len(ff_data_estimation_period) = 107, k=3.
- degrees of freedom = 107 - 3 - 1 =103.
- t-statistic = (betas_df['Constant'] - avg_rf) / standard error of betas_df['Constant'].


Since the t-statistic is already calculated for the constant in the regression summary, we don't need to manually calculate them again.

The p-value given in the regression summary tests whether the constant (alpha) is significantly different from zero, not from your specific average 1-month risk-free rate (avg_rf). So in below code, we will calculate the new p-value using the t-distribution with the degrees of freedom. (n - k - 1): where n is the number of observations and k is the number of independent variables.

We use the p-value to check whether the constant is significantly different from the average 1-month risk-free rate. if p-value <= 0.05, it suggests that the constant is significantly different from the average risk-free rate; if p-value >= 0.05, indicates no significant difference.

In [16]:
significant_alpha_stocks = betas_df[betas_df['p_value'] < 0.05]['Stock'].tolist()
non_significant_alpha_stocks = betas_df[betas_df['p_value'] >= 0.05]['Stock'].tolist()

# Print the results
print(f"Stocks with significant alpha (p-value < 0.05): {', '.join(significant_alpha_stocks)}")
print(f"Stocks with non-significant alpha (p-value >= 0.05): {', '.join(non_significant_alpha_stocks)}")


Stocks with significant alpha (p-value < 0.05): AAPL, MSFT, GOOGL, INTC, ORCL, JNJ, PFE, MRK, LLY, JPM, BAC, WFC, C, AMZN, HD, NKE, XOM, CVX, BP, COP
Stocks with non-significant alpha (p-value >= 0.05): 


Results: **All p-values are less than 0.05**, it suggests that the constant is significantly different from the average risk-free rate, indicating the presence of **"in-sample" alpha**; we SHOULD include the intercept in it.

- We calculated the actual returns from 2010 to 2018 in order to compare them with the model's results (expected result) later. Since these are in-sample data, the model's predictions should closely match the actual returns. After confirming this, we can proceed to compare the expected returns with the 2019 data.

In [17]:
##2010-2018 monthly return
# for loop to calculate actual return from 2010 to 2018 by using actual data
actual_returns = []
stock_returns_2018 = stock_returns[(stock_returns['date'] >= '2010-01') & (stock_returns['date'] <= '2018-12')]

for stock in tickers:
  stock_actual_returns = stock_returns_2018[stock].values
  average_actual_return = np.mean(stock_actual_returns)
  actual_returns.append({
        'Stock': stock,
        'Average_Actual_Return18': average_actual_return
  })

actual_returns_18 = pd.DataFrame(actual_returns)
actual_returns_18.set_index('Stock', inplace=True)
actual_returns_18

Unnamed: 0_level_0,Average_Actual_Return18
Stock,Unnamed: 1_level_1
AAPL,0.019373
MSFT,0.015225
GOOGL,0.013624
INTC,0.012265
ORCL,0.008655
JNJ,0.009769
PFE,0.012268
MRK,0.010946
LLY,0.014985
JPM,0.012351


Now we are going to check the average monthly return for each stock in 2019 to see if they correspond in some what to the values you calculated in step 3.

- The average monthly return for each stock in 2019 is shown in the Average_Actual_Return column below.

In [18]:
##2019 monthly return
# for loop to calculate actual return in 2019 by using actual data
actual_returns = []
stock_returns_2019 = stock_returns[(stock_returns['date'] >= '2019-01') & (stock_returns['date'] <= '2019-12')]

for stock in tickers:
  stock_actual_returns = stock_returns_2019[stock].values
  average_actual_return = np.mean(stock_actual_returns)
  actual_returns.append({
        'Stock': stock,
        'Average_Actual_Return19': average_actual_return
  })

actual_returns_df = pd.DataFrame(actual_returns)
actual_returns_df.set_index('Stock', inplace=True)
actual_returns_df


Unnamed: 0_level_0,Average_Actual_Return19
Stock,Unnamed: 1_level_1
AAPL,0.05597
MSFT,0.03931
GOOGL,0.022093
INTC,0.024864
ORCL,0.016485
JNJ,0.013079
PFE,-0.005357
MRK,0.017762
LLY,0.013827
JPM,0.034216


- The expected monthly returns for the stocks using the 1 Mo risk free rate at the end of 2018 and your factor calculations.

In [19]:
rf_18=ff_data.loc[ff_data['date'] == '2018-12', 'RF'].values[0]

for stock in tickers:
    betas_for_stocks['Expected_Return'] = rf_18/100 + \
                betas_for_stocks['Mkt-RF'] * average_lambda_mkt_rf + \
                betas_for_stocks['SMB'] * average_lambda_smb + \
                betas_for_stocks['HML'] * average_lambda_hml

betas_for_stocks

Unnamed: 0_level_0,Mkt-RF,SMB,HML,Expected_Return
Stock,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AAPL,1.078061,-0.288579,-0.81254,0.010934
AMZN,1.392425,-0.307618,-1.32968,0.015902
BAC,1.36793,0.20392,1.568716,-0.000659
BP,1.405606,-0.215354,0.466026,0.004264
C,1.6161,-0.043738,1.080023,0.002244
COP,0.920868,0.434121,1.110531,0.001841
CVX,0.995014,0.016683,0.803751,0.001492
GOOGL,1.258891,-0.741881,-0.58167,0.007177
HD,0.976939,0.218111,0.16629,0.00714
INTC,0.898071,-0.186208,0.042699,0.004869


Merge the two DataFrames based on the Stock index.

We aim to compare the Expected Return generated by the model with the Actual Return from 2019 to assess the model's performance.

In [20]:
comparison_df = actual_returns_df.merge(betas_for_stocks[['Expected_Return']], left_index=True, right_index=True)
comparison_df['Diff_19r_expected'] = comparison_df['Average_Actual_Return19'] - comparison_df['Expected_Return']

comparison_df = comparison_df.merge(actual_returns_18[['Average_Actual_Return18']], left_index=True, right_index=True)
comparison_df['Diff_18r_expected'] = comparison_df['Average_Actual_Return18'] - comparison_df['Expected_Return']

comparison_df

Unnamed: 0_level_0,Average_Actual_Return19,Expected_Return,Diff_19r_expected,Average_Actual_Return18,Diff_18r_expected
Stock,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AAPL,0.05597,0.010934,0.045037,0.019373,0.008439
MSFT,0.03931,0.00185,0.03746,0.015225,0.013375
GOOGL,0.022093,0.007177,0.014916,0.013624,0.006448
INTC,0.024864,0.004869,0.019996,0.012265,0.007396
ORCL,0.016485,0.006111,0.010374,0.008655,0.002544
JNJ,0.013079,0.00205,0.011029,0.009769,0.007719
PFE,-0.005357,0.00562,-0.010977,0.012268,0.006648
MRK,0.017762,0.004522,0.01324,0.010946,0.006423
LLY,0.013827,0.006583,0.007245,0.014985,0.008403
JPM,0.034216,-0.000688,0.034904,0.012351,0.013039


This is the Average Expected Return generated by the model with the Average Actual Return from 2019 for each sector.

In [21]:
# get expected sector return and actual sector return:
# define sectors
sectors = {
    'AAPL': 'Technology', 'MSFT': 'Technology', 'GOOGL': 'Technology', 'INTC': 'Technology', 'ORCL': 'Technology',
    'JNJ': 'Healthcare', 'PFE': 'Healthcare', 'MRK': 'Healthcare', 'LLY': 'Healthcare',
    'JPM': 'Financials', 'BAC': 'Financials', 'WFC': 'Financials', 'C': 'Financials',
    'AMZN': 'Consumer Discretionary', 'HD': 'Consumer Discretionary', 'NKE': 'Consumer Discretionary',
    'XOM': 'Energy', 'CVX': 'Energy', 'BP': 'Energy', 'COP': 'Energy'
}

comparison_df['Sector'] = comparison_df.index.map(sectors)

sector_avg_returns = comparison_df.groupby('Sector').agg({
    'Average_Actual_Return19': 'mean',
    'Expected_Return': 'mean'
}).rename(columns={
    'Average_Actual_Return19': 'Avg Actual Return by Sector',
    'Expected_Return': 'Avg Expected Return by Sector'
})

comparison_df = comparison_df.merge(sector_avg_returns, on='Sector', how='left')
comparison_df['Diff_18r_expected'] = comparison_df['Average_Actual_Return18'] - comparison_df['Expected_Return']

sector_avg_returns


Unnamed: 0_level_0,Avg Actual Return by Sector,Avg Expected Return by Sector
Sector,Unnamed: 1_level_1,Unnamed: 2_level_1
Consumer Discretionary,0.023839,0.010574
Energy,0.007913,0.002521
Financials,0.032317,0.000329
Healthcare,0.009828,0.004694
Technology,0.031744,0.006188


### Step 6:  
Calculate the average monthly returns for your portfolio in 2019.  Do a quick regression to see if the realized beta in 2019 differs significantly from your previously calculated betas.

- betas_for_stocks represents the stock betas for the period from 2010-01 to 2018-12.

In [22]:
betas_for_stocks

Unnamed: 0_level_0,Mkt-RF,SMB,HML,Expected_Return
Stock,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AAPL,1.078061,-0.288579,-0.81254,0.010934
AMZN,1.392425,-0.307618,-1.32968,0.015902
BAC,1.36793,0.20392,1.568716,-0.000659
BP,1.405606,-0.215354,0.466026,0.004264
C,1.6161,-0.043738,1.080023,0.002244
COP,0.920868,0.434121,1.110531,0.001841
CVX,0.995014,0.016683,0.803751,0.001492
GOOGL,1.258891,-0.741881,-0.58167,0.007177
HD,0.976939,0.218111,0.16629,0.00714
INTC,0.898071,-0.186208,0.042699,0.004869


We calculated the betas for 2019, as shown below, and then compared them to the betas we had previously calculated.

To determine if the 2019 betas significantly differ from those for 2010-2018, we calculated the P-value.

In [23]:
betas_df_19 = pd.DataFrame(columns=['Stock', 'Mkt-RF', 'SMB', 'HML'])
betas_list = []

for stock in tickers:

    # Merge the stock returns with Fama-French data for 2019
    stock_data = pd.merge(stock_returns[['date', stock]], ff_data, on='date')
    stock_data2 = stock_data[(stock_data['date'] >= '2019-01') & (stock_data['date'] <= '2019-12')]

    X = sm.add_constant(stock_data2[['Mkt-RF', 'SMB', 'HML']])
    y = stock_data2[stock]
    model = sm.OLS(y, X).fit()

    betas_2019 = model.params[['Mkt-RF', 'SMB', 'HML']]
    betas_2010_2018 = betas_for_stocks.loc[stock]

    beta_diff_mkt_rf = betas_2019['Mkt-RF'] - betas_2010_2018['Mkt-RF']
    beta_diff_smb = betas_2019['SMB'] - betas_2010_2018['SMB']
    beta_diff_hml = betas_2019['HML'] - betas_2010_2018['HML']

    se_mkt_rf = model.bse['Mkt-RF']  # SE for Mkt-RF
    se_smb = model.bse['SMB']        # SE for SMB
    se_hml = model.bse['HML']        # SE for HML

    t_stat_mkt_rf = beta_diff_mkt_rf / se_mkt_rf
    t_stat_smb = beta_diff_smb / se_smb
    t_stat_hml = beta_diff_hml / se_hml

    p_value_mkt_rf = stats.t.sf(abs(t_stat_mkt_rf), df=model.df_resid) * 2
    p_value_smb = stats.t.sf(abs(t_stat_smb), df=model.df_resid) * 2
    p_value_hml = stats.t.sf(abs(t_stat_hml), df=model.df_resid) * 2  # t-statistics (two-tailed test)

    significant_mkt_rf = p_value_mkt_rf < 0.05
    significant_smb = p_value_smb < 0.05
    significant_hml = p_value_hml < 0.05  # at 95% confidence level

    betas_list.append({
        'Stock': stock,
        'Mkt-RF': betas_2019['Mkt-RF'],
        'SMB': betas_2019['SMB'],
        'HML': betas_2019['HML'],
        'Mkt-RF_Difference': beta_diff_mkt_rf,
        'SMB_Difference': beta_diff_smb,
        'HML_Difference': beta_diff_hml,
        'Mkt-RF_t_stat': t_stat_mkt_rf,
        'SMB_t_stat': t_stat_smb,
        'HML_t_stat': t_stat_hml,
        'Mkt-RF_p_value': p_value_mkt_rf,
        'SMB_p_value': p_value_smb,
        'HML_p_value': p_value_hml,
        'Mkt-RF_Significant': significant_mkt_rf,
        'SMB_Significant': significant_smb,
        'HML_Significant': significant_hml
    })

betas_df_19 = pd.DataFrame(betas_list)
betas_df_19 = betas_df_19[['Stock', 'Mkt-RF', 'SMB', 'HML', 'Mkt-RF_Difference', 'SMB_Difference', 'HML_Difference',
                           'Mkt-RF_t_stat', 'SMB_t_stat', 'HML_t_stat', 'Mkt-RF_p_value', 'SMB_p_value', 'HML_p_value',
                           'Mkt-RF_Significant', 'SMB_Significant', 'HML_Significant']].set_index('Stock')
betas_df_19 = betas_df_19.sort_index(ascending=True)

betas_df_19


Unnamed: 0_level_0,Mkt-RF,SMB,HML,Mkt-RF_Difference,SMB_Difference,HML_Difference,Mkt-RF_t_stat,SMB_t_stat,HML_t_stat,Mkt-RF_p_value,SMB_p_value,HML_p_value,Mkt-RF_Significant,SMB_Significant,HML_Significant
Stock,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
AAPL,1.705673,-1.549978,0.093543,0.627613,-1.261399,0.906083,1.47332,-1.384529,2.110043,0.178888,0.203586,0.067869,False,False,False
AMZN,1.692572,-1.358061,-0.28009,0.300147,-1.050443,1.04959,0.710097,-1.161985,2.463324,0.497818,0.278733,0.039116,False,False,True
BAC,1.623766,0.430271,0.951578,0.255835,0.226351,-0.617138,0.924886,0.382609,-2.213237,0.382077,0.711975,0.057789,False,False,False
BP,0.617524,0.729193,0.049029,-0.788082,0.944547,-0.416997,-2.846051,1.59492,-1.493901,0.02161,0.149396,0.173556,True,False,False
C,1.939659,0.28985,0.886179,0.323559,0.333588,-0.193844,0.873784,0.421215,-0.519301,0.407699,0.684687,0.617608,False,False,False
COP,0.398797,1.739252,0.928536,-0.52207,1.30513,-0.181994,-1.144539,1.337827,-0.395802,0.285486,0.217734,0.702597,False,False,False
CVX,0.717199,0.349766,-0.214588,-0.277816,0.333083,-1.018338,-0.943119,0.528697,-3.429412,0.373226,0.611365,0.008963,False,False,True
GOOGL,0.851976,-0.925654,0.180817,-0.406915,-0.183773,0.762488,-0.845616,-0.178564,1.571881,0.422337,0.862719,0.154622,False,False,False
HD,1.199221,-1.843852,-0.197661,0.222282,-2.061962,-0.363951,0.550273,-2.386708,-0.893789,0.597157,0.044082,0.397525,False,True,False
INTC,0.880896,0.533586,0.243378,-0.017176,0.719794,0.200679,-0.024541,0.480875,0.284446,0.981022,0.643481,0.783293,False,False,False


Out of 60 betas, 55 fail to reject, while only 5 are rejected.

So based on what we see here, everything appears fine.

We haven't rejected that--We have not found significant differences between the 2019 betas and those from 2010-2018.

For this model, we would cite "lack of concerns" or "no significant concerns".

### Conclusion on the significant difference between realized betas and predicted betas

Based on the t-tests performed for each stock, we can conclude the following about the significance of the differences between the realized betas and the predicted betas:

Most of the stocks, such as AAPL, GOOGL, BAC, and MSFT, do not show significant differences between their 2019 and 2010-2018 betas across the factors. This suggests that these stocks maintained relatively stable exposures to the Fama-French factors during this period.

Significant Differences: A few stocks show significant changes in their betas from 2010-2018 to 2019. For example:

*  BP shows a significant difference in the Mkt-RF beta.
* AMZN has a significant difference in the HML beta, indicating a shift in its
exposure to the value-growth factor.
* NKE also shows a significant difference in the HML beta, suggesting a meaningful change in its relationship with the value-growth factor.
* PFE displays a significant change in the Mkt-RF beta, highlighting an adjustment in its market exposure.



