# Introduction
The objective of this worksheet is to test the efficacy of a linear regression but with parameters obtained by minimizing a loss function whose terms are exponentially weighted. More specifically, given explanatory variables $X$ and response variable $y$, we will find $\beta$ such that
$$
y_t = X_t\beta_t + \epsilon_t
$$
but we minimize
$$
L(\beta_t) = \frac{1 - \kappa}{1 - \kappa^t}\sum_{s = 0}^{t - 1} \kappa^s \epsilon_s^2
\qquad\text{instead of}\qquad
L_t(\beta_t) = \frac{1}{t}\sum_{s = 0}^{t - 1} \epsilon_s^2.
$$

The value of $\kappa$ is treated as a tuning parameter within our calculations. There are formulas for the best estimate of $\beta_t$ given our objective function. However, our conclusions were not strong with an optimizer so we did not produce closed-form results for our $\beta_t$ estimates.

The theory behind this experiment was that $\beta_t$ may change over time. Therefore, the more recent observations within a time series may be more informative. This would imply $\kappa > 1$.

Within our experiment, we used the Fama-French factors Mrkt - RF, SMB, HML, and MOM to predict the monthly returns of shares of Apple's stock. We used closing prices for Apple's stock and obtained the data from Yahoo Finance using the package yfinance. The data are from 1981 to 2020.

# Packages and Setup

In [155]:
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
from scipy.optimize import minimize
import os
os.chdir('/Users/charlesrambo/Desktop/GitHub data')

In [156]:
# Load the data for Apple stock
aapl = yf.Ticker('AAPL').history(period = 'max')

# Make the Date index a column
aapl.reset_index(inplace = True)

# Remove unneeded columns
aapl = aapl[['Date', 'Close']].rename(columns = {'Close':'Price'})

# Load Fama-French 3-factor plus momentum
FF = pd.read_csv('FF.csv')

# Convert date column into date-time object
FF['Date'] = pd.to_datetime(FF['Date'], format = '%m/%d/%y')

# Fix bad dates
FF.loc[FF['Date'].dt.year > 2020, 'Date'] = FF.loc[FF['Date'].dt.year > 2020, 'Date'].apply(lambda date: '19' + str(date)[2:])

# Resave as date-time object
FF['Date'] = pd.to_datetime(FF['Date'], format = '%Y-%m-%d')

# Convert returns to a decimal 
FF.iloc[:, 1:] = FF.iloc[:, 1:].div(100)

# Create column of 1s
FF['alpha'] = 1

# Merge dataframes
data = aapl.merge(FF, on = 'Date')

# Remove columns that have done their jobs
del aapl, FF

# Apple returns
data['AAPL'] = (data['Price'] - data['Price'].shift(1))/data['Price'].shift(1)

# Drop column
data.drop('Price', axis = 1, inplace = True)

# Drop missing values
data.dropna(axis = 0, inplace = True)

data.sort_values(by = 'Date', inplace = True)

data.reset_index(drop = True, inplace = True)

data.head()

Unnamed: 0,Date,Mkt-RF,SMB,HML,MOM,alpha,AAPL
0,1981-03-31,0.0356,0.0358,0.0069,0.0073,1,-0.282053
1,1981-04-30,-0.0211,0.0442,0.0229,-0.0095,1,0.158162
2,1981-06-30,-0.0236,-0.0083,0.051,-0.059,1,-0.083703
3,1981-07-31,-0.0154,-0.0218,-0.0068,-0.0252,1,-0.038459
4,1981-08-31,-0.0704,-0.0196,0.0484,-0.0112,1,-0.194996


As mentioned previously, we used an optimizer to minimize a positive constant times $\sum_s \kappa^s \epsilon_s^2$. For each time $t$ we used data from time $0$ to $t - 1$ to calculate $\beta_t$. We then used $\beta_t$ to predict Apple's stock returns at time $t$ using the time $t$ Fama-French data. We began with $t = 12$, since fitting a linear model with five parameterss and fewer than tweleve observations seemed unwise.

In [157]:
# Create function to minimize
def obj(X, y, beta, kappa):
    
    # Calculate weights
    w = np.array([kappa**i for i in range(len(y))])  
    
    return np.sum(np.diag(w) @ (y - X @ beta)**2)/np.sum(w)

# Create function to calculate the beta values
def calc_beta(X, y, kappa): 
    
    # Calculate result
    result = minimize(lambda beta: obj(X, y, beta, kappa), x0 = np.zeros(X.shape[1]), method = 'Nelder-Mead')
    
    if result.success == 1:      
        return result.x  
    else:      
        return np.zeros(X.shape[1])

# Create function to obtain MSE
def calc_error(dates, kappa):
    
    SSE = 0
    
    for date in dates[12:]:
        
        X = data.loc[(data['Date'] >= dates[0]) & (data['Date'] < date), data.columns[1:6]].values       
        y = data.loc[(data['Date'] >= dates[0]) & (data['Date'] < date), 'AAPL'].values
      
        beta = calc_beta(X, y, kappa)
        
        y_pred = np.sum(beta * data.loc[data['Date'] == date, data.columns[1:6]].values)
        
        SSE += (data.loc[data['Date'] == date, 'AAPL'].values[0] - y_pred)**2
        
    return SSE/len(dates)


# Training Set Results
We used our training set to find an optimal $\kappa$. As can be seen, we found $\kappa = 1.0075$ worked best.

In [158]:
# Create boolean to denote training observations
train_data_bool = data['Date'].dt.year < 2000

# Save dates in training observations
dates_train = np.unique(data.loc[train_data_bool, 'Date'])

# Find values of kappa to look over
kappas = [0.995, 1, 1.0025, 1.005, 1.0075, 1.01, 1.025, 1.03, 1.035]

# Create data frame to save errors
errors_train_df = pd.DataFrame(index = range(len(kappas)), columns = ['kappa', 'MSE'])

# Save results
errors_train_df['kappa'] = kappas
errors_train_df['MSE'] = [calc_error(dates_train, kappa) for kappa in kappas]

errors_train_df

Unnamed: 0,kappa,MSE
0,0.995,0.03546
1,1.0,0.038985
2,1.0025,0.039307
3,1.005,0.036282
4,1.0075,0.035656
5,1.01,0.03844
6,1.025,0.036788
7,1.03,0.036406
8,1.035,0.037317


# Testing Set Results
Using $\kappa = 1.0075$ we obtained worse results than $\kappa = 1$ (which is an OLS regression). This is a disapointing result. Coincidently $\kappa = 1.0300$ worked about as well as (only slightly worse than) $\kappa = 1$ in our testing set. However, there is no justification for the use of this parameter given our training set results.

In [159]:
# Save test dates
dates_test = np.unique(data.loc[~train_data_bool, 'Date'])

# Only try 1 and best other kappa
kappas = [1, 1.0075]

# Create new data frame
errors_test_df = pd.DataFrame(index = range(len(kappas)), columns = ['kappa', 'MSE'])

# Save errors
errors_test_df['kappa'] = kappas
errors_test_df['MSE'] = [calc_error(dates_test, kappa) for kappa in kappas]

errors_test_df

Unnamed: 0,kappa,MSE
0,1.0,0.01482
1,1.0075,0.015358
