## Applying Machine Learning to Trading Strategies: Using Logistic Regression to Build Momentum-based Trading Strategies - **Patrick Beaudan and Shuoyuan He**

Objectives :

    1. Addressing the drawbacks of classical approach in building investment strategies
    2. Use of ML Model, Logistic Regression, to build a time-series dual momentum trading strategy on the S&P 500 Index
    3. Showing how the proposed model outperforms both buy-and-hold and several base-case dual momentum strategies, significantly increasing returns and reducing risk
    4. Applying the algorithm to other U.S. and international large capitalization equity indices 
    5. Analyzing yields improvements in risk-adjusted performance. 

### 1. Fetching data

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib inline 
plt.style.use('seaborn-v0_8-dark-palette')
import yfinance as yf 
from sklearn.preprocessing import PolynomialFeatures 
import warnings
warnings.filterwarnings('ignore') 

#### Tickers 
1. S&P 500 Index: **^GSPC**
2. S&P Small Cap 600 Index (SML): **^SML**
3. S&P Mid Cap 400 Index (MID): **^MID**
4. FTSE 100 Index (UKX): **^FTSE**
5. FTSEurofirst 300 Index (E300): **^FTEU3**
6. Tokyo Stock Exchange Price Index (TPX): **^TPX**
7. Dow Jones Industrial Average Index (INDU): **^DJI**
8. Dow Jones Transportation Average Index (TRAN): **^DJT**

In [2]:
start = '1927-12-30'
end = '2018-12-12'
tickers = ['^GSPC', '^SML', '^MID', '^FTSE', '^FTEU3', '^TPX', '^DJI', '^DJT'] 

In [3]:
data = yf.download('^GSPC',start=start,end=end) 
data.tail(3) 

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-07,2691.26001,2708.540039,2623.139893,2633.080078,2633.080078,4242240000
2018-12-10,2630.860107,2647.51001,2583.22998,2637.719971,2637.719971,4162880000
2018-12-11,2664.439941,2674.350098,2621.300049,2636.780029,2636.780029,3963440000


# I. Classical Time Series Dual-Momentum Trading Strategy

### 2. Defining class to include base-features Momentum and Drawdown

* Momentum features are calculated over time frames of 30, 60, 90, 120, 180, 270, 300, 360
* Drawdown features are calculated over time frames of 15, 60, 90, 120

Also, it is instructed to calculate features by skipping last month. We follow the convention of 252 business days per calendar year and 21 business days per calendar month.

In [4]:
n = len(data)

# Slice the DataFrame to exclude the last 21 rows
df_21 = data.iloc[:n-21] 

In [5]:
class IncludeFeatures:
    def __init__(self,data):
        self.data = data 

    def calculate_momentum(self,window): # computing the rate of change in the stock's closing price over window days
        self.data[f'momntm_{window}'] =  self.data['Adj Close'] - self.data['Adj Close'].shift(window) 

    def calculate_drawdown(self,window): # Compute the drawdown by finding the peak and trough in the price data
        # calculating cumulative maximum for stocks price
        self.data['Cumulative_Peak'] = self.data['Adj Close'].cummax() # max of cumulative value upto that day
        # calculating drawdown 
        self.data[f'drwdwn_{window}'] = (self.data['Adj Close']-self.data['Cumulative_Peak'])/self.data['Cumulative_Peak']

    def include_features(self):
        
        momentum_windows = [30, 60, 90, 120, 180, 270, 300, 360]
        drawdwn_windows = [15, 60, 90, 120]    

        for days in momentum_windows:
            self.calculate_momentum(days) 

        for days in drawdwn_windows:
            self.calculate_drawdown(days) 
        
        self.data.drop(columns=['Cumulative_Peak','Open','High','Low','Close','Volume'],axis=1,inplace=True)
        return self.data     

In [6]:
include_feat = IncludeFeatures(df_21) 
data_feat = include_feat.include_features()
data_feat.dropna(inplace=True)
print(data_feat.shape) 
data_feat.head(3) 

(22463, 13)


Unnamed: 0_level_0,Adj Close,momntm_30,momntm_60,momntm_90,momntm_120,momntm_180,momntm_270,momntm_300,momntm_360,drwdwn_15,drwdwn_60,drwdwn_90,drwdwn_120
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1929-06-10,25.27,-0.309999,-0.459999,-0.09,2.74,4.09,5.060001,6.380001,7.610001,-0.041714,-0.041714,-0.041714,-0.041714
1929-06-11,25.43,-0.1,-0.65,-0.02,2.99,4.25,5.07,6.48,7.67,-0.035647,-0.035647,-0.035647,-0.035647
1929-06-12,25.450001,-0.49,-0.59,-0.289999,2.75,4.230001,5.01,6.17,7.730001,-0.034888,-0.034888,-0.034888,-0.034888


In [7]:
print(f'Null values : {data_feat.isna().sum().sum()}') 

Null values : 0


### 3. Defining Function to create polynomial features

In [56]:
def degree2(data):
    print('Shape of data : ',data.shape) 
    feature_names = ['Adj Close', 'momntm_30', 'momntm_60', 'momntm_90', 'momntm_120',
                     'momntm_180', 'momntm_270', 'momntm_300', 'momntm_360', 'drwdwn_15',
                     'drwdwn_60', 'drwdwn_90', 'drwdwn_120'] 
    
    if data.shape[1] != len(feature_names):
        raise ValueError("The number of features in the data does not match the length of feature names.")

    poly2 = PolynomialFeatures(degree=2, include_bias=False)
    poly2_feat = poly2.fit_transform(data) 
    
    feature_names_poly2 = poly2.get_feature_names_out(input_features=feature_names)
    
    df_poly2 = pd.DataFrame(poly2_feat, columns=feature_names_poly2, index=data.index) 
    print('Shape of df_poly2 : ',df_poly2.shape) 

    df_combined2 = pd.concat([data, df_poly2], axis=1)
    print('Shape of df_combined2 : ',df_combined2.shape)

    return df_combined2 

def degree3(data):
    print('Shape of data : ',data.shape) 
    feature_names = ['Adj Close', 'momntm_30', 'momntm_60', 'momntm_90', 'momntm_120',
                     'momntm_180', 'momntm_270', 'momntm_300', 'momntm_360', 'drwdwn_15',
                     'drwdwn_60', 'drwdwn_90', 'drwdwn_120'] 
    
    if data.shape[1] != len(feature_names):
        raise ValueError("The number of features in the data does not match the length of feature names.")

    df_poly2 = degree2(data) 
    print('Shape of df_poly2 : ',df_poly2.shape) 
    poly3 = PolynomialFeatures(degree=3, include_bias=False)
    poly3_feat = poly3.fit_transform(data) 
    
    feature_names_poly3 = poly3.get_feature_names_out(input_features=feature_names)
    
    df_poly3 = pd.DataFrame(poly3_feat, columns=feature_names_poly3, index=data.index) 
    print('Shape of df_poly3 : ',df_poly3.shape) 

    df_combined3 = pd.concat([data, df_poly2, df_poly3], axis=1)
    print('Shape of df_combined3 : ',df_combined3.shape)
    
    return df_combined3 

In [55]:
X = degree2(data_feat) 

Shape of data :  (22463, 13)
Shape of df_poly2 :  (22463, 104)
Shape of df_combined2 :  (22463, 117)


In [57]:
Y = degree3(data_feat) 

Shape of data :  (22463, 13)
Shape of data :  (22463, 13)
Shape of df_poly2 :  (22463, 104)
Shape of df_combined2 :  (22463, 117)
Shape of df_poly2 :  (22463, 117)
Shape of df_poly3 :  (22463, 559)
Shape of df_combined3 :  (22463, 689)


In [58]:
print('Length of Y.columns : ',len(Y.columns))
print('Length of unique column names: ',len(set(Y.columns))) 
print('Number of duplicate columns : ',len(Y.columns)-len(set(Y.columns))) 

Length of Y.columns :  689
Length of unique column names:  560
Number of duplicate columns :  129


Removing duplicate columns

In [59]:
print('Shape before removing duplicates : ',Y.shape)
df = pd.DataFrame()
for cols in list(set(Y.columns)):
    df[cols] = Y[cols] 

print('Shape after removing duplicates : ',df.shape) 

Shape before removing duplicates :  (22463, 689)


ValueError: Cannot set a DataFrame with multiple columns to the single column momntm_120_momntm_360