## Applying Machine Learning to Trading Strategies: Using Logistic Regression to Build Momentum-based Trading Strategies - **Patrick Beaudan and Shuoyuan He**

Objectives :

    1. Addressing the drawbacks of classical approach in building investment strategies
    2. Use of ML Model, Logistic Regression, to build a time-series dual momentum trading strategy on the S&P 500 Index
    3. Showing how the proposed model outperforms both buy-and-hold and several base-case dual momentum strategies, significantly increasing returns and reducing risk
    4. Applying the algorithm to other U.S. and international large capitalization equity indices 
    5. Analyzing yields improvements in risk-adjusted performance. 

### 1. Fetching data

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib inline 
plt.style.use('seaborn-v0_8-dark-palette')
import yfinance as yf 
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import warnings
warnings.filterwarnings('ignore') 

#### Tickers 
1. S&P 500 Index: **^GSPC**
2. S&P Small Cap 600 Index (SML): **^SML**  ==> Data not available 
3. S&P Mid Cap 400 Index (MID): **^MID**
4. FTSE 100 Index (UKX): **^FTSE**
5. FTSEurofirst 300 Index (E300): **^FTEU3**  ==> Data not available
6. Tokyo Stock Exchange Price Index (TPX): **^TPX**  ==> Data not available
7. Dow Jones Industrial Average Index (INDU): **^DJI**
8. Dow Jones Transportation Average Index (TRAN): **^DJT**

In [2]:
end = '2018-12-12'

# df_sml = yf.download('^SML',start='1993-12-31',end=end)
df_mid = yf.download('^MID',start='1990-12-31',end=end) 
df_ukx = yf.download('^FTSE',start='1997-12-19',end=end)
# df_e300 = yf.download('^FTEU3',start='1985-12-31',end=end)
# df_tpx = yf.download('^TPX',start='1997-12-19',end=end)
df_dji = yf.download('^DJI',start='1920-01-02',end=end)
df_djt = yf.download('^DJT',start='1920-01-02',end=end) 

data = yf.download('^GSPC',start='1927-12-30',end=end) 
print() 
df_21 = data.copy() 
print('Shape of data : ',data.shape) 
data.tail(3) 

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Shape of data :  (22844, 6)





Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2018-12-07,2691.26001,2708.540039,2623.139893,2633.080078,2633.080078,4242240000
2018-12-10,2630.860107,2647.51001,2583.22998,2637.719971,2637.719971,4162880000
2018-12-11,2664.439941,2674.350098,2621.300049,2636.780029,2636.780029,3963440000


### 2. Defining class to include base-features Momentum and Drawdown

* Momentum features are calculated over time frames of 30, 60, 90, 120, 180, 270, 300, 360
* Drawdown features are calculated over time frames of 15, 60, 90, 120

Also, it is instructed to calculate features by skipping last month. We follow the convention of 252 business days per calendar year and 21 business days per calendar month.

Features are selected based on the fact that observing the change in the shape of the price history using multiple historical time windows for momenta and drawdowns is more pertinent than considering other metrics to predict short-term profitability. So, we use momenta and drawdowns of different timeframes as features

In [3]:
class IncludeFeatures:
    def __init__(self,data):
        self.data = data 

    def calculate_momentum(self,window): # computing the rate of change in the stock's closing price over window days
        self.data[f'momntm_{window}'] =  self.data['Adj Close'] - self.data['Adj Close'].shift(window) 

    def calculate_drawdown(self,window): # Compute the drawdown by finding the peak and trough in the price data
        # calculating cumulative maximum for stocks price
        self.data['Cumulative_Peak'] = self.data['Adj Close'].cummax() # max of cumulative value upto that day
        # calculating drawdown 
        self.data[f'drwdwn_{window}'] = (self.data['Adj Close']-self.data['Cumulative_Peak'])/self.data['Cumulative_Peak']

    def include_features(self):
        
        momentum_windows = [30, 60, 90, 120, 180, 270, 300, 360]
        drawdwn_windows = [15, 60, 90, 120]    

        for days in momentum_windows:
            self.calculate_momentum(days) 

        for days in drawdwn_windows:
            self.calculate_drawdown(days) 
        
        self.data.drop(columns=['Cumulative_Peak','Open','High','Low','Close','Volume'],axis=1,inplace=True)
        return self.data     

In [4]:
include_feat = IncludeFeatures(data) 
data_feat = include_feat.include_features()
data_feat.dropna(inplace=True)
print(data_feat.shape) 
data_feat.head(3) 

(22484, 13)


Unnamed: 0_level_0,Adj Close,momntm_30,momntm_60,momntm_90,momntm_120,momntm_180,momntm_270,momntm_300,momntm_360,drwdwn_15,drwdwn_60,drwdwn_90,drwdwn_120
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1929-06-10,25.27,-0.309999,-0.459999,-0.09,2.74,4.09,5.060001,6.380001,7.610001,-0.041714,-0.041714,-0.041714,-0.041714
1929-06-11,25.43,-0.1,-0.65,-0.02,2.99,4.25,5.07,6.48,7.67,-0.035647,-0.035647,-0.035647,-0.035647
1929-06-12,25.450001,-0.49,-0.59,-0.289999,2.75,4.230001,5.01,6.17,7.730001,-0.034888,-0.034888,-0.034888,-0.034888


In [5]:
print(f'Null values : {data_feat.isna().sum().sum()}') 

Null values : 0


### 3. Analyzing Key Performance Indicators over sample indices over the entire period

KPIs analysed here are Annual Return, Sharpe Ratio, Volatility, Maximum Drawdown, Average Daily Drawdown

In [6]:
class KPIs:
    def __init__(self,data):
        self.datac = data  

    def annual_return(self,datac):
        cumulative_returns = (1+datac['Daily_Return']).prod()-1 
        n_days = datac.shape[0]     # Number of trading days 
        annualized_return = (1+cumulative_returns)**(252/n_days)-1
        return annualized_return 
    
    def sharpe_ratio(self,datac):
        average_return = datac['Daily_Return'].mean() 
        risk_free_rate = 0.01/252  # constant 1% annual risk-free rate
        std_dev = datac['Daily_Return'].std() 
        print(f'Average Return : {average_return:.4f}') 
        print(f'Standard Deviation : {std_dev:.4f}') 
        print() 
        sharpe_ratio = (average_return-risk_free_rate)/std_dev
        return sharpe_ratio 

    def volatility(self,datac):
        daily_volatility = datac['Daily_Return'].std()
        trading_days_per_year = 252 
        annual_volatility = daily_volatility*np.sqrt(trading_days_per_year)   # Annualizing Volatility
        return annual_volatility 
    
    def max_drawdown(self,datac):
        datac['Running_max'] = datac['Adj Close'].cummax() 
        datac['Drawdowns'] = (datac['Adj Close']-datac['Running_max'])/datac['Running_max']

        max_drawdown = datac['Drawdowns'].min() 
        avg_drawdown = datac['Drawdowns'].mean() 

        return max_drawdown, avg_drawdown 

    def calculate_kpi(self):        
        self.datac['Log_Return'] =  np.log(self.datac['Adj Close']/self.datac['Adj Close'].shift(1))
        self.datac['Daily_Return'] = self.datac['Adj Close'].pct_change() 
        self.datac.dropna(inplace=True) 

        annualized_return = self.annual_return(self.datac)
        sharpe_ratio = self.sharpe_ratio(self.datac)
        annual_volatility = self.volatility(self.datac)
        max_drawdown, avg_drawdown = self.max_drawdown(self.datac)

        print(f'Annual Return : {annualized_return*100:.1f}%')
        print(f'Sharpe Ratio : {sharpe_ratio:.4f}')
        print(f'Volatility : {annual_volatility*100:.0f}%')
        print(f'Maximum Drawdown : {max_drawdown*100:.0f}%')
        print(f'Average Drawdown : {avg_drawdown*100:.0f}%') 

#### 3.1 Performance Metrics of S&P 500 Index (SPX) - **^GSPC**

In [7]:
calc_kpi = KPIs(data)   
calc_kpi.calculate_kpi() 

Average Return : 0.0003
Standard Deviation : 0.0119

Annual Return : 5.3%
Sharpe Ratio : 0.0200
Volatility : 19%
Maximum Drawdown : -86%
Average Drawdown : -22%


#### 3.2 Performance Metrics of S&P Mid Cap 400 Index (MID) - **^MID**

In [8]:
calc_kpi = KPIs(df_mid) 
calc_kpi.calculate_kpi() 

Average Return : 0.0005
Standard Deviation : 0.0120

Annual Return : 10.8%
Sharpe Ratio : 0.0367
Volatility : 19%
Maximum Drawdown : -56%
Average Drawdown : -7%


#### 3.3 Performance Metrics of FTSE 100 Index (UKX) - **^FTSE**

In [9]:
calc_kpi = KPIs(df_ukx) 
calc_kpi.calculate_kpi() 

Average Return : 0.0001
Standard Deviation : 0.0118

Annual Return : 1.5%
Sharpe Ratio : 0.0074
Volatility : 19%
Maximum Drawdown : -53%
Average Drawdown : -16%


#### 3.4 Performance Metrics of Dow Jones Industrial Average (INDU) - **^DJI**

In [10]:
calc_kpi = KPIs(df_dji) 
calc_kpi.calculate_kpi() 

Average Return : 0.0004
Standard Deviation : 0.0106

Annual Return : 7.9%
Sharpe Ratio : 0.0299
Volatility : 17%
Maximum Drawdown : -54%
Average Drawdown : -9%


#### 3.4 Performance Metrics of Dow Jones Transportation Average (TRAN) - **^DJT**

In [11]:
calc_kpi = KPIs(df_djt)  
calc_kpi.calculate_kpi() 

Average Return : 0.0004
Standard Deviation : 0.0143

Annual Return : 7.7%
Sharpe Ratio : 0.0249
Volatility : 23%
Maximum Drawdown : -61%
Average Drawdown : -13%


# I. Classical Time Series Dual-Momentum Trading Strategy

#### Strategy

1. The momentum, i.e. the percentage price change of a security, is calculated over a historical time horizon of twelve months, skipping the most recent month 
2. If momentum > threshold (here,5%=0.05) => Invest 
3. If momentum < threshold => the portfolio is moved to cash in the long-only strategy, or moved to a short position in the long-short strategy 
4. This investment decision is revisited at regular intervals of one month 

In [12]:
print(f'Shape before slicing : {df_21.shape}')
n = len(df_21)
# Slice the DataFrame to exclude the last 21 rows for skipping most recent month 
df_21 = df_21.iloc[:n-21]
print(f'Shape after slicing : {df_21.shape}') 
df_21.head(3)  

Shape before slicing : (22844, 6)
Shape after slicing : (22823, 6)


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1927-12-30,17.66,17.66,17.66,17.66,17.66,0
1928-01-03,17.76,17.76,17.76,17.76,17.76,0
1928-01-04,17.719999,17.719999,17.719999,17.719999,17.719999,0


Calculating momentum, percentage change

In [13]:
trading_days_per_month = 21
no_of_months = 12  
time_horizon = trading_days_per_month*no_of_months 

df_21['Momentum'] = df_21['Adj Close'].pct_change(periods=time_horizon)*100
df_21.dropna(inplace=True)
df_21.drop(columns=['Open','High','Low','Close','Volume'],axis=1,inplace=True)

df_21.head(3) 

Unnamed: 0_level_0,Adj Close,Momentum
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1929-01-03,24.860001,40.770107
1929-01-04,24.85,39.921172
1929-01-07,24.25,36.851021


In [14]:
threshold = 5 
df_21['Signals'] = (df_21['Momentum']>=threshold).astype(int) 
print('No of invest signals : ',df_21['Signals'].value_counts()) 
print() 
df_21.head(3) 

No of invest signals :  Signals
1    13404
0     9167
Name: count, dtype: int64



Unnamed: 0_level_0,Adj Close,Momentum,Signals
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1929-01-03,24.860001,40.770107,1
1929-01-04,24.85,39.921172,1
1929-01-07,24.25,36.851021,1


## Machine Learning Approach

### 4. Defining Function to create polynomial features

In [15]:
def degree(data,degree): 

    feature_names = data.columns 
    # feature_names = ['Adj Close', 'momntm_30', 'momntm_60', 'momntm_90', 'momntm_120',
    #                  'momntm_180', 'momntm_270', 'momntm_300', 'momntm_360', 'drwdwn_15',
    #                  'drwdwn_60', 'drwdwn_90', 'drwdwn_120'] 
    
    if data.shape[1] != len(feature_names):
        raise ValueError("The number of features in the data does not match the length of feature names.")

    poly = PolynomialFeatures(degree=degree, include_bias=False)
    poly_feat = poly.fit_transform(data) 
    
    feature_names_poly = poly.get_feature_names_out(input_features=feature_names)
    
    df_poly = pd.DataFrame(poly_feat, columns=feature_names_poly, index=data.index) 
    print(f'Shape of df_poly of degree 1 : ',data.shape) 
    print(f'Shape of df_poly of degree {degree} : ',df_poly.shape) 
    print('Number of duplicate columns : ',len(df_poly.columns)-len(set(df_poly.columns))) 
    return df_poly 

In [16]:
x_quad = degree(data_feat,2)  

Shape of df_poly of degree 1 :  (22483, 17)
Shape of df_poly of degree 2 :  (22483, 170)
Number of duplicate columns :  0


In [17]:
x_cubic = degree(data_feat,3) 

Shape of df_poly of degree 1 :  (22483, 17)
Shape of df_poly of degree 3 :  (22483, 1139)
Number of duplicate columns :  0


### 5. Creating Datasets for training with Target Variable

#### 5.1 Linear dataset

In [18]:
print('Shape of linear dataset before concatenation : ',data_feat.shape)
x_linear = pd.concat([data_feat, df_21[['Signals']]], axis=1)
x_linear.dropna(inplace=True) 
print('Shape of linear dataset after concatenation : ',x_linear.shape)

Shape of linear dataset before concatenation :  (22483, 17)
Shape of linear dataset after concatenation :  (22462, 18)


#### 5.2 Quadratic dataset

In [19]:
print('Shape of quadratic dataset before concatenation : ',x_quad.shape)
x_quad = pd.concat([x_quad, df_21[['Signals']]], axis=1)
x_quad.dropna(inplace=True) 
print('Shape of quadratic dataset after concatenation : ',x_quad.shape) 

Shape of quadratic dataset before concatenation :  (22483, 170)
Shape of quadratic dataset after concatenation :  (22462, 171)


#### 5.3 Cubic dataset

In [20]:
print('Shape of cubic dataset before concatenation : ',x_cubic.shape)
x_cubic = pd.concat([x_cubic, df_21[['Signals']]], axis=1)
x_cubic.dropna(inplace=True) 
print('Shape of cubic dataset after concatenation : ',x_cubic.shape) 

Shape of cubic dataset before concatenation :  (22483, 1139)
Shape of cubic dataset after concatenation :  (22462, 1140)


### 6. Splitting data into Training and Testing datasets

7 training and testing datasets corresponding to different time period is mentioned

In [21]:
def split_data_train(x_data):
    sliced_tr1 = x_data.loc['1927-12-30':'1964-06-30'] 
    sliced_tr2 = x_data.loc['1936-09-10':'1973-03-21']
    sliced_tr3 = x_data.loc['1945-02-02':'1981-07-15']
    sliced_tr4 = x_data.loc['1953-07-07':'1989-11-02']
    sliced_tr5 = x_data.loc['1961-02-15':'1997-06-04']
    sliced_tr6 = x_data.loc['1968-12-24':'2005-03-08']
    sliced_tr7 = x_data.loc['1977-05-20':'2013-08-09'] 

    return sliced_tr1, sliced_tr2, sliced_tr3, sliced_tr4, sliced_tr5, sliced_tr6, sliced_tr7 

In [22]:
def split_data_test(x_data):
    sliced_ts1 = x_data.loc['1964-08-03':'1973-04-23'] 
    sliced_ts2 = x_data.loc['1973-04-24':'1981-08-14']
    sliced_ts3 = x_data.loc['1981-08-17':'1989-12-05']
    sliced_ts4 = x_data.loc['1989-12-06':'1997-07-07']
    sliced_ts5 = x_data.loc['1997-07-08':'2005-04-08']
    sliced_ts6 = x_data.loc['2005-04-11':'2013-09-11']
    sliced_ts7 = x_data.loc['2013-09-12':'2018-12-12'] 

    return sliced_ts1, sliced_ts2, sliced_ts3, sliced_ts4, sliced_ts5, sliced_ts6, sliced_ts7 

In [23]:
sliced_tr1_lin, sliced_tr2_lin, sliced_tr3_lin, sliced_tr4_lin, sliced_tr5_lin, sliced_tr6_lin, \
    sliced_tr7_lin = split_data_train(x_linear) 

sliced_ts1_lin, sliced_ts2_lin, sliced_ts3_lin, sliced_ts4_lin, sliced_ts5_lin, sliced_ts6_lin, \
    sliced_ts7_lin = split_data_test(x_linear) 

sliced_tr1_quad, sliced_tr2_quad, sliced_tr3_quad, sliced_tr4_quad, sliced_tr5_quad, sliced_tr6_quad,\
      sliced_tr7_quad = split_data_train(x_quad)

sliced_ts1_quad, sliced_ts2_quad, sliced_ts3_quad, sliced_ts4_quad, sliced_ts5_quad, sliced_ts6_quad,\
      sliced_ts7_quad = split_data_test(x_quad)

sliced_tr1_cub, sliced_tr2_cub, sliced_tr3_cub, sliced_tr4_cub, sliced_tr5_cub, sliced_tr6_cub, \
    sliced_tr7_cub = split_data_train(x_cubic)

sliced_ts1_cub, sliced_ts2_cub, sliced_ts3_cub, sliced_ts4_cub, sliced_ts5_cub, sliced_ts6_cub, \
    sliced_ts7_cub = split_data_test(x_cubic)

### 7. Class for Training and Evaluating the Model

Model metrics calculated are cost function, accuracy, confusion matrix and classification report. 

To calculate the cost function, also known as the loss function, for logistic regression, we need to use the logistic loss function, which is commonly referred to as cross-entropy loss or log loss.

#### Formula:
The logistic regression cost function $J(\theta)$ is defined as <br>
$$J(\theta)=\dfrac{-1}{m}\Sigma_{i=1}^{m} [y_i log(h_\theta (x_i))+(1-y_i)log(1-h_\theta (x_i))]$$

where:
* $m$ = number of training examples
* $y_i$ = true label for the $i^{th}$ example 
* $h_\theta (x_i)$ = predicted probability of $i^{th}$ example calculated using the sigmoid function <br>
$$h_\theta (x_i) = \dfrac{1}{1+e^{-\theta^Tx_i}}$$

In [24]:
class logistic_regression:
    def __init__(self):
        self.test_size = 0.4
        self.random_state = 42

    def scaling_x(self,X):
        scaler = StandardScaler()
        scaled_X = scaler.fit_transform(X)
        return scaled_X
    
    def cost_func(self,model,x_test,y_test): 
        probabilities = model.predict_proba(x_test)[:,1] # Getting probabilities for class 1 (positive class)
        # m = len(y_test) 
        epsilon = 1e-15
        probabilities = np.clip(probabilities,epsilon,1-epsilon)
        cost = -np.mean(y_test*np.log(probabilities)+(1-y_test)*np.log(1-probabilities))
        return cost 

    def model_metrics(self,model,x_test,y_test):
        y_pred = model.predict(x_test) 
        cost_fn = self.cost_func(model,x_test,y_test)
        accuracy = accuracy_score(y_test,y_pred)
        conf_matrix = confusion_matrix(y_test, y_pred)
        class_report = classification_report(y_test, y_pred)

        print(f'Cost function : {cost_fn}') 
        print(f'Accuracy : {accuracy}')
        print('Confusion Matrix : ')
        print(conf_matrix) 
        print('Classification Report : ')
        print(class_report) 
        return y_pred 

    def training_model(self,X,Y):
        scaled_X = self.scaling_x(X)
        x_train, x_test, y_train, y_test = train_test_split(scaled_X,Y,test_size=self.test_size,
                                                            shuffle=False, random_state=self.random_state)
        model = LogisticRegression(C=1.0)   # C is the regularization parameter
        model.fit(x_train,y_train)
        self.model_metrics(model,x_test,y_test)

logistic = logistic_regression() 

### 8. Evaluation of Linear, Quadratic and Cubic Combination of features on Training set

#### 8.1 Evaluation on Linear Combination of features

In [25]:
sliced_df_lin = [sliced_tr1_lin, sliced_tr2_lin, sliced_tr3_lin, sliced_tr4_lin, sliced_tr5_lin,
                 sliced_tr6_lin, sliced_tr7_lin] 

for df in sliced_df_lin:
    print('='*90)  
    X = df.drop(columns=['Signals'],axis=1)
    Y = df['Signals'] 
    y_pred_lin = logistic.training_model(X,Y) 

Cost function : 0.17383585921793046
Accuracy : 0.9427513528909143
Confusion Matrix : 
[[1059  190]
 [  11 2251]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.99      0.85      0.91      1249
         1.0       0.92      1.00      0.96      2262

    accuracy                           0.94      3511
   macro avg       0.96      0.92      0.94      3511
weighted avg       0.95      0.94      0.94      3511

Cost function : 0.20830680085567868
Accuracy : 0.9439277899343544
Confusion Matrix : 
[[1340   54]
 [ 151 2111]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.90      0.96      0.93      1394
         1.0       0.98      0.93      0.95      2262

    accuracy                           0.94      3656
   macro avg       0.94      0.95      0.94      3656
weighted avg       0.95      0.94      0.94      3656

Cost function : 0.39655830681254806
Accuracy : 0.9130196936542669
Conf

#### 8.2 Evaluation on Quadratic Combination of features

In [26]:
sliced_df_quad = [sliced_tr1_quad, sliced_tr2_quad, sliced_tr3_quad, sliced_tr4_quad, sliced_tr5_quad,
                 sliced_tr6_quad, sliced_tr7_quad] 

for df in sliced_df_quad:
    print('='*90)  
    X = df.drop(columns=['Signals'],axis=1)
    Y = df['Signals'] 
    y_pred_quad = logistic.training_model(X,Y) 

Cost function : 0.7567384382599395
Accuracy : 0.8968954713756765
Confusion Matrix : 
[[ 999  250]
 [ 112 2150]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.90      0.80      0.85      1249
         1.0       0.90      0.95      0.92      2262

    accuracy                           0.90      3511
   macro avg       0.90      0.88      0.88      3511
weighted avg       0.90      0.90      0.90      3511

Cost function : 1.0184041307017997
Accuracy : 0.8569474835886215
Confusion Matrix : 
[[1346   48]
 [ 475 1787]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.74      0.97      0.84      1394
         1.0       0.97      0.79      0.87      2262

    accuracy                           0.86      3656
   macro avg       0.86      0.88      0.85      3656
weighted avg       0.88      0.86      0.86      3656

Cost function : 0.5288016906897562
Accuracy : 0.9105579868708972
Confusi

#### 8.3 Evaluation on Cubic Combination of features

In [27]:
sliced_df_cub = [sliced_tr1_cub, sliced_tr2_cub, sliced_tr3_cub, sliced_tr4_cub, sliced_tr5_cub,
                 sliced_tr6_cub, sliced_tr7_cub] 

for df in sliced_df_cub:
    print('='*90)  
    X = df.drop(columns=['Signals'],axis=1)
    Y = df['Signals'] 
    y_pred_cub = logistic.training_model(X,Y) 

Cost function : 0.47779480688378434
Accuracy : 0.8857875249216748
Confusion Matrix : 
[[1133  116]
 [ 285 1977]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.80      0.91      0.85      1249
         1.0       0.94      0.87      0.91      2262

    accuracy                           0.89      3511
   macro avg       0.87      0.89      0.88      3511
weighted avg       0.89      0.89      0.89      3511

Cost function : 0.6688470402539024
Accuracy : 0.9100109409190372
Confusion Matrix : 
[[1345   49]
 [ 280 1982]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.83      0.96      0.89      1394
         1.0       0.98      0.88      0.92      2262

    accuracy                           0.91      3656
   macro avg       0.90      0.92      0.91      3656
weighted avg       0.92      0.91      0.91      3656

Cost function : 0.8169607640623379
Accuracy : 0.9053610503282276
Confus

### 9. Evaluation of Linear, Quadratic and Cubic Combination of features on Testing set

#### 9.1 Evaluation on Linear Combination of features

In [28]:
sliced_df_lin = [sliced_ts1_lin, sliced_ts2_lin, sliced_ts3_lin, sliced_ts4_lin, sliced_ts5_lin,
                 sliced_ts6_lin, sliced_ts7_lin] 

for df in sliced_df_lin:
    print('='*90)  
    X = df.drop(columns=['Signals'],axis=1)
    Y = df['Signals'] 
    y_pred_linear = logistic.training_model(X,Y) 

Cost function : 0.17219668948732228
Accuracy : 0.9286536248561565
Confusion Matrix : 
[[312  33]
 [ 29 495]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.91      0.90      0.91       345
         1.0       0.94      0.94      0.94       524

    accuracy                           0.93       869
   macro avg       0.93      0.92      0.93       869
weighted avg       0.93      0.93      0.93       869

Cost function : 0.5563159604138853
Accuracy : 0.7312722948870393
Confusion Matrix : 
[[224  17]
 [209 391]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.52      0.93      0.66       241
         1.0       0.96      0.65      0.78       600

    accuracy                           0.73       841
   macro avg       0.74      0.79      0.72       841
weighted avg       0.83      0.73      0.74       841

Cost function : 0.256146633396117
Accuracy : 0.9476813317479191
Confusion Matri

#### 9.2 Evaluation on Quadratic Combination of features

In [29]:
sliced_df_quad = [sliced_ts1_quad, sliced_ts2_quad, sliced_ts3_quad, sliced_ts4_quad, sliced_ts5_quad,
                 sliced_ts6_quad, sliced_ts7_quad] 

for df in sliced_df_quad:
    print('='*90)  
    X = df.drop(columns=['Signals'],axis=1)
    Y = df['Signals'] 
    y_pred_quadratic = logistic.training_model(X,Y) 

Cost function : 0.3570732121684744
Accuracy : 0.9125431530494822
Confusion Matrix : 
[[312  33]
 [ 43 481]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.88      0.90      0.89       345
         1.0       0.94      0.92      0.93       524

    accuracy                           0.91       869
   macro avg       0.91      0.91      0.91       869
weighted avg       0.91      0.91      0.91       869

Cost function : 0.5220162797269994
Accuracy : 0.7859690844233056
Confusion Matrix : 
[[154  87]
 [ 93 507]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.62      0.64      0.63       241
         1.0       0.85      0.84      0.85       600

    accuracy                           0.79       841
   macro avg       0.74      0.74      0.74       841
weighted avg       0.79      0.79      0.79       841

Cost function : 0.8330508548975208
Accuracy : 0.9239001189060642
Confusion Matri

#### 9.3 Evaluation on Cubic Combination of features

In [30]:
sliced_df_cub = [sliced_ts1_cub, sliced_ts2_cub, sliced_ts3_cub, sliced_ts4_cub, sliced_ts5_cub,
                 sliced_ts6_cub, sliced_ts7_cub] 

for df in sliced_df_cub:
    print('='*90)  
    X = df.drop(columns=['Signals'],axis=1)
    Y = df['Signals'] 
    y_pred_cubic = logistic.training_model(X,Y) 

Cost function : 0.6590054013156437
Accuracy : 0.9021864211737629
Confusion Matrix : 
[[311  34]
 [ 51 473]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.86      0.90      0.88       345
         1.0       0.93      0.90      0.92       524

    accuracy                           0.90       869
   macro avg       0.90      0.90      0.90       869
weighted avg       0.90      0.90      0.90       869

Cost function : 0.6961237144636168
Accuracy : 0.7740784780023782
Confusion Matrix : 
[[135 106]
 [ 84 516]]
Classification Report : 
              precision    recall  f1-score   support

         0.0       0.62      0.56      0.59       241
         1.0       0.83      0.86      0.84       600

    accuracy                           0.77       841
   macro avg       0.72      0.71      0.72       841
weighted avg       0.77      0.77      0.77       841

Cost function : 1.5324425049618304
Accuracy : 0.9441141498216409
Confusion Matri

#### **On average, the accuracy score of the cubic model is higher than that of the quadratic and linear models.**

### 10. Calculating Key Performance Indicators of various Logistic regression models

#### 10.1 Benchmark SPX

#### 10.2 Logistic Regression Linear Model

#### 10.3 Logistic Regression Quadratic Model

#### 10.4 Logistic Regression Cubic Model