Hybrid machine learning models combine multiple algorithms to enhance predictive performance and robustness by leveraging their unique strengths. These models are particularly useful when a single algorithm cannot capture the complexity of the data, such as sequential patterns or broader trends. For example, combining LSTM for sequence learning with Linear Regression for trend analysis can improve results. The need for a hybrid approach arises when single models show poor performance based on metrics, as combining different models can address diverse data patterns effectively.

In [1]:
import pandas as pd
data = pd.read_csv("./SAFCOM.csv")
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,12/29/23,14.0,14.0,13.8,13.9,396700
1,12/28/23,13.9,14.1,13.6,13.7,5262500
2,12/27/23,13.55,14.0,13.5,13.6,14199200
3,12/22/23,13.8,13.8,13.5,13.55,1740200
4,12/21/23,13.8,13.8,13.5,13.55,5824300


In [2]:
# converting date column to datetime type
data["Date"] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace= True)
data.head()

  data["Date"] = pd.to_datetime(data['Date'])


Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-12-29,14.0,14.0,13.8,13.9,396700
2023-12-28,13.9,14.1,13.6,13.7,5262500
2023-12-27,13.55,14.0,13.5,13.6,14199200
2023-12-22,13.8,13.8,13.5,13.55,1740200
2023-12-21,13.8,13.8,13.5,13.55,5824300


In [3]:
data.columns

Index([' Open', ' High', ' Low', ' Close', ' Volume'], dtype='object')

In [4]:
close = data[[' Close']]

In [5]:
close.head()

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2023-12-29,13.9
2023-12-28,13.7
2023-12-27,13.6
2023-12-22,13.55
2023-12-21,13.55


In [6]:
close.columns

Index([' Close'], dtype='object')

## Choosing the Hybrid Models


The approach involves using LSTM (Long Short-Term Memory) and Linear Regression models to create a hybrid system. LSTM was selected for its ability to capture sequential dependencies and patterns in time-series data, making it well-suited for modeling stock price movements influenced by historical trends. Linear Regression, being a simple model, is used to capture linear relationships and long-term trends in the data. By combining these models, the aim is to balance LSTM's ability to model complex time-dependent patterns with Linear Regression’s focus on broader trends, resulting in a more accurate prediction system. The Close price data is scaled between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model.

scale the Close price data between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model

In [7]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range= (0,1))
close[' Close'] = scaler.fit_transform(close[[' Close']])
close.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  close[' Close'] = scaler.fit_transform(close[[' Close']])


Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2023-12-29,0.175097
2023-12-28,0.159533
2023-12-27,0.151751
2023-12-22,0.14786
2023-12-21,0.14786


prepare the data for LSTM by creating sequences of a defined length (e.g., 60 days) to predict the next day’s price:

In [8]:
import numpy as np
def sequences(data, length= 60):
    X, y = [], []
    for i in range(len(data) - length):
        X.append(data[i:i+length])
        y.append(data[i+length])
    return np.array(X), np.array(y)

sequence_length = 60
X, y = sequences(close[' Close'].values, sequence_length)
X.shape


(187, 60)

In [9]:
y.shape

(187,)

split the sequences into training and test sets (e.g., 80% training, 20% testing):

In [10]:
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

build a sequential LSTM model with layers to capture the temporal dependencies in the data:

In [12]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense


model = Sequential()
model.add(LSTM(50, return_sequences= True, input_shape= (X_train.shape[1], 1)))
model.add(LSTM(50))
model.add(Dense(1))

  super().__init__(**kwargs)


In [13]:
model.compile(optimizer= 'adam', loss= 'mean_squared_error')
model.fit(X_train, y_train, epochs= 20, batch_size= 32)


Epoch 1/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 29ms/step - loss: 0.0735
Epoch 2/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step - loss: 0.0296
Epoch 3/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - loss: 0.0118
Epoch 4/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 0.0180
Epoch 5/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - loss: 0.0104
Epoch 6/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step - loss: 0.0127
Epoch 7/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - loss: 0.0099
Epoch 8/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 0.0102
Epoch 9/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step - loss: 0.0096
Epoch 10/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 0.0087
Epoch 11/20
[1m5/5

<keras.src.callbacks.history.History at 0x23977ddfc50>

let’s train the second model. Start by generating lagged features for Linear Regression (e.g., using the past 3 days as predictors):

In [14]:
data = pd.DataFrame(close[' Close'])  # Reset `data` to avoid conflicts
data['Lag_1'] = data[' Close'].shift(1)
data['Lag_2'] = data[' Close'].shift(2)
data['Lag_3'] = data[' Close'].shift(3)
data = data.dropna()

In [15]:
X_lin = data[['Lag_1', 'Lag_2', 'Lag_3']]
y_lin = data[' Close']
X_train_lin, X_test_lin = X_lin[:train_size], X_lin[train_size:]
y_train_lin, y_test_lin = y_lin[:train_size], y_lin[train_size:]

In [16]:
from sklearn.linear_model import LinearRegression
lin_model = LinearRegression()
lin_model.fit(X_train_lin, y_train_lin)

 how to make predictions using LSTM on the test set and inverse transform the scaled predictions

In [17]:
X_test_lstm = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
lstm_predictions = model.predict(X_test_lstm)
lstm_predictions = scaler.inverse_transform(lstm_predictions)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 329ms/step


In [18]:
lstm_predictions

array([[24.022303],
       [24.313515],
       [24.548939],
       [24.727535],
       [24.860954],
       [24.96732 ],
       [25.04322 ],
       [25.101646],
       [25.143208],
       [25.173983],
       [25.206696],
       [25.253387],
       [25.296318],
       [25.333858],
       [25.401903],
       [25.487158],
       [25.536621],
       [25.561373],
       [25.517614],
       [25.404099],
       [25.268202],
       [25.123264],
       [25.004953],
       [24.86805 ],
       [24.725311],
       [24.549929],
       [24.299335],
       [23.923006],
       [23.532635],
       [23.18934 ],
       [22.931847],
       [22.76536 ],
       [22.701828],
       [22.734322],
       [22.86139 ],
       [23.055471],
       [23.315956],
       [23.604107]], dtype=float32)

In [19]:
lstm_predictions.shape

(38, 1)

In [20]:
lin_predictions = lin_model.predict(X_test_lin)
lin_predictions = scaler.inverse_transform(lin_predictions.reshape(-1, 1))

In [21]:
lin_predictions.shape

(95, 1)

In [22]:
lin_predictions

array([[15.01906389],
       [14.98997303],
       [14.83724363],
       [15.61817097],
       [14.14341185],
       [12.74122055],
       [13.27715656],
       [15.01852043],
       [15.5148065 ],
       [15.69546442],
       [16.02702031],
       [15.72938048],
       [16.31652531],
       [16.03102472],
       [16.55036273],
       [16.50467427],
       [16.709585  ],
       [16.68366492],
       [16.43770109],
       [16.69695328],
       [16.45033281],
       [16.69695328],
       [16.858664  ],
       [17.05992432],
       [17.89952585],
       [17.84802684],
       [18.37750812],
       [18.42621604],
       [18.55667577],
       [17.88925655],
       [17.89272996],
       [18.1089463 ],
       [18.03433114],
       [17.91467132],
       [18.51411956],
       [18.45563521],
       [18.83238709],
       [18.84535997],
       [18.93344049],
       [19.03398857],
       [18.83971359],
       [17.68617725],
       [15.95859461],
       [16.98744323],
       [18.17091728],
       [19

use a weighted average to create hybrid predictions:

In [23]:
min_length = min(len(lstm_predictions), len(lin_predictions))

# Truncate both prediction arrays to the minimum length
lstm_predictions_aligned = lstm_predictions[:min_length]
lin_predictions_aligned = lin_predictions[:min_length]


In [24]:
lstm_predictions_aligned

array([[24.022303],
       [24.313515],
       [24.548939],
       [24.727535],
       [24.860954],
       [24.96732 ],
       [25.04322 ],
       [25.101646],
       [25.143208],
       [25.173983],
       [25.206696],
       [25.253387],
       [25.296318],
       [25.333858],
       [25.401903],
       [25.487158],
       [25.536621],
       [25.561373],
       [25.517614],
       [25.404099],
       [25.268202],
       [25.123264],
       [25.004953],
       [24.86805 ],
       [24.725311],
       [24.549929],
       [24.299335],
       [23.923006],
       [23.532635],
       [23.18934 ],
       [22.931847],
       [22.76536 ],
       [22.701828],
       [22.734322],
       [22.86139 ],
       [23.055471],
       [23.315956],
       [23.604107]], dtype=float32)

In [25]:
lin_predictions_aligned

array([[15.01906389],
       [14.98997303],
       [14.83724363],
       [15.61817097],
       [14.14341185],
       [12.74122055],
       [13.27715656],
       [15.01852043],
       [15.5148065 ],
       [15.69546442],
       [16.02702031],
       [15.72938048],
       [16.31652531],
       [16.03102472],
       [16.55036273],
       [16.50467427],
       [16.709585  ],
       [16.68366492],
       [16.43770109],
       [16.69695328],
       [16.45033281],
       [16.69695328],
       [16.858664  ],
       [17.05992432],
       [17.89952585],
       [17.84802684],
       [18.37750812],
       [18.42621604],
       [18.55667577],
       [17.88925655],
       [17.89272996],
       [18.1089463 ],
       [18.03433114],
       [17.91467132],
       [18.51411956],
       [18.45563521],
       [18.83238709],
       [18.84535997]])

In [26]:
hybrid_predictions = (0.7 * lstm_predictions_aligned) + (0.3 * lin_predictions_aligned)


In [27]:
hybrid_predictions

array([[21.32133005],
       [21.51645259],
       [21.6354306 ],
       [21.99472596],
       [21.64569155],
       [21.29948943],
       [21.51340133],
       [22.07670786],
       [22.25468647],
       [22.33042735],
       [22.45279184],
       [22.39618421],
       [22.60237985],
       [22.54300721],
       [22.74644084],
       [22.79241333],
       [22.88851065],
       [22.89806098],
       [22.79363981],
       [22.79195532],
       [22.62284017],
       [22.59537062],
       [22.56106676],
       [22.52561203],
       [22.67757603],
       [22.53935793],
       [22.52278727],
       [22.2739681 ],
       [22.0398459 ],
       [21.59931519],
       [21.42011086],
       [21.36843581],
       [21.30157856],
       [21.2884267 ],
       [21.55720752],
       [21.67552075],
       [21.97088503],
       [22.17648282]])

Predicting using the Hybrid Model


 make predictions for the next 10 days using our hybrid model

In [28]:
# Making prediction with LSTM
lstm_future = []
last_sequence = X_test[-1].reshape(1, sequence_length, -1)
for _ in range(10):
    prediction = model.predict(last_sequence)[0,0]
    lstm_future.append(prediction)
    prediction_reshaped =np.array([[prediction]]).reshape(1, 1, 1)
    last_sequence= np.append(last_sequence[:, 1:, :], prediction_reshaped, axis=1)

lstm_future = scaler.inverse_transform(np.array(lstm_future).reshape(-1, 1))
lstm_future

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step


array([[23.604107],
       [23.869902],
       [24.128767],
       [24.387657],
       [24.651371],
       [24.923386],
       [25.206242],
       [25.501835],
       [25.811628],
       [26.136782]], dtype=float32)

In [29]:
# Next 10 Days using Linear Regression
recent = close[' Close'].values[-3:]
lin_future_pred = []
for _ in range(10):
    lin_pred = lin_model.predict(recent.reshape(1, -1))
    lin_future_pred.append(lin_pred)
    recent = np.append(recent[1:], lin_pred)

lin_future_pred = scaler.inverse_transform(np.array(lin_future_pred).reshape(-1, 1))
lin_future_pred



array([[23.67365023],
       [23.6779799 ],
       [23.90148061],
       [23.44773107],
       [23.45247473],
       [23.77173071],
       [23.22158049],
       [23.22048574],
       [23.66431794],
       [22.99458944]])

In [30]:
hybrid_future_pred = (0.7 * lstm_future) + (0.3 * lin_future_pred)
hybrid_future_pred

array([[23.6249699 ],
       [23.81232494],
       [24.0605809 ],
       [24.10567896],
       [24.29170098],
       [24.57788839],
       [24.61084327],
       [24.81742975],
       [25.16743541],
       [25.19412459]])

create the final DataFrame to look at the predictions

In [31]:
future = pd.date_range(start=close.index[-1] + pd.Timedelta(days=1), periods=10)
pred_df = pd.DataFrame({
    'Date': future,
    'LSTM': lstm_future.flatten(),
    'Linear Regression': lin_future_pred.flatten(),
    'Hybrid': hybrid_future_pred.flatten()
    })
pred_df.set_index('Date', inplace= True)
pred_df

Unnamed: 0_level_0,LSTM,Linear Regression,Hybrid
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-04,23.604107,23.67365,23.62497
2023-01-05,23.869902,23.67798,23.812325
2023-01-06,24.128767,23.901481,24.060581
2023-01-07,24.387657,23.447731,24.105679
2023-01-08,24.651371,23.452475,24.291701
2023-01-09,24.923386,23.771731,24.577888
2023-01-10,25.206242,23.22158,24.610843
2023-01-11,25.501835,23.220486,24.81743
2023-01-12,25.811628,23.664318,25.167435
2023-01-13,26.136782,22.994589,25.194125
