Hybrid machine learning models combine multiple algorithms to enhance predictive performance and robustness by leveraging their unique strengths. These models are particularly useful when a single algorithm cannot capture the complexity of the data, such as sequential patterns or broader trends. For example, combining LSTM for sequence learning with Linear Regression for trend analysis can improve results. The need for a hybrid approach arises when single models show poor performance based on metrics, as combining different models can address diverse data patterns effectively.

In [35]:
import pandas as pd
data = pd.read_csv("./SAFCOM.csv")
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,12/29/23,14.0,14.0,13.8,13.9,396700
1,12/28/23,13.9,14.1,13.6,13.7,5262500
2,12/27/23,13.55,14.0,13.5,13.6,14199200
3,12/22/23,13.8,13.8,13.5,13.55,1740200
4,12/21/23,13.8,13.8,13.5,13.55,5824300


In [36]:
# converting date column to datetime type
data["Date"] = pd.to_datetime(data['Date'])
data.set_index('Date', inplace= True)
data.head()

  data["Date"] = pd.to_datetime(data['Date'])


Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-12-29,14.0,14.0,13.8,13.9,396700
2023-12-28,13.9,14.1,13.6,13.7,5262500
2023-12-27,13.55,14.0,13.5,13.6,14199200
2023-12-22,13.8,13.8,13.5,13.55,1740200
2023-12-21,13.8,13.8,13.5,13.55,5824300


In [37]:
data.columns

Index([' Open', ' High', ' Low', ' Close', ' Volume'], dtype='object')

In [38]:
close = data[[' Close']]

In [39]:
close.head()

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2023-12-29,13.9
2023-12-28,13.7
2023-12-27,13.6
2023-12-22,13.55
2023-12-21,13.55


In [40]:
close.columns

Index([' Close'], dtype='object')

## Choosing the Hybrid Models


The approach involves using LSTM (Long Short-Term Memory) and Linear Regression models to create a hybrid system. LSTM was selected for its ability to capture sequential dependencies and patterns in time-series data, making it well-suited for modeling stock price movements influenced by historical trends. Linear Regression, being a simple model, is used to capture linear relationships and long-term trends in the data. By combining these models, the aim is to balance LSTM's ability to model complex time-dependent patterns with Linear Regression’s focus on broader trends, resulting in a more accurate prediction system. The Close price data is scaled between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model.

scale the Close price data between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model

In [41]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range= (0,1))
close[' Close'] = scaler.fit_transform(close[[' Close']])
close.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  close[' Close'] = scaler.fit_transform(close[[' Close']])


Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2023-12-29,0.175097
2023-12-28,0.159533
2023-12-27,0.151751
2023-12-22,0.14786
2023-12-21,0.14786


prepare the data for LSTM by creating sequences of a defined length (e.g., 60 days) to predict the next day’s price:

In [42]:
import numpy as np
def sequences(data, length= 60):
    X, y = [], []
    for i in range(len(data) - length):
        X.append(data[i:i+length])
        y.append(data[i+length])
    return np.array(X), np.array(y)

sequence_length = 60
X, y = sequences(close[' Close'].values, sequence_length)
X.shape


(187, 60)

In [43]:
y.shape

(187,)

split the sequences into training and test sets (e.g., 80% training, 20% testing):

In [44]:
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

build a sequential LSTM model with layers to capture the temporal dependencies in the data:

In [45]:
!pip install tensorflow



In [46]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense


model = Sequential()
model.add(LSTM(50, return_sequences= True, input_shape= (X_train.shape[1], 1)))
model.add(LSTM(50))
model.add(Dense(1))

  super().__init__(**kwargs)


In [47]:
model.compile(optimizer= 'adam', loss= 'mean_squared_error')
model.fit(X_train, y_train, epochs= 20, batch_size= 32)


Epoch 1/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 58ms/step - loss: 0.2259
Epoch 2/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 59ms/step - loss: 0.0698
Epoch 3/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step - loss: 0.0226
Epoch 4/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 60ms/step - loss: 0.0193
Epoch 5/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 0.0181
Epoch 6/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - loss: 0.0173
Epoch 7/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - loss: 0.0118
Epoch 8/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 0.0138
Epoch 9/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 0.0124
Epoch 10/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step - loss: 0.0102
Epoch 11/20
[1m5/5

<keras.src.callbacks.history.History at 0x7fa2910bdb10>

let’s train the second model. Start by generating lagged features for Linear Regression (e.g., using the past 3 days as predictors):

In [48]:
data = pd.DataFrame(close[' Close'])  # Reset `data` to avoid conflicts
data['Lag_1'] = data[' Close'].shift(1)
data['Lag_2'] = data[' Close'].shift(2)
data['Lag_3'] = data[' Close'].shift(3)
data = data.dropna()

In [49]:
X_lin = data[['Lag_1', 'Lag_2', 'Lag_3']]
y_lin = data[' Close']
X_train_lin, X_test_lin = X_lin[:train_size], X_lin[train_size:]
y_train_lin, y_test_lin = y_lin[:train_size], y_lin[train_size:]

In [50]:
from sklearn.linear_model import LinearRegression
lin_model = LinearRegression()
lin_model.fit(X_train_lin, y_train_lin)

 how to make predictions using LSTM on the test set and inverse transform the scaled predictions

In [51]:
X_test_lstm = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
lstm_predictions = model.predict(X_test_lstm)
lstm_predictions = scaler.inverse_transform(lstm_predictions)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 222ms/step


In [52]:
lstm_predictions

array([[22.64787 ],
       [22.951036],
       [23.208296],
       [23.418299],
       [23.587612],
       [23.727718],
       [23.838188],
       [23.92789 ],
       [23.998783],
       [24.055773],
       [24.107807],
       [24.163826],
       [24.215015],
       [24.260744],
       [24.322842],
       [24.395475],
       [24.44743 ],
       [24.482927],
       [24.474264],
       [24.417322],
       [24.338219],
       [24.245817],
       [24.163055],
       [24.065615],
       [23.960938],
       [23.833336],
       [23.65588 ],
       [23.394485],
       [23.111343],
       [22.845837],
       [22.626644],
       [22.463114],
       [22.367418],
       [22.339937],
       [22.382782],
       [22.481598],
       [22.63591 ],
       [22.822466]], dtype=float32)

In [53]:
lstm_predictions.shape

(38, 1)

In [54]:
lin_predictions = lin_model.predict(X_test_lin)
lin_predictions = scaler.inverse_transform(lin_predictions.reshape(-1, 1))

In [55]:
lin_predictions.shape

(95, 1)

In [56]:
lin_predictions

array([[15.01906389],
       [14.98997303],
       [14.83724363],
       [15.61817097],
       [14.14341185],
       [12.74122055],
       [13.27715656],
       [15.01852043],
       [15.5148065 ],
       [15.69546442],
       [16.02702031],
       [15.72938048],
       [16.31652531],
       [16.03102472],
       [16.55036273],
       [16.50467427],
       [16.709585  ],
       [16.68366492],
       [16.43770109],
       [16.69695328],
       [16.45033281],
       [16.69695328],
       [16.858664  ],
       [17.05992432],
       [17.89952585],
       [17.84802684],
       [18.37750812],
       [18.42621604],
       [18.55667577],
       [17.88925655],
       [17.89272996],
       [18.1089463 ],
       [18.03433114],
       [17.91467132],
       [18.51411956],
       [18.45563521],
       [18.83238709],
       [18.84535997],
       [18.93344049],
       [19.03398857],
       [18.83971359],
       [17.68617725],
       [15.95859461],
       [16.98744323],
       [18.17091728],
       [19

use a weighted average to create hybrid predictions:

In [57]:
min_length = min(len(lstm_predictions), len(lin_predictions))

# Truncate both prediction arrays to the minimum length
lstm_predictions_aligned = lstm_predictions[:min_length]
lin_predictions_aligned = lin_predictions[:min_length]


In [58]:
lstm_predictions_aligned

array([[22.64787 ],
       [22.951036],
       [23.208296],
       [23.418299],
       [23.587612],
       [23.727718],
       [23.838188],
       [23.92789 ],
       [23.998783],
       [24.055773],
       [24.107807],
       [24.163826],
       [24.215015],
       [24.260744],
       [24.322842],
       [24.395475],
       [24.44743 ],
       [24.482927],
       [24.474264],
       [24.417322],
       [24.338219],
       [24.245817],
       [24.163055],
       [24.065615],
       [23.960938],
       [23.833336],
       [23.65588 ],
       [23.394485],
       [23.111343],
       [22.845837],
       [22.626644],
       [22.463114],
       [22.367418],
       [22.339937],
       [22.382782],
       [22.481598],
       [22.63591 ],
       [22.822466]], dtype=float32)

In [59]:
lin_predictions_aligned

array([[15.01906389],
       [14.98997303],
       [14.83724363],
       [15.61817097],
       [14.14341185],
       [12.74122055],
       [13.27715656],
       [15.01852043],
       [15.5148065 ],
       [15.69546442],
       [16.02702031],
       [15.72938048],
       [16.31652531],
       [16.03102472],
       [16.55036273],
       [16.50467427],
       [16.709585  ],
       [16.68366492],
       [16.43770109],
       [16.69695328],
       [16.45033281],
       [16.69695328],
       [16.858664  ],
       [17.05992432],
       [17.89952585],
       [17.84802684],
       [18.37750812],
       [18.42621604],
       [18.55667577],
       [17.88925655],
       [17.89272996],
       [18.1089463 ],
       [18.03433114],
       [17.91467132],
       [18.51411956],
       [18.45563521],
       [18.83238709],
       [18.84535997]])

In [60]:
hybrid_predictions = (0.7 * lstm_predictions_aligned) + (0.3 * lin_predictions_aligned)


In [61]:
hybrid_predictions

array([[20.35922716],
       [20.56271724],
       [20.69698074],
       [21.0782602 ],
       [20.7543513 ],
       [20.43176787],
       [20.66987831],
       [21.25507929],
       [21.45359051],
       [21.54768008],
       [21.68357149],
       [21.63349272],
       [21.84546762],
       [21.79182847],
       [21.99109835],
       [22.0282341 ],
       [22.12607664],
       [22.14314765],
       [22.06329504],
       [22.10121092],
       [21.97185254],
       [21.98115858],
       [21.97173799],
       [21.9639074 ],
       [22.14251324],
       [22.0377424 ],
       [22.0723688 ],
       [21.9040035 ],
       [21.7449431 ],
       [21.35886242],
       [21.20646969],
       [21.15686316],
       [21.06749157],
       [21.01235706],
       [21.22218268],
       [21.27380928],
       [21.49485277],
       [21.62933412]])

Predicting using the Hybrid Model


 make predictions for the next 10 days using our hybrid model

In [70]:
# Making prediction with LSTM
lstm_future = []
last_sequence = X_test[-1].reshape(1, sequence_length, -1)
for _ in range(10):
    prediction = model.predict(last_sequence)[0,0]
    lstm_future.append(prediction)
    prediction_reshaped =np.array([[prediction]]).reshape(1, 1, 1)
    last_sequence= np.append(last_sequence[:, 1:, :], prediction_reshaped, axis=1)

lstm_future = scaler.inverse_transform(np.array(lstm_future).reshape(-1, 1))
lstm_future

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step


array([[22.822466],
       [22.970009],
       [23.090557],
       [23.19028 ],
       [23.274385],
       [23.34717 ],
       [23.412048],
       [23.471655],
       [23.527958],
       [23.582378]], dtype=float32)

In [72]:
# Next 10 Days using Linear Regression
recent = close[' Close'].values[-3:]
lin_future_pred = []
for _ in range(10):
    lin_pred = lin_model.predict(recent.reshape(1, -1))
    lin_future_pred.append(lin_pred)
    recent = np.append(recent[1:], lin_pred)

lin_future_pred = scaler.inverse_transform(np.array(lin_future_pred).reshape(-1, 1))
lin_future_pred



array([[23.67365023],
       [23.6779799 ],
       [23.90148061],
       [23.44773107],
       [23.45247473],
       [23.77173071],
       [23.22158049],
       [23.22048574],
       [23.66431794],
       [22.99458944]])

In [73]:
hybrid_future_pred = (0.7 * lstm_future) + (0.3 * lin_future_pred)
hybrid_future_pred

array([[23.0778212 ],
       [23.18240017],
       [23.33383339],
       [23.26751558],
       [23.32781281],
       [23.4745387 ],
       [23.3549076 ],
       [23.39630434],
       [23.56886554],
       [23.40604056]])

create the final DataFrame to look at the predictions

In [76]:
future = pd.date_range(start=close.index[-1] + pd.Timedelta(days=1), periods=10)
pred_df = pd.DataFrame({
    'Date': future,
    'LSTM': lstm_future.flatten(),
    'Linear Regression': lin_future_pred.flatten(),
    'Hybrid': hybrid_future_pred.flatten()
    })
pred_df.set_index('Date', inplace= True)
pred_df

Unnamed: 0_level_0,LSTM,Linear Regression,Hybrid
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-01-04,22.822466,23.67365,23.077821
2023-01-05,22.970009,23.67798,23.1824
2023-01-06,23.090557,23.901481,23.333833
2023-01-07,23.190281,23.447731,23.267516
2023-01-08,23.274385,23.452475,23.327813
2023-01-09,23.34717,23.771731,23.474539
2023-01-10,23.412048,23.22158,23.354908
2023-01-11,23.471655,23.220486,23.396304
2023-01-12,23.527958,23.664318,23.568866
2023-01-13,23.582378,22.994589,23.406041
