<a href="https://colab.research.google.com/github/Dly27/stock-forecast/blob/main/stock_forecast.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stock Price Prediction using LSTM

## Project Description
This project aims to predict stock prices using an LSTM (Long Short-Term Memory) neural network. The dataset used is the historical stock prices of Apple Inc. (AAPL) from January 1, 2021, to January 1, 2024. The main steps involved in this analysis are:

1. Fetching stock data from Alpha Vantage API
2. Preprocessing the data and creating technical indicators (90-day and 30-day SMA)
3. Use features such as : OHLCV , SMAs
4. Splitting the data into training, validation, and testing sets
5. Training an LSTM model on the preprocessed data
6. Evaluating the model's performance on the testing set

## To do


1.   Expand the project scope: Include multiple stocks or a market index to demonstrate your ability to handle a larger and more diverse dataset.
2.   Compare multiple models: Implement and compare the performance of different models to showcase your knowledge of various modeling techniques and your ability to select the most appropriate one for the task.
3. Enhance model interpretation: Provide insights into the model's predictions, identify the most influential features, and discuss the economic or market factors that may impact the stock price.
4. Improve visualizations: Create more informative and visually appealing charts and plots to effectively communicate your findings and make your project more engaging.
5. Conduct error analysis: Analyze the model's errors, identify potential limitations, and propose strategies to improve the model's performance.







In [None]:
!pip install alpha_vantage

In [4]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from alpha_vantage.timeseries import TimeSeries
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

ALPHA_VANTAGE_API_KEY = 'VC2S9T9RSVXMPOP0'

Functions to prepare data by creating features and targets

In [5]:
def fetch_stock_data(symbol, start_date, end_date):
    ts = TimeSeries(key=ALPHA_VANTAGE_API_KEY, output_format='pandas')
    data, meta_data = ts.get_daily(symbol=symbol, outputsize='full')
    data = data[(data.index >= start_date) & (data.index <= end_date)]
    return data

def prepare_data(data):
    features = data[['1. open', '2. high', '3. low', '4. close', '5. volume', '90_day_sma', '30_day_sma']]
    target = data['4. close']
    return features, target


* Use fetch_stock_data to fetch data from Alpha vantage
* Calculate SMAs and handle missing values at end of SMAs
* Prepare data for preprocessing

In [7]:
#stock_data = fetch_stock_data('AAPL', '2021-01-01', '2024-01-01')

stock_data['90_day_sma'] = stock_data['4. close'].rolling(window=90, min_periods=1).mean()
stock_data['30_day_sma'] = stock_data['4. close'].rolling(window=30, min_periods=1).mean()

last_valid_index = min(stock_data['90_day_sma'].last_valid_index(), stock_data['30_day_sma'].last_valid_index()) # Handles missing SMA values

stock_data = stock_data.loc[:last_valid_index]

features, target = prepare_data(stock_data)

In [None]:
print(stock_data.head)

* Plot stock data and SMAs

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(stock_data.index, stock_data['4. close'])
plt.plot(stock_data.index, stock_data['90_day_sma'])
plt.plot(stock_data.index, stock_data['30_day_sma'])
plt.title('Apple (AAPL) Stock Price')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.grid(True)
plt.show()

* Split data for training
* Scale data
* Reshape data to fit lstm model input

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [52]:
X_train, X_temp, y_train, y_temp = train_test_split(features, target, test_size=0.3, random_state=42, shuffle=False)
X_validation, X_test, y_validation, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, shuffle=False)

pipeline = Pipeline([
    ('scaler', MinMaxScaler())
])

datasets = [X_train, X_validation, X_test]
scaled_datasets = []
for dataset in datasets:
    scaled_dataset = pipeline.fit_transform(dataset)
    scaled_datasets.append(scaled_dataset)

X_train_scaled, X_validation_scaled, X_test_scaled = scaled_datasets

# Create input sequences
time_steps = 30
step_size = 5

X_train_lstm = []
y_train_lstm = []

for i in range(time_steps, len(X_train_scaled) - time_steps, step_size):
    X_train_lstm.append(X_train_scaled[i:i + time_steps])
    y_train_lstm.append(y_train[i + time_steps])

X_train_lstm, y_train_lstm = np.array(X_train_lstm), np.array(y_train_lstm)
X_train_lstm = X_train_lstm.reshape(X_train_lstm.shape[0], X_train_lstm.shape[1], X_train_lstm.shape[2])

# Prepare validation and test sequences
X_validation_lstm, y_validation_lstm = [], []
X_test_lstm, y_test_lstm = [], []

for i in range(time_steps, len(X_validation_scaled) - time_steps, step_size):
    X_validation_lstm.append(X_validation_scaled[i:i + time_steps])
    y_validation_lstm.append(y_validation[i + time_steps])

for i in range(time_steps, len(X_test_scaled) - time_steps, step_size):
    X_test_lstm.append(X_test_scaled[i:i + time_steps])
    y_test_lstm.append(y_test[i + time_steps])

X_validation_lstm, y_validation_lstm = np.array(X_validation_lstm), np.array(y_validation_lstm)
X_test_lstm, y_test_lstm = np.array(X_test_lstm), np.array(y_test_lstm)

X_validation_lstm = X_validation_lstm.reshape(X_validation_lstm.shape[0], X_validation_lstm.shape[1], X_validation_lstm.shape[2])
X_test_lstm = X_test_lstm.reshape(X_test_lstm.shape[0], X_test_lstm.shape[1], X_test_lstm.shape[2])

* Train model with early stopping

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(64, return_sequences=True, input_shape=(X_train_lstm.shape[1], X_train_lstm.shape[2])),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1)
])

optimizer = tf.keras.optimizers.Adam(learning_rate=1, clipvalue=1.0)
model.compile(optimizer=optimizer, loss='mean_squared_error')

early_stopping = EarlyStopping(monitor='val_loss',patience=10, verbose=1, restore_best_weights=True)

history = model.fit(
    X_train_lstm, y_train_lstm,
    epochs=100,
    batch_size=32,
    validation_data=(X_validation_lstm, y_validation_lstm),
    callbacks=[early_stopping]
)


Model prediction

In [66]:
X_test_lstm = []
y_test_lstm = []

for i in range(time_steps, len(X_test_scaled)):
    X_test_lstm.append(X_test_scaled[i - time_steps:i])
    y_test_lstm.append(y_test[i])

X_test_lstm, y_test_lstm = np.array(X_test_lstm), np.array(y_test_lstm)
X_test_lstm = X_test_lstm.reshape(X_test_lstm.shape[0], X_test_lstm.shape[1], X_test_lstm.shape[2])

loss = model.evaluate(X_test_lstm, y_test_lstm)
predictions = model.predict(X_test_lstm)



Model Evaluation

In [63]:
mse = mean_squared_error(y_test_lstm, predictions)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test_lstm, predictions)
directional_accuracy = np.sum((np.sign(y_test_lstm[1:] - y_test_lstm[:-1]) == np.sign(predictions[1:] - predictions[:-1]))) / len(y_test_lstm[1:])
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("Mean Absolute Error (MAE):", mae)
print("Directional Accuracy:", directional_accuracy)

Mean Squared Error (MSE): 797.943238225539
Root Mean Squared Error (RMSE): 28.247889093267464
Mean Absolute Error (MAE): 27.450905972184806
Directional Accuracy: 0.0


Plot actual vs predicted price

In [None]:
test_indices = stock_data.index[-len(y_test_lstm):]

plt.figure(figsize=(12, 6))
plt.plot(test_indices, y_test_lstm, label='Actual Price')
plt.plot(test_indices, predictions, label='Predicted Price')
plt.title('Stock Price Prediction')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

In [None]:
model.summary()