Stock price prediction is a challenging task in the field of finance with applications ranging from personal investment strategies to algorithmic trading. In this article we will explore how to build a stock price prediction model using TensorFlow and Long Short-Term Memory (LSTM) networks a type of recurrent neural network (RNN) which is well-suited for Timeseries data like stock prices.
#####################################################################################################
What is LSTM? LSTM = Long Short-Term Memory It’s a type of Recurrent Neural Network (RNN). Designed to handle sequence data (time series, text, speech). Solves the vanishing gradient problem of vanilla RNNs.
-
Problem with Vanilla RNN RNNs pass information through hidden states step by step. But when sequences are long, gradients shrink during backpropagation (vanishing gradient) → model forgets long-term dependencies. Example: Sentence: “I grew up in France … I speak fluent ___.” To predict French, the model must remember “France” from many words ago. Vanilla RNN struggles with this.
-
LSTM to the Rescue LSTMs introduce a cell state + gates that control what information to keep or forget. Key Idea: Instead of just blindly passing hidden states, LSTMs have a memory cell that can store information for long periods. Gates decide what to write, read, and forget.
-
The LSTM Cell An LSTM has 3 gates:
- Forget Gate (𝑓𝑡): Decides what info to forget from the cell state. Formula: ft=σ(Wf[ht−1,xt]+bf)
- Input Gate (𝑖𝑡) + Candidate Memory (𝐶𝑡): Decides what new info to store in the cell state. Formula: it=σ(Wi[ht−1,xt]+bi) C~t=tanh(WC[ht−1,xt]+bC) -Output Gate (𝑜𝑡): Decides what to output as hidden state. Formula: ot=σ(Wo[ht−1,xt]+bo)
-
Updating Memory Cell state update: Ct=ft∗Ct−1+it∗C~t (keep old info + add new info) Hidden state update: ht=ot∗tanh(Ct)
-
Intuition with Example Imagine reading a paragraph: Forget gate: Drops irrelevant info (e.g., “I went to the store”). Input gate: Stores useful info (e.g., “France”). Cell state: Memory across the paragraph. Output gate: Gives output relevant for prediction (e.g., language = “French”).
-
Applications of LSTMs Natural Language Processing (NLP): text generation, translation, sentiment analysis. Time Series Forecasting: stock prices, weather. Speech Recognition. Music Generation.
-
Why Important? Before Transformers (like GPT), LSTMs were state-of-the-art for sequential tasks. Still used in resource-constrained systems (smaller, efficient compared to Transformers).
#####################################################################################################
STEP 1. Importing Libraries
import pandas as pd Purpose: Primary library for data manipulation and analysis Uses in your project: Reading stock price data from CSV files or APIs Data cleaning and preprocessing Creating DataFrames to organize stock data Handling missing values, date indexing Data filtering, sorting, and transformation Example: df = pd.read_csv('stock_data.csv') import matplotlib.pyplot as plt Purpose: Primary plotting library for Python Uses in your project: Plotting stock price trends over time Visualizing training vs validation loss Creating prediction vs actual price charts Plotting model performance metrics Example: plt.plot(dates, prices), plt.show() import numpy as np Purpose: Fundamental library for numerical computing Uses in your project: Array operations for time series data Mathematical calculations and transformations Reshaping data for neural network input Statistical operations (mean, std, etc.) Example: np.array(stock_prices), np.reshape(data, (samples, timesteps, features)) import tensorflow as tf Purpose: Open-source machine learning platform Uses in your project: Building and training LSTM neural networks Creating sequential models for time series prediction GPU acceleration for faster training Model compilation and optimization from tensorflow import keras Purpose: High-level API for building neural networks Uses in your project: Creating LSTM layers: keras.layers.LSTM() Building model architecture: keras.Sequential() Adding layers: Dense, Dropout, etc. Model compilation with optimizers and loss functions Example: model = keras.Sequential([keras.layers.LSTM(50, return_sequences=True)]) import seaborn as sns Purpose: Statistical data visualization library Uses in your project: Creating correlation heatmaps between stock features Enhanced statistical plots Better-looking default styles for matplotlib Distribution plots for stock price data Example: sns.heatmap(correlation_matrix) import os Purpose: Operating system interface Uses in your project: File path operations Checking if data files exist Creating directories for saving models Environment variable access Example: os.path.exists('data.csv'), os.mkdir('models') from datetime import datetime Purpose: Date and time manipulation Uses in your project: Converting date strings to datetime objects Date arithmetic for time series Timestamp generation for logging Date formatting for plots Example: datetime.strptime('2023-01-01', '%Y-%m-%d') import warnings warnings.filterwarnings("ignore") Purpose: Suppress warning messages Uses in your project: Hides deprecation warnings from libraries Reduces console clutter during training Keeps output clean for better readability Common warnings: TensorFlow version compatibility, pandas future warnings
#####################################################################################################
STEP 2. Loading the Dataset
We will load the dataset containing stock prices over a 5-year period. The read_csv function loads the dataset into a pandas DataFrame for further analysis.
delimiter=',': Specifies that commas separate values in the CSV file. Specifies that columns are separated by commas on_bad_lines='skip': Skips any malformed lines in the CSV file. If there are any corrupted/malformed rows, skip them instead of crashing
data = pd.read_csv('all_stocks_5yr.csv', delimiter=',', on_bad_lines='skip') print(data.shape) Purpose: Shows the dimensions of your dataset Output format: (rows, columns) - e.g., (125000, 7) What it tells you: First number = total number of data points/rows Second number = number of features/columns in the dataset print(data.sample(7)) Purpose: Displays a random sample of 7 rows from your dataset data.sample(7): Randomly selects 7 rows from the DataFrame Why useful: Shows you what the actual data looks like Helps you understand the structure and format Displays column names and sample values
Since the given data consists of a date feature, this is more likely to be an 'object' data type.
data.info()
Whenever we deal with the date or time feature, it should always be in the DateTime data type. Pandas library helps us convert the object date feature to the DateTime data type.
data['date'] = pd.to_datetime(data['date']) Purpose: Converts the 'date' column from string format to proper datetime format What's happening: data['date']: Selects the 'date' column from your DataFrame pd.to_datetime(): Pandas function that converts string dates to datetime objects data['date'] = ...: Replaces the original date column with the converted version Why this is important: Before: Date might be stored as strings like '2013-02-08' or '02/08/2013' After: Converted to proper datetime objects that Python can understand data.info()
##############################################################################################################
STEP 3. Exploratory Data Analysis
Exploratory Data Analysis is a technique that is used to analyze the data through visualization and manipulation. For this project let us visualize the data of famous companies such as Nvidia, Google, Apple, Facebook and so on. First let us consider a few companies and visualize the distribution of open and closed Stock prices through 5 years.
companies = ['AAPL', 'AMD', 'FB', 'GOOGL', 'AMZN', 'NVDA', 'EBAY', 'CSCO', 'IBM']
plt.figure(figsize=(15, 8)) # Purpose: Creates a new matplotlib figure figsize=(15, 8): Sets figure size to 15 inches wide × 8 inches tall Why large size: Accommodates 9 subplots (3×3 grid) for index, company in enumerate(companies, 1): #enumerate(companies, 1): Creates pairs of (index, company) starting from 1 Example: (1, 'AAPL'), (2, 'AMD'), (3, 'FB'), etc. index: Used for subplot positioning company: Current company ticker symbol plt.subplot(3, 3, index) #Purpose: Creates a 3×3 grid of subplots 3, 3: 3 rows, 3 columns index: Position of current subplot (1-9) c = data[data['Name'] == company] Purpose: Filters data for the current company data['Name'] == company: Creates boolean mask (True/False for each row) data[...]: Returns only rows where condition is True c: Contains only data for current company plt.plot(c['date'], c['close'], c="r", label="close", marker="+") plt.plot(c['date'], c['open'], c="g", label="open", marker="^") plt.title(company) plt.legend() plt.tight_layout()
Now let's plot the volume of trade for these 9 stocks as well as a function of time.
plt.figure(figsize=(15, 8)) #Purpose: Creates a second figure for volume data Same size: 15×8 inches for index, company in enumerate(companies, 1): plt.subplot(3, 3, index) c = data[data['Name'] == company] plt.plot(c['date'], c['volume'], c='purple', marker='') #Purpose: Plots trading volume over time c['volume']: Y-axis (number of shares traded) c='purple': Purple color marker='': Star markers plt.title(f"{company} Volume") plt.tight_layout()
First Figure: 9 subplots showing price trends Red lines with + markers: Closing prices Green lines with ^ markers: Opening prices Each subplot shows one company's price movement over time Second Figure: 9 subplots showing volume trends Purple lines with * markers: Trading volume Shows how much trading activity occurred each day Purpose: Data exploration: Understand price patterns and trading activity Company comparison: Compare different stocks visually Pattern identification: Spot trends, volatility, and correlations Model preparation: Helps decide which stocks to focus on for prediction This visualization helps you understand your data before building the LSTM model!
Now let's analyze the data for Apple Stocks from 2013 to 2018.
apple = data[data['Name'] == 'AAPL'] ##Purpose: Extracts only Apple stock data from the entire dataset data['Name'] == 'AAPL': Creates a boolean mask (True/False for each row) data[...]: Returns only rows where the condition is True apple: New DataFrame containing only Apple's stock data Result: All Apple stock records (dates, prices, volume, etc.) prediction_range = apple.loc[(apple['date'] > datetime(2013,1,1)) & (apple['date']<datetime(2018,1,1))] ##Purpose: Filters Apple data to a specific 5-year period (2013-2018) datetime(2013,1,1): Creates date object for January 1, 2013 datetime(2018,1,1): Creates date object for January 1, 2018 apple['date'] > datetime(2013,1,1): Dates after Jan 1, 2013 apple['date'] < datetime(2018,1,1): Dates before Jan 1, 2018 &: Logical AND operator (both conditions must be True) apple.loc[...]: Uses label-based indexing to filter rows prediction_range: Contains Apple data from 2013-2018 only plt.plot(apple['date'],apple['close']) plt.xlabel("Date") plt.ylabel("Close") plt.title("Apple Stock Prices") plt.show()
Now let's select a subset of the whole data as the training data so that we will be left with a subset of the data for the validation part as well.
close_data = apple.filter(['close']) Purpose: Extracts only the 'close' column from Apple's data apple.filter(['close']): Creates a new DataFrame with only the closing prices close_data: Contains only Apple's closing stock prices Why only close prices: LSTM models often focus on one target variable (closing price) Closing price is the most important price for prediction Simplifies the model by reducing input features dataset = close_data.values Purpose: Converts pandas DataFrame to NumPy array close_data.values: Extracts the underlying NumPy array from DataFrame dataset: Now contains closing prices as a NumPy array training = int(np.ceil(len(dataset) * .95)) Purpose: Calculates how many data points to use for training len(dataset): Total number of data points in the dataset * .95: Multiplies by 0.95 (95% of data) np.ceil(): Rounds up to the nearest integer int(): Converts to integer training: Number of data points for training print(training)
Now we have the training data length, next applying scaling and preparing features and labels that are x_train and y_train.
from sklearn.preprocessing import MinMaxScaler Purpose: Imports the MinMaxScaler from scikit-learn MinMaxScaler: Normalizes data to a specific range (usually 0-1) Why needed: Neural networks work better with normalized data
scaler = MinMaxScaler(feature_range=(0, 1)) Purpose: Creates a scaler that will normalize data between 0 and 1 feature_range=(0, 1): Sets the target range (minimum=0, maximum=1) scaler: Object that will transform your data scaled_data = scaler.fit_transform(dataset) Purpose: Normalizes all stock prices to the 0-1 range fit_transform(): fit: Learns the min/max values from your data transform: Applies the scaling to all data points scaled_data: All stock prices now between 0 and 1
train_data = scaled_data[0:int(training), :] Purpose: Extracts the training portion from scaled data 0:int(training): From index 0 to the training size (95% of data) :: All columns (only 1 column in this case) train_data: Contains 95% of scaled data for training
x_train = [] y_train = [] Purpose: Creates empty lists to store training features and labels x_train: Will contain input sequences (60 days of prices) y_train: Will contain target values (next day's price)
for i in range(60, len(train_data)): """ Loop through data starting from index 60 range(60, len(train_data)): Starts at 60, goes to end of training data Why start at 60: Need 60 previous days to predict the next day"""
x_train.append(train_data[i-60:i, 0])
train_data[i-60:i, 0]: Takes 60 consecutive days of prices
i-60:i: From 60 days ago to current day
0: First (and only) column (closing prices)
Example: If i=100, takes days 40-99 to predict day 100
y_train.append(train_data[i, 0])
train_data[i, 0]: The price on day i (what we want to predict)
Example: If i=100, target is the price on day 100
x_train, y_train = np.array(x_train), np.array(y_train) Purpose: Converts lists to NumPy arrays for LSTM np.array(x_train): Converts list of sequences to array np.array(y_train): Converts list of targets to array Why needed: LSTM requires NumPy arrays as input x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1)) Purpose: Reshapes data to LSTM format Current shape: (samples, 60) - 2D array New shape: (samples, 60, 1) - 3D array LSTM requirement: Needs 3D input (samples, timesteps, features) samples: Number of sequences timesteps: 60 (days in each sequence) features: 1 (closing price)
Data Structure Example Before reshaping: x_train.shape = (890, 60) # 890 sequences, each with 60 days After reshaping: x_train.shape = (890, 60, 1) # 890 sequences, 60 timesteps, 1 feature Sequence Creation Example If you have 1000 days of data: Day 1-60: Input sequence → Day 61: Target Day 2-61: Input sequence → Day 62: Target Day 3-62: Input sequence → Day 63: Target ...and so on This creates a sliding window approach where the LSTM learns to predict the next day's price based on the previous 60 days! Why This Approach? Time series: LSTM needs sequential data to learn patterns 60 days: Enough history to capture trends and patterns Normalization: Helps LSTM train faster and more accurately Sliding window: Maximizes training data by creating overlapping sequences This is the standard preparation for LSTM time series prediction!
###############################################################################################
STEP 4. Build LSTM network using TensorFlow
Using TensorFlow, we can easily create LSTM models. It is used in Recurrent Neural Networks for sequence models and time series data. It is used to avoid the vanishing gradient issue which is widely occurred in training RNN. To stack multiple LSTM in TensorFlow it is mandatory to use return_sequences = True. Since our data is time series varying we apply no activation to the output layer and it remains as 1 node.
model = keras.models.Sequential()
Purpose: Creates a sequential neural network model
Sequential: Layers are stacked one after another (linear flow)
model: Empty container that will hold your neural network layers
model.add(keras.layers.LSTM(units=64, return_sequences=True, input_shape=(x_train.shape[1], 1)))
Purpose: Adds the first LSTM layer to the model
units=64: LSTM has 64 memory units (neurons)
More units = more learning capacity, but slower training
return_sequences=True: Returns output for each timestep
Why needed: Next layer is also LSTM, needs sequence output
If False: Would only return final output
input_shape=(x_train.shape[1], 1): Defines input dimensions
x_train.shape[1]: 60 (number of timesteps/days)
1: 1 feature (closing price)
Shape: (60, 1) - 60 days, 1 price per day
model.add(keras.layers.LSTM(units=64))
Purpose: Adds a second LSTM layer
units=64: Same number of units as first layer
return_sequences=False: Default - only returns final output
Why: Next layer is Dense (doesn't need sequences)
Purpose: Second LSTM learns higher-level patterns from first LSTM
model.add(keras.layers.Dense(32))
Purpose: Adds a fully connected (dense) layer
32: 32 neurons in this layer
Function: Combines and processes LSTM outputs
Activation: Default is linear (no activation function)
model.add(keras.layers.Dropout(0.5))
Purpose: Adds regularization to prevent overfitting
0.5: Randomly sets 50% of neurons to 0 during training
Why needed: Prevents model from memorizing training data
Effect: Forces model to learn more robust patterns
model.add(keras.layers.Dense(1))
Purpose: Final layer that produces the prediction
1: Single neuron (predicts one value - next day's price)
Activation: Linear (no activation function)
Output: Single number representing predicted stock price
model.summary()
##Network Architecture Flow Input (60, 1) → LSTM(64) → LSTM(64) → Dense(32) → Dropout(0.5) → Dense(1) → Output
Layer Details Input: 60 days of stock prices LSTM Layer 1: 64 units, returns sequences LSTM Layer 2: 64 units, returns final output Dense Layer: 32 units for feature combination Dropout: 50% dropout for regularization Output: 1 unit for price prediction Why This Architecture? Two LSTM layers: First learns basic patterns, second learns complex patterns 64 units: Good balance between capacity and training speed Dense layer: Combines LSTM outputs into final prediction Dropout: Prevents overfitting on training data Single output: Predicts one value (next day's price)
###############################################################################################
STEP 5. Model Compilation and Training
While compiling a model we provide these three essential parameters:
optimizer – This is the method that helps to optimize the cost function by using gradient descent. loss – The loss function by which we monitor whether the model is improving with training or not. metrics – This helps to evaluate the model by predicting the training and the validation data.
model.compile(optimizer='adam', loss='mean_squared_error')
Purpose: Configures the model for training
optimizer='adam': Sets the optimization algorithm
Adam: Adaptive learning rate optimizer
Why Adam: Works well for most problems, adjusts learning rate automatically
Alternative: Could use 'sgd', 'rmsprop', etc.
loss='mean_squared_error': Sets the loss function
MSE: Measures average squared difference between predicted and actual prices
Why MSE: Good for regression problems (predicting continuous values like stock prices)
Formula: MSE = (1/n) * Σ(actual - predicted)²
history = model.fit(x_train, y_train, epochs=10)
Purpose: Trains the model on your data
x_train: Input sequences (60 days of prices for each sample)
y_train: Target values (actual next day's price)
epochs=10: Number of complete passes through the training data
Epoch 1: Model sees all training data once
Epoch 2: Model sees all training data again
...continues for 10 epochs
history: Stores training metrics (loss values over time)
For predicting we require testing data, so we first create the testing data and then proceed with the model prediction.
test_data = scaled_data[training - 60:, :]
Purpose: Extracts test data from scaled dataset
training - 60: Starts 60 days before the training cutoff
Why -60: Need 60 previous days to predict the first test day
:: Takes all data from that point to the end
test_data: Contains scaled test data plus 60 days before for context
Example: If training=950, takes data from index 890 to end\
x_test = [] ##Will contain 60-day sequences for testing
y_test = dataset[training:, :]
Purpose: Extracts actual stock prices for testing
dataset[training:, :]: Takes original (unscaled) data from training point to end
y_test: Contains actual stock prices to compare with predictions
Why original data: Need real prices for comparison, not scaled values
for i in range(60, len(test_data)): x_test.append(test_data[i-60:i, 0])
Loop through test data starting from index 60
Create 60-day input sequences
test_data[i-60:i, 0]: Takes 60 consecutive days of scaled prices
Same logic: As training data - use 60 days to predict next day
x_test = np.array(x_test)
Purpose: Converts list of sequences to NumPy array
Required: LSTM needs NumPy arrays as input
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
Purpose: Reshapes to LSTM format
Shape: (samples, 60, 1) - 3D array for LSTM
Same format: As training data
predictions = model.predict(x_test)
Purpose: Uses trained model to predict stock prices
model.predict(): Runs forward pass through the network
predictions: Contains predicted stock prices (scaled values)
Shape: Same as y_test
predictions = scaler.inverse_transform(predictions)
Purpose: Converts scaled predictions back to real stock prices
scaler.inverse_transform(): Undoes the MinMax scaling
Result: Predictions now in original price units (dollars)
Why needed: Compare with actual prices in same units
mse = np.mean(((predictions - y_test) ** 2))
Purpose: Calculates Mean Squared Error
predictions - y_test: Difference between predicted and actual prices
** 2: Squares each difference
np.mean(): Averages all squared differences
MSE: Average squared prediction error
rmse = np.sqrt(mse)
Purpose: Calculates Root Mean Squared Error
np.sqrt(mse): Square root of MSE
RMSE: Same units as original data (dollars)
Easier to interpret: RMSE is in same units as stock prices
print("MSE", mse) print("RMSE", np.sqrt(mse))
Now that we have predicted the testing data, let us visualize the final results.
train = apple[:training] test = apple[training:] test['Predictions'] = predictions
plt.figure(figsize=(10, 8)) plt.plot(train['date'], train['close']) plt.plot(test['date'], test[['close', 'Predictions']]) plt.title('Apple Stock Close Price') plt.xlabel('Date') plt.ylabel("Close") plt.legend(['Train', 'Test', 'Predictions'])
###############################################################################################
The chart shows Apple’s stock closing price over time with the "Train" data representing historical prices used for model training, "Test" data for evaluation and "Predictions" showing the model’s forecasted values. It visually demonstrates how well the model’s predictions align with actual stock prices highlighting areas of accurate forecasting and divergence.