# Stock Price Prediction Using Machine Learning and Time Series Analysis

## Introduction

This project aims to predict stock prices using machine learning techniques and time series analysis. Accurate stock price prediction is vital for investors seeking to make informed decisions. By leveraging historical stock data, we can develop models that forecast future price movements.

We will use the following libraries and tools:

- **Quandl** and **yFinance** for data acquisition.
- **Pandas** and **NumPy** for data manipulation.
- **Scikit-learn** for machine learning models like Linear Regression and SVR.
- **Keras** and **TensorFlow** for deep learning models.
- **Plotly** for visualization.

The steps include:
1. **Data Collection**: Obtaining historical stock prices.
2. **Data Preprocessing**: Cleaning and preparing the data.
3. **Model Training**: Building and training machine learning models.
4. **Evaluation**: Assessing model performance.
5. **Visualization**: Plotting actual vs. predicted prices.

Our goal is to create a model that reliably predicts stock prices, aiding investors in decision-making.


### Importing Libraries for Time Series Analysis and Forecasting


In [22]:
import pandas as pd
import numpy as np
import tensorflow as tf
import keras
from keras.preprocessing.sequence import TimeseriesGenerator
import yfinance as yf


 Next, we will download the historical stock price data for SUNPHARMA from Yahoo Finance using the yFinance library.


In [23]:
df = yf.download('SUNPHARMA.NS', period = 'max')

[*********************100%%**********************]  1 of 1 completed


We will then reset the index of the DataFrame to ensure the date information is properly formatted for analysis.


In [24]:
df.reset_index(inplace=True)


In [25]:
df.shape

(7136, 7)

In [26]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1996-01-01,1.812932,1.804799,1.80325,1.80325,1.414609,38730
1,1996-01-02,1.800926,1.800926,1.743219,1.743219,1.367516,77460
2,1996-01-03,1.743219,1.750578,1.750578,1.750578,1.37329,12910
3,1996-01-04,1.746705,1.758324,1.742832,1.742832,1.367213,64550
4,1996-01-05,1.738572,1.738572,1.738572,1.738572,1.363871,12910


In [27]:
df.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
7131,2024-05-22,1547.0,1564.0,1505.400024,1539.300049,1539.300049,3948239
7132,2024-05-23,1510.0,1510.0,1467.0,1495.099976,1495.099976,11618479
7133,2024-05-24,1504.0,1505.699951,1477.099976,1486.699951,1486.699951,5307322
7134,2024-05-27,1486.699951,1501.0,1460.550049,1466.050049,1466.050049,3474806
7135,2024-05-28,1468.949951,1479.099976,1455.949951,1460.5,1460.5,937829


In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7136 entries, 0 to 7135
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       7136 non-null   datetime64[ns]
 1   Open       7136 non-null   float64       
 2   High       7136 non-null   float64       
 3   Low        7136 non-null   float64       
 4   Close      7136 non-null   float64       
 5   Adj Close  7136 non-null   float64       
 6   Volume     7136 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 390.4 KB


___

We will convert the 'Date' column to datetime format and set it as the index of the DataFrame. Then, we'll drop the columns that are not needed for our analysis.


In [29]:
df['Date'] = df.index
df.reset_index(drop=True, inplace=True)
df.drop(columns=['Open', 'High', 'Low', 'Volume'], inplace=True)



In the following code snippet, we manipulate and prepare financial data for analysis and model training. 

We start by extracting the closing prices of a stock from a DataFrame (`df`) and reshaping the data for further processing. The dataset is then split into training and testing sets using a specified ratio. 




In [31]:
closing_prices = df['Close'].values
closing_prices = closing_prices.reshape((-1, 1))

train_test_split_ratio = 0.80
split_index = int(train_test_split_ratio * len(closing_prices))

train_data = closing_prices[:split_index]
test_data = closing_prices[split_index:]

train_dates = df['Date'][:split_index]
test_dates = df['Date'][split_index:]

print(len(train_data))
print(len(test_data))

5708
1428


## Time Series Data Preparation

In this section, we prepare time series data for training and testing using the `TimeseriesGenerator` from Keras. This approach involves generating batches of temporal sequences, which are essential for training recurrent neural networks such as LSTM models.

We set a `look_back` window of 15 time steps, meaning each batch of data will contain sequences of 15 consecutive time steps. This allows the model to learn patterns and dependencies within the data over this window.

### Training Data Generator

We create a `TimeseriesGenerator` for the training data, `train_data`, where each batch consists of sequences from `train_data` of length `look_back`. The batch size is set to 20, optimizing memory usage and training efficiency.

### Testing Data Generator

Similarly, a `TimeseriesGenerator` is created for the testing data, `test_data`, ensuring consistency in data preparation across training and evaluation phases.

These generators will facilitate the training and evaluation of our machine learning model on time series data.


In [32]:
look_back = 15
train_generator = TimeseriesGenerator(train_data, train_data, length=look_back, batch_size=20)
test_generator = TimeseriesGenerator(test_data, test_data, length=look_back, batch_size=20)

## LSTM Model Training

In this section, we define and train a Long Short-Term Memory (LSTM) neural network model using Keras. LSTMs are a type of recurrent neural network (RNN) particularly effective for sequence prediction tasks, making them suitable for time series analysis.

### Model Architecture

We construct a Sequential model in Keras, which allows us to build the model layer by layer:
- The first layer is an LSTM layer with 10 units, using ReLU activation function, and expecting input sequences of `look_back` time steps with 1 feature.
- The output of the LSTM layer is passed to a Dense layer with 1 unit, which outputs a single value prediction.
  
### Compilation

The model is compiled using the Adam optimizer and Mean Squared Error (MSE) loss function, suitable for regression tasks.

### Training

We train the model using the `fit_generator` method, which iterates over the `train_generator` to feed batches of training data. We specify 25 epochs for training and set verbosity to 1 to display training progress.

This setup enables us to train an LSTM model to predict future values based on historical time series data.


In [33]:
from keras.models import Sequential
from keras.layers import LSTM, Dense

model = Sequential()
model.add(
    LSTM(10, activation='relu', input_shape = (look_back, 1))
)

model.add(Dense(1))
model.compile(optimizer='adam', loss = 'mse')

num_epochs = 25
model.fit_generator(train_generator, epochs=num_epochs, verbose = 1)

  model.fit_generator(train_generator, epochs=num_epochs, verbose = 1)


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.src.callbacks.History at 0x25889c5cdd0>

Now, we Plotly to visualize the predictions made by an LSTM model for stock prices. 

- It first predicts the stock prices using the trained model and the test data.
- The actual and predicted stock prices, along with the training data, are plotted on a line chart.
- The x-axis represents the dates, while the y-axis represents the closing prices.
- The title of the chart is set to "SUNPHARMA.NS".


In [34]:
from plotly import graph_objs as go


In [35]:

predicted_values = model.predict_generator(test_generator)

train_data = train_data.reshape((-1))
test_data = test_data.reshape((-1))
predicted_values = predicted_values.reshape((-1))

actual_trace = go.Scatter(
    x = train_dates,
    y = train_data,
    mode = 'lines',
    name = 'Data'
)
predicted_trace = go.Scatter(
    x = test_dates,
    y = predicted_values,
    mode = 'lines',
    name = 'Prediction'
)
actual_price_trace = go.Scatter(
    x = test_dates,
    y = test_data,
    mode='lines',
    name = 'Actual Price'
)
plot_layout = go.Layout(
    title = "SUNPHARMA.NS",
    xaxis = {'title' : "Date"},
    yaxis = {'title' : "Close"}
)
fig = go.Figure(data=[actual_trace, predicted_trace, actual_price_trace], layout=plot_layout)
fig.show()


  predicted_values = model.predict_generator(test_generator)




- **Reshaping Data**: The `train_data` array is reshaped into `reshaped_close_data` for compatibility with the model.

- **Prediction Function (`predict_future_values`)**:
  - `predict_future_values` generates future predictions (`prediction_list`) based on the last `look_back` periods of historical data.
  - It iterates `num_prediction` times to append predicted values using the LSTM model (`model`).
  
- **Date Prediction Function (`generate_prediction_dates`)**:
  - `generate_prediction_dates` determines the last date in the dataset (`last_date`) and generates a list of future dates (`prediction_dates`) for the forecasted periods based on `num_prediction`.

- **Forecasting**:
  - `num_prediction` is set to 30, indicating the number of future periods to forecast.
  - `forecast_values` stores the predicted values obtained from `predict_future_values`.
  - `forecast_dates` stores the corresponding dates for these predictions obtained from `generate_prediction_dates`.



In [36]:
reshaped_close_data = train_data.reshape((-1))

def predict_future_values(num_prediction, model):
    prediction_list = reshaped_close_data[-look_back:]
    
    for _ in range(num_prediction):
        x = prediction_list[-look_back:]
        x = x.reshape((1, look_back, 1))
        out = model.predict(x)[0][0]
        prediction_list = np.append(prediction_list, out)
        
    prediction_list = prediction_list[look_back-1:]
        
    return prediction_list
    
def generate_prediction_dates(num_prediction):
    last_date = df['Date'].values[-1]
    
    prediction_dates = pd.date_range(last_date, periods=num_prediction+1).tolist()
    return prediction_dates

num_prediction = 30

forecast_values = predict_future_values(num_prediction, model)
forecast_dates = generate_prediction_dates(num_prediction)




we now visualize the actual data and predicted values using Plotly.

- **Predictions**: The `model` predicts future values (`predicted_values`) based on the `test_generator`.
  
- **Data Reshaping**: The arrays (`train_data`, `test_data`, `predicted_values`) are reshaped (`reshape((-1))`) to ensure compatibility with Plotly's plotting functions.
  
- **Plotting**:
  - **Trace 1 (`trace1`)**: Displays the actual historical data (`df['Close']`) over time (`df['Date']`).
  
  - **Trace 2 (`trace2`)**: Plots the forecasted values (`forecast_values`) against their corresponding dates (`forecast_dates`).
  
- **Layout**: Defines the layout (`layout`) of the plot, including the title ("SUNPHARMA") and axis labels for date and closing prices (`xaxis` and `yaxis`).

- **Figure**: Combines the traces (`trace1` and `trace2`) with the layout (`layout`) to create a Figure (`fig`).

- **Display**: Finally, `fig.show()` displays the interactive plot using Plotly.



In [37]:
from plotly import graph_objs as go

predicted_values = model.predict_generator(test_generator)

train_data = train_data.reshape((-1))
test_data = test_data.reshape((-1))
predicted_values = predicted_values.reshape((-1))

trace1 = go.Scatter(
    x = df['Date'],
    y = df['Close'],
    mode = 'lines',
    name = 'Data'
)

trace2 = go.Scatter(
    x = forecast_dates,
    y = forecast_values,
    mode = 'lines',
    name = 'Prediction'
)

layout = go.Layout(
    title = "SUNPHARMA",
    xaxis = {'title' : "Date"},
    yaxis = {'title' : "Close"}
)

fig = go.Figure(data=[trace1, trace2], layout=layout)
fig.show()



`Model.predict_generator` is deprecated and will be removed in a future version. Please use `Model.predict`, which supports generators.

