#TIMESERIES ANALYSIS AND FORECASTING FOR STOCK MARKET- Ashish

import the necessary libraries

In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from prophet import Prophet
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

In [14]:
!pip install prophet



Training and Testing  the model with random stock data

In [15]:

np.random.seed(42)
date_rng = pd.date_range(start='2018-01-01', end='2023-12-31', freq='B')
random_walk = np.random.randn(len(date_rng)).cumsum()
trend = np.linspace(start=0, stop=150, num=len(date_rng))
stock_price = 300 + random_walk * 0.75 + trend

stock_df = pd.DataFrame(stock_price, index=date_rng, columns=['Close'])

test_size = 100
train_data = stock_df[:-test_size]
test_data = stock_df[-test_size:]

#Models

ARIMA Model: An ARIMA model was implemented, trained on the training data, and used to forecast stock prices on the test set. The Root Mean Squared Error (RMSE) was calculated to evaluate its performance.

In [16]:
#ARIMA
model = ARIMA(train_data['Close'], order=(5, 1, 0))
model_fit = model.fit()
arima_predictions = model_fit.forecast(steps=test_size)
arima_rmse = np.sqrt(mean_squared_error(test_data['Close'], arima_predictions))
print(f"ARIMA Model RMSE: {arima_rmse:.4f}")

ARIMA Model RMSE: 12.4180


Prophet Model: A Prophet model was implemented, trained, and used for forecasting. Its RMSE was also calculated. (Note: There were initial syntax and data preparation issues that were resolved during the process).

In [17]:
#PROPHET
prophet_train_df = train_data.reset_index().rename(columns={'index': 'ds', 'Close': 'y'})
prophet_model = Prophet()
prophet_model.fit(prophet_train_df)
future = prophet_model.make_future_dataframe(periods=test_size, freq='B')
forecast = prophet_model.predict(future)
prophet_predictions = forecast['yhat'][-test_size:]
prophet_rmse = np.sqrt(mean_squared_error(test_data['Close'], prophet_predictions))
print(f"Prophet Model RMSE: {prophet_rmse:.4f}")

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpqzr6n5lm/uu7si60n.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpqzr6n5lm/p4gngrpz.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.12/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=18381', 'data', 'file=/tmp/tmpqzr6n5lm/uu7si60n.json', 'init=/tmp/tmpqzr6n5lm/p4gngrpz.json', 'output', 'file=/tmp/tmpqzr6n5lm/prophet_modelxtouddzn/prophet_model-20250930162909.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
16:29:09 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
16:29:10 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Prophet Model RMSE: 8.8289


LSTM Model: An LSTM (Long Short-Term Memory) model, a type of neural network suitable for sequence data, was implemented. The data was scaled and prepared into sequences for the LSTM. The model was trained and used for prediction, and its RMSE was calculated.

In [18]:
#LSTM
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(stock_df['Close'].values.reshape(-1, 1))

sequence_length = 60
X, y = [], []
for i in range(sequence_length, len(scaled_data)):
    X.append(scaled_data[i-sequence_length:i, 0])
    y.append(scaled_data[i, 0])

X, y = np.array(X), np.array(y)
X = np.reshape(X, (X.shape[0], X.shape[1], 1))

X_train, X_test = X[:-test_size], X[-test_size:]
y_train, y_test = y[:-test_size], y[-test_size:]

lstm_model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], 1)),
    LSTM(50, return_sequences=False),
    Dense(25),
    Dense(1)
])

lstm_model.compile(optimizer='adam', loss='mean_squared_error')
lstm_model.fit(X_train, y_train, batch_size=32, epochs=20, verbose=0)
lstm_predictions_scaled = lstm_model.predict(X_test)
lstm_predictions = scaler.inverse_transform(lstm_predictions_scaled)
lstm_rmse = np.sqrt(mean_squared_error(test_data['Close'], lstm_predictions))
print(f"LSTM Model RMSE: {lstm_rmse:.4f}")

  super().__init__(**kwargs)


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 122ms/step
LSTM Model RMSE: 3.6556


#Visualisations

In [33]:
models = ['ARIMA', 'Prophet', 'LSTM']
rmse_values = [arima_rmse, prophet_rmse, lstm_rmse]

fig = go.Figure(data=[go.Bar(x=models, y=rmse_values, marker_color=['blue', 'green', 'red'])])
fig.update_layout(
    title='Comparison of Model RMSE Values',
    xaxis_title='Model',
    yaxis_title='RMSE'
)
fig.show()

In [36]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=test_data.index, y=test_data['Close'], mode='lines', name='Actual Prices'))
fig.add_trace(go.Scatter(x=test_data.index, y=arima_predictions, mode='lines', name='ARIMA Predictions'))
fig.add_trace(go.Scatter(x=test_data.index, y=prophet_predictions, mode='lines', name='Prophet Predictions'))
fig.add_trace(go.Scatter(x=test_data.index, y=lstm_predictions.flatten(), mode='lines', name='LSTM Predictions')) # Flatten predictions for Plotly

fig.update_layout(
    title='Model Predictions vs. Actual Prices',
    xaxis_title='Date',
    yaxis_title='Close Price'
)
fig.show()

## Summary and Analysis

A time series analysis and forecasting task on stock market data using three different models: ARIMA, Prophet, and LSTM.

**Workflow Followed:**

1.  **Library Imports:** Necessary libraries for data manipulation, modeling, and visualization were imported (pandas, numpy, matplotlib, statsmodels, sklearn, prophet, tensorflow/keras).
2.  **Data Loading and Preparation:**
    *   Initially, synthetic stock price data was generated for demonstration purposes.
    *   The uploaded data was read into a pandas DataFrame, and its shape was verified.
    *   The data was split into training and testing sets to evaluate the models on unseen data.
3.  **Model Implementation and Evaluation:**
    *   **ARIMA Model:** An ARIMA model was implemented, trained on the training data, and used to forecast stock prices on the test set. The Root Mean Squared Error (RMSE) was calculated to evaluate its performance.
    *   **Prophet Model:** A Prophet model was implemented, trained, and used for forecasting. Its RMSE was also calculated. .
    *   **LSTM Model:** An LSTM (Long Short-Term Memory) model, a type of neural network suitable for sequence data, was implemented. The data was scaled and prepared into sequences for the LSTM. The model was trained and used for prediction, and its RMSE was calculated.

It's important to note that the performance of a model is highly dependent on the specific dataset. While LSTM performed best here, ARIMA or Prophet might be more suitable for other time series datasets.


 while LSTM might offer higher accuracy (lower RMSE) by capturing complex patterns, ARIMA and Prophet are generally more computationally efficient and can be faster for training and prediction, making them suitable for scenarios where speed and resource constraints are critical. The choice of model often involves a trade-off between accuracy and efficiency, as well as considering the characteristics of the time series data.

# Using own stocks datasets for analysis

Imports

In [32]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from prophet import Prophet
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import plotly.graph_objects as go

In [22]:
!pip install prophet



Choosing file from local machine

In [23]:
from google.colab import files

uploaded = files.upload()

# Assuming only one file is uploaded, get its name
if uploaded:
  file_name = next(iter(uploaded))
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=file_name, length=len(uploaded[file_name])))

  try:
      stock_df = pd.read_csv(file_name, parse_dates=['Date'], index_col='Date')
  except FileNotFoundError:
      print(f"Error: '{file_name}' not found after upload.")
  except KeyError as e:
      print(f"Error: Missing expected column(s) in the CSV: {e}")
  except Exception as e:
      print(f"An unexpected error occurred while reading the CSV: {e}")
else:
  print("No file was uploaded.")

Saving ASIANPAINT.csv to ASIANPAINT.csv
User uploaded file "ASIANPAINT.csv" with length 622754 bytes


In [24]:
print(stock_df.shape)

test_size = 100
train_data = stock_df[:-test_size]
test_data = stock_df[-test_size:]

(5306, 14)


MODELS

In [25]:

model = ARIMA(train_data['Close'], order=(5, 1, 0))
model_fit = model.fit()
arima_predictions = model_fit.forecast(steps=test_size)
arima_rmse = np.sqrt(mean_squared_error(test_data['Close'], arima_predictions))
print(f"ARIMA Model RMSE: {arima_rmse:.4f}")

prophet_train_df = train_data.reset_index().rename(columns={'Date': 'ds', 'Close': 'y'})
prophet_model = Prophet()
prophet_model.fit(prophet_train_df)
future = prophet_model.make_future_dataframe(periods=test_size, freq='B')
forecast = prophet_model.predict(future)
prophet_predictions = forecast['yhat'][-test_size:]
prophet_rmse = np.sqrt(mean_squared_error(test_data['Close'], prophet_predictions))
print(f"Prophet Model RMSE: {prophet_rmse:.4f}")


scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(stock_df['Close'].values.reshape(-1, 1))

sequence_length = 60
X, y = [], []
for i in range(sequence_length, len(scaled_data)):
    X.append(scaled_data[i-sequence_length:i, 0])
    y.append(scaled_data[i, 0])

X, y = np.array(X), np.array(y)
X = np.reshape(X, (X.shape[0], X.shape[1], 1))

X_train, X_test = X[:-test_size], X[-test_size:]
y_train, y_test = y[:-test_size], y[-test_size:]

lstm_model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], 1)),
    LSTM(50, return_sequences=False),
    Dense(25),
    Dense(1)
])

lstm_model.compile(optimizer='adam', loss='mean_squared_error')
lstm_model.fit(X_train, y_train, batch_size=32, epochs=20, verbose=0)
lstm_predictions_scaled = lstm_model.predict(X_test)
lstm_predictions = scaler.inverse_transform(lstm_predictions_scaled)
lstm_rmse = np.sqrt(mean_squared_error(test_data['Close'], lstm_predictions))

print(f"ARIMA Model RMSE: {arima_rmse:.4f}")
print(f"Prophet Model RMSE: {prophet_rmse:.4f}")
print(f"LSTM Model RMSE: {lstm_rmse:.4f}")

  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpqzr6n5lm/63jzbk2h.json


ARIMA Model RMSE: 253.8014


DEBUG:cmdstanpy:input tempfile: /tmp/tmpqzr6n5lm/uxnr_jf9.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.12/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=49446', 'data', 'file=/tmp/tmpqzr6n5lm/63jzbk2h.json', 'init=/tmp/tmpqzr6n5lm/uxnr_jf9.json', 'output', 'file=/tmp/tmpqzr6n5lm/prophet_model7c005mif/prophet_model-20250930163059.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
16:30:59 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
16:31:07 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Prophet Model RMSE: 644.7489


  super().__init__(**kwargs)


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 322ms/step



[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 117ms/step
ARIMA Model RMSE: 253.8014
Prophet Model RMSE: 644.7489
LSTM Model RMSE: 52.1509


In [37]:

# Comparing RMSE values using a bar chart
models = ['ARIMA', 'Prophet', 'LSTM']
rmse_values = [arima_rmse, prophet_rmse, lstm_rmse]

fig = go.Figure(data=[go.Bar(x=models, y=rmse_values, marker_color=['blue', 'green', 'red'])])
fig.update_layout(
    title='Comparison of Model RMSE Values',
    xaxis_title='Model',
    yaxis_title='RMSE'
)
fig.show()


Ploting the historical stock prices to understand the trend and seasonality.


In [38]:

fig = go.Figure(data=go.Scatter(x=stock_df.index, y=stock_df['Close'], mode='lines'))
fig.update_layout(
    title='Historical Stock Prices',
    xaxis_title='Date',
    yaxis_title='Close Price'
)
fig.show()


Ploting the actual test data and the LSTM predictions to visually assess the model's performance.


In [39]:

fig = go.Figure()
fig.add_trace(go.Scatter(x=test_data.index, y=test_data['Close'], mode='lines', name='Actual Prices'))
fig.add_trace(go.Scatter(x=test_data.index, y=lstm_predictions.flatten(), mode='lines', name='LSTM Predictions'))

fig.update_layout(
    title='LSTM Model Predictions vs. Actual Prices',
    xaxis_title='Date',
    yaxis_title='Close Price'
)
fig.show()


Create a single plot comparing the actual test data with the predictions from all three models for easier comparison.


In [40]:

fig = go.Figure()
fig.add_trace(go.Scatter(x=test_data.index, y=test_data['Close'], mode='lines', name='Actual Prices'))
fig.add_trace(go.Scatter(x=test_data.index, y=arima_predictions, mode='lines', name='ARIMA Predictions'))
fig.add_trace(go.Scatter(x=test_data.index, y=prophet_predictions, mode='lines', name='Prophet Predictions'))
fig.add_trace(go.Scatter(x=test_data.index, y=lstm_predictions.flatten(), mode='lines', name='LSTM Predictions')) # Flatten predictions for Plotly

fig.update_layout(
    title='Model Predictions vs. Actual Prices',
    xaxis_title='Date',
    yaxis_title='Close Price'
)
fig.show()

## Conclusion

 Time series analysis and forecasting for stock market data using three distinct models: ARIMA, Prophet, and LSTM. The process involved loading and preparing stock data, implementing each model, making predictions on a test set, evaluating performance using Root Mean Squared Error (RMSE), and visualizing the results.

**Key Findings:**

*   **Data Characteristics:** The initial visualization of the historical stock prices revealed a clear upward trend, indicating the non-stationary nature of the data, which is a common characteristic of financial time series.
*   **Model Performance (Based on RMSE):** The RMSE values calculated for each model on the test dataset provide a quantitative measure of their forecasting accuracy. Based on the bar chart comparing RMSE values, the LSTM model demonstrated the lowest RMSE, suggesting it provided the most accurate predictions among the three models on this specific dataset and test period.
*   **Visual Assessment of Predictions:** The plots comparing the actual stock prices with the predictions from each model visually reinforce the RMSE findings. The LSTM model's predictions appear to track the actual price movements more closely than those of the ARIMA and Prophet models, particularly in capturing some of the volatility. The ARIMA and Prophet models, while capturing the overall trend, might smooth out some of the finer fluctuations.

*   **Model Suitability and Trade-offs:**
    *   **LSTM:** While showing the best performance in terms of RMSE on this dataset, LSTMs are generally more complex and computationally intensive to train. Their ability to capture non-linear patterns and maintain sequence memory likely contributed to their superior performance on this data.
    *   **ARIMA:** ARIMA is a more traditional time series model that is generally faster and less computationally demanding. It performed reasonably well but was not as accurate as the LSTM on this dataset. ARIMA is well-suited for data with clear linear dependencies and seasonality.
    *   **Prophet:** Prophet is designed to handle time series data with strong seasonality and holiday effects, and it is generally more robust to missing data and outliers than ARIMA. It offered a balance between performance and computational efficiency.

**Overall Conclusion:**

For the stock price data analyzed in this notebook, the LSTM model proved to be the most effective in terms of forecasting accuracy, as indicated by its lower RMSE and visual alignment with actual prices. This suggests that the complex patterns and potential non-linearities in this specific stock data were better captured by the deep learning approach of the LSTM.

However, the choice of the best model depends on various factors beyond just RMSE, including the specific characteristics of the time series data, computational resources available, the need for interpretability, and the importance of training and prediction speed. ARIMA and Prophet remain valuable tools for time series forecasting, especially when computational efficiency is a priority or when the data exhibits clear seasonality and trend without strong non-linear complexities.



**Why LSTM Might Have a Lower RMSE:**

Based on the RMSE values observed in the execution, the LSTM model achieved a lower RMSE compared to ARIMA and Prophet. Several factors can contribute to this:

*   **Handling Non-Linearity:** LSTMs, as neural networks, are capable of capturing complex non-linear relationships and patterns in the data that traditional linear models like ARIMA might miss. Stock price movements can be influenced by many non-linear factors.
*   **Sequence Memory:** LSTMs are specifically designed to handle sequential data and have internal memory mechanisms that allow them to retain information from previous time steps. This can be advantageous in capturing long-term dependencies and patterns in time series data.
*   **Feature Scaling and Data Preparation:** The data preparation steps for LSTM, including scaling and creating sequences, can help the model learn more effectively.
*   **Model Complexity:** LSTMs are generally more complex models than ARIMA and Prophet. With sufficient data and proper tuning, this complexity can allow them to achieve better performance on intricate patterns.

It's important to note that the performance of a model is highly dependent on the specific dataset. While LSTM performed best here, ARIMA or Prophet might be more suitable for other time series datasets.

--------------------------------------------------------------------------------