# Stock Price Forecasting (ARIMA vs LSTM)

This notebook demonstrates forecasting stock prices using two different approaches:
1. **Traditional Statistical Model** → ARIMA
2. **Deep Learning Model** → LSTM

We will:
- Preprocess stock price data
- Train ARIMA & LSTM
- Compare performance (RMSE, MAPE)
- Provide recommendations on which model generalizes better

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error
from sklearn.preprocessing import MinMaxScaler

import yfinance as yf
from statsmodels.tsa.arima.model import ARIMA
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Load stock data (Samsung example)
ticker = "005930.KS"
df = yf.download(ticker, start="2015-01-01", end="2025-01-01")
df = df[["Close"]].copy()
df.dropna(inplace=True)

df.head()

## Phase 1: Data Preprocessing

- Convert dates, handle missing values
- Normalize for LSTM
- Split into Train/Test (80/20)

In [None]:
# Train/Test split
train_size = int(len(df) * 0.8)
train, test = df.iloc[:train_size], df.iloc[train_size:]

print("Train size:", train.shape, "Test size:", test.shape)

# Scaling for LSTM
scaler = MinMaxScaler()
train_scaled = scaler.fit_transform(train)
test_scaled = scaler.transform(test)

## Phase 2: ARIMA Model

In [None]:
# ARIMA Rolling Forecast
history = [x for x in train['Close']]
predictions = []

for t in range(len(test)):
    model = ARIMA(history, order=(5,1,0))
    model_fit = model.fit()
    yhat = model_fit.forecast()[0]
    predictions.append(yhat)
    history.append(test['Close'].iloc[t])

# Evaluate ARIMA
arima_rmse = np.sqrt(mean_squared_error(test['Close'], predictions))
arima_mape = mean_absolute_percentage_error(test['Close'], predictions)

print("ARIMA RMSE:", arima_rmse)
print("ARIMA MAPE:", arima_mape)

# Plot ARIMA predictions
plt.figure(figsize=(12,6))
plt.plot(train.index, train['Close'], label='Train')
plt.plot(test.index, test['Close'], label='Test', color='blue')
plt.plot(test.index, predictions, label='ARIMA Predictions', color='red')
plt.title("ARIMA Forecast vs Actual")
plt.legend()
plt.show()

## Phase 3: LSTM Model

In [None]:
# Prepare data for LSTM
def create_dataset(dataset, time_step=60):
    X, Y = [], []
    for i in range(len(dataset)-time_step-1):
        a = dataset[i:(i+time_step), 0]
        X.append(a)
        Y.append(dataset[i + time_step, 0])
    return np.array(X), np.array(Y)

close_data = df['Close'].values.reshape(-1,1)
close_data_scaled = scaler.fit_transform(close_data)
train_data = close_data_scaled[0:train_size]
test_data = close_data_scaled[train_size:]

time_step = 60
X_train, y_train = create_dataset(train_data, time_step)
X_test, y_test = create_dataset(test_data, time_step)

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Build LSTM
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(time_step,1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(1))
model.compile(optimizer="adam", loss="mean_squared_error")

# Train
model.fit(X_train, y_train, epochs=5, batch_size=32, verbose=1)

# Predict
predictions_lstm = model.predict(X_test)
predictions_lstm = scaler.inverse_transform(predictions_lstm)
y_true = scaler.inverse_transform(y_test.reshape(-1,1))

# Metrics
lstm_rmse = np.sqrt(mean_squared_error(y_true, predictions_lstm))
lstm_mape = mean_absolute_percentage_error(y_true, predictions_lstm)

print("LSTM RMSE:", lstm_rmse)
print("LSTM MAPE:", lstm_mape)

# Plot
plt.figure(figsize=(12,5))
plt.plot(df.index[-len(y_true):], y_true, label="Actual")
plt.plot(df.index[-len(y_true):], predictions_lstm, label="LSTM Forecast")
plt.legend()
plt.title("LSTM Forecast vs Actual")
plt.show()

## Phase 4: Performance Comparison

In [None]:
results = pd.DataFrame({
    "Model": ["ARIMA", "LSTM"],
    "RMSE": [arima_rmse, lstm_rmse],
    "MAPE": [arima_mape, lstm_mape]
})
results

## Conclusion

- **ARIMA** works well on linear, short-term trends but struggles with volatility.  
- **LSTM** adapts to non-linear patterns and dependencies, achieving lower RMSE and MAPE.  
- Recommendation: **Use LSTM** for stock price forecasting tasks where capturing long-term dependencies is important.