# Project 3: Stock Price Prediction

This notebook explores time series forecasting to predict stock prices. We will use two different models:
1. **ARIMA (Autoregressive Integrated Moving Average):** A classical statistical model for time series data.
2. **LSTM (Long Short-Term Memory):** A type of recurrent neural network (RNN) well-suited for sequence prediction problems.

We will fetch historical stock data for Apple Inc. (`AAPL`) and compare the performance of these two models.

## 1. Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.arima.model import ARIMA
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout

sns.set_style('whitegrid')

In [None]:
# Download historical data for a stock (e.g., Apple Inc.)
ticker = 'AAPL'
data = yf.download(ticker, start='2018-01-01', end='2023-12-31')

if data.empty:
    print(f"No data found for ticker {ticker}. Please check the ticker symbol or the date range.")
else:
    print(f"Data for {ticker} downloaded successfully:")
    print(data.head())

## 2. Exploratory Data Analysis (EDA)

In [None]:
# Plot the closing price history
plt.figure(figsize=(14, 7))
plt.plot(data['Close'], label='Close Price')
plt.title(f'{ticker} Stock Price History')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

In [None]:
# Plot closing price with moving averages
data['MA50'] = data['Close'].rolling(50).mean()
data['MA200'] = data['Close'].rolling(200).mean()

plt.figure(figsize=(14, 7))
plt.plot(data['Close'], label='Close Price')
plt.plot(data['MA50'], label='50-Day Moving Average')
plt.plot(data['MA200'], label='200-Day Moving Average')
plt.title(f'{ticker} Price and Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

## 3. Data Preprocessing

In [None]:
# Create a new dataframe with only the 'Close' column
close_data = data.filter(['Close'])
# Convert the dataframe to a numpy array
dataset = close_data.values
# Get the number of rows to train the model on (80% of the data)
training_data_len = int(np.ceil(len(dataset) * .8))

## 4. Modeling with ARIMA

In [None]:
# Split data for ARIMA
train_arima = dataset[:training_data_len]
test_arima = dataset[training_data_len:]

# Fit the ARIMA model
# We use a simple (5,1,0) model as a starting point
history = [x for x in train_arima]
predictions_arima = []

for t in range(len(test_arima)):
    model = ARIMA(history, order=(5,1,0))
    model_fit = model.fit()
    output = model_fit.forecast()
    yhat = output[0]
    predictions_arima.append(yhat)
    obs = test_arima[t]
    history.append(obs)

# Evaluate the model
rmse_arima = np.sqrt(mean_squared_error(test_arima, predictions_arima))
print(f'ARIMA Model RMSE: {rmse_arima:.2f}')

## 5. Modeling with LSTM

### 5.1. Preprocessing for LSTM

In [None]:
# Scale the data
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

# Create the training data set
train_data = scaled_data[0:int(training_data_len), :]
# Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])

# Convert the x_train and y_train to numpy arrays 
x_train, y_train = np.array(x_train), np.array(y_train)

# Reshape the data for the LSTM model
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

### 5.2. Building and Training the LSTM Model

In [None]:
# Build the LSTM model
model_lstm = Sequential()
model_lstm.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model_lstm.add(LSTM(50, return_sequences=False))
model_lstm.add(Dense(25))
model_lstm.add(Dense(1))

# Compile the model
model_lstm.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model_lstm.fit(x_train, y_train, batch_size=1, epochs=1)

### 5.3. Making Predictions with LSTM

In [None]:
# Create the testing data set
test_data = scaled_data[training_data_len - 60:, :]
# Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])

# Convert the data to a numpy array
x_test = np.array(x_test)

# Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))

# Get the models predicted price values 
predictions_lstm = model_lstm.predict(x_test)
predictions_lstm = scaler.inverse_transform(predictions_lstm)

# Get the root mean squared error (RMSE)
rmse_lstm = np.sqrt(np.mean(((predictions_lstm - y_test) ** 2)))
print(f'LSTM Model RMSE: {rmse_lstm:.2f}')

## 6. Visualization and Conclusion

In [None]:
# Plot the data
train = close_data[:training_data_len]
valid = close_data[training_data_len:]
valid['ARIMA'] = predictions_arima
valid['LSTM'] = predictions_lstm

# Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model Comparison')
plt.xlabel('Date')
plt.ylabel('Close Price USD ($)')
plt.plot(train['Close'])
plt.plot(valid[['Close', 'ARIMA', 'LSTM']])
plt.legend(['Train', 'Actual', 'ARIMA', 'LSTM'], loc='lower right')
plt.show()

### Conclusion

In this notebook, we built and compared an ARIMA and an LSTM model for stock price prediction.
- **ARIMA RMSE:** The ARIMA model provides a solid baseline but struggles to capture complex, non-linear patterns.
- **LSTM RMSE:** The LSTM model typically performs better due to its ability to learn long-term dependencies in the data.

**Important Note:** Stock market prediction is extremely difficult. These models are for educational purposes and should not be used for actual trading without extensive further development, validation, and risk management.