<a href="https://www.kaggle.com/code/babaksh/googlestocklstm-tensorflow?scriptVersionId=227949386" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

This notebook aims to predict Google stock prices using Long Short-Term Memory (LSTM) neural networks. It explores multiple configurations by incorporating Simple Moving Averages (SMAs) of different window sizes as features to enhance predictive performance. The analysis compares:

- A baseline model using raw price data (no SMAs).
- Models with individual SMAs (5, 7, 9, 11, 13, 15, 17 days).
- A model with all SMAs combined.
- Combinations of the best-performing SMAs.

The goal is to assess how these configurations impact prediction accuracy and forecast stock prices for the next 10 days.

# Required Libraries

The following libraries are essential for this project:

- **Keras**: Builds and trains LSTM neural network models.
- **NumPy**: Handles numerical computations.
- **Pandas**: Manages data manipulation and preprocessing.
- **yfinance**: Typically fetches stock data (though here, data is loaded from a CSV).
- **datetime**: Processes date and time data.
- **Matplotlib**: Creates static visualizations.
- **Plotly**: Generates interactive charts.
- **Scikit-learn**: Provides tools for scaling data (`MinMaxScaler`) and calculating evaluation metrics.

These libraries enable data handling, model development, and result visualization.

In [None]:
import keras
import numpy as np
import pandas as pd
import yfinance as yf
from datetime import datetime
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from IPython.display import HTML, display
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, mean_absolute_percentage_error

# Configurations

Key hyperparameters and settings guide the modeling process:

- **EPOCH**: 700 training iterations.
- **RATIO**: 80% training data, 20% testing data (0.8).
- **WINDOW_SIZE**: Default SMA window (21 days), used in Bollinger Bands.
- **NUM_STD**: 4 standard deviations for Bollinger Bands.
- **BATCH_SIZE**: 128 samples per gradient update.
- **LSTM_UNIT**: 70 units in the first LSTM layer.
- **DROPOUT_RATE**: 30% dropout to prevent overfitting.
- **FUTURE_DAYS**: 10 days for future predictions.
- **SEQ_LENGTH**: 45-days input sequences for LSTM.
- **target_col**: 'Close' (target variable).
- **LEARNING_RATE**: 0.001 Learning rate in LSTM model.
- **L2_REGULARIZER**: 0.0001 L2 regularization in LSTM model.

These settings ensure consistency across data preparation, model architecture, and training.

In [None]:
# Configs
EPOCH = 700
RATIO = 0.8
WINDOW_SIZE = 21
NUM_STD = 4
BATCH_SIZE = 128
LSTM_UNIT = 70
DROPOUT_RATE = 0.3
FUTURE_DAYS = 10
SEQ_LENGTH = 45  # Use 45 days to predict the next day
target_col = 'Close'
LEARNING_RATE = 0.001
L2_REGULARIZER = 0.0001
PATIENCE = 50
today = datetime.today()

# Data Preprocessing and Visualization

## Data Preprocessing

This section prepares the Google stock dataset:

- **Loading Data**: Data is loaded from `Google_2025.csv`.
- **Future Data**: The last 10 rows are reserved for future prediction evaluation.
- **Date Handling**: 'Date' column is converted to datetime format.
- **Calculating SMAs**: SMAs are computed for 5, 7, 9, 11, 13, 15, and 17 days.
- **Missing Values**: Initial SMA NaNs are filled with corresponding closing prices.
- **Column Removal**: Drops 'Dividends', 'Stock Splits', and 'Volume'.

In [None]:
# Load data
df = pd.read_csv('/kaggle/input/googlestock/Google_2025-03-22.csv')
last_n_days_data = df.iloc[-FUTURE_DAYS:]  # Only the last FUTURE_DAYS rows
df = df.iloc[:-FUTURE_DAYS]  # All rows except the last FUTURE_DAYS

# Data cleaning
# Convert the datetime column to datetime object
df['Date'] = pd.to_datetime(df['Date'], utc=True)
last_n_days_data['Date'] = pd.to_datetime(last_n_days_data['Date'], utc=True)

# Extract the date part
df['Date'] = df['Date'].dt.date
last_n_days_data['Date'] = last_n_days_data['Date'].dt.date

df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
last_n_days_data['Date'] = pd.to_datetime(last_n_days_data['Date'], format='%Y-%m-%d')

df["5d_sma"] = df["Close"].rolling(5).mean()
df["7d_sma"] = df["Close"].rolling(7).mean()
df["9d_sma"] = df["Close"].rolling(9).mean()
df["11d_sma"] = df["Close"].rolling(11).mean()
df["13d_sma"] = df["Close"].rolling(13).mean()
df["15d_sma"] = df["Close"].rolling(15).mean()
df["17d_sma"] = df["Close"].rolling(17).mean()

# Check missing values
print(f"Missing values: {df.isnull().sum().sum()}")

In [None]:
df['5d_sma'] = df['5d_sma'].fillna(df['Close'])
df['7d_sma'] = df['7d_sma'].fillna(df['Close'])
df['9d_sma'] = df['9d_sma'].fillna(df['Close'])
df['11d_sma'] = df['11d_sma'].fillna(df['Close'])
df['13d_sma'] = df['13d_sma'].fillna(df['Close'])
df['15d_sma'] = df['15d_sma'].fillna(df['Close'])
df['17d_sma'] = df['17d_sma'].fillna(df['Close'])

# Check missing values after refinements
print(f"Missing values: {df.isnull().sum().sum()}")

In [None]:
plot_df = df[df['Date'] >= '2025-01-01']
plot_january = df[(df['Date'] >= '2025-01-01') & (df['Date'] <= '2025-01-31')]
plot_last_30_days = df.iloc[-30:]

df = df.drop('Dividends', axis=1)  # Remove constant column
df = df.drop('Stock Splits', axis=1)  # Remove constant column
df = df.drop('Volume', axis=1)  # Remove redundant column

last_n_days_data = last_n_days_data.drop('Dividends', axis=1)  # Remove constant column
last_n_days_data = last_n_days_data.drop('Stock Splits', axis=1)  # Remove constant column
last_n_days_data = last_n_days_data.drop('Volume', axis=1)  # Remove redundant column

In [None]:
print(df.tail(15))

## Data Visualization

Visualizations explore stock price trends:

- **Historical Closing Prices**: Line plot of closing prices over time.
- **2025 Prices**: Plot of open, high, low, and close prices for 2025.
- **January 2025 Prices**: Focused view of January 2025.
- **Last 30 Days**: Recent price trends.
- **Candlestick Chart with SMAs**: Interactive Plotly chart of 2025 data with candlesticks and SMA overlays (5D to 17D).

These steps produce cleaned data and insightful visualizations.

In [None]:
def plot_historical_data(dataframe, title):
    plt.figure(figsize=(12, 6))
    plt.plot(dataframe['Date'], dataframe['Close'], label='Closing Price', color='blue')
    plt.plot(dataframe['Date'], dataframe['High'], label='High Price', color='green')
    plt.plot(dataframe['Date'], dataframe['Low'], label='Low Price', color='red')
    plt.plot(dataframe['Date'], dataframe['Open'], label='Opening Price', color='black')
    plt.title(title)
    plt.xlabel('Date')
    plt.ylabel('Price (USD)')
    plt.legend()
    plt.show()

In [None]:
# Visualize closing price
plt.figure(figsize=(12, 6))
plt.plot(df['Close'])
plt.title('Google Stock Closing Price History')
plt.xlabel('Date')
plt.ylabel('Closing Price (USD)')
plt.show()

In [None]:
# Visualize 2025 price
plot_historical_data(plot_df, 'Google Stock High Price History - 2025')

In [None]:
# Visualize Jan 2025 price
plot_historical_data(plot_january, 'Google Stock High Price History - January 2025')

In [None]:
# Visualize Last 30 days price
plot_historical_data(plot_last_30_days, 'Google Stock High Price History - The last 30 days')

In [None]:
# Create the candlestick chart
fig = go.Figure(data=[go.Candlestick(
    x=plot_df['Date'],  # Date on the x-axis
    open=plot_df['Open'],  # Open prices
    high=plot_df['High'],  # High prices
    low=plot_df['Low'],  # Low prices
    close=plot_df['Close'],  # Close prices
    name='Google Stock Market in 2025'
)])

# Customize the layout
fig.update_layout(
    title='Google Stock Price History - 2025',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    xaxis_rangeslider_visible=False
)
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['Close'], line_color='green', name='Close', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['5d_sma'], line_color='yellow', name='5D-SMA', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['7d_sma'], line_color='aqua', name='7D-SMA', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['9d_sma'], line_color='red', name='9D-SMA', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['11d_sma'], line_color='cyan', name='11D-SMA', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['13d_sma'], line_color='darkgreen', name='5D-SMA', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['15d_sma'], line_color='darkblue', name='7D-SMA', mode='lines'))
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['17d_sma'], line_color='darkred', name='9D-SMA', mode='lines'))

# Show the plot
display(HTML(fig.to_html(include_plotlyjs=True)))

# Simple Data without SMA

This section creates a baseline LSTM model using raw price data:

- **Sequence Creation**: Generates 60-day sequences (X) and targets (y).
- **Feature Selection**:
  - Features: 'Open', 'High', 'Low', 'Close'.
  - Normalized using `MinMaxScaler` (0 to 1).
  - Split into 80% training, 20% testing sets.

The baseline excludes SMAs to evaluate performance without trend indicators.

In [None]:
# Create sequences
def create_sequences(data, seq_length, features):
    X, y = [], []
    for i in range(seq_length, len(data)):
        X.append(data[i-seq_length:i])
        y.append(data[i, features.index(target_col)])
    return np.array(X), np.array(y)

In [None]:
# Feature selection and normalization
def feature_selection(dataframe, features):
    scaler = MinMaxScaler(feature_range=(0,1))
    scaled_data = scaler.fit_transform(dataframe[features])
    
    X, y = create_sequences(scaled_data, SEQ_LENGTH, features)
    
    # Train-Test split
    train_size = int(RATIO * len(X))
    X_train, X_test = X[:train_size], X[train_size:]
    y_train, y_test = y[:train_size], y[train_size:]

    print(X_train.shape, X_test.shape)
    print(y_train.shape, y_test.shape)
    
    return X_train, X_test, y_train, y_test, scaler

In [None]:
# Feature selection and normalization
features_simple = ['Open', 'High', 'Low', 'Close']
X_train_simple, X_test_simple, y_train_simple, y_test_simple, scaler_simple = feature_selection(df, features_simple)

# Bollinger Bands

## Overview

Bollinger Bands indicate volatility using an SMA and standard deviations:

- **Purpose**: Identify overbought/oversold conditions.
- **Calculation**:
  - SMA (rolling mean) over 15 days.
  - Rolling standard deviation over 15 days.
  - Upper Band = SMA + (4 × Std Dev).
  - Lower Band = SMA - (4 × Std Dev).

In [None]:
def calculate_bollinger_bands(dataframe, nd_sma, model_name):
    # Calculate rolling mean and standard deviation 5d
    rolling_mean = np.convolve(dataframe[nd_sma], np.ones(WINDOW_SIZE)/WINDOW_SIZE, mode='valid')
    rolling_std = np.std([dataframe[nd_sma][i:i+WINDOW_SIZE] for i in range(len(dataframe[nd_sma])-WINDOW_SIZE+1)], axis=1)
     
    # Calculate Bollinger Bands 5d
    upper_band = rolling_mean + NUM_STD * rolling_std
    lower_band = rolling_mean - NUM_STD * rolling_std
    
    # plot_bollingerBands(rolling_mean, upper_band, lower_band, nd_sma, model_name)
    plt.figure(figsize=(12,6))
    plt.plot(dataframe['Close'], label='Stock Price')
    plt.plot(rolling_mean, label=f'{model_name}', color='black')
    plt.plot(upper_band, label='Upper Bollinger Band', color='green')
    plt.plot(lower_band, label='Lower Bollinger Band', color='red')
    plt.fill_between(np.arange(WINDOW_SIZE-1, len(dataframe[f'{nd_sma}'])), lower_band, upper_band, color='blue', alpha=0.2)
    plt.title(f'Bollinger Bands ({model_name})')
    plt.xlabel('Days')
    plt.ylabel('Price')
    plt.legend()
    plt.show()

## Implementation

For each SMA (5D to 17D):
- **Data**: Includes 'Open', 'High', 'Low', 'Close', and the SMA.
- **Visualization**: Plots price, SMA, and bands.

This provides volatility insights for each SMA configuration.

## 5,7,9,11,13,15,17 SMA

This section prepares data with all SMAs combined:

- **Features**: 'Open', 'High', 'Low', 'Close', plus SMAs (5D to 17D).
- **Process**: Normalizes data, creates sequences, splits into training/testing sets.
- **Purpose**: Tests if multiple SMAs improve predictions.

Data is sorted by date and indexed for time series consistency.

### 5D SMA

In [None]:
# Feature selection and normalization
features_5d = ['Open', 'High', 'Low', 'Close', '5d_sma']
X_train_5d, X_test_5d, y_train_5d, y_test_5d, scaler_5d = feature_selection(df, features_5d)

# Calculate and Visualize Bollinger Bands 5d price
calculate_bollinger_bands(df, '5d_sma', '5D SMA')

### 7D SMA

In [None]:
# Feature selection and normalization
features_7d = ['Open', 'High', 'Low', 'Close', '7d_sma']
X_train_7d, X_test_7d, y_train_7d, y_test_7d, scaler_7d = feature_selection(df, features_7d)

# Calculate and Visualize Bollinger Bands 7d price
calculate_bollinger_bands(df, '7d_sma', '7D SMA')

### 9D SMA

In [None]:
# Feature selection and normalization
features_9d = ['Open', 'High', 'Low', 'Close', '9d_sma']
X_train_9d, X_test_9d, y_train_9d, y_test_9d, scaler_9d = feature_selection(df, features_9d)

# Calculate and Visualize Bollinger Bands 9d price
calculate_bollinger_bands(df, '9d_sma', '9D SMA')

### 11D SMA

In [None]:
# Feature selection and normalization
features_11d = ['Open', 'High', 'Low', 'Close', '11d_sma']
X_train_11d, X_test_11d, y_train_11d, y_test_11d, scaler_11d = feature_selection(df, features_11d)

# Calculate and Visualize Bollinger Bands 11d price
calculate_bollinger_bands(df, '11d_sma', '11D SMA')

### 13D SMA

In [None]:
# Feature selection and normalization
features_13d = ['Open', 'High', 'Low', 'Close', '13d_sma']
X_train_13d, X_test_13d, y_train_13d, y_test_13d, scaler_13d = feature_selection(df, features_13d)

# Calculate and Visualize Bollinger Bands 13d price
calculate_bollinger_bands(df, '13d_sma', '13D SMA')

### 15D SMA

In [None]:
# Feature selection and normalization
features_15d = ['Open', 'High', 'Low', 'Close', '15d_sma']
X_train_15d, X_test_15d, y_train_15d, y_test_15d, scaler_15d = feature_selection(df, features_15d)

# Calculate and Visualize Bollinger Bands 15d price
calculate_bollinger_bands(df, '15d_sma', '15D SMA')

### 17D SMA

In [None]:
# Feature selection and normalization
features_17d = ['Open', 'High', 'Low', 'Close', '17d_sma']
X_train_17d, X_test_17d, y_train_17d, y_test_17d, scaler_17d = feature_selection(df, features_17d)

# Calculate and Visualize Bollinger Bands 17d price
calculate_bollinger_bands(df, '17d_sma', '17D SMA')

### 5-17D SMA

In [None]:
# Feature selection and normalization
features_all = ['Open', 'High', 'Low', 'Close', '5d_sma', '7d_sma', '9d_sma', '11d_sma', '13d_sma', '15d_sma', '17d_sma']
X_train_all, X_test_all, y_train_all, y_test_all, scaler_all = feature_selection(df, features_all)

In [None]:
df = df.sort_values('Date').set_index('Date')

# Model Architecture

The LSTM model is defined as:

- **First LSTM Layer**: LSTM_UNIT units in configuration, returns sequences, L2 kernel and recurrent regularizer in configuration.
- **Dropout**: DROPOUT_RATE in configuration.
- **Second LSTM Layer**: LSTM_UNIT units, no sequences returned, L2 kernel and recurrent regularizer in configuration.
- **Dropout**: DROPOUT_RATE in configuration.
- **Dense Layer**: LSTM_UNIT/2 unit, L2 kernel regularizer in configuration.
- **Dense Layer**: 1 unit, L2 kernel regularizer in configuration (output).

This structure captures temporal patterns while reducing overfitting.

In [None]:
def create_model(X_train_nd):
    lstm_unit_1 = LSTM_UNIT
    lstm_unit_2 = int(LSTM_UNIT/2)
    model = Sequential([
        LSTM(units=lstm_unit_1,
             return_sequences=False,
             input_shape=(X_train_nd.shape[1],
                          X_train_nd.shape[2]),
             kernel_regularizer=l2(L2_REGULARIZER),
             recurrent_regularizer=l2(L2_REGULARIZER)),
        Dropout(DROPOUT_RATE),
        
        # Dense(units=lstm_unit_2, kernel_regularizer=l2(L2_REGULARIZER)),
        Dense(units=1, kernel_regularizer=l2(L2_REGULARIZER))
    ])
    return model

In [None]:
model_simple = create_model(X_train_simple)
model_simple.summary()

In [None]:
model_5d = create_model(X_train_5d)
model_7d = create_model(X_train_7d)
model_9d = create_model(X_train_9d)
model_11d = create_model(X_train_11d)
model_13d = create_model(X_train_13d)
model_15d = create_model(X_train_15d)
model_17d = create_model(X_train_17d)

In [None]:
model_all = create_model(X_train_all)
model_all.summary()

# Training the Models

Models are trained for each configuration:

- **Compilation**: Adam optimizer, MSE loss.
- **Training**:
  - Batch size: 128.
  - Epochs: 500.
  - Validation: 20% of data.
- **Configurations**:
  - Simple data.
  - Individual SMAs (5D to 17D).
  - All SMAs combined.
- **Visualization**: Plots training/validation loss.

This produces trained models for evaluation.

In [None]:
def train_model(model, X_train, y_train, X_val, y_val, model_name):
    # Compile the model
    model.compile(optimizer=Adam(learning_rate=LEARNING_RATE), loss='mean_squared_error')
    
    # Define callbacks
    early_stopping = EarlyStopping(monitor='val_loss', patience=PATIENCE, restore_best_weights=True)
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=PATIENCE)
    model_checkpoint = ModelCheckpoint(f'{model_name}.keras', monitor='val_loss', save_best_only=True)
    
    history = model.fit(
        X_train, y_train,
        batch_size=BATCH_SIZE,
        epochs=EPOCH,
        validation_data=(X_val, y_val),
        # callbacks=[early_stopping, reduce_lr, model_checkpoint],
        verbose=0
    )
    
    return history

In [None]:
# Plot training loss
def plot_model_data(history, model_name):
    plt.figure(figsize=(12, 6))
    plt.plot(history.history['loss'], label='Train')
    plt.plot(history.history['val_loss'], label='Test')
    plt.title(f'Model Loss - {model_name}')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend()
    plt.show()

In [None]:
history_simple = train_model(model_simple, X_train_simple, y_train_simple, X_test_simple, y_test_simple, 'no_sma')
plot_model_data(history_simple, 'Without SMA')

In [None]:
history_5d = train_model(model_5d, X_train_5d, y_train_5d, X_test_5d, y_test_5d, '5d_sma')
plot_model_data(history_5d, '5D SMA')

In [None]:
history_7d = train_model(model_7d, X_train_7d, y_train_7d, X_test_7d, y_test_7d, '7d_sma')
plot_model_data(history_7d, '7D SMA')

In [None]:
history_9d = train_model(model_9d, X_train_9d, y_train_9d, X_test_9d, y_test_9d, '9d_sma')
plot_model_data(history_9d, '9D SMA')

In [None]:
history_11d = train_model(model_11d, X_train_11d, y_train_11d, X_test_11d, y_test_11d, '11d_sma')
plot_model_data(history_11d, '11D SMA')

In [None]:
history_13d = train_model(model_13d, X_train_13d, y_train_13d, X_test_13d, y_test_13d, '13d_sma')
plot_model_data(history_13d, '13D SMA')

In [None]:
history_15d = train_model(model_15d, X_train_15d, y_train_15d, X_test_15d, y_test_15d, '15d_sma')
plot_model_data(history_15d, '15D SMA')

In [None]:
history_17d = train_model(model_17d, X_train_17d, y_train_17d, X_test_17d, y_test_17d, '17d_sma')
plot_model_data(history_17d, '17D SMA')

In [None]:
history_all = train_model(model_all, X_train_all, y_train_all, X_test_all, y_test_all, 'all_sma')
plot_model_data(history_all, 'ALL SMAs')

# Model Evaluation

## Process

Models are evaluated on the test set:

- **Prediction**: Scaled predictions are inverse-transformed.
- **Metrics**:
  - MAE (Mean Absolute Error).
  - MSE (Mean Squared Error).
  - RMSE (Root Mean Squared Error).
  - R² (Goodness of fit).
  - MAPE (Mean Absolute Percentage Error).
  - Directional Accuracy (% correct direction).
- **Visualization**: True vs. predicted prices.

In [None]:
the_best_models_list = []

In [None]:
# Create inverse transformation helper
def inverse_transform_prediction(scaler, scaled_prediction, features):
    temp = np.zeros((len(scaled_prediction), len(features)))
    temp[:, features.index(target_col)] = scaled_prediction
    return scaler.inverse_transform(temp)[:, features.index(target_col)]

In [None]:
def evaluate_model(model, features, scaler, X_test, y_test, model_name, model_id=None):
    # Predict on test set
    test_predictions = model.predict(X_test).flatten()
    
    # Inverse transform
    y_test_true = inverse_transform_prediction(scaler, y_test, features)
    y_test_pred = inverse_transform_prediction(scaler, test_predictions, features)
    
    # Calculate metrics
    mae = mean_absolute_error(y_test_true, y_test_pred)
    mse = mean_squared_error(y_test_true, y_test_pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test_true, y_test_pred)
    mape = mean_absolute_percentage_error(y_test_true, y_test_pred)
    
    # Directional accuracy
    direction_true = np.diff(y_test_true) > 0
    direction_pred = np.diff(y_test_pred) > 0
    directional_acc = np.mean(direction_true == direction_pred)
    
    print(f"{model_name} MAE: {mae:.2f}")
    print(f"{model_name} MSE: {mse:.2f}")
    print(f"{model_name} RMSE: {rmse:.2f}")
    print(f"{model_name} R² Score: {r2:.4f}")
    print(f"{model_name} Directional Accuracy: {directional_acc*100:.2f}%")
    print(f"{model_name} Mean Absolute Percentage Error: {mape*100:.2f}%")
    
    if mape*100 < 2.5 and 'd_' in model_id:
        the_best_models_list.append(model_id)
    
    # %%
    # Visual comparison
    plt.figure(figsize=(12,6))
    plt.plot(y_test_true, label='True Price')
    plt.plot(y_test_pred, label='Predicted Price')
    plt.title(f'Test Set Predictions vs Actuals - {model_name}')
    plt.xlabel('Time Steps')
    plt.ylabel('Price (USD)')
    plt.legend()
    plt.show()

## Models

- Simple data.
- Individual SMAs (5D to 17D).
- All SMAs.

This assesses predictive accuracy comprehensively.

In [None]:
evaluate_model(model_simple, features_simple, scaler_simple, X_test_simple, y_test_simple, 'Without SMA', 'no_sma')

In [None]:
evaluate_model(model_5d, features_5d, scaler_5d, X_test_5d, y_test_5d, '5D', '5d_sma')

In [None]:
evaluate_model(model_7d, features_7d, scaler_7d, X_test_7d, y_test_7d, '7D', '7d_sma')

In [None]:
evaluate_model(model_9d, features_9d, scaler_9d, X_test_9d, y_test_9d, '9D', '9d_sma')

In [None]:
evaluate_model(model_11d, features_11d, scaler_11d, X_test_11d, y_test_11d, '11D', '11d_sma')

In [None]:
evaluate_model(model_13d, features_13d, scaler_13d, X_test_13d, y_test_13d, '13D', '13d_sma')

In [None]:
evaluate_model(model_15d, features_15d, scaler_15d, X_test_15d, y_test_15d, '15D', '15d_sma')

In [None]:
evaluate_model(model_17d, features_17d, scaler_17d, X_test_17d, y_test_17d, '17D', '17d_sma')

In [None]:
evaluate_model(model_all, features_all, scaler_all, X_test_all, y_test_all, 'ALL', 'all_sma')

# Future Prediction

## Process

The `predict_future_days` function forecasts 10 days:

- **Method**: Iteratively predicts using the last test sequence.
- **Output**: Inverse-transformed predictions with dates.
- **Evaluation**: Compares to actual data (MSE, MAE, RMSE, R², MAPE).
- **Visualization**: Plots last 200 days with predictions.

In [None]:
the_best_future_models_list = []

In [None]:
def predict_future_days(dataframe, model, features, scaler, X_test, model_name, model_id=None):
    # Predict next days
    predictions = []
    last_sequence = X_test[-1].copy()
    
    for _ in range(FUTURE_DAYS):
        current_pred = model.predict(last_sequence.reshape(1, SEQ_LENGTH, len(features)))[0,0]
        new_row = last_sequence[-1].copy()
        new_row[features.index(target_col)] = current_pred
        last_sequence = np.vstack([last_sequence[1:], new_row])
        predictions.append(current_pred)
    
    # Inverse transform
    temp_array = np.zeros((len(predictions), len(features)))
    temp_array[:, features.index(target_col)] = predictions
    predicted_prices = scaler.inverse_transform(temp_array)[:, features.index(target_col)]
    
    # Generate dates
    last_date = dataframe.index[-1]
    prediction_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=FUTURE_DAYS)
    
    # Create DataFrame
    predictions_df = pd.DataFrame({
        'Date': last_n_days_data.Date,
        'Actual Price': last_n_days_data.Close,
        'Predicted Price': predicted_prices
    })
    
    # Print numerical predictions
    print("\nGoogle Stock Price Predictions for Next FUTURE_DAYS Days:")
    print(predictions_df.round(2).to_string(index=False))
    
    mse = mean_squared_error(last_n_days_data.Close, predicted_prices)
    mae = mean_absolute_error(last_n_days_data.Close, predicted_prices)
    rmse = np.sqrt(mse)
    r2 = r2_score(last_n_days_data.Close, predicted_prices)
    mape = mean_absolute_percentage_error(last_n_days_data.Close, predicted_prices)
    
    print(f'{model_name} MSE based on last {FUTURE_DAYS} days prediction: {mse}')
    print(f'{model_name} MAE based on last {FUTURE_DAYS} days prediction: {mae}')
    print(f'{model_name} RMSE based on last {FUTURE_DAYS} days prediction: {rmse}')
    print(f'{model_name} R^2 based on last {FUTURE_DAYS} days prediction: {r2}')
    print(f'{model_name} MAPE based on last {FUTURE_DAYS} days prediction: {mape*100:.4f}%')

    if mape*100 < 1.5 and 'd_' in model_id:
        the_best_future_models_list.append(model_id)
    
    df1_subset = dataframe[['Close']] 
    df2_subset = predictions_df[['Date', 'Actual Price']] 
    df2_renamed = df2_subset[['Date', 'Actual Price']].rename(columns={'Actual Price': 'Close'})
    df2 = df2_renamed.set_index('Date')
    result = pd.concat([df1_subset, df2], axis=0)
    
    # Plot predictions
    plt.figure(figsize=(12,6))
    plt.plot(result[-50:], 'b-', label='Historical')
    plt.plot(last_n_days_data['Date'], predicted_prices, 'r-', label='Predicted')
    plt.title(f'Google Stock Price Prediction - Last 50 days - {model_name}')
    plt.xlabel('Date')
    plt.ylabel('Price (USD)')
    plt.legend()
    plt.show()

## Models

Predictions for:
- Simple data.
- Individual SMAs.
- All SMAs.

This tests forecasting ability.

In [None]:
predict_future_days(df, model_simple, features_simple, scaler_simple, X_test_simple, 'Without SMA', 'no_sma')

In [None]:
predict_future_days(df, model_5d, features_5d, scaler_5d, X_test_5d, '5D', '5d_sma')

In [None]:
predict_future_days(df, model_7d, features_7d, scaler_7d, X_test_7d, '7D', '7d_sma')

In [None]:
predict_future_days(df, model_9d, features_9d, scaler_9d, X_test_9d, '9D', '9d_sma')

In [None]:
predict_future_days(df, model_11d, features_11d, scaler_11d, X_test_11d, '11D', '11d_sma')

In [None]:
predict_future_days(df, model_13d, features_13d, scaler_13d, X_test_13d, '13D', '13d_sma')

In [None]:
predict_future_days(df, model_15d, features_15d, scaler_15d, X_test_15d, '15D', '15d_sma')

In [None]:
predict_future_days(df, model_17d, features_17d, scaler_17d, X_test_17d, '17D', '17d_sma')

In [None]:
predict_future_days(df, model_all, features_all, scaler_all, X_test_all, 'ALL', 'all_sma')

## Combining Models (Best SMAs)

Models with MAPE < 3.0 are selected automatically.

- **Features**: 'Open', 'High', 'Low', 'Close', ...
- **Process**: Trains, evaluates, predicts.
- **Visualization**: Candlestick chart with best SMAs.

In [None]:
print(f'The selected models: {the_best_models_list}')

In [None]:
# Feature selection and normalization
features_best = ['Open', 'High', 'Low', 'Close'] + the_best_models_list
X_train_best, X_test_best, y_train_best, y_test_best, scaler_best = feature_selection(df, features_best)

In [None]:
model_best = create_model(X_train_best)
model_best.summary()

In [None]:
history_best = train_model(model_best, X_train_best, y_train_best, X_test_best, y_test_best, 'best_sma')
plot_model_data(history_best, 'The Best SMAs')

In [None]:
evaluate_model(model_best, features_best, scaler_best, X_test_best, y_test_best, 'The Best SMAs', 'best_sma')

In [None]:
predict_future_days(df, model_best, features_best, scaler_best, X_test_best, 'The Best SMAs', 'best_sma')

In [None]:
# Create the candlestick chart
fig = go.Figure(data=[go.Candlestick(
    x=plot_df['Date'],  # Date on the x-axis
    open=plot_df['Open'],  # Open prices
    high=plot_df['High'],  # High prices
    low=plot_df['Low'],  # Low prices
    close=plot_df['Close'],  # Close prices
    name='Google Stock Market in 2025'
)])

# Customize the layout
fig.update_layout(
    title='Google Stock Price History - 2025 - The Best SMA Models',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    xaxis_rangeslider_visible=False
)

colors = ['blue', 'orange', 'red', 'purple', 'brown', 'pink', 'gray']
fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['Close'], line_color='green', name='Close', mode='lines'))

for i, item in enumerate(the_best_models_list):
    fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df[item], line_color=colors[i], name='-'.join(item.upper().split('_')), mode='lines'))


# Show the plot
display(HTML(fig.to_html(include_plotlyjs=True)))

## The Best Models based on Future Predictions

Models Future Predictions with MAPE < 2.0 are selected automatically.
- **Features**: 'Open', 'High', 'Low', 'Close', ...
- **Process**: Trains, evaluates, predicts.
- **Visualization**: Candlestick chart with the best SMAs based on future predictions.

These combinations aim to enhance prediction accuracy.

In [None]:
print(f'The selected models based on predictions: {the_best_future_models_list}')

In [None]:
# Feature selection and normalization
features_future_best = ['Open', 'High', 'Low', 'Close'] + the_best_future_models_list
X_train_future_best, X_test_future_best, y_train_future_best, y_test_future_best, scaler_future_best = feature_selection(df, features_future_best)

In [None]:
model_future_best = create_model(X_train_future_best)
model_future_best.summary()

In [None]:
history_future_best = train_model(model_future_best, X_train_future_best, y_train_future_best, X_test_future_best, y_test_future_best, 'best_future_sma')
plot_model_data(history_best, 'The Best Future SMAs')

In [None]:
evaluate_model(model_future_best, features_future_best, scaler_future_best, X_test_future_best, y_test_future_best, 'The Best Future SMAs', 'best_future_sma')

In [None]:
predict_future_days(df, model_future_best, features_future_best, scaler_future_best, X_test_future_best, 'The Best Future SMAs', 'best_future_sma')

In [None]:
# Create the candlestick chart
fig = go.Figure(data=[go.Candlestick(
    x=plot_df['Date'],  # Date on the x-axis
    open=plot_df['Open'],  # Open prices
    high=plot_df['High'],  # High prices
    low=plot_df['Low'],  # Low prices
    close=plot_df['Close'],  # Close prices
    name='Google Stock Market in 2025'
)])

# Customize the layout
fig.update_layout(
    title='Google Stock Price History - 2025 - The Best Future SMA Models',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    xaxis_rangeslider_visible=False
)

fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df['Close'], line_color='green', name='Close', mode='lines'))

for i, item in enumerate(the_best_future_models_list):
    fig.add_trace(go.Scatter(x=plot_df['Date'], y=plot_df[item], line_color=colors[i], name='-'.join(item.upper().split('_')), mode='lines'))


# Show the plot
display(HTML(fig.to_html(include_plotlyjs=True)))

## Summary

This notebook explores the use of Long Short-Term Memory (LSTM) neural networks to predict Google stock prices, focusing on the impact of different Simple Moving Average (SMA) configurations. The analysis includes several models:

- **Baseline Model**: An LSTM model without SMAs.
- **Individual SMA Models**: Models incorporating SMAs with window sizes of 5, 7, 9, 11, 13, 15, and 17 days.
- **Combined SMA Model**: A model using all SMA window sizes together.
- **Top-Performing SMA Model**: Two models combining the best-performing SMAs from the individual tests.

The models are evaluated using key metrics:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- R² Score
- Mean Absolute Percentage Error (MAPE)
- Directional Accuracy

### Visualizations
Interactive candlestick charts overlaid with SMA trends are included to highlight price movements and model performance, offering clear visual insights into the effectiveness of the SMA-enhanced predictions.

### Conclusion
Incorporating multiple SMAs into LSTM models significantly improves stock price forecasting accuracy. This approach provides a robust framework for understanding market trends and can be a valuable tool for financial analysis.