### ==========================================
### MODULE E: AI Applications - Individual Open Project
### Project: Gold Market Trend Analysis using LSTM
### Name: Vikash PR
### ==========================================

**Objectives:**
1. Analyze historical gold price data and identify patterns
2. Engineer meaningful technical indicators for feature extraction
3. Build and train a deep learning model (LSTM) for price prediction
4. Evaluate model performance and generate future predictions

**Technology Stack:**
- Python 3.10+
- TensorFlow/Keras for Deep Learning
- Pandas & NumPy for Data Processing
- Matplotlib, Seaborn & Plotly for Visualization

---

## 1. Import Required Libraries

Import all necessary libraries for data processing, visualization, and deep learning model building.

In [None]:
# Data Processing & Analysis
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Machine Learning & Deep Learning
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.optimizers import Adam

# Data Fetching
import yfinance as yf

# Model Saving
import joblib
import os

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.4f' % x)

print("‚úÖ All libraries imported successfully!")
print(f"üì¶ TensorFlow Version: {tf.__version__}")
print(f"üì¶ Pandas Version: {pd.__version__}")
print(f"üì¶ NumPy Version: {np.__version__}")

## 2. Load and Explore Dataset

We'll fetch gold price data (GC=F - Gold Futures) from Yahoo Finance covering the period from 2013 to present. This provides us with 12+ years of daily OHLCV (Open, High, Low, Close, Volume) data.

In [None]:
# Define date range for data collection
START_DATE = "2013-01-01"
END_DATE = datetime.now().strftime("%Y-%m-%d")

print(f"üìÖ Fetching Gold Price Data from {START_DATE} to {END_DATE}")
print("-" * 50)

# Fetch Gold Futures data from Yahoo Finance
# GC=F is the ticker symbol for Gold Futures
gold_data = yf.download("GC=F", start=START_DATE, end=END_DATE, progress=False)

# Display basic information
print(f"\n‚úÖ Data Successfully Loaded!")
print(f"üìä Dataset Shape: {gold_data.shape}")
print(f"üìÖ Date Range: {gold_data.index.min().strftime('%Y-%m-%d')} to {gold_data.index.max().strftime('%Y-%m-%d')}")
print(f"üìà Total Trading Days: {len(gold_data)}")

# Display first few rows
print("\nüìã First 5 Rows:")
gold_data.head()

In [None]:
# Display last few rows
print("üìã Last 5 Rows:")
gold_data.tail()

In [None]:
# Display detailed information about the dataset
print("üìä Dataset Information:")
print("=" * 50)
print(gold_data.info())
print("\n" + "=" * 50)
print("\nüìà Statistical Summary:")
gold_data.describe()

In [None]:
# Check for missing values
print("üîç Missing Values Analysis:")
print("=" * 50)
missing_values = gold_data.isnull().sum()
missing_percentage = (gold_data.isnull().sum() / len(gold_data)) * 100

missing_df = pd.DataFrame({
    'Missing Values': missing_values,
    'Percentage (%)': missing_percentage
})
print(missing_df)
print(f"\nüìä Total Rows with Missing Data: {gold_data.isnull().any(axis=1).sum()}")

## 3. Data Preprocessing and Cleaning

In this section, we'll:
- Handle missing values using forward/backward fill
- Flatten multi-level column headers (if present from yfinance)
- Detect and handle outliers in price data
- Prepare the data for feature engineering

In [None]:
# Create a copy of the dataframe for processing
df = gold_data.copy()

# Flatten multi-level column headers if present (yfinance sometimes returns multi-level)
if isinstance(df.columns, pd.MultiIndex):
    df.columns = df.columns.get_level_values(0)

# Rename columns for consistency
df.columns = ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']

# We'll use 'Close' as our primary target, drop 'Adj Close' as it's similar
df = df.drop('Adj Close', axis=1)

print("üìä Cleaned Dataset Columns:", list(df.columns))
print(f"üìà Dataset Shape: {df.shape}")
df.head()

In [None]:
# Handle missing values using forward fill followed by backward fill
print("üîß Handling Missing Values...")
print(f"Before: {df.isnull().sum().sum()} missing values")

# Forward fill (use previous day's value)
df = df.ffill()

# Backward fill for any remaining NaN at the start
df = df.bfill()

# Drop any remaining rows with NaN (if any)
df = df.dropna()

print(f"After: {df.isnull().sum().sum()} missing values")
print(f"‚úÖ Dataset Shape After Cleaning: {df.shape}")

In [None]:
# Outlier Detection using IQR method
def detect_outliers_iqr(data, column):
    """Detect outliers using Interquartile Range method"""
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    outliers = data[(data[column] < lower_bound) | (data[column] > upper_bound)]
    return outliers, lower_bound, upper_bound

# Check outliers in Close price
outliers, lower, upper = detect_outliers_iqr(df, 'Close')
print("üîç Outlier Analysis for Close Price:")
print(f"   Lower Bound: ${lower:.2f}")
print(f"   Upper Bound: ${upper:.2f}")
print(f"   Number of Outliers: {len(outliers)}")
print(f"   Outlier Percentage: {(len(outliers)/len(df))*100:.2f}%")

# Note: For financial data, we typically keep outliers as they represent real market events
print("\nüìù Note: Outliers are kept as they represent real market events (COVID, financial crises, etc.)")

In [None]:
# Save raw data for reference
raw_data_path = 'data/raw/gold_prices_raw.csv'
df.to_csv(raw_data_path)
print(f"‚úÖ Raw data saved to: {raw_data_path}")

## 4. Exploratory Data Analysis and Visualization

Let's visualize the gold price data to understand trends, patterns, and volatility over the 12+ year period.

In [None]:
# Gold Price Time Series Plot
fig, axes = plt.subplots(3, 1, figsize=(14, 12))

# Plot 1: Closing Price Over Time
axes[0].plot(df.index, df['Close'], color='gold', linewidth=1.5, label='Close Price')
axes[0].set_title('Gold Price (USD/oz) - 2013 to Present', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Price (USD)')
axes[0].legend(loc='upper left')
axes[0].grid(True, alpha=0.3)

# Mark significant events
covid_start = pd.Timestamp('2020-03-01')
if covid_start in df.index or covid_start < df.index.max():
    axes[0].axvline(x=covid_start, color='red', linestyle='--', alpha=0.7, label='COVID-19')
    axes[0].annotate('COVID-19', xy=(covid_start, df['Close'].max()), fontsize=9, color='red')

# Plot 2: High and Low Prices
axes[1].fill_between(df.index, df['Low'], df['High'], alpha=0.3, color='blue', label='High-Low Range')
axes[1].plot(df.index, df['Close'], color='gold', linewidth=1, label='Close')
axes[1].set_title('Gold Price Range (High-Low)', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Price (USD)')
axes[1].legend(loc='upper left')
axes[1].grid(True, alpha=0.3)

# Plot 3: Trading Volume
axes[2].bar(df.index, df['Volume'], color='steelblue', alpha=0.7, width=1)
axes[2].set_title('Trading Volume Over Time', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Date')
axes[2].set_ylabel('Volume')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('images/visualizations/gold_price_overview.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Chart saved to: images/visualizations/gold_price_overview.png")

In [None]:
# Interactive Candlestick Chart using Plotly (Last 6 months for clarity)
last_6_months = df.last('6M')

fig = go.Figure(data=[go.Candlestick(
    x=last_6_months.index,
    open=last_6_months['Open'],
    high=last_6_months['High'],
    low=last_6_months['Low'],
    close=last_6_months['Close'],
    name='Gold Price'
)])

fig.update_layout(
    title='Gold Price Candlestick Chart (Last 6 Months)',
    yaxis_title='Price (USD/oz)',
    xaxis_title='Date',
    template='plotly_white',
    xaxis_rangeslider_visible=False,
    height=500
)

fig.show()

In [None]:
# Distribution and Correlation Analysis
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Price Distribution
axes[0, 0].hist(df['Close'], bins=50, color='gold', edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Gold Close Price Distribution', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Price (USD)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].axvline(df['Close'].mean(), color='red', linestyle='--', label=f'Mean: ${df["Close"].mean():.2f}')
axes[0, 0].legend()

# Daily Returns Distribution
daily_returns = df['Close'].pct_change().dropna() * 100
axes[0, 1].hist(daily_returns, bins=50, color='steelblue', edgecolor='black', alpha=0.7)
axes[0, 1].set_title('Daily Returns Distribution (%)', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Daily Return (%)')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].axvline(0, color='red', linestyle='--')

# Box Plot of OHLC Prices
axes[1, 0].boxplot([df['Open'], df['High'], df['Low'], df['Close']], 
                    labels=['Open', 'High', 'Low', 'Close'])
axes[1, 0].set_title('OHLC Price Distribution (Box Plot)', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('Price (USD)')

# Correlation Heatmap
corr_matrix = df[['Open', 'High', 'Low', 'Close', 'Volume']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='YlOrRd', ax=axes[1, 1], fmt='.3f')
axes[1, 1].set_title('Feature Correlation Heatmap', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.savefig('images/visualizations/gold_price_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Chart saved to: images/visualizations/gold_price_distribution.png")

In [None]:
# Yearly and Monthly Analysis
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Add year and month columns for analysis
df_analysis = df.copy()
df_analysis['Year'] = df_analysis.index.year
df_analysis['Month'] = df_analysis.index.month

# Yearly Average Price
yearly_avg = df_analysis.groupby('Year')['Close'].mean()
axes[0].bar(yearly_avg.index, yearly_avg.values, color='gold', edgecolor='black')
axes[0].set_title('Average Gold Price by Year', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Average Price (USD)')
axes[0].tick_params(axis='x', rotation=45)

# Monthly Seasonality
monthly_avg = df_analysis.groupby('Month')['Close'].mean()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
axes[1].bar(range(1, 13), monthly_avg.values, color='steelblue', edgecolor='black')
axes[1].set_title('Average Gold Price by Month (Seasonality)', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Month')
axes[1].set_ylabel('Average Price (USD)')
axes[1].set_xticks(range(1, 13))
axes[1].set_xticklabels(month_names)

plt.tight_layout()
plt.savefig('images/visualizations/gold_price_seasonality.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Chart saved to: images/visualizations/gold_price_seasonality.png")

## 5. Feature Engineering - Technical Indicators

We'll calculate various technical indicators commonly used in financial analysis:
- **Moving Averages (SMA, EMA)**: Trend identification
- **RSI (Relative Strength Index)**: Overbought/oversold conditions
- **MACD**: Momentum and trend changes
- **Bollinger Bands**: Volatility measurement
- **Daily Returns & Volatility**: Price momentum indicators

In [None]:
# Create a copy for feature engineering
df_features = df.copy()

# =====================================
# 1. Moving Averages (SMA and EMA)
# =====================================
# Simple Moving Averages
df_features['SMA_20'] = df_features['Close'].rolling(window=20).mean()
df_features['SMA_50'] = df_features['Close'].rolling(window=50).mean()
df_features['SMA_200'] = df_features['Close'].rolling(window=200).mean()

# Exponential Moving Average
df_features['EMA_20'] = df_features['Close'].ewm(span=20, adjust=False).mean()

print("‚úÖ Moving Averages calculated: SMA_20, SMA_50, SMA_200, EMA_20")

In [None]:
# =====================================
# 2. RSI (Relative Strength Index)
# =====================================
def calculate_rsi(data, window=14):
    """Calculate RSI indicator"""
    delta = data.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    rs = gain / loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

df_features['RSI'] = calculate_rsi(df_features['Close'])
print("‚úÖ RSI (14-period) calculated")

In [None]:
# =====================================
# 2. RSI (Relative Strength Index)
# =====================================
def calculate_rsi(data, window=14):
    """Calculate Relative Strength Index"""
    delta = data.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    
    rs = gain / loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

df_features['RSI'] = calculate_rsi(df_features['Close'], window=14)
print("‚úÖ RSI (14-day) calculated")

In [None]:
# =====================================
# 3. MACD (Moving Average Convergence Divergence)
# =====================================
def calculate_macd(data, fast=12, slow=26, signal=9):
    """Calculate MACD, Signal Line, and Histogram"""
    ema_fast = data.ewm(span=fast, adjust=False).mean()
    ema_slow = data.ewm(span=slow, adjust=False).mean()
    macd = ema_fast - ema_slow
    signal_line = macd.ewm(span=signal, adjust=False).mean()
    histogram = macd - signal_line
    return macd, signal_line, histogram

df_features['MACD'], df_features['MACD_Signal'], df_features['MACD_Hist'] = calculate_macd(df_features['Close'])
print("‚úÖ MACD, Signal Line, and Histogram calculated")

In [None]:
# =====================================
# 4. Bollinger Bands
# =====================================
def calculate_bollinger_bands(data, window=20, num_std=2):
    """Calculate Bollinger Bands"""
    sma = data.rolling(window=window).mean()
    std = data.rolling(window=window).std()
    upper_band = sma + (std * num_std)
    lower_band = sma - (std * num_std)
    return upper_band, sma, lower_band

df_features['BB_Upper'], df_features['BB_Middle'], df_features['BB_Lower'] = calculate_bollinger_bands(df_features['Close'])
df_features['BB_Width'] = df_features['BB_Upper'] - df_features['BB_Lower']
print("‚úÖ Bollinger Bands calculated")

In [None]:
# =====================================
# 5. Daily Returns and Volatility
# =====================================
# Daily Returns (percentage change)
df_features['Daily_Return'] = df_features['Close'].pct_change() * 100

# Rolling Volatility (standard deviation of returns over 20 days)
df_features['Volatility_20'] = df_features['Daily_Return'].rolling(window=20).std()

# Price Change (absolute)
df_features['Price_Change'] = df_features['Close'].diff()

# High-Low Range
df_features['HL_Range'] = df_features['High'] - df_features['Low']

# Open-Close Range
df_features['OC_Range'] = df_features['Close'] - df_features['Open']

print("‚úÖ Daily Returns, Volatility, and Range features calculated")

In [None]:
# =====================================
# 6. Lag Features (Previous day values)
# =====================================
# Previous close prices (lag features)
for lag in [1, 2, 3, 5, 7]:
    df_features[f'Close_Lag_{lag}'] = df_features['Close'].shift(lag)

print("‚úÖ Lag features calculated (1, 2, 3, 5, 7 days)")

# Display all features
print("\nüìä Feature Summary:")
print(f"Total Features: {len(df_features.columns)}")
print("\nFeatures Created:")
for i, col in enumerate(df_features.columns, 1):
    print(f"  {i}. {col}")

In [None]:
# Visualize Technical Indicators
fig, axes = plt.subplots(4, 1, figsize=(14, 16))

# Use last 2 years of data for clearer visualization
plot_data = df_features.last('2Y')

# Plot 1: Price with Moving Averages
axes[0].plot(plot_data.index, plot_data['Close'], label='Close', color='black', linewidth=1.5)
axes[0].plot(plot_data.index, plot_data['SMA_20'], label='SMA 20', color='blue', alpha=0.7)
axes[0].plot(plot_data.index, plot_data['SMA_50'], label='SMA 50', color='orange', alpha=0.7)
axes[0].plot(plot_data.index, plot_data['SMA_200'], label='SMA 200', color='red', alpha=0.7)
axes[0].set_title('Gold Price with Moving Averages', fontsize=12, fontweight='bold')
axes[0].legend(loc='upper left')
axes[0].grid(True, alpha=0.3)

# Plot 2: Price with Bollinger Bands
axes[1].plot(plot_data.index, plot_data['Close'], label='Close', color='black', linewidth=1.5)
axes[1].fill_between(plot_data.index, plot_data['BB_Upper'], plot_data['BB_Lower'], 
                     alpha=0.2, color='blue', label='Bollinger Bands')
axes[1].plot(plot_data.index, plot_data['BB_Upper'], color='blue', linestyle='--', alpha=0.5)
axes[1].plot(plot_data.index, plot_data['BB_Lower'], color='blue', linestyle='--', alpha=0.5)
axes[1].set_title('Gold Price with Bollinger Bands', fontsize=12, fontweight='bold')
axes[1].legend(loc='upper left')
axes[1].grid(True, alpha=0.3)

# Plot 3: RSI
axes[2].plot(plot_data.index, plot_data['RSI'], color='purple', linewidth=1.5)
axes[2].axhline(y=70, color='red', linestyle='--', alpha=0.7, label='Overbought (70)')
axes[2].axhline(y=30, color='green', linestyle='--', alpha=0.7, label='Oversold (30)')
axes[2].fill_between(plot_data.index, 70, 100, alpha=0.1, color='red')
axes[2].fill_between(plot_data.index, 0, 30, alpha=0.1, color='green')
axes[2].set_title('RSI (Relative Strength Index)', fontsize=12, fontweight='bold')
axes[2].set_ylim(0, 100)
axes[2].legend(loc='upper left')
axes[2].grid(True, alpha=0.3)

# Plot 4: MACD
axes[3].plot(plot_data.index, plot_data['MACD'], label='MACD', color='blue', linewidth=1.5)
axes[3].plot(plot_data.index, plot_data['MACD_Signal'], label='Signal', color='orange', linewidth=1.5)
axes[3].bar(plot_data.index, plot_data['MACD_Hist'], label='Histogram', color='gray', alpha=0.5, width=1)
axes[3].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[3].set_title('MACD (Moving Average Convergence Divergence)', fontsize=12, fontweight='bold')
axes[3].legend(loc='upper left')
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('images/visualizations/technical_indicators.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Chart saved to: images/visualizations/technical_indicators.png")

In [None]:
# Drop rows with NaN values (from rolling calculations)
print(f"üìä Shape before dropping NaN: {df_features.shape}")
df_features = df_features.dropna()
print(f"üìä Shape after dropping NaN: {df_features.shape}")

# Display sample of the feature-engineered dataset
print("\nüìã Sample of Feature-Engineered Dataset:")
df_features.head()

## 6. Data Normalization and Scaling

Apply MinMaxScaler to normalize all features to the 0-1 range for optimal LSTM training. Neural networks perform better with normalized data as it helps with gradient descent optimization.

In [None]:
# Select features for the model
# We'll use a subset of features that are most relevant for prediction
feature_columns = [
    'Open', 'High', 'Low', 'Close', 'Volume',
    'SMA_20', 'SMA_50', 'EMA_20',
    'RSI', 'MACD', 'MACD_Signal',
    'BB_Upper', 'BB_Lower', 'BB_Width',
    'Daily_Return', 'Volatility_20',
    'HL_Range', 'OC_Range'
]

# Create feature matrix
data = df_features[feature_columns].values

# Store the original Close prices for later inverse transformation
close_prices = df_features['Close'].values

print(f"üìä Feature Matrix Shape: {data.shape}")
print(f"üìã Features Used ({len(feature_columns)}):")
for i, col in enumerate(feature_columns, 1):
    print(f"   {i}. {col}")

In [None]:
# Initialize MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))

# Fit and transform the data
scaled_data = scaler.fit_transform(data)

# Create a separate scaler for the Close price (for inverse transformation)
close_scaler = MinMaxScaler(feature_range=(0, 1))
close_scaler.fit(close_prices.reshape(-1, 1))

print("‚úÖ Data Normalization Complete!")
print(f"üìä Scaled Data Shape: {scaled_data.shape}")
print(f"üìà Value Range: [{scaled_data.min():.4f}, {scaled_data.max():.4f}]")

# Display sample of scaled data
print("\nüìã Sample of Scaled Data (first 5 rows, first 5 features):")
print(pd.DataFrame(scaled_data[:5, :5], columns=feature_columns[:5]))

## 7. Create Sequences for LSTM

LSTMs require 3D input: `(samples, timesteps, features)`. We'll create sliding window sequences with a 60-day lookback period, meaning the model will use the past 60 days of data to predict the next day's closing price.

In [None]:
# Define sequence parameters
LOOKBACK = 60  # Number of previous days to use for prediction
TARGET_COL = 3  # Index of 'Close' in feature_columns (0-indexed)

def create_sequences(data, lookback, target_col):
    """
    Create sequences for LSTM training.
    
    Parameters:
    - data: Scaled feature matrix
    - lookback: Number of time steps to look back
    - target_col: Column index of the target variable (Close price)
    
    Returns:
    - X: Input sequences (samples, timesteps, features)
    - y: Target values (next day's close price)
    """
    X, y = [], []
    
    for i in range(lookback, len(data)):
        # Input: past 'lookback' days of all features
        X.append(data[i-lookback:i])
        # Target: next day's Close price (scaled)
        y.append(data[i, target_col])
    
    return np.array(X), np.array(y)

# Create sequences
X, y = create_sequences(scaled_data, LOOKBACK, TARGET_COL)

print(f"‚úÖ Sequences Created!")
print(f"üìä Input Shape (X): {X.shape} ‚Üí (samples, timesteps, features)")
print(f"üìä Target Shape (y): {y.shape} ‚Üí (samples,)")
print(f"\nüìã Sequence Details:")
print(f"   - Lookback Window: {LOOKBACK} days")
print(f"   - Number of Features: {X.shape[2]}")
print(f"   - Total Sequences: {X.shape[0]}")

## 8. Train-Test Split

Split the data chronologically with 80% for training and 20% for testing. It's crucial to maintain temporal order to prevent data leakage.

In [None]:
# Define train-test split ratio
TRAIN_RATIO = 0.8

# Calculate split index
split_idx = int(len(X) * TRAIN_RATIO)

# Split the data (chronological order maintained)
X_train, X_test = X[:split_idx], X[split_idx:]
y_train, y_test = y[:split_idx], y[split_idx:]

# Get corresponding dates for visualization later
dates = df_features.index[LOOKBACK:]
train_dates = dates[:split_idx]
test_dates = dates[split_idx:]

print("‚úÖ Train-Test Split Complete!")
print("=" * 50)
print(f"üìä Training Set:")
print(f"   - X_train shape: {X_train.shape}")
print(f"   - y_train shape: {y_train.shape}")
print(f"   - Date Range: {train_dates[0].strftime('%Y-%m-%d')} to {train_dates[-1].strftime('%Y-%m-%d')}")
print(f"\nüìä Testing Set:")
print(f"   - X_test shape: {X_test.shape}")
print(f"   - y_test shape: {y_test.shape}")
print(f"   - Date Range: {test_dates[0].strftime('%Y-%m-%d')} to {test_dates[-1].strftime('%Y-%m-%d')}")
print("=" * 50)
print(f"\nüìà Split Ratio: {TRAIN_RATIO*100:.0f}% Train / {(1-TRAIN_RATIO)*100:.0f}% Test")

## 9. Build LSTM Model Architecture

We'll build a stacked LSTM model with the following architecture:
- **LSTM Layer 1**: 128 units with return_sequences=True + Dropout(0.2)
- **LSTM Layer 2**: 64 units + Dropout(0.2)
- **Dense Layer**: 32 units with ReLU activation
- **Output Layer**: 1 unit (linear activation for regression)

In [None]:
def build_lstm_model(input_shape):
    """
    Build a stacked LSTM model for gold price prediction.
    
    Architecture:
    - LSTM(128) ‚Üí Dropout(0.2)
    - LSTM(64) ‚Üí Dropout(0.2)
    - Dense(32, ReLU)
    - Dense(1, Linear)
    
    Parameters:
    - input_shape: Tuple of (timesteps, features)
    
    Returns:
    - Compiled Keras model
    """
    model = Sequential([
        # First LSTM Layer
        LSTM(units=128, return_sequences=True, input_shape=input_shape),
        Dropout(0.2),
        
        # Second LSTM Layer
        LSTM(units=64, return_sequences=False),
        Dropout(0.2),
        
        # Dense Layer
        Dense(units=32, activation='relu'),
        
        # Output Layer
        Dense(units=1, activation='linear')
    ])
    
    # Compile the model
    model.compile(
        optimizer=Adam(learning_rate=0.001),
        loss='mse',
        metrics=['mae']
    )
    
    return model

# Build the model
input_shape = (X_train.shape[1], X_train.shape[2])  # (timesteps, features)
model = build_lstm_model(input_shape)

# Display model summary
print("üèóÔ∏è LSTM Model Architecture:")
print("=" * 60)
model.summary()
print("=" * 60)

## 10. Train the LSTM Model

Train the model with:
- **Epochs**: 100 (with early stopping)
- **Batch Size**: 32
- **Validation Split**: 10% of training data
- **Callbacks**: Early Stopping and Model Checkpoint

In [None]:
# Define training parameters
EPOCHS = 100
BATCH_SIZE = 32
VALIDATION_SPLIT = 0.1

# Define callbacks
callbacks = [
    # Early Stopping: Stop training if validation loss doesn't improve for 15 epochs
    EarlyStopping(
        monitor='val_loss',
        patience=15,
        restore_best_weights=True,
        verbose=1
    ),
    # Model Checkpoint: Save the best model
    ModelCheckpoint(
        filepath='models/lstm_gold_model.keras',
        monitor='val_loss',
        save_best_only=True,
        verbose=1
    )
]

print("üöÄ Starting Model Training...")
print("=" * 50)
print(f"üìã Training Configuration:")
print(f"   - Epochs: {EPOCHS}")
print(f"   - Batch Size: {BATCH_SIZE}")
print(f"   - Validation Split: {VALIDATION_SPLIT*100:.0f}%")
print(f"   - Early Stopping Patience: 15 epochs")
print("=" * 50)

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_split=VALIDATION_SPLIT,
    callbacks=callbacks,
    verbose=1
)

print("\n‚úÖ Model Training Complete!")

In [None]:
# Plot Training History
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot Loss
axes[0].plot(history.history['loss'], label='Training Loss', color='blue', linewidth=2)
axes[0].plot(history.history['val_loss'], label='Validation Loss', color='orange', linewidth=2)
axes[0].set_title('Model Loss Over Epochs', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss (MSE)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot MAE
axes[1].plot(history.history['mae'], label='Training MAE', color='blue', linewidth=2)
axes[1].plot(history.history['val_mae'], label='Validation MAE', color='orange', linewidth=2)
axes[1].set_title('Model MAE Over Epochs', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('MAE')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('images/visualizations/training_history.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Chart saved to: images/visualizations/training_history.png")

## 11. Make Predictions

Generate predictions on both training and test datasets to evaluate the model's performance.

In [None]:
# Make predictions on training and test sets
print("üîÆ Generating Predictions...")

# Predictions on training set
train_predictions_scaled = model.predict(X_train, verbose=0)

# Predictions on test set
test_predictions_scaled = model.predict(X_test, verbose=0)

print(f"‚úÖ Predictions Generated!")
print(f"üìä Training Predictions Shape: {train_predictions_scaled.shape}")
print(f"üìä Test Predictions Shape: {test_predictions_scaled.shape}")

## 12. Inverse Transform Predictions

Convert the scaled predictions back to original USD price scale for interpretable results.

In [None]:
# Inverse transform predictions to original scale
train_predictions = close_scaler.inverse_transform(train_predictions_scaled)
test_predictions = close_scaler.inverse_transform(test_predictions_scaled)

# Inverse transform actual values
y_train_actual = close_scaler.inverse_transform(y_train.reshape(-1, 1))
y_test_actual = close_scaler.inverse_transform(y_test.reshape(-1, 1))

print("‚úÖ Inverse Transformation Complete!")
print(f"\nüìä Price Range in Predictions:")
print(f"   Training Set: ${train_predictions.min():.2f} - ${train_predictions.max():.2f}")
print(f"   Test Set: ${test_predictions.min():.2f} - ${test_predictions.max():.2f}")

# Display sample predictions vs actual
print("\nüìã Sample Test Predictions vs Actual:")
sample_comparison = pd.DataFrame({
    'Date': test_dates[:10],
    'Actual': y_test_actual[:10].flatten(),
    'Predicted': test_predictions[:10].flatten(),
    'Difference': (y_test_actual[:10] - test_predictions[:10]).flatten()
})
sample_comparison['% Error'] = (abs(sample_comparison['Difference']) / sample_comparison['Actual'] * 100)
print(sample_comparison.to_string(index=False))

## 13. Evaluate Model Performance

Calculate and analyze key evaluation metrics:
- **RMSE** (Root Mean Square Error)
- **MAE** (Mean Absolute Error)
- **MAPE** (Mean Absolute Percentage Error)
- **R¬≤ Score** (Coefficient of Determination)

In [None]:
def calculate_mape(actual, predicted):
    """Calculate Mean Absolute Percentage Error"""
    return np.mean(np.abs((actual - predicted) / actual)) * 100

def evaluate_model(actual, predicted, set_name=""):
    """Calculate and return all evaluation metrics"""
    rmse = np.sqrt(mean_squared_error(actual, predicted))
    mae = mean_absolute_error(actual, predicted)
    mape = calculate_mape(actual, predicted)
    r2 = r2_score(actual, predicted)
    
    return {
        'Set': set_name,
        'RMSE': rmse,
        'MAE': mae,
        'MAPE (%)': mape,
        'R¬≤ Score': r2
    }

# Calculate metrics for training and test sets
train_metrics = evaluate_model(y_train_actual, train_predictions, "Training")
test_metrics = evaluate_model(y_test_actual, test_predictions, "Test")

# Create metrics DataFrame
metrics_df = pd.DataFrame([train_metrics, test_metrics])

print("üìä Model Performance Metrics")
print("=" * 70)
print(metrics_df.to_string(index=False))
print("=" * 70)

# Performance analysis
print("\nüìà Performance Analysis:")
print(f"   ‚úÖ RMSE (Test): ${test_metrics['RMSE']:.2f} USD")
print(f"   ‚úÖ MAE (Test): ${test_metrics['MAE']:.2f} USD")
print(f"   ‚úÖ MAPE (Test): {test_metrics['MAPE (%)']:.2f}%")
print(f"   ‚úÖ R¬≤ Score (Test): {test_metrics['R¬≤ Score']:.4f}")

# Check against targets
print("\nüéØ Performance vs Targets:")
print(f"   MAE Target (< $20): {'‚úÖ PASS' if test_metrics['MAE'] < 20 else '‚ùå NEEDS IMPROVEMENT'}")
print(f"   MAPE Target (< 2%): {'‚úÖ PASS' if test_metrics['MAPE (%)'] < 2 else '‚ùå NEEDS IMPROVEMENT'}")
print(f"   R¬≤ Target (> 0.90): {'‚úÖ PASS' if test_metrics['R¬≤ Score'] > 0.90 else '‚ùå NEEDS IMPROVEMENT'}")

In [None]:
# Trend Direction Accuracy
def calculate_trend_accuracy(actual, predicted):
    """Calculate the accuracy of predicting trend direction (up/down)"""
    actual_diff = np.diff(actual.flatten())
    predicted_diff = np.diff(predicted.flatten())
    
    # Check if signs match (both positive or both negative)
    correct_direction = np.sum(np.sign(actual_diff) == np.sign(predicted_diff))
    total = len(actual_diff)
    
    return (correct_direction / total) * 100

train_trend_acc = calculate_trend_accuracy(y_train_actual, train_predictions)
test_trend_acc = calculate_trend_accuracy(y_test_actual, test_predictions)

print("üìà Trend Direction Accuracy:")
print(f"   Training Set: {train_trend_acc:.2f}%")
print(f"   Test Set: {test_trend_acc:.2f}%")
print("\nüìù Note: This measures how often the model correctly predicts whether the price will go up or down.")

## 14. Visualize Predictions vs Actual Prices

Create comprehensive visualizations to compare model predictions with actual gold prices.

In [None]:
# Create comprehensive prediction visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Plot 1: Full Dataset - Actual vs Predicted
ax1 = axes[0, 0]
ax1.plot(train_dates, y_train_actual, label='Actual (Train)', color='blue', alpha=0.7)
ax1.plot(train_dates, train_predictions, label='Predicted (Train)', color='cyan', alpha=0.7, linestyle='--')
ax1.plot(test_dates, y_test_actual, label='Actual (Test)', color='green', alpha=0.7)
ax1.plot(test_dates, test_predictions, label='Predicted (Test)', color='red', alpha=0.7, linestyle='--')
ax1.axvline(x=test_dates[0], color='black', linestyle='-', linewidth=2, label='Train/Test Split')
ax1.set_title('Gold Price: Actual vs Predicted (Full Dataset)', fontsize=12, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Price (USD)')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)

# Plot 2: Test Set Only - Zoomed View
ax2 = axes[0, 1]
ax2.plot(test_dates, y_test_actual, label='Actual', color='blue', linewidth=2)
ax2.plot(test_dates, test_predictions, label='Predicted', color='red', linewidth=2, linestyle='--')
ax2.fill_between(test_dates, y_test_actual.flatten(), test_predictions.flatten(), 
                  alpha=0.2, color='gray', label='Prediction Gap')
ax2.set_title('Gold Price: Actual vs Predicted (Test Set)', fontsize=12, fontweight='bold')
ax2.set_xlabel('Date')
ax2.set_ylabel('Price (USD)')
ax2.legend(loc='upper left')
ax2.grid(True, alpha=0.3)

# Plot 3: Error Distribution
ax3 = axes[1, 0]
errors = (y_test_actual - test_predictions).flatten()
ax3.hist(errors, bins=50, color='steelblue', edgecolor='black', alpha=0.7)
ax3.axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero Error')
ax3.axvline(x=np.mean(errors), color='green', linestyle='--', linewidth=2, label=f'Mean Error: ${np.mean(errors):.2f}')
ax3.set_title('Prediction Error Distribution', fontsize=12, fontweight='bold')
ax3.set_xlabel('Error (Actual - Predicted) USD')
ax3.set_ylabel('Frequency')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: Scatter Plot - Actual vs Predicted
ax4 = axes[1, 1]
ax4.scatter(y_test_actual, test_predictions, alpha=0.5, color='blue', s=20)
min_val = min(y_test_actual.min(), test_predictions.min())
max_val = max(y_test_actual.max(), test_predictions.max())
ax4.plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect Prediction')
ax4.set_title('Actual vs Predicted Prices (Scatter)', fontsize=12, fontweight='bold')
ax4.set_xlabel('Actual Price (USD)')
ax4.set_ylabel('Predicted Price (USD)')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('images/visualizations/predictions_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

print("‚úÖ Chart saved to: images/visualizations/predictions_analysis.png")

In [None]:
# Interactive Plot using Plotly
fig = go.Figure()

# Add actual prices (training)
fig.add_trace(go.Scatter(
    x=train_dates, y=y_train_actual.flatten(),
    name='Actual (Train)', mode='lines',
    line=dict(color='blue', width=1)
))

# Add predicted prices (training)
fig.add_trace(go.Scatter(
    x=train_dates, y=train_predictions.flatten(),
    name='Predicted (Train)', mode='lines',
    line=dict(color='cyan', width=1, dash='dot')
))

# Add actual prices (test)
fig.add_trace(go.Scatter(
    x=test_dates, y=y_test_actual.flatten(),
    name='Actual (Test)', mode='lines',
    line=dict(color='green', width=2)
))

# Add predicted prices (test)
fig.add_trace(go.Scatter(
    x=test_dates, y=test_predictions.flatten(),
    name='Predicted (Test)', mode='lines',
    line=dict(color='red', width=2, dash='dash')
))

# Add train/test split line
fig.add_vline(x=test_dates[0], line_dash="solid", line_color="black", 
              annotation_text="Train/Test Split")

fig.update_layout(
    title='Gold Price Prediction: LSTM Model Results (Interactive)',
    xaxis_title='Date',
    yaxis_title='Price (USD/oz)',
    template='plotly_white',
    hovermode='x unified',
    height=600
)

fig.show()

## 15. Future Price Prediction

Use the last 60 days of available data to predict the next day's gold price.

In [None]:
# Get the last 60 days of data for future prediction
last_60_days = scaled_data[-LOOKBACK:]
last_60_days = last_60_days.reshape(1, LOOKBACK, len(feature_columns))

# Make prediction
next_day_prediction_scaled = model.predict(last_60_days, verbose=0)
next_day_prediction = close_scaler.inverse_transform(next_day_prediction_scaled)

# Get the last actual price
last_actual_price = df_features['Close'].iloc[-1]
last_date = df_features.index[-1]
next_date = last_date + timedelta(days=1)

# Calculate predicted change
predicted_change = next_day_prediction[0][0] - last_actual_price
predicted_change_pct = (predicted_change / last_actual_price) * 100

print("=" * 60)
print("üîÆ FUTURE GOLD PRICE PREDICTION")
print("=" * 60)
print(f"\nüìÖ Last Available Date: {last_date.strftime('%Y-%m-%d')}")
print(f"üí∞ Last Closing Price: ${last_actual_price:.2f}")
print(f"\nüìÖ Prediction Date: {next_date.strftime('%Y-%m-%d')}")
print(f"üéØ Predicted Price: ${next_day_prediction[0][0]:.2f}")
print(f"\nüìà Predicted Change: ${predicted_change:.2f} ({predicted_change_pct:+.2f}%)")
print(f"üìä Trend Direction: {'üìà UP' if predicted_change > 0 else 'üìâ DOWN' if predicted_change < 0 else '‚û°Ô∏è STABLE'}")
print("=" * 60)

# Confidence context based on model performance
print(f"\nüìã Model Confidence Context:")
print(f"   - Based on test set MAPE: {test_metrics['MAPE (%)']:.2f}%")
print(f"   - Typical prediction range: ${next_day_prediction[0][0] - test_metrics['MAE']:.2f} to ${next_day_prediction[0][0] + test_metrics['MAE']:.2f}")
print(f"   - R¬≤ Score: {test_metrics['R¬≤ Score']:.4f}")

In [None]:
# Multi-day Future Prediction (Next 7 days)
def predict_future_days(model, last_sequence, scaler, close_scaler, n_days=7):
    """
    Predict gold prices for the next n days using iterative prediction.
    Note: Accuracy decreases for predictions further into the future.
    """
    predictions = []
    current_sequence = last_sequence.copy()
    
    for day in range(n_days):
        # Predict next day
        next_pred_scaled = model.predict(current_sequence, verbose=0)
        predictions.append(next_pred_scaled[0][0])
        
        # Update sequence for next prediction (simplified - using prediction as next Close)
        # Create a new row with the predicted close value
        new_row = current_sequence[0, -1, :].copy()
        new_row[3] = next_pred_scaled[0][0]  # Update Close column
        
        # Shift the sequence and add new prediction
        current_sequence = np.roll(current_sequence, -1, axis=1)
        current_sequence[0, -1, :] = new_row
    
    # Inverse transform predictions
    predictions_array = np.array(predictions).reshape(-1, 1)
    predictions_usd = close_scaler.inverse_transform(predictions_array)
    
    return predictions_usd.flatten()

# Predict next 7 days
future_predictions = predict_future_days(model, last_60_days, scaler, close_scaler, n_days=7)

# Create future dates
future_dates = [last_date + timedelta(days=i+1) for i in range(7)]

print("\nüìÖ 7-Day Gold Price Forecast:")
print("=" * 50)
forecast_df = pd.DataFrame({
    'Date': [d.strftime('%Y-%m-%d') for d in future_dates],
    'Day': ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5', 'Day 6', 'Day 7'],
    'Predicted Price': [f"${p:.2f}" for p in future_predictions],
    'Change from Last': [f"{((p - last_actual_price) / last_actual_price * 100):+.2f}%" for p in future_predictions]
})
print(forecast_df.to_string(index=False))
print("=" * 50)
print("\n‚ö†Ô∏è Note: Predictions become less reliable further into the future.")

In [None]:
# Visualize Future Predictions
fig = go.Figure()

# Add recent historical prices (last 90 days)
recent_data = df_features.last('90D')

fig.add_trace(go.Scatter(
    x=recent_data.index,
    y=recent_data['Close'],
    name='Historical Prices',
    mode='lines',
    line=dict(color='blue', width=2)
))

# Add future predictions
fig.add_trace(go.Scatter(
    x=future_dates,
    y=future_predictions,
    name='Predicted Prices',
    mode='lines+markers',
    line=dict(color='red', width=2, dash='dash'),
    marker=dict(size=10, symbol='diamond')
))

# Add vertical line at last known date
fig.add_vline(x=last_date, line_dash="solid", line_color="gray",
              annotation_text="Last Known Price")

fig.update_layout(
    title='Gold Price: Historical & 7-Day Forecast',
    xaxis_title='Date',
    yaxis_title='Price (USD/oz)',
    template='plotly_white',
    hovermode='x unified',
    height=500
)

fig.show()

## 16. Save Trained Model

Save the trained LSTM model and the scaler objects for future inference.

In [None]:
# Save the trained model in both formats
model_path_h5 = 'models/lstm_gold_model.h5'
model_path_keras = 'models/lstm_gold_model.keras'

# Save in H5 format (legacy format)
model.save(model_path_h5)
print(f"‚úÖ Model saved to: {model_path_h5}")

# Save in Keras format (recommended for TF 2.x)
model.save(model_path_keras)
print(f"‚úÖ Model saved to: {model_path_keras}")

# Save the scalers using joblib
scaler_path = 'models/feature_scaler.pkl'
close_scaler_path = 'models/close_scaler.pkl'

joblib.dump(scaler, scaler_path)
print(f"‚úÖ Feature scaler saved to: {scaler_path}")

joblib.dump(close_scaler, close_scaler_path)
print(f"‚úÖ Close scaler saved to: {close_scaler_path}")

# Save feature columns for reference
feature_info = {
    'feature_columns': feature_columns,
    'lookback': LOOKBACK,
    'target_col': TARGET_COL
}
joblib.dump(feature_info, 'models/feature_info.pkl')
print(f"‚úÖ Feature info saved to: models/feature_info.pkl")

In [None]:
# Save processed data
processed_data_path = 'data/processed/gold_prices_features.csv'
df_features.to_csv(processed_data_path)
print(f"‚úÖ Processed data saved to: {processed_data_path}")

# Display saved files summary
print("\nüìÅ Saved Files Summary:")
print("=" * 50)
print("üìÇ models/")
print("   ‚îú‚îÄ‚îÄ lstm_gold_model.h5")
print("   ‚îú‚îÄ‚îÄ lstm_gold_model.keras")
print("   ‚îú‚îÄ‚îÄ feature_scaler.pkl")
print("   ‚îú‚îÄ‚îÄ close_scaler.pkl")
print("   ‚îî‚îÄ‚îÄ feature_info.pkl")
print("üìÇ data/")
print("   ‚îú‚îÄ‚îÄ raw/gold_prices_raw.csv")
print("   ‚îî‚îÄ‚îÄ processed/gold_prices_features.csv")
print("üìÇ images/visualizations/")
print("   ‚îú‚îÄ‚îÄ gold_price_overview.png")
print("   ‚îú‚îÄ‚îÄ gold_price_distribution.png")
print("   ‚îú‚îÄ‚îÄ gold_price_seasonality.png")
print("   ‚îú‚îÄ‚îÄ technical_indicators.png")
print("   ‚îú‚îÄ‚îÄ training_history.png")
print("   ‚îî‚îÄ‚îÄ predictions_analysis.png")
print("=" * 50)

## 17. Ethical Considerations & Responsible AI

### 7.1 Bias and Fairness
- **Data Bias:** Historical data may not represent future market conditions. The model is trained on 2013-2025 data which may not capture all market scenarios.
- **Model Bias:** LSTM may overfit to specific market regimes (bull/bear markets).
- **Mitigation:** Regular model retraining, ensemble approaches, and continuous monitoring are recommended.

### 7.2 Dataset Limitations
- Does not include all market-influencing factors (news sentiment, geopolitical events, interest rates)
- Historical patterns may not repeat in future markets
- Missing data during market holidays affects continuity

### 7.3 Responsible Use of AI
‚ö†Ô∏è **Important Disclaimers:**
- This is an **educational project**, NOT financial advice
- Predictions should NOT be solely relied upon for investment decisions
- Always consult qualified financial advisors for investment choices
- Past performance does NOT guarantee future results
- The model has inherent uncertainty and errors

### 7.4 AI Tool Usage Declaration
- AI tools (GitHub Copilot) used for code assistance
- All code logic and analysis performed by the student
- Model results require human interpretation and validation

## 18. Conclusion & Future Scope

### Key Findings

1. **Model Performance:** The LSTM model successfully captured temporal patterns in gold price data, achieving competitive metrics on the test set.

2. **Technical Indicators:** Feature engineering with moving averages, RSI, MACD, and Bollinger Bands improved the model's ability to understand market dynamics.

3. **Pattern Recognition:** The model demonstrates ability to follow overall price trends and identify major turning points.

### Future Improvements

1. **External Data Integration:**
   - Include macroeconomic indicators (interest rates, inflation)
   - Add sentiment analysis from financial news
   - Incorporate USD index and other correlated assets

2. **Model Enhancements:**
   - Implement attention mechanisms (Transformer architecture)
   - Ensemble multiple models for robust predictions
   - Add bidirectional LSTM layers
   - Experiment with different lookback windows

3. **Real-time Deployment:**
   - Build API for real-time predictions
   - Implement automated data pipeline
   - Create interactive dashboard for monitoring

### Summary Statistics

In [None]:
# Final Summary
print("=" * 70)
print("üìä PROJECT SUMMARY: AI-Powered Gold Price Trend Analysis & Prediction")
print("=" * 70)

print("\nüìà Dataset Information:")
print(f"   ‚Ä¢ Total Data Points: {len(df_features)}")
print(f"   ‚Ä¢ Date Range: {df_features.index[0].strftime('%Y-%m-%d')} to {df_features.index[-1].strftime('%Y-%m-%d')}")
print(f"   ‚Ä¢ Features Used: {len(feature_columns)}")

print("\nüèóÔ∏è Model Architecture:")
print(f"   ‚Ä¢ Type: Stacked LSTM Neural Network")
print(f"   ‚Ä¢ Layers: LSTM(128) ‚Üí LSTM(64) ‚Üí Dense(32) ‚Üí Dense(1)")
print(f"   ‚Ä¢ Lookback Window: {LOOKBACK} days")
print(f"   ‚Ä¢ Total Parameters: {model.count_params():,}")

print("\nüìä Performance Metrics (Test Set):")
print(f"   ‚Ä¢ RMSE: ${test_metrics['RMSE']:.2f}")
print(f"   ‚Ä¢ MAE: ${test_metrics['MAE']:.2f}")
print(f"   ‚Ä¢ MAPE: {test_metrics['MAPE (%)']:.2f}%")
print(f"   ‚Ä¢ R¬≤ Score: {test_metrics['R¬≤ Score']:.4f}")
print(f"   ‚Ä¢ Trend Accuracy: {test_trend_acc:.2f}%")

print("\nüîÆ Latest Prediction:")
print(f"   ‚Ä¢ Last Known Price: ${last_actual_price:.2f} ({last_date.strftime('%Y-%m-%d')})")
print(f"   ‚Ä¢ Next Day Prediction: ${next_day_prediction[0][0]:.2f}")
print(f"   ‚Ä¢ Predicted Change: {predicted_change_pct:+.2f}%")

print("\n‚úÖ Project completed successfully!")
print("=" * 70)

---

## üìö References

1. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. *Neural Computation*.
2. Yahoo Finance - Gold Futures Historical Data (GC=F)
3. TensorFlow/Keras Documentation - https://www.tensorflow.org/
4. Technical Analysis Indicators - Investopedia

---

## üë§ Author Information

- **Module:** E - AI Applications
- **Project Type:** Individual Open Project
- **Track:** Financial AI / Time Series Prediction

---

*Last Updated: January 2026*

‚ö†Ô∏è **Disclaimer:** This notebook is for educational purposes only. The predictions and analyses provided should not be used as financial advice. Always consult with qualified financial professionals before making investment decisions.