# üìà Stock Price Prediction with LSTM

This notebook demonstrates end-to-end stock price prediction using LSTM neural networks.

## Objectives:
- Fetch historical stock data from Yahoo Finance
- Engineer technical indicators (MA, RSI, MACD)
- Build and train LSTM model
- Evaluate model performance
- Make future price predictions

---

## 1. Setup & Imports

In [None]:
# Install required packages (uncomment if needed)
# ! pip install yfinance pandas numpy keras tensorflow scikit-learn matplotlib seaborn plotly

import warnings
warnings. filterwarnings('ignore')

import sys
import os

# Add src directory to path
sys.path.append(os.path.join(os.getcwd(), '..', 'src'))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Import custom modules
from data_fetcher import StockDataFetcher
from feature_engineering import FeatureEngineer
from preprocessor import DataPreprocessor
from model import LSTMModel
from trainer import ModelTrainer
from visualizer import Visualizer
import config

# Set random seeds for reproducibility
np.random.seed(42)
import tensorflow as tf
tf.random.set_seed(42)

# Configure matplotlib
plt. style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("‚úÖ All imports successful!")
print(f"TensorFlow version: {tf.__version__}")

## 2. Data Collection

Fetch historical stock data from Yahoo Finance API.

In [None]:
# Configuration
TICKER = 'AAPL'  # Change to any stock ticker
PERIOD = '5y'    # 5 years of data

# Initialize data fetcher
fetcher = StockDataFetcher(cache_dir='../data/')

# Fetch data
print(f"Fetching data for {TICKER}...")
df = fetcher.fetch_stock_data(TICKER, period=PERIOD)

# Get stock information
stock_info = fetcher.get_stock_info(TICKER)

print(f"\nüìä Stock:  {stock_info['name']}")
print(f"üìÖ Data Range: {df['Date'].min()} to {df['Date'].max()}")
print(f"üìà Total Records: {len(df)}")
print(f"\nStock Info:")
for key, value in stock_info. items():
    print(f"  {key}: {value}")

In [None]:
# Display first and last rows
print("First 5 rows:")
display(df.head())

print("\nLast 5 rows:")
display(df.tail())

In [None]:
# Basic statistics
print("Statistical Summary:")
df[['Open', 'High', 'Low', 'Close', 'Volume']].describe()

## 3. Exploratory Data Analysis

In [None]:
# Plot price history
fig = Visualizer.plot_price_history(df, TICKER)
plt.show()

In [None]:
# Price distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Close price distribution
axes[0].hist(df['Close'], bins=50, edgecolor='black', alpha=0.7)
axes[0].set_title('Close Price Distribution', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Price ($)')
axes[0].set_ylabel('Frequency')
axes[0].grid(True, alpha=0.3)

# Volume distribution
axes[1].hist(df['Volume'], bins=50, edgecolor='black', alpha=0.7, color='orange')
axes[1].set_title('Volume Distribution', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Volume')
axes[1].set_ylabel('Frequency')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
correlation = df[['Open', 'High', 'Low', 'Close', 'Volume']].corr()
sns.heatmap(correlation, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Feature Correlation Heatmap', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## 4. Feature Engineering

Add technical indicators to enhance prediction capability.

In [None]:
# Initialize feature engineer
engineer = FeatureEngineer()

# Add all technical indicators
print("Adding technical indicators...")
df = engineer.add_all_indicators(df, ma_windows=[50, 200], rsi_period=14)

print(f"\n‚úÖ Features added! ")
print(f"Total features: {len(df.columns)}")
print(f"\nFeature columns:")
print(df.columns.tolist())

In [None]:
# Display data with indicators
print("Data with Technical Indicators:")
display(df[['Date', 'Close', 'MA_50', 'MA_200', 'RSI', 'MACD', 'MACD_Signal']].tail(10))

In [None]:
# Visualize technical indicators
fig = Visualizer.plot_technical_indicators(df)
plt.show()

## 5. Data Preprocessing

Normalize data and create sequences for LSTM input.

In [None]:
# Select features for training
FEATURES = config.FEATURES
TARGET = config.TARGET
SEQ_LENGTH = config.SEQ_LENGTH
TEST_SIZE = config.TEST_SIZE

print(f"Selected features: {FEATURES}")
print(f"Target:  {TARGET}")
print(f"Sequence length: {SEQ_LENGTH} days")
print(f"Test size: {TEST_SIZE * 100}%")

In [None]:
# Initialize preprocessor
preprocessor = DataPreprocessor(FEATURES, TARGET)

# Normalize data
print("Normalizing data...")
features, target, original_df = preprocessor.normalize_data(df)

print(f"Features shape: {features.shape}")
print(f"Target shape: {target. shape}")

In [None]:
# Create sequences
print("Creating sequences... ")
X, y = preprocessor.create_sequences(features, target, SEQ_LENGTH)

print(f"X shape: {X.shape} (samples, timesteps, features)")
print(f"y shape: {y.shape}")

In [None]:
# Train-test split
print("Splitting data...")
X_train, X_test, y_train, y_test = preprocessor.train_test_split(X, y, TEST_SIZE)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")
print(f"\nTraining set date range: {df['Date'].iloc[SEQ_LENGTH: SEQ_LENGTH+len(X_train)].min()} to {df['Date'].iloc[SEQ_LENGTH:SEQ_LENGTH+len(X_train)].max()}")
print(f"Testing set date range: {df['Date'].iloc[-len(X_test):].min()} to {df['Date'].iloc[-len(X_test):].max()}")

## 6. Model Building

Build LSTM neural network architecture.

In [None]:
# Initialize LSTM model
lstm_model = LSTMModel(
    seq_length=SEQ_LENGTH,
    n_features=len(FEATURES),
    lstm_units=config.LSTM_UNITS,
    dropout_rate=config.DROPOUT_RATE,
    learning_rate=config.LEARNING_RATE
)

# Build model
model = lstm_model.build_model()

print("\nüìä Model Architecture:")
model.summary()

### Model Architecture Explanation:

- **Input Layer**: Takes sequences of 60 days with 5 features each
- **LSTM Layer 1**: 50 units with return_sequences=True for stacking
- **Dropout**:  20% dropout to prevent overfitting
- **LSTM Layer 2**: 50 units for pattern learning
- **Dropout**: Another 20% dropout layer
- **Dense Layer**: 25 units with ReLU activation
- **Output Layer**: Single unit for price prediction

**Optimizer**: Adam with learning rate 0.001  
**Loss Function**: Mean Squared Error (MSE)

## 7. Model Training

Train the model with early stopping and model checkpointing.

In [None]:
# Initialize trainer
trainer = ModelTrainer(lstm_model)

# Get callbacks
callbacks = lstm_model.get_callbacks(model_path='../models/best_model.h5', patience=10)

print("üöÄ Starting training...\n")

In [None]:
# Train model
history = trainer.train(
    X_train, y_train,
    epochs=config.EPOCHS,
    batch_size=config.BATCH_SIZE,
    validation_split=config.VALIDATION_SPLIT,
    callbacks=callbacks
)

print("\n‚úÖ Training completed!")

In [None]:
# Plot training history
fig = Visualizer.plot_training_history(history)
plt.show()

In [None]:
# Save model and scalers
os.makedirs('../models', exist_ok=True)
lstm_model.save_model('../models/lstm_stock_model.h5')
preprocessor.save_scalers('../models/feature_scaler.pkl', '../models/target_scaler. pkl')

print("‚úÖ Model and scalers saved successfully!")

## 8. Model Evaluation

Evaluate model performance on test set.

In [None]:
# Evaluate model
metrics = trainer.evaluate(X_test, y_test)

print("\nüìä Model Performance Metrics:")
print("=" * 50)
print(f"RMSE (Root Mean Squared Error): ${metrics['rmse']:.4f}")
print(f"MAE (Mean Absolute Error):      ${metrics['mae']:.4f}")
print(f"MAPE (Mean Absolute % Error):   {metrics['mape']:. 2f}%")
print(f"R¬≤ Score:                        {metrics['r2']:.4f}")
print(f"Direction Accuracy:              {metrics['direction_accuracy']:. 2f}%")
print("=" * 50)

## 9. Predictions & Visualization

Make predictions and visualize results.

In [None]:
# Make predictions
y_pred = trainer.predict(X_test)

# Inverse transform to get actual prices
y_pred_actual = preprocessor.inverse_transform_target(y_pred)
y_test_actual = preprocessor.inverse_transform_target(y_test)

print(f"Predictions shape: {y_pred_actual.shape}")
print(f"First 5 predictions: ")
for i in range(5):
    print(f"  Actual: ${y_test_actual[i][0]:.2f}, Predicted: ${y_pred_actual[i][0]:.2f}, Error: ${abs(y_test_actual[i][0] - y_pred_actual[i][0]):.2f}")

In [None]:
# Get test dates
test_dates = df['Date'].iloc[-len(y_test):].reset_index(drop=True)

# Plot predictions vs actual
fig = Visualizer.plot_predictions(y_test_actual, y_pred_actual, test_dates)
plt.show()

In [None]:
# Plot prediction errors
fig = Visualizer.plot_prediction_error(y_test_actual, y_pred_actual)
plt.show()

In [None]:
# Create comparison DataFrame
comparison_df = pd.DataFrame({
    'Date': test_dates,
    'Actual':  y_test_actual. flatten(),
    'Predicted': y_pred_actual.flatten(),
    'Error': (y_test_actual - y_pred_actual).flatten(),
    'Error_Percent': ((y_test_actual - y_pred_actual) / y_test_actual * 100).flatten()
})

print("\nPrediction Comparison (Last 10 days):")
display(comparison_df.tail(10))

## 10. Future Predictions

Predict stock prices for the next 30 days.

In [None]:
# Predict future prices
FUTURE_DAYS = 30

print(f"Predicting next {FUTURE_DAYS} days...")

# Get last sequence
last_sequence = X[-1]

# Make future predictions
future_predictions = trainer.predict_future(last_sequence, FUTURE_DAYS, preprocessor)

print(f"Future predictions shape: {future_predictions. shape}")

In [None]:
# Create future dates
last_date = df['Date'].iloc[-1]
future_dates = pd.date_range(start=last_date + timedelta(days=1), periods=FUTURE_DAYS, freq='D')

# Create forecast DataFrame
forecast_df = pd.DataFrame({
    'Date': future_dates,
    'Predicted_Price': future_predictions.flatten()
})

print(f"\nüìà {TICKER} Price Forecast for Next {FUTURE_DAYS} Days:")
display(forecast_df)

In [None]:
# Plot future predictions
historical_prices = df['Close'].tail(90)
historical_dates = df['Date'].tail(90)

fig = Visualizer.plot_future_predictions(
    historical_prices, 
    future_predictions. flatten(),
    historical_dates, 
    future_dates, 
    TICKER
)
plt.show()

In [None]:
# Forecast statistics
current_price = df['Close'].iloc[-1]
predicted_price_7d = future_predictions[6][0]
predicted_price_30d = future_predictions[29][0]

change_7d = predicted_price_7d - current_price
change_7d_pct = (change_7d / current_price) * 100

change_30d = predicted_price_30d - current_price
change_30d_pct = (change_30d / current_price) * 100

print("\n" + "="*60)
print(f"üìä {TICKER} Price Forecast Summary")
print("="*60)
print(f"Current Price:               ${current_price:.2f}")
print(f"\nPredicted Price (7 days):   ${predicted_price_7d:. 2f}")
print(f"Change:                      ${change_7d:. 2f} ({change_7d_pct: +.2f}%)")
print(f"\nPredicted Price (30 days):  ${predicted_price_30d:.2f}")
print(f"Change:                     ${change_30d:.2f} ({change_30d_pct:+. 2f}%)")
print("="*60)

# Trend analysis
if change_30d > 0:
    print("\nüìà Trend:  BULLISH (Upward trend predicted)")
else:
    print("\nüìâ Trend: BEARISH (Downward trend predicted)")

In [None]:
# Save forecast to CSV
forecast_df.to_csv(f'../data/{TICKER}_forecast_{datetime.now().strftime("%Y%m%d")}.csv', index=False)
print(f"\n‚úÖ Forecast saved to ../data/{TICKER}_forecast_{datetime.now().strftime('%Y%m%d')}.csv")

## 11. Model Analysis & Insights

In [None]:
# Feature importance analysis (approximate)
print("üìä Feature Analysis:")
print("\nFeatures used in model:")
for i, feature in enumerate(FEATURES, 1):
    print(f"{i}. {feature}")

print("\nüí° Technical Indicator Insights:")
print(f"  ‚Ä¢ MA_50 > MA_200: {'Golden Cross (Bullish)' if df['MA_50'].iloc[-1] > df['MA_200']. iloc[-1] else 'Death Cross (Bearish)'}")
print(f"  ‚Ä¢ Current RSI: {df['RSI'].iloc[-1]:.2f} - {'Overbought' if df['RSI'].iloc[-1] > 70 else 'Oversold' if df['RSI'].iloc[-1] < 30 else 'Neutral'}")
print(f"  ‚Ä¢ MACD Signal: {'Bullish' if df['MACD'].iloc[-1] > df['MACD_Signal'].iloc[-1] else 'Bearish'}")

## 12. Conclusion & Next Steps

### Model Performance Summary:
- Successfully built and trained LSTM model for stock price prediction
- Achieved reasonable prediction accuracy on test data
- Generated 30-day price forecasts

### Limitations:
1. **Market Volatility**: Model assumes patterns continue; unexpected events can cause large deviations
2. **External Factors**: Doesn't account for news, earnings, economic indicators
3. **Historical Bias**: Based solely on past data patterns
4. **Feature Dependency**: Limited to technical indicators

### Future Improvements:
1. **Add More Features**:
   - Sentiment analysis from news/social media
   - Economic indicators (interest rates, GDP)
   - Company fundamentals (P/E ratio, earnings)

2. **Model Enhancements**:
   - Bidirectional LSTM
   - Attention mechanisms
   - Ensemble methods (combining multiple models)
   - GRU layers as alternative

3. **Advanced Techniques**:
   - Walk-forward validation
   - Multi-step predictions
   - Confidence intervals
   - Anomaly detection

4. **Deployment**:
   - Real-time predictions
   - Automated retraining
   - Alert system for significant predictions

### ‚ö†Ô∏è Disclaimer:
**This model is for educational purposes only. Stock price predictions are inherently uncertain and should NOT be used as the sole basis for investment decisions.  Past performance does not guarantee future results.  Always consult with financial advisors and conduct thorough research before making investment decisions.**

In [None]:
print("\n" + "="*60)
print("üéâ Notebook Execution Complete!")
print("="*60)
print(f"\n‚úÖ Successfully: ")
print(f"  ‚Ä¢ Fetched {len(df)} days of {TICKER} stock data")
print(f"  ‚Ä¢ Engineered {len(FEATURES)} technical indicators")
print(f"  ‚Ä¢ Trained LSTM model for {config. EPOCHS} epochs")
print(f"  ‚Ä¢ Achieved {metrics['direction_accuracy']:.2f}% direction accuracy")
print(f"  ‚Ä¢ Generated {FUTURE_DAYS}-day price forecast")
print(f"\nüìÅ Saved Files:")
print(f"  ‚Ä¢ Model: ../models/lstm_stock_model. h5")
print(f"  ‚Ä¢ Scalers: ../models/*_scaler.pkl")
print(f"  ‚Ä¢ Forecast: ../data/{TICKER}_forecast_*.csv")
print("\nüöÄ Next:  Run the Streamlit dashboard with:  streamlit run ../app. py")
print("="*60)