# Advanced Algorithms for StockPredictionPro

This notebook explores state-of-the-art machine learning algorithms for stock price prediction and trading signal generation.

**Advanced Models Covered:**
- **Gradient Boosting Machines**: XGBoost, LightG``` CatBoost
- **Deep Learning Models**: LSTM, GRU, Bidirectional RNNs```**Transformer Architectures**: Attention mechanisms for time```ries
- **Hybrid Models**: Combining ensemble and deep learning approaches

**Objectives:**
- Implement cutting-edge algorithms for maximum prediction accuracy
- Compare performance across different```del architectures
- Optimize hyperparameters for financial```me-series data
- Build production-ready models for deployment
- Analyze feature importance and model interpretability

These advanced models will serve as the core prediction engines for StockPredictionP```


In [10]:
# ============================================
# Advanced Algorithms - Library Imports and Setup
# ============================================

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Gradient Boosting Libraries
import xgboost as xgb
import lightgbm as lgb
from catboost import CatBoostRegressor, Pool

# Deep Learning Libraries
from flax import linen as nn
import jax
import jax.numpy as jnp
import optax

# Plotting configuration
sns.set_theme(style='darkgrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 10
%matplotlib inline

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)


print("✅ Advanced ML libraries loaded successfully")
print(f"   • XGBoost version: {xgb.__version__}")
print(f"   • LightGBM version: {lgb.__version__}")
print(f"   • jax version: {jax.__version__}")
print("📊 Ready for advanced model training")


✅ Advanced ML libraries loaded successfully
   • XGBoost version: 3.0.4
   • LightGBM version: 4.6.0
   • jax version: 0.7.1
📊 Ready for advanced model training


In [12]:
# ============================================
# Load Engineered Features and Target
# ============================================

# Load selected features from feature selection notebook
try:
    features_df = pd.read_csv('./outputs/selected_features.csv', index_col=0)
    print(f"✅ Selected features loaded from outputs")
    print(f"📊 Features shape: {features_df.shape}")
    
except FileNotFoundError:
    print("📝 Selected features not found. Loading technical indicators...")
    try:
        features_df = pd.read_csv('C:\\Users\\Faraz\\Documents\\StockPredictionPro\\notebooks\\outputs\\technical_indicators.csv', index_col=0)
        print(f"✅ Technical indicators loaded as features")
        print(f"📊 Features shape: {features_df.shape}")
    except FileNotFoundError:
        print("❌ No feature files found. Generating sample data...")
        
        # Generate realistic sample data for demonstration
        np.random.seed(42)
        n_samples = 1000
        n_features = 20
        
        feature_names = [f'feature_{i}' for i in range(1, n_features)]
        X_sample = np.random.randn(n_samples, n_features-1)
        y_sample = np.sum(X_sample[:, :5] * [0.3, -0.2, 0.15, 0.25, -0.1], axis=1) + np.random.normal(0, 0.1, n_samples)
        
        features_df = pd.DataFrame(X_sample, columns=feature_names)
        features_df['target'] = y_sample
        
        print("✅ Sample data generated for demonstration")

# Display data overview
print(f"\n📈 Dataset Overview:")
print(f"   • Shape: {features_df.shape}")
print(f"   • Features: {features_df.shape[1] - 1}")
print(f"   • Samples: {len(features_df):,}")

print(f"\n📋 Data Preview:")
display(features_df.head())

print(f"\n📊 Target Statistics:")
if 'target' in features_df.columns:
    target_stats = features_df['target'].describe()
    display(target_stats)
else:
    print("⚠️ No 'target' column found. Please ensure target variable is available.")


📝 Selected features not found. Loading technical indicators...
✅ Technical indicators loaded as features
📊 Features shape: (800, 46)

📈 Dataset Overview:
   • Shape: (800, 46)
   • Features: 45
   • Samples: 800

📋 Data Preview:


Unnamed: 0,ATR_14,ATR_ratio,BB_breakout_lower,BB_breakout_upper,BB_lower,BB_middle,BB_position,BB_squeeze,BB_upper,BB_width,CCI_20,Close_to_Close_Volatility,EMA_12,EMA_26,EMA_50,MACD,MACD_bearish_crossover,MACD_bullish_crossover,MACD_histogram,MACD_signal,MFI_14,Price_Range,Price_Range_Pct,ROC_10,RSI_14,RSI_21,RSI_overbought,RSI_oversold,SMA_10,SMA_20,SMA_200,SMA_5,SMA_50,SMA_ratio_20_50,SMA_ratio_5_20,Stoch_D,Stoch_K,Stoch_bearish_crossover,Stoch_bullish_crossover,Stoch_overbought,Stoch_oversold,True_Range,Ultimate_Oscillator,VWAP,Volatility_Breakout,Williams_R
2020-07-19,2.926241,0.032351,0,0,89.057955,93.631217,0.152641,1,98.204478,9.146523,-103.026571,0.228974,91.992386,91.937699,90.312639,0.054689,0,0,-0.750014,0.804703,45.400758,2.429464,2.685853,-1.727532,45.336951,49.057494,0,0,91.976532,93.631217,87.489987,91.009811,89.627451,1.044671,0.972003,13.424149,12.5727,0,0,0,1,2.429464,50.69698,93.996718,0,-87.4273
2020-07-20,2.973942,0.032497,0,0,88.919456,93.38479,0.290501,1,97.850125,8.930668,-86.753518,0.230226,91.918762,91.906301,90.359745,0.012462,0,0,-0.633792,0.646254,44.708722,1.812649,1.980738,-2.298482,48.880757,51.173281,0,0,91.761241,93.38479,87.448689,91.044406,89.813841,1.039759,0.974938,13.889597,24.9118,0,1,0,0,1.812649,49.562515,93.831541,0,-75.0882
2020-07-21,2.953191,0.031571,0,0,88.90122,93.340634,0.522664,1,97.780049,8.878829,-10.613404,0.235291,92.16847,92.027454,90.484534,0.141017,0,0,-0.40419,0.545207,50.385127,2.848906,3.045595,-0.608593,54.905822,54.934317,0,0,91.703963,93.340634,87.410819,91.41931,90.062329,1.0364,0.979416,33.398995,62.712484,0,0,0,0,2.848906,50.29994,93.803467,0,-37.287516
2020-07-22,2.921271,0.03057,0,0,88.892266,93.349473,0.747987,1,97.806681,8.914415,64.165592,0.245344,92.690264,92.289134,90.683577,0.401131,0,0,-0.115261,0.516392,56.942543,4.199536,4.394652,4.076894,59.963171,58.291428,0,0,92.078291,93.349473,87.367388,92.167356,90.342834,1.03328,0.987337,56.990786,83.348073,0,0,1,0,4.199536,56.468677,93.798594,0,-16.651927
2020-07-23,3.043796,0.032738,0,0,88.857533,93.247981,0.468901,1,97.638429,8.780897,25.308985,0.263562,92.734055,92.339932,90.773433,0.394124,0,0,-0.097814,0.491938,57.641688,4.408181,4.741259,0.861485,51.929375,52.982997,0,0,92.157703,93.247981,87.313211,92.808964,90.561262,1.029667,0.995292,64.858408,48.514667,1,0,0,0,4.408181,49.09738,93.708892,0,-51.485333



📊 Target Statistics:
⚠️ No 'target' column found. Please ensure target variable is available.


In [14]:
# ============================================
# Advanced Data Preparation
# ============================================

# Separate features and target
if 'target' in features_df.columns:
    X = features_df.drop(columns=['target'])
    y = features_df['target']
else:
    # If no target, create a synthetic one for demonstration
    X = features_df
    y = np.random.normal(0, 1, len(features_df))
    print("⚠️ Using synthetic target for demonstration")

print(f"📊 Features matrix shape: {X.shape}")
print(f"📊 Target vector shape: {y.shape}")

# Time series split for financial data (respects temporal order)
tscv = TimeSeriesSplit(n_splits=5)
train_indices, test_indices = next(iter(tscv.split(X)))

X_train = X.iloc[train_indices]
X_test = X.iloc[test_indices]
y_train = y.iloc[train_indices]
y_test = y.iloc[test_indices]

print(f"\n📈 Time Series Split:")
print(f"   • Training samples: {len(X_train):,}")
print(f"   • Test samples: {len(X_test):,}")
print(f"   • Train ratio: {len(X_train)/(len(X_train)+len(X_test))*100:.1f}%")

# Feature scaling
scaler = StandardScaler()
X_train_scaled = pd.DataFrame(
    scaler.fit_transform(X_train), 
    columns=X_train.columns, 
    index=X_train.index
)
X_test_scaled = pd.DataFrame(
    scaler.transform(X_test), 
    columns=X_test.columns, 
    index=X_test.index
)

# Additional scaling for deep learning (MinMax for better LSTM performance)
minmax_scaler = MinMaxScaler()
X_train_minmax = pd.DataFrame(
    minmax_scaler.fit_transform(X_train), 
    columns=X_train.columns, 
    index=X_train.index
)
X_test_minmax = pd.DataFrame(
    minmax_scaler.transform(X_test), 
    columns=X_test.columns, 
    index=X_test.index
)

print(f"\n✅ Data scaling completed")
print(f"   • StandardScaler applied for tree-based models")
print(f"   • MinMaxScaler applied for neural networks")

# Store model results
model_results = {}


⚠️ Using synthetic target for demonstration
📊 Features matrix shape: (800, 46)
📊 Target vector shape: (800,)


NameError: name 'TimeSeriesSplit' is not defined