# XGBoost Price Prediction (Advanced)

This notebook implements an advanced XGBoost pipeline for price direction prediction.
Key enhancements:
- **Advanced Features**: MACD, Bollinger Bands, ATR.
- **Hyperparameter Tuning**: Using GridSearchCV to find optimal model parameters.
- **Feature Importance**: Visualizing which technical indicators matter most.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xgboost as xgb
from sklearn.model_selection import train_test_split, GridSearchCV, TimeSeriesSplit
from sklearn.metrics import accuracy_score, classification_report, plot_confusion_matrix

from data_generator import generate_gbm_data
from feature_engineering import add_technical_indicators

%matplotlib inline

## 1. Data Prep & Feature Engineering

In [None]:
raw_df = generate_gbm_data(n_samples=2000)
df = add_technical_indicators(raw_df)
print(f"Data Shape: {df.shape}")
df.tail()

In [None]:
features = ['SMA_10', 'SMA_50', 'RSI', 'MACD', 'MACD_Signal', 'BB_Upper', 'BB_Lower', 'ATR', 'Log_Return', 'Volume']
target = 'Target'

X = df[features]
y = df[target]

# Time-series split (No random shuffle)
split_point = int(len(df) * 0.8)
X_train, X_test = X.iloc[:split_point], X.iloc[split_point:]
y_train, y_test = y.iloc[:split_point], y.iloc[split_point:]

## 2. Hyperparameter Tuning (Grid Search)
We use TimeSeriesSplit for cross-validation to respect temporal order.

In [None]:
tscv = TimeSeriesSplit(n_splits=3)

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'reg_alpha': [0, 0.1, 0.5] # L1 regularization
}

xgb_clf = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss')
grid_search = GridSearchCV(estimator=xgb_clf, param_grid=param_grid, cv=tscv, scoring='accuracy', verbose=1, n_jobs=-1)

print("Starting Grid Search...")
grid_search.fit(X_train, y_train)

print(f"Best Parameters: {grid_search.best_params_}")
best_model = grid_search.best_estimator_

## 3. Evaluation

In [None]:
predictions = best_model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

print(f"Test Set Accuracy: {accuracy * 100:.2f}%")
print(classification_report(y_test, predictions))

plot_confusion_matrix(best_model, X_test, y_test, cmap='Blues')
plt.show()

## 4. Feature Importance Analysis

In [None]:
fig, ax = plt.subplots(figsize=(10, 8))
xgb.plot_importance(best_model, ax=ax, importance_type='gain')
plt.title('Feature Importance (Gain)')
plt.show()