# Crypto Algorithmic Trading Strategy using RSI, Bollinger Bands & SMAs.

---
#### **Crypto Algorithmic Trading Strategy.**
A machine learning-based trading model using **RSI, Bollinger Bands, XGBoost, SMAs**, and **Backtesting** for strategy evaluation.

---

## 📌 Project Overview.
This notebook presents an **algorithmic trading strategy** leveraging:
- **Relative Strength Index (RSI)** for momentum analysis.
- **Bollinger Bands** for volatility-based buy/sell signals.
- **Simple Moving Averages** for trends-based signals.
- **XGBoost** for machine learning-driven trade classification.

The strategy also includes **risk management**, a **backtesting engine**, and **visualizations** using **Plotly**.

## 🚀 Technologies Used.
- **Python** 🐍
- **Jupyter Notebook**
- **Binance API**
- **TA-Lib (`ta` library)**
- **Scikit-learn**
- **XGBoost**
- **Plotly**
- **Request**
- **Python-binance**
- **Imblean or Imbalanced-learn**

## 📊 Trading Strategy Components.
1️⃣ **Fetch Cryptocurrency Data** (Binance API)  
2️⃣ **Compute Technical Indicators** (RSI, Bollinger Bands & SMAs)  
3️⃣ **Generate Trading Labels** (Buy/Sell/Hold)  
4️⃣ **Train ML Model** (XGBoost classifier)  
5️⃣ **Backtest Trading Strategy** (Risk Management)  
6️⃣ **Visualize Candlestick Charts with Bollinger Bands**  

## 📂 Dataset.
- **Cryptocurencies (BTCUSDT/ETHUSDT/XRPUSDT, etc)** Price Data.
--- 4h interval, 1000 limit. You can tweak this accordingly to your preference.


## 1️⃣ Setup: Imports and Configuration

This cell loads all necessary libraries, environment variables for API keys, and configures basic settings like logging and warning suppression.
```python

In [None]:
# --- Core Python & Environment ---
import os
import logging
import warnings
from dotenv import load_dotenv

# --- Data Handling & Numerics ---
import pandas as pd
import numpy as np

# --- API & Web ---
import requests # Kept for general purpose, though Binance client is preferred for klines
from requests.exceptions import ConnectionError, Timeout, HTTPError
from binance.client import Client as BinanceClient # Using python-binance

# --- Technical Analysis ---
from ta.momentum import RSIIndicator
from ta.volatility import BollingerBands 
from ta.trend import SMAIndicator # For SMAs

# --- Machine Learning ---
import xgboost as xgb # ML Model
from sklearn.model_selection import train_test_split, StratifiedKFold, GridSearchCV, RandomizedSearchCV, cross_val_score
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.exceptions import UndefinedMetricWarning
from sklearn.utils.class_weight import compute_class_weight
from imblearn.over_sampling import SMOTE # Make sure imbalanced-learn is installed

# --- Plotting ---
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# --- Load Environment Variables (for API keys) ---
load_dotenv()
BINANCE_API_KEY = os.getenv("BINANCE_API_KEY")
BINANCE_SECRET_KEY = os.getenv("BINANCE_SECRET_KEY")

if not BINANCE_API_KEY or not BINANCE_SECRET_KEY:
    print("⚠️ WARNING: Binance API keys not found in .env file or environment variables.")
    print("   The data fetching part might fail or use unauthenticated limits.")
    # For public data like klines, client can be initialized without keys, but it's good practice
    binance_client = BinanceClient(None, None)
else:
    binance_client = BinanceClient(BINANCE_API_KEY, BINANCE_SECRET_KEY)

# --- Configure Logging ---
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - [%(funcName)s] - %(message)s')

# --- Suppress Specific Warnings ---
warnings.filterwarnings('ignore', category=UserWarning, module='xgboost')
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings("ignore", category=UndefinedMetricWarning)

print("✅ Setup Complete: Libraries imported and configuration set.")

## 1️⃣ Setup: Imports and Configuration

This cell loads all necessary libraries, environment variables for API keys, and configures basic settings like logging and warning suppression.

In [None]:
def fetch_crypto_data_client(client, symbol="BTCUSDT", interval="4h", limit=1000, start_str="1 Jan, 2021", end_str="1 Jul, 2025"):
    """
     Fetch historical candlestick data using python-binance client.
     Can fetch by limit (most recent) or by date range.

     Parameters:
       - client: Initialized BinanceAPI client.
       - symbol (str): Trading pair (e.g., 'BTCUSDT').
       - interval (str): Candlestick interval (e.g., '1m', '1h', '1d').
       - limit (int): Number of candles to fetch if start_str is None.
       - start_str (str, optional): Start date string (e.g., "1 Jan, 2020").
       - end_str (str, optional): End date string (e.g., "1 Jan, 2021").

     Returns:
       DataFrame: Candlestick data, or None if an error occurs.
    """
    logging.info(f"Fetching data for {symbol}, interval {interval}, limit {limit}, start '{start_str}', end '{end_str}'")
    try:
        if start_str:
            klines = client.get_historical_klines(symbol, interval, start_str, end_str=end_str)
            logging.info(f"Fetched {len(klines)} klines from {start_str} to {end_str or 'now'}")
        else:
            klines = client.get_historical_klines(symbol, interval, limit=limit)
            logging.info(f"Fetched {len(klines)} (limit: {limit}) most recent klines.")

        if not klines:
            logging.warning("No klines data returned from Binance.")
            return pd.DataFrame() # Return empty DataFrame

    except Exception as e: # Catches BinanceAPIException, BinanceRequestException, etc.
        logging.error(f"Error fetching klines from Binance: {e}")
        return None

    df = pd.DataFrame(klines, columns=[
        "open_time", "open", "high", "low", "close", "volume",
        "close_time", "quote_asset_volume", "number_of_trades",
        "taker_buy_base_asset_volume", "taker_buy_quote_asset_volume", "ignore"
    ])

    # Convert timestamp columns
    df["open_time"] = pd.to_datetime(df["open_time"], unit="ms") # Binance provides UTC
    df["close_time"] = pd.to_datetime(df["close_time"], unit="ms")
    df["timestamp"] = df["open_time"] # Use open_time as the primary timestamp for the bar

    # Convert numeric columns
    numeric_cols = ["open", "high", "low", "close", "volume", "quote_asset_volume",
                    "number_of_trades", "taker_buy_base_asset_volume", "taker_buy_quote_asset_volume"]
    for col in numeric_cols:
        df[col] = pd.to_numeric(df[col], errors='coerce')

    logging.info(f"Data fetched successfully. Shape: {df.shape}")
    return df

# --- Example Usage (Comment out if running the full pipeline later in main) ---
test_df = fetch_crypto_data_client(binance_client, symbol="BTCHUSDT", interval="4h", limit=1000, start_str="1 Jan, 2021", end_str="1 Jul, 2025")
if test_df is not None and not test_df.empty:
    print("\n--- Fetched Data Sample ---")
    print(test_df.head())
    print(f"\nData types:\n{test_df.dtypes}")
else:
    print("Failed to fetch test data or data is empty.")

## 3️⃣ Feature Engineering Functions

These functions calculate technical indicators like RSI, Bollinger Bands, and SMAs using the `ta` library.

In [None]:
def calculate_technical_indicators(df, rsi_window=14, bb_window=20, bb_std_dev=2,
                                   sma_short_window=50, sma_long_window=200):
    """
    Calculate RSI, Bollinger Bands, SMAs, and basic volume analysis.
    Handles NaNs by filling or they will be dropped later.
    """
    logging.info(f"Calculating technical indicators. Initial df shape: {df.shape}")
    if df.empty or 'close' not in df.columns or len(df) < max(rsi_window, bb_window, sma_long_window): # Ensure enough data
        logging.warning("DataFrame is empty or too short for feature calculation.")
        # Add empty columns if df is not empty but too short, to maintain structure
        if not df.empty:
            for col in ["RSI", "BB_mavg", "BB_hband", "BB_lband", "SMA_short", "SMA_long", "volume_pct_change"]:
                if col not in df.columns: df[col] = np.nan
        return df

    # RSI
    rsi_indicator = RSIIndicator(close=df["close"], window=rsi_window, fillna=True) # fillna handles initial NaNs
    df["RSI"] = rsi_indicator.rsi()
    logging.info("RSI calculated.")

    # Bollinger Bands
    bb_indicator = BollingerBands(close=df["close"], window=bb_window, window_dev=bb_std_dev, fillna=True)
    df["BB_mavg"] = bb_indicator.bollinger_mavg()   # Middle Band
    df["BB_hband"] = bb_indicator.bollinger_hband() # Upper Band
    df["BB_lband"] = bb_indicator.bollinger_lband() # Lower Band
    logging.info("Bollinger Bands calculated.")

    # SMAs
    sma_short_indicator = SMAIndicator(close=df["close"], window=sma_short_window, fillna=True)
    df["SMA_short"] = sma_short_indicator.sma_indicator()
    
    sma_long_indicator = SMAIndicator(close=df["close"], window=sma_long_window, fillna=True)
    df["SMA_long"] = sma_long_indicator.sma_indicator()
    logging.info("SMAs calculated.")

    # Volume Percentage Change
    df["volume_pct_change"] = df["volume"].pct_change(fill_method=None) # Default fillna later
    df["volume_pct_change"].fillna(0, inplace=True) # Fill first NaN with 0
    logging.info("Volume percentage change calculated.")
    
    logging.info(f"Technical indicators calculation complete. df shape: {df.shape}")
    return df

# --- Example Usage (Comment out if running the full pipeline later in main) ---
if 'test_df' in globals() and test_df is not None and not test_df.empty:
    test_df_features = calculate_technical_indicators(test_df.copy())
    print("\n--- Data Sample with Features ---")
    print(test_df_features[["timestamp", "close", "RSI", "BB_mavg", "BB_hband", "BB_lband", "SMA_short", "SMA_long"]].tail())
    print(f"\nNaNs after feature calculation:\n{test_df_features.isnull().sum()}")
else:
    print("Skipping feature calculation example as test_df is not available.")

## 4️⃣ Target Variable Generation
This section defines functions to generate target labels based on RSI quantiles.
Crucially, thresholds are derived *only* from the training set's RSI distribution and then applied to both train and test sets.

In [None]:
def get_rsi_quantile_thresholds(rsi_series_train, lower_quantile=0.10, upper_quantile=0.90):
    """
    Calculates RSI buy/sell thresholds based on quantiles of a given RSI series (training data).
    """
    if rsi_series_train.empty or rsi_series_train.isnull().all():
        logging.warning("RSI series for threshold calculation is empty or all NaN. Using default thresholds.")
        return 35, 65 # Default fallback thresholds

    buy_threshold = rsi_series_train.quantile(lower_quantile)
    sell_threshold = rsi_series_train.quantile(upper_quantile)
    
    # Ensure buy_threshold is less than sell_threshold
    if buy_threshold >= sell_threshold:
        logging.warning(f"Calculated buy_threshold ({buy_threshold:.2f}) >= sell_threshold ({sell_threshold:.2f}). Adjusting or using defaults.")
        # Attempt to adjust, or fall back to ensure separation
        if sell_threshold < 50 : buy_threshold = max(10, sell_threshold - 5) # Ensure some gap
        elif buy_threshold > 50 : sell_threshold = min(90, buy_threshold + 5)
        else: # If they are very close or crossed around 50, use defaults
            buy_threshold, sell_threshold = 35, 65
            logging.warning("Using default thresholds 35/65 due to quantile overlap.")


    logging.info(f"RSI Thresholds calculated from training data: Buy <= {buy_threshold:.2f}, Sell >= {sell_threshold:.2f}")
    return buy_threshold, sell_threshold

def apply_rsi_labels_to_series(rsi_series, buy_threshold, sell_threshold):
    """
    Applies target labels (0: Hold, 1: Buy, 2: Sell) to an RSI series based on pre-calculated thresholds.
    """
    # Buy Signal (1): RSI is below or equal to the buy_threshold.
    # Sell Signal (2): RSI is above or equal to the sell_threshold.
    # Neutral (0): Otherwise.
    target_labels = np.select(
        [rsi_series <= buy_threshold, rsi_series >= sell_threshold], # Conditions
        [1, 2],                                                      # Choices for Buy, Sell
        default=0                                                    # Default is Hold
    )
    return target_labels.astype(int)

## 5️⃣ Data Preparation for Machine Learning

This involves:
1.  Defining the feature set (`X`).
2.  Handling any remaining NaNs after feature engineering (especially relevant if `fillna=False` was used in `ta` or for initial rolling window periods).
3.  Splitting data into training and testing sets chronologically (`shuffle=False`).
4.  Generating `y_train` and `y_test`.

In [None]:
def prepare_ml_data(df_featurized, model_input_features, target_rsi_series_name="RSI",
                    test_set_size=0.2, random_state_split=42, # random_state_split is not used if shuffle=False, stratify=None
                    stratify_on_rsi_quantiles=True, # This argument will now be ignored if shuffle=False
                    lower_quantile=0.20, upper_quantile=0.80):
    """
    Prepares features (X) and target (y) for ML, splits into train/test, and applies correct labeling.
    For time-series (shuffle=False), stratification is disabled.
    """
    logging.info(f"Preparing ML data. Input featurized df shape: {df_featurized.shape}")
    logging.info(f"Using lower_quantile: {lower_quantile}, upper_quantile: {upper_quantile} for RSI thresholds.")

    # --- 1. Drop NaNs from rows based on selected model input features & target RSI series ---
    cols_to_check_for_nan = model_input_features + [target_rsi_series_name]
    df_cleaned = df_featurized.dropna(subset=cols_to_check_for_nan).copy()
    
    if df_cleaned.empty:
        logging.error("DataFrame is empty after dropping NaNs from critical ML columns. Cannot proceed.")
        return None, None, None, None, None 

    logging.info(f"DataFrame shape after NaN drop from critical columns: {df_cleaned.shape}")

    # --- 2. Define X (features) and the RSI series for labeling ---
    X_full = df_cleaned[model_input_features]
    rsi_for_labeling_full = df_cleaned[target_rsi_series_name]

    if len(X_full) < 20: 
        logging.error(f"Not enough data ({len(X_full)} rows) for a meaningful train/test split. Need at least 20.")
        return None, None, None, None, None

    # --- 3. Split Data (Chronological for Time Series) ---
    if stratify_on_rsi_quantiles:
        logging.warning("Stratification is requested but shuffle=False. Stratification will be disabled for train_test_split.")

    X_train, X_test, rsi_train, rsi_test = train_test_split(
        X_full, rsi_for_labeling_full,
        test_size=test_set_size,
        shuffle=False, 
        stratify=None, # MUST BE NONE if shuffle=False
        random_state=None # Not used if shuffle=False and stratify=None
    )

    logging.info(f"Data split: X_train shape {X_train.shape}, X_test shape {X_test.shape}")

    # --- 4. Generate Target Labels Correctly (Post-Split) ---
    # Ensure rsi_train is not empty before proceeding (can happen if test_size is too large or data too small)
    if rsi_train.empty:
        logging.error("rsi_train is empty after split. Cannot generate labels. Check data size and test_set_size.")
        return None, None, None, None, None

    buy_thresh_train, sell_thresh_train = get_rsi_quantile_thresholds(
        rsi_train,
        lower_quantile=lower_quantile, 
        upper_quantile=upper_quantile  
    )
    
    y_train = apply_rsi_labels_to_series(rsi_train, buy_thresh_train, sell_thresh_train)
    y_test = apply_rsi_labels_to_series(rsi_test, buy_thresh_train, sell_thresh_train) 
    
    logging.info(f"Target labels generated. y_train distribution: {dict(pd.Series(y_train).value_counts())}")
    logging.info(f"y_test distribution: {dict(pd.Series(y_test).value_counts())}")
    
    train_thresholds_info = {'buy_threshold': buy_thresh_train, 'sell_threshold': sell_thresh_train}

    return X_train, X_test, y_train, y_test, train_thresholds_info

## 6️⃣ SMOTE for Handling Class Imbalance (Applied to Training Data Only)

If class imbalance is an issue in `y_train`, SMOTE can be used to oversample minority classes.

In [None]:
def apply_smote_to_train_data(X_train_orig, y_train_orig, random_state_smote=42):
    """Applies SMOTE to the training data if conditions are met."""
    logging.info(f"Attempting SMOTE. Original y_train distribution: {dict(pd.Series(y_train_orig).value_counts())}")
    
    X_train_resampled = X_train_orig.copy()
    y_train_resampled = y_train_orig.copy()

    unique_classes, counts = np.unique(y_train_orig, return_counts=True)

    if len(unique_classes) < 2:
        logging.warning("Only one class in y_train. SMOTE not applicable.")
        return X_train_resampled, y_train_resampled
    
    min_class_count = min(counts)
    # k_neighbors for SMOTE must be less than the number of samples in the smallest class
    # So, if smallest class has M samples, k_neighbors <= M-1.
    # SMOTE also requires at least 2 samples for its default k_neighbors=5, or k_neighbors+1 samples.
    
    if min_class_count <= 1: # If any class has only 1 sample, k_neighbors cannot be >= 1.
        logging.warning(f"Smallest class has {min_class_count} samples. SMOTE requires k_neighbors+1 samples per class. Skipping SMOTE.")
        return X_train_resampled, y_train_resampled

    # Determine a safe k_neighbors
    k_neighbors_val = min(5, min_class_count - 1) # Cap at 5, ensure it's at least 1

    if k_neighbors_val < 1:
        logging.warning(f"Calculated k_neighbors ({k_neighbors_val}) is < 1. Smallest class count: {min_class_count}. Skipping SMOTE.")
        return X_train_resampled, y_train_resampled
        
    try:
        logging.info(f"Applying SMOTE with k_neighbors={k_neighbors_val}")
        smote = SMOTE(random_state=random_state_smote, k_neighbors=k_neighbors_val)
        X_train_resampled, y_train_resampled = smote.fit_resample(X_train_orig, y_train_orig)
        logging.info(f"SMOTE applied. Resampled y_train distribution: {dict(pd.Series(y_train_resampled).value_counts())}")
    except ValueError as e:
        logging.error(f"Error during SMOTE: {e}. Smallest class count: {min_class_count}, k_neighbors: {k_neighbors_val}. Using original data.")
    except ImportError:
        logging.error("imblearn library not found. SMOTE cannot be applied. Using original data.")
        
    return X_train_resampled, y_train_resampled

## 7️⃣ XGBoost Model Training Function

This function encapsulates the two-stage hyperparameter tuning (RandomizedSearchCV then GridSearchCV) and training for an XGBoost classifier. It also handles class weighting, which can be used alternatively or supplementarily to SMOTE.

In [None]:
def train_xgboost_model(X_train_data, y_train_data, n_random_iter=25, use_sample_weights=True):
    """
    Trains an XGBoost classifier with hyperparameter tuning.
    If use_sample_weights is True, computes and applies class weights.
    """
    logging.info(f"--- Starting XGBoost Training --- Input X shape: {X_train_data.shape}, y dist: {dict(pd.Series(y_train_data).value_counts())}")

    fit_params = {}
    if use_sample_weights:
        unique_classes_train = np.unique(y_train_data)
        if len(unique_classes_train) > 1 : # Ensure multiple classes for weighting
            class_weights_values = compute_class_weight("balanced", classes=unique_classes_train, y=y_train_data)
            weights_dict = dict(zip(unique_classes_train, class_weights_values))
            sample_weight_array = np.array([weights_dict[val] for val in y_train_data])
            fit_params['sample_weight'] = sample_weight_array
            logging.info(f"Sample weights computed and will be used. Weights dict: {weights_dict}")
        else:
            logging.warning("Only one class in y_train_data. Skipping sample weight calculation.")
    
    xgb_clf_base = xgb.XGBClassifier(
        objective='multi:softprob',
        eval_metric='mlogloss',
        use_label_encoder=False, # Deprecated, use_label_encoder=False is default in newer versions
        random_state=42
    )
    
    # --- Stage 1: RandomizedSearchCV ---
    logging.info("--- Stage 1: RandomizedSearchCV ---")
    param_dist = {
        'n_estimators': [200, 300, 500], # Reduced for faster example
        'max_depth': [3, 5, 7],
        'learning_rate': [0.01, 0.05, 0.1, 0.15],
        'subsample': [0.7, 0.8, 0.9, 1.0],
        'colsample_bytree': [0.7, 0.8, 0.9, 1.0],
        'gamma': [0, 0.1, 0.2], # Added gamma for regularization
        'reg_alpha': [0, 0.01, 0.1, 0.5],
        'reg_lambda': [0.5, 1.0, 1.5]
    }

    random_search = RandomizedSearchCV(
        estimator=xgb_clf_base,
        param_distributions=param_dist,
        n_iter=n_random_iter,
        cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42), # Using StratifiedKFold
        scoring='f1_macro',
        verbose=1,
        n_jobs=-1,
        random_state=42
    )
    
    try:
        random_search.fit(X_train_data, y_train_data, **fit_params)
        logging.info(f"RandomizedSearchCV Best F1 Macro: {random_search.best_score_:.4f}")
        logging.info(f"RandomizedSearchCV Best Params: {random_search.best_params_}")
        best_params_rs = random_search.best_params_
    except Exception as e_rs:
        logging.error(f"RandomizedSearchCV failed: {e_rs}. Returning base model.")
        # Fallback: Fit a base model with default-ish params if search fails
        xgb_clf_base.fit(X_train_data, y_train_data, **fit_params)
        return xgb_clf_base


    # --- Stage 2: GridSearchCV (Refined Search) ---
    logging.info("--- Stage 2: GridSearchCV (Refined Search) ---")
    
    # Refined grid helper (simplified version for brevity, your previous one was more detailed)
    def get_refined_param_grid(best_params, base_param_dist):
        refined_grid = {}
        for param, value in best_params.items():
            if isinstance(value, (int, float)):
                if param in ['n_estimators', 'max_depth']: # Discrete steps
                    original_options = sorted(list(set(base_param_dist[param])))
                    current_idx = original_options.index(value) if value in original_options else -1
                    low_idx = max(0, current_idx -1)
                    high_idx = min(len(original_options)-1, current_idx + 1)
                    options = [original_options[low_idx], value, original_options[high_idx]]
                    refined_grid[param] = sorted(list(set(o for o in options if o is not None)))

                elif isinstance(value, float): # Continuous, tighter range
                     refined_grid[param] = sorted(list(set(np.clip([value * 0.8, value, value * 1.2], min(base_param_dist[param]), max(base_param_dist[param])))))
                else: # int but not estimators/depth
                    refined_grid[param] = [max(0, value -1), value, value+1] if value > 0 else [0,1,2]
            else:
                refined_grid[param] = [value] # Keep best or define a small range
        # Ensure essential params are present even if not optimized by RS (if they were fixed)
        for p in ['subsample', 'colsample_bytree', 'gamma', 'reg_alpha', 'reg_lambda']:
            if p not in refined_grid: refined_grid[p] = [best_params.get(p, base_param_dist[p][0])] # fallback
        return refined_grid

    param_grid_refined = get_refined_param_grid(best_params_rs, param_dist)
    logging.info(f"Refined Parameter Grid for GridSearchCV: {param_grid_refined}")

    grid_search = GridSearchCV(
        estimator=xgb_clf_base, # Important: use the base estimator
        param_grid=param_grid_refined,
        cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42),
        scoring='f1_macro',
        n_jobs=-1,
        verbose=1
    )
    
    try:
        grid_search.fit(X_train_data, y_train_data, **fit_params)
        logging.info(f"GridSearchCV Best F1 Macro: {grid_search.best_score_:.4f}")
        logging.info(f"GridSearchCV Best Params: {grid_search.best_params_}")
        best_model = grid_search.best_estimator_
    except Exception as e_gs:
        logging.error(f"GridSearchCV failed: {e_gs}. Using best model from RandomizedSearch or base if that also failed.")
        best_model = random_search.best_estimator_ # Fallback to RS best

    logging.info("--- XGBoost Training Finished ---")
    return best_model

## 8️⃣ Model Evaluation Functions

Functions to evaluate the trained model's performance using various metrics, including cross-validation.

In [None]:
def evaluate_ml_model(model_to_eval, X_eval_data, y_eval_data, dataset_name="Test"):
    """Evaluates model on a given dataset."""
    if X_eval_data.empty or len(y_eval_data) == 0:
        logging.warning(f"Evaluation data for {dataset_name} is empty. Skipping evaluation.")
        return

    logging.info(f"--- Evaluating Model on {dataset_name} Set ---")
    try:
        predictions = model_to_eval.predict(X_eval_data)
        proba_predictions = model_to_eval.predict_proba(X_eval_data) # For more detailed analysis if needed
    except Exception as e:
        logging.error(f"Error during prediction on {dataset_name} set: {e}")
        return

    acc = accuracy_score(y_eval_data, predictions)
    logging.info(f"✅ {dataset_name} Accuracy: {acc:.4f}")
    
    report_labels = sorted(np.unique(np.concatenate((y_eval_data, predictions))))
    if not report_labels: report_labels = None # Handle case of no labels

    class_report = classification_report(y_eval_data, predictions, labels=report_labels, zero_division=0, target_names=[f"Class {l}" for l in report_labels] if report_labels else None)
    logging.info(f"📊 {dataset_name} Classification Report:\n{class_report}")
    
    if report_labels and len(report_labels) > 1: # Confusion matrix needs at least 2 classes typically
        cm = confusion_matrix(y_eval_data, predictions, labels=report_labels)
        logging.info(f"🟦 {dataset_name} Confusion Matrix (labels: {report_labels}):\n{cm}")
    elif report_labels:
         logging.info(f"Only one class ({report_labels[0]}) present in y_eval_data/predictions for {dataset_name}. Confusion matrix not meaningful.")


def advanced_ml_evaluation(model_to_eval, X_train_data_cv, y_train_data_cv, cv_folds=3):
    """Performs cross-validation and full training set evaluation."""
    if X_train_data_cv.empty or len(y_train_data_cv) == 0:
        logging.warning("Training data for advanced evaluation is empty. Skipping.")
        return

    logging.info("\n--- Advanced Evaluation Metrics (on Training Data variants) ---")
    
    # --- Cross-Validation ---
    # Note: Uses the original X_train_data_cv, y_train_data_cv (could be SMOTE'd or original)
    # CV might not use sample_weights from the initial fit unless model_to_eval inherently carries them
    # or fit_params are passed to cross_val_score. XGBoost objects don't typically store sample_weight after fit.
    # For this demo, we evaluate the final model_to_eval's generalizability on folds of X_train_data_cv.
    unique_cv_labels = np.unique(y_train_data_cv)
    if len(unique_cv_labels) < 2 :
        logging.warning("Not enough classes in y_train_data_cv for stratified cross-validation. Skipping CV.")
    else:
        logging.info(f"⚡ Cross-Validation Metrics (using {cv_folds}-fold StratifiedKFold):")
        skf = StratifiedKFold(n_splits=cv_folds, shuffle=True, random_state=42)
        
        scoring_metrics = {'accuracy': 'accuracy', 'precision_macro': 'precision_macro',
                           'recall_macro': 'recall_macro', 'f1_macro': 'f1_macro'}
        cv_results = {}
        for metric_name, scorer_name in scoring_metrics.items():
            try:
                # Ensure enough samples per class for each split for some scorers
                cv_score = cross_val_score(model_to_eval, X_train_data_cv, y_train_data_cv, cv=skf, scoring=scorer_name)
                cv_results[metric_name] = (cv_score.mean(), cv_score.std())
                logging.info(f"✔ CV {metric_name.replace('_', ' ').title()}: {cv_score.mean():.4f} ± {cv_score.std():.4f}")
            except ValueError as e_cv:
                 logging.warning(f"Could not compute CV for {metric_name} (e.g., due to class distribution in a fold): {e_cv}")
                 cv_results[metric_name] = (np.nan, np.nan)


    # --- Performance on Full Training Data (Sanity Check) ---
    logging.info("\n📊 Performance on Full Training Data used for this evaluation (Sanity Check):")
    # This evaluates how well the model fits the data it was trained on (or the processed version passed here)
    # The model_to_eval is already fitted.
    try:
        preds_full_train = model_to_eval.predict(X_train_data_cv)
        report_labels_train = sorted(np.unique(np.concatenate((y_train_data_cv, preds_full_train))))
        if not report_labels_train: report_labels_train = None

        full_train_report = classification_report(y_train_data_cv, preds_full_train, labels=report_labels_train, zero_division=0, target_names=[f"Class {l}" for l in report_labels_train] if report_labels_train else None)
        logging.info(f"Classification Report (Full Training Data variant):\n{full_train_report}")

        if report_labels_train and len(report_labels_train) > 1:
            cm_full_train = confusion_matrix(y_train_data_cv, preds_full_train, labels=report_labels_train)
            logging.info(f"Confusion Matrix (Full Training Data variant - labels: {report_labels_train}):\n{cm_full_train}")
        elif report_labels_train:
            logging.info(f"Only one class ({report_labels_train[0]}) present for Full Training Data variant. CM not meaningful.")

    except Exception as e_eval_full:
        logging.error(f"Error evaluating on full training data variant: {e_eval_full}")

## 9️⃣ Apply Trained Model to DataFrame

This function takes a trained model and a DataFrame (which should have the necessary features) and adds a `predicted_signal` column.

In [None]:
def apply_trained_model_to_df(df_input, trained_ml_model, model_features_list):
    """
    Applies a trained ML model to generate predictions on a DataFrame.
    Handles NaNs in prediction features by creating NaNs in predicted_signal.
    """
    logging.info(f"Applying trained model to DataFrame. Input shape: {df_input.shape}")
    df_with_predictions = df_input.copy()
    
    # Ensure all required model features are present
    missing_cols = [col for col in model_features_list if col not in df_with_predictions.columns]
    if missing_cols:
        logging.error(f"DataFrame missing required features for prediction: {missing_cols}. Cannot apply model.")
        df_with_predictions["predicted_signal"] = np.nan # Add NaN column
        return df_with_predictions

    X_to_predict = df_with_predictions[model_features_list].copy()
    
    # Initialize predicted_signal column with NaN
    df_with_predictions["predicted_signal"] = np.nan
    
    # Identify rows that are valid for prediction (all features are non-NaN)
    valid_rows_mask = X_to_predict.notna().all(axis=1)
    X_valid_to_predict = X_to_predict[valid_rows_mask]

    if X_valid_to_predict.empty:
        logging.warning("No valid rows with all required features non-NaN. No predictions will be made.")
    else:
        logging.info(f"Making predictions on {len(X_valid_to_predict)} valid rows.")
        try:
            predictions_array = trained_ml_model.predict(X_valid_to_predict)
            # Assign predictions only to the valid rows using their original index
            df_with_predictions.loc[X_valid_to_predict.index, "predicted_signal"] = predictions_array
            logging.info("Predictions applied successfully.")
        except Exception as e:
            logging.error(f"Error during model prediction: {e}. 'predicted_signal' will remain NaN for these rows.")
            
    return df_with_predictions

## 1️⃣0️⃣ Backtesting Function (ML Strategy Version)

This function simulates trading based on the `predicted_signal` from the ML model and other defined entry/exit rules. It includes fees, slippage, and risk management.

In [None]:
# Using the more detailed backtest_ml_strategy function
def backtest_ml_strategy(df_with_signals, initial_balance=10000, fee_rate=0.001, slippage_rate=0.0005,
                         risk_fraction_per_trade=0.02, stop_loss_pct=0.02, take_profit_pct=0.04,
                         signal_col="predicted_signal", rsi_col="RSI", bb_lower_col="BB_lband",
                         sma_short_col="SMA_short", sma_long_col="SMA_long"):
    """
    Simulate a backtest of a trading strategy based on ML model signals.
    Assumes df_with_signals has 'close', 'low', 'high', and all specified indicator/signal columns.
    """
    balance = initial_balance
    position_qty = 0
    
    num_wins = 0
    num_losses = 0
    total_pnl_realized = 0 # Sum of P&L from closed trades
    
    balance_history = [initial_balance]
    trades_log = [] # To log individual trades
    active_trade = None 

    # Check for required columns
    required_cols_backtest = ['timestamp', 'open', 'high', 'low', 'close', signal_col, rsi_col, bb_lower_col, sma_short_col, sma_long_col]
    missing_backtest_cols = [col for col in required_cols_backtest if col not in df_with_signals.columns]
    if missing_backtest_cols:
        logging.error(f"DataFrame for backtest is missing required columns: {missing_backtest_cols}. Aborting backtest.")
        return 0, {"error": "Missing columns"}, []

    logging.info(f"--- Starting Backtest --- Initial Balance: ${initial_balance:.2f}")
    df_backtest = df_with_signals.copy() # Work on a copy

    for idx, row in df_backtest.iterrows():
        current_dt = row["timestamp"]
        current_price_close = row["close"]
        current_price_low = row["low"]
        current_price_high = row["high"]

        # --- Manage Active Trade (Check Exits First) ---
        if active_trade:
            trade_closed_this_bar = False
            # Trailing Stop (Example: Simple percentage based trail from high since entry)
            if active_trade['type'] == 'long':
                # More robust trailing stop could be ATR based or % of current price
                # current_high_since_entry = df_backtest.loc[active_trade['entry_bar_index']:idx, 'high'].max()
                # potential_new_stop = current_high_since_entry * (1 - (stop_loss_pct * 1.5)) # Trail by 1.5x SL
                # active_trade['stop_loss_price'] = max(active_trade['stop_loss_price'], potential_new_stop)
                pass # Simplified for now, add more complex trailing if needed

            # Check Stop Loss (using current bar's low)
            if active_trade['type'] == 'long' and current_price_low <= active_trade['stop_loss_price']:
                exit_price = active_trade['stop_loss_price'] 
                pnl = (exit_price - active_trade['entry_price_eff']) * active_trade['quantity']
                pnl -= (active_trade['entry_price_eff'] * active_trade['quantity'] * fee_rate) # Entry fee already "paid" conceptually by reducing capital available or factored in cost basis
                pnl -= (exit_price * active_trade['quantity'] * fee_rate) # Exit fee
                
                balance += pnl # P&L is directly added/subtracted
                total_pnl_realized += pnl
                trades_log.append({'entry_dt': active_trade['entry_dt'], 'exit_dt': current_dt, 'type': 'long', 'entry': active_trade['entry_price_eff'], 'exit': exit_price, 'qty': active_trade['quantity'], 'pnl': pnl, 'reason': 'StopLoss'})
                logging.info(f"STOP-LOSS @ {current_dt}: Long exited at {exit_price:.2f}. Qty: {active_trade['quantity']:.4f}. P&L: ${pnl:.2f}. Balance: ${balance:.2f}")
                if pnl > 0: num_wins += 1
                else: num_losses += 1
                active_trade = None; position_qty = 0; trade_closed_this_bar = True

            # Check Take Profit (using current bar's high)
            elif active_trade and not trade_closed_this_bar and active_trade['type'] == 'long' and current_price_high >= active_trade['take_profit_price']:
                exit_price = active_trade['take_profit_price']
                pnl = (exit_price - active_trade['entry_price_eff']) * active_trade['quantity']
                pnl -= (active_trade['entry_price_eff'] * active_trade['quantity'] * fee_rate) 
                pnl -= (exit_price * active_trade['quantity'] * fee_rate) 

                balance += pnl
                total_pnl_realized += pnl
                trades_log.append({'entry_dt': active_trade['entry_dt'], 'exit_dt': current_dt, 'type': 'long', 'entry': active_trade['entry_price_eff'], 'exit': exit_price, 'qty': active_trade['quantity'], 'pnl': pnl, 'reason': 'TakeProfit'})
                logging.info(f"TAKE-PROFIT @ {current_dt}: Long exited at {exit_price:.2f}. Qty: {active_trade['quantity']:.4f}. P&L: ${pnl:.2f}. Balance: ${balance:.2f}")
                if pnl > 0: num_wins += 1
                else: num_losses += 1 # Should be positive for TP
                active_trade = None; position_qty = 0; trade_closed_this_bar = True
        
        # --- Check Entry Conditions (if no active trade) ---
        # Ensure current bar's data isn't NaN for indicators
        if not active_trade and not pd.isna(row[signal_col]) and \
           not pd.isna(row[rsi_col]) and not pd.isna(row[bb_lower_col]) and \
           not pd.isna(row[sma_short_col]) and not pd.isna(row[sma_long_col]):

            if (row[signal_col] == 1 and # ML Buy Signal (Assuming 1 = Buy)
                row[rsi_col] < 35 and    # Additional filter: RSI (configurable)
                row['close'] < row[bb_lower_col] and # Additional filter: Close below BB_lower
                row[sma_short_col] > row[sma_long_col]):   # Additional filter: Trend filter

                entry_price_nominal = current_price_close # Typically use next bar's open, or current close for simplicity
                entry_price_slippage = entry_price_nominal * (1 + slippage_rate)
                
                stop_loss_price_val = entry_price_slippage * (1 - stop_loss_pct)
                take_profit_price_val = entry_price_slippage * (1 + take_profit_pct)

                dollar_risk_per_unit = entry_price_slippage - stop_loss_price_val
                if dollar_risk_per_unit <= 1e-6: # Avoid division by zero/tiny risk
                    logging.warning(f"Calculated dollar_risk_per_unit ({dollar_risk_per_unit:.4f}) is too small or zero. Skipping trade entry @ {current_dt}")
                    continue
                
                capital_to_risk = balance * risk_fraction_per_trade
                quantity = capital_to_risk / dollar_risk_per_unit
                
                cost_of_trade = entry_price_slippage * quantity
                entry_fee = cost_of_trade * fee_rate

                if balance < cost_of_trade + entry_fee : # Check affordability
                    logging.info(f"Insufficient balance for trade @ {current_dt}. Need ${cost_of_trade + entry_fee:.2f}, Have ${balance:.2f}")
                    continue 

                # Execute Buy
                # Balance is NOT reduced by cost_of_trade here. Fees are paid.
                # The "cost" is that capital is now tied up in the asset.
                balance -= entry_fee 
                
                position_qty = quantity
                active_trade = {
                    'type': 'long', 'entry_dt': current_dt,
                    'entry_price_nominal': entry_price_nominal,
                    'entry_price_eff': entry_price_slippage, # Effective entry after slippage AND entry fee
                    'quantity': quantity,
                    'stop_loss_price': stop_loss_price_val,
                    'take_profit_price': take_profit_price_val,
                    'entry_bar_index': idx # Store index for potential later reference (e.g. calculating high since entry)
                }
                logging.info(f"ENTRY @ {current_dt}: Long at {entry_price_slippage:.2f}. Qty: {quantity:.4f}. SL: {stop_loss_price_val:.2f}, TP: {take_profit_price_val:.2f}. Balance after fee: ${balance:.2f}")
        
        # Update balance history (Mark-to-market equity)
        current_equity = balance + (position_qty * current_price_close * (1 - fee_rate) if active_trade else 0) # Value of open position (less exit fee)
        balance_history.append(current_equity)

    # --- End of Loop: Liquidate any open positions ---
    if active_trade:
        final_close_price = df_backtest.iloc[-1]["close"]
        logging.info(f"End of backtest. Liquidating open {active_trade['type']} position of {active_trade['quantity']:.4f} at market close {final_close_price:.2f}")
        
        exit_price = final_close_price * (1 - slippage_rate) # Assume slippage on market close
        pnl = (exit_price - active_trade['entry_price_eff']) * active_trade['quantity']
        pnl -= (active_trade['entry_price_eff'] * active_trade['quantity'] * fee_rate) 
        pnl -= (exit_price * active_trade['quantity'] * fee_rate)

        balance += pnl
        total_pnl_realized += pnl
        trades_log.append({'entry_dt': active_trade['entry_dt'], 'exit_dt': df_backtest.iloc[-1]["timestamp"], 'type': 'long', 'entry': active_trade['entry_price_eff'], 'exit': exit_price, 'qty': active_trade['quantity'], 'pnl': pnl, 'reason': 'EndOfBacktest'})
        if pnl > 0: num_wins += 1
        else: num_losses += 1
        logging.info(f"   Final P&L on liquidation: ${pnl:.2f}. Final Balance: ${balance:.2f}")

    # --- Calculate Performance Metrics ---
    net_profit = balance - initial_balance
    total_trades_executed = num_wins + num_losses
    win_rate_val = num_wins / total_trades_executed if total_trades_executed > 0 else 0
    
    # Sum of P&Ls for profit factor
    sum_positive_pnl = sum(t['pnl'] for t in trades_log if t['pnl'] > 0)
    sum_negative_pnl = abs(sum(t['pnl'] for t in trades_log if t['pnl'] < 0))
    profit_factor_val = sum_positive_pnl / sum_negative_pnl if sum_negative_pnl > 0 else float('inf') if sum_positive_pnl > 0 else 0

    # Max Drawdown Calculation from equity curve
    equity_curve = pd.Series(balance_history)
    peak_equity = equity_curve.cummax()
    drawdown = (equity_curve - peak_equity) / peak_equity
    max_dd_val = abs(drawdown.min()) if not drawdown.empty else 0
            
    logging.info(f"\n--- Backtest Results ---")
    logging.info(f"Period: {df_backtest['timestamp'].iloc[0]} to {df_backtest['timestamp'].iloc[-1]}")
    logging.info(f"Initial Balance: ${initial_balance:.2f}")
    logging.info(f"Final Balance: ${balance:.2f}")
    logging.info(f"Net Profit: ${net_profit:.2f} ({(net_profit/initial_balance)*100:.2f}%)")
    logging.info(f"Total Realized P&L: ${total_pnl_realized:.2f}")
    logging.info(f"Total Trades: {total_trades_executed}")
    logging.info(f"Wins: {num_wins}, Losses: {num_losses}")
    logging.info(f"Win Rate: {win_rate_val:.2%}")
    logging.info(f"Profit Factor: {profit_factor_val:.2f}")
    logging.info(f"Max Drawdown: {max_dd_val:.2%}")
    
    results_summary = {
        "initial_balance": initial_balance, "final_balance": balance, "net_profit_dollars": net_profit,
        "net_profit_pct": (net_profit/initial_balance)*100 if initial_balance > 0 else 0,
        "total_realized_pnl": total_pnl_realized,
        "total_trades": total_trades_executed, "wins": num_wins, "losses": num_losses,
        "win_rate": win_rate_val, "profit_factor": profit_factor_val, "max_drawdown_pct": max_dd_val,
    }
    return net_profit, results_summary, trades_log, balance_history

## 1️⃣1️⃣ Plotting Function

Visualizes the price action, Bollinger Bands, RSI, and trading signals (from ML model or rules).

In [None]:
def plot_trading_chart_with_signals(df_to_plot, signal_col_name="predicted_signal",
                                    rsi_thresholds_info=None, trades_df=None):
    """
    Plots candlestick chart, BBands, SMAs, RSI with signals, and optionally trade entry/exit markers.
    - rsi_thresholds_info: dict {'buy_threshold': val, 'sell_threshold': val} for plotting RSI lines.
    - trades_df: DataFrame of trades from backtest log for plotting entry/exits.
    """
    logging.info(f"Plotting chart. Data shape: {df_to_plot.shape}. Signal column: {signal_col_name}")
    
    fig = make_subplots(
        rows=3, cols=1, shared_xaxes=True, vertical_spacing=0.05,
        row_heights=[0.6, 0.2, 0.2], # Adjust row heights
        subplot_titles=("Price Action with Bollinger Bands & Trades", "Volume", "RSI & Signals")
    )
    
    # --- Row 1: Candlestick, Bollinger Bands, Trade Markers ---
    fig.add_trace(go.Candlestick(
        x=df_to_plot["timestamp"], open=df_to_plot["open"], high=df_to_plot["high"],
        low=df_to_plot["low"], close=df_to_plot["close"], name="Price"
    ), row=1, col=1)
    
    fig.add_trace(go.Scatter(x=df_to_plot["timestamp"], y=df_to_plot["BB_hband"], mode="lines", name="BB Upper", line=dict(color="red", dash="dot", width=1)), row=1, col=1)
    fig.add_trace(go.Scatter(x=df_to_plot["timestamp"], y=df_to_plot["BB_mavg"], mode="lines", name="BB Mid", line=dict(color="blue", dash="dash", width=1)), row=1, col=1)
    fig.add_trace(go.Scatter(x=df_to_plot["timestamp"], y=df_to_plot["BB_lband"], mode="lines", name="BB Lower", line=dict(color="green", dash="dot", width=1)), row=1, col=1)

    # Plot SMA lines
    if "SMA_short" in df_to_plot.columns:
        fig.add_trace(go.Scatter(x=df_to_plot["timestamp"], y=df_to_plot["SMA_short"], mode="lines", name="SMA Short", line=dict(color="orange", width=1)), row=1, col=1)
    if "SMA_long" in df_to_plot.columns:
        fig.add_trace(go.Scatter(x=df_to_plot["timestamp"], y=df_to_plot["SMA_long"], mode="lines", name="SMA Long", line=dict(color="purple", width=1)), row=1, col=1)

    # Plot Trade Entry/Exit Markers if trades_df is provided
    if trades_df is not None and not trades_df.empty:
        # Ensure datetime conversion if necessary
        trades_df_copy = trades_df.copy()
        if not pd.api.types.is_datetime64_any_dtype(trades_df_copy['entry_dt']):
            trades_df_copy['entry_dt'] = pd.to_datetime(trades_df_copy['entry_dt'])
        if not pd.api.types.is_datetime64_any_dtype(trades_df_copy['exit_dt']):
            trades_df_copy['exit_dt'] = pd.to_datetime(trades_df_copy['exit_dt'])

        winning_trades = trades_df_copy[trades_df_copy['pnl'] > 0]
        losing_trades = trades_df_copy[trades_df_copy['pnl'] <= 0]

        fig.add_trace(go.Scatter(
            x=winning_trades['entry_dt'], y=winning_trades['entry'], mode="markers", name="Win Entry",
            marker=dict(color="lime", symbol="triangle-up", size=10, line=dict(color='black', width=1))), row=1, col=1)
        fig.add_trace(go.Scatter(
            x=winning_trades['exit_dt'], y=winning_trades['exit'], mode="markers", name="Win Exit",
            marker=dict(color="lime", symbol="circle", size=8, line=dict(color='black', width=1))), row=1, col=1)
        
        fig.add_trace(go.Scatter(
            x=losing_trades['entry_dt'], y=losing_trades['entry'], mode="markers", name="Loss Entry",
            marker=dict(color="magenta", symbol="triangle-down", size=10, line=dict(color='black', width=1))), row=1, col=1)
        fig.add_trace(go.Scatter(
            x=losing_trades['exit_dt'], y=losing_trades['exit'], mode="markers", name="Loss Exit",
            marker=dict(color="magenta", symbol="circle", size=8, line=dict(color='black', width=1))), row=1, col=1)
        
        # Draw lines connecting entry and exit for each trade
        for _, trade in trades_df_copy.iterrows():
            fig.add_trace(go.Scatter(
                x=[trade['entry_dt'], trade['exit_dt']],
                y=[trade['entry'], trade['exit']],
                mode="lines", showlegend=False,
                line=dict(color="lime" if trade['pnl'] > 0 else "magenta", width=1, dash="dash")
            ), row=1, col=1)


    # --- Row 2: Volume ---
    fig.add_trace(go.Bar(x=df_to_plot["timestamp"], y=df_to_plot["volume"], name="Volume", marker_color='grey'), row=2, col=1)


    # --- Row 3: RSI & Signals ---
    fig.add_trace(go.Scatter(x=df_to_plot["timestamp"], y=df_to_plot["RSI"], mode="lines", name="RSI", line=dict(color="cyan", width=2)), row=3, col=1)
    
    # Plot Buy Signals (e.g., signal_col_name == 1)
    buy_signals = df_to_plot[df_to_plot[signal_col_name] == 1]
    fig.add_trace(go.Scatter(
        x=buy_signals["timestamp"], y=buy_signals["RSI"], mode="markers", name="Buy Signal (Model)",
        marker=dict(color="lime", symbol="triangle-up", size=9, line=dict(color='black',width=1))
    ), row=3, col=1)
    
    # Plot Sell Signals (e.g., signal_col_name == 2)
    sell_signals = df_to_plot[df_to_plot[signal_col_name] == 2]
    fig.add_trace(go.Scatter(
        x=sell_signals["timestamp"], y=sell_signals["RSI"], mode="markers", name="Sell Signal (Model)",
        marker=dict(color="red", symbol="triangle-down", size=9, line=dict(color='black',width=1))
    ), row=3, col=1)
    
    # Plot RSI Threshold Lines (from training or fixed)
    if rsi_thresholds_info:
        fig.add_hline(y=rsi_thresholds_info['sell_threshold'], line=dict(color='red', dash='dash', width=1), row=3, col=1,
                      annotation_text=f"Train Sell Thr: {rsi_thresholds_info['sell_threshold']:.2f}", annotation_position="top right")
        fig.add_hline(y=rsi_thresholds_info['buy_threshold'], line=dict(color='lime', dash='dash', width=1), row=3, col=1,
                      annotation_text=f"Train Buy Thr: {rsi_thresholds_info['buy_threshold']:.2f}", annotation_position="bottom right")
    else: # Fallback to fixed lines if no dynamic thresholds provided
        fig.add_hline(y=65, line=dict(color='red', dash='dash', width=1), row=3, col=1, annotation_text="RSI Overbought (65)", annotation_position="top right")
        fig.add_hline(y=35, line=dict(color='lime', dash='dash', width=1), row=3, col=1, annotation_text="RSI Oversold (35)", annotation_position="bottom right")
        
    # --- Layout & Styling ---
    fig.update_layout(
        title_text=f"Trading Strategy Analysis ({df_to_plot['timestamp'].iloc[0].date()} to {df_to_plot['timestamp'].iloc[-1].date()})",
        height=1200, template="plotly_dark",
        xaxis_rangeslider_visible=False,
        legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
    )
    fig.update_yaxes(title_text="Price (USDT)", row=1, col=1)
    fig.update_yaxes(title_text="Volume", row=2, col=1)
    fig.update_yaxes(title_text="RSI", row=3, col=1)
    
    fig.show()

## 1️⃣2️⃣ Main Execution Orchestrator

This `main()` function ties together all the steps: data fetching, feature engineering, ML data preparation, model training, evaluation, signal generation, backtesting, and plotting.

In [None]:
def main_trading_pipeline():
    logging.info("--- Starting Main Trading Pipeline ---")

    # --- Configuration ---
    SYMBOL = "BTCUSDT"
    INTERVAL = "4h" # Shorter interval for more data points for ML
    DATA_LIMIT = 1000 # Number of candles to fetch for the entire process
    # For fetching by date range:
    # START_DATE_STR = "1 Jan, 2022"
    # END_DATE_STR = "1 Jan, 2023"
    
    # Features to be used by the ML model
    MODEL_FEATURES = ["close", "RSI", "BB_hband", "BB_lband", "BB_mavg", "SMA_short", "SMA_long", "volume_pct_change"]
    # Ensure these are calculated in calculate_technical_indicators and present after NaN drop
    
    RSI_LOWER_QUANTILE = 0.15 # For defining "buy" based on RSI (more signals)
    RSI_UPPER_QUANTILE = 0.85 # For defining "sell" based on RSI

    TEST_SET_SIZE_ML = 0.20 # 20% for ML model testing
    RANDOM_ITER_XGB = 20    # For RandomizedSearchCV (keep low for speed)
    
    APPLY_SMOTE = True         # Whether to apply SMOTE for class imbalance
    USE_SAMPLE_WEIGHTS_XGB = True # Whether to use sample_weights in XGBoost (can be used with or without SMOTE)

    # Backtest parameters
    INITIAL_BALANCE_BT = 10000
    FEE_RATE_BT = 0.00075 # Binance spot trading fee (maker/taker)
    SLIPPAGE_RATE_BT = 0.0005 # 0.05% slippage
    RISK_FRACTION_BT = 0.01 # Risk 1% of equity per trade
    STOP_LOSS_PCT_BT = 0.02 # 2% stop loss
    TAKE_PROFIT_PCT_BT = 0.05 # 4% take profit (2:1 R:R)


    # --- 1. Fetch Data ---
    # df_raw = fetch_crypto_data_client(binance_client, symbol=SYMBOL, interval=INTERVAL, start_str=START_DATE_STR, end_str=END_DATE_STR)
    df_raw = fetch_crypto_data_client(binance_client, symbol=SYMBOL, interval=INTERVAL, limit=DATA_LIMIT)
    if df_raw is None or df_raw.empty:
        logging.error("Failed to fetch initial data. Exiting pipeline.")
        return
    logging.info(f"Raw data fetched: {df_raw.shape[0]} rows from {df_raw['timestamp'].min()} to {df_raw['timestamp'].max()}")

    # --- 2. Calculate Technical Indicators (Features) ---
    df_featurized = calculate_technical_indicators(df_raw.copy()) # Pass a copy
    if df_featurized.empty:
        logging.error("DataFrame empty after feature calculation. Exiting.")
        return

    # --- 3. Prepare Data for ML (Split, Correct Labeling) ---
    X_train, X_test, y_train, y_test, train_rsi_thresholds = prepare_ml_data(
        df_featurized,
        model_input_features=MODEL_FEATURES,
        target_rsi_series_name="RSI", # Assuming target is based on RSI
        test_set_size=TEST_SET_SIZE_ML,
        lower_quantile=RSI_LOWER_QUANTILE, # Passed to get_rsi_quantile_thresholds via prepare_ml_data
        upper_quantile=RSI_UPPER_QUANTILE  # Passed to get_rsi_quantile_thresholds via prepare_ml_data
    )
    if X_train is None or y_train is None:
        logging.error("ML data preparation failed (X_train or y_train is None). Exiting.")
        return
    logging.info(f"ML Data Prepared: X_train shape {X_train.shape}, y_train_dist: {dict(pd.Series(y_train).value_counts())}")


    # --- 4. (Optional) Apply SMOTE to Training Data ---
    X_train_final = X_train.copy()
    y_train_final = y_train.copy()
    if APPLY_SMOTE:
        X_train_final, y_train_final = apply_smote_to_train_data(X_train, y_train)
    

    # --- 5. Train XGBoost Model ---
    ml_model = None
    if len(np.unique(y_train_final)) < 2:
        logging.warning("Not enough unique classes in final training data to train model. Skipping training.")
    else:
        ml_model = train_xgboost_model(X_train_final, y_train_final,
                                       n_random_iter=RANDOM_ITER_XGB,
                                       use_sample_weights=USE_SAMPLE_WEIGHTS_XGB)
    if ml_model is None:
        logging.error("ML model training failed. Exiting pipeline.")
        return
    

    # --- 6. Evaluate ML Model on Test Set ---
    logging.info("\n--- Evaluating ML Model on Unseen Test Set ---")
    evaluate_ml_model(ml_model, X_test, y_test, dataset_name="ML Hold-Out Test")
    advanced_ml_evaluation(ml_model, X_train_final, y_train_final, cv_folds=3) # CV on the data used for training (SMOTE'd or not)
    

    # To reconstruct df_cleaned if not returned directly:
    df_cleaned_for_prediction = df_featurized.dropna(subset=MODEL_FEATURES + ["RSI"]).copy() # Align with what was split
    if df_cleaned_for_prediction.empty:
        logging.error("Cleaned DataFrame for prediction signals is empty. Cannot backtest.")
        return

    logging.info("\n--- Generating ML Predictions for Backtesting on Cleaned Featurized Data ---")
    df_for_backtesting_input = apply_trained_model_to_df(df_cleaned_for_prediction, ml_model, MODEL_FEATURES)
    
    if "predicted_signal" not in df_for_backtesting_input.columns or df_for_backtesting_input["predicted_signal"].isnull().all():
        logging.error("Failed to generate 'predicted_signal' for backtesting. Exiting.")
        return
    logging.info(f"'predicted_signal' distribution for backtest: {dict(df_for_backtesting_input['predicted_signal'].value_counts(dropna=False))}")

    # --- 8. Run Backtesting ---
    logging.info("\n--- Running Backtest with ML Model Signals ---")
    # Pass all necessary column names if they differ from defaults in backtest_ml_strategy
    net_profit_bt, backtest_summary, trades_history_bt, balance_curve_bt = backtest_ml_strategy(
        df_for_backtesting_input, # This df has features and 'predicted_signal'
        initial_balance=INITIAL_BALANCE_BT,
        fee_rate=FEE_RATE_BT,
        slippage_rate=SLIPPAGE_RATE_BT,
        risk_fraction_per_trade=RISK_FRACTION_BT,
        stop_loss_pct=STOP_LOSS_PCT_BT,
        take_profit_pct=TAKE_PROFIT_PCT_BT,
        signal_col="predicted_signal", # Explicitly state the signal column
        rsi_col="RSI", bb_lower_col="BB_lband", # Match column names from `ta` library
        sma_short_col="SMA_short", sma_long_col="SMA_long"
    )
    logging.info(f"Backtesting complete. Net Profit: ${net_profit_bt:.2f}")
    logging.info(f"Backtest Summary: {backtest_summary}")

    # Convert trades log to DataFrame for easier plotting/analysis
    trades_df_bt = pd.DataFrame(trades_history_bt)

    # --- 9. Plotting ---
    logging.info("\n--- Plotting Results ---")
    # Plot on the same data used for backtesting input for consistency
    plot_trading_chart_with_signals(
        df_for_backtesting_input,
        signal_col_name="predicted_signal",
        rsi_thresholds_info=train_rsi_thresholds, # Show thresholds learned from training
        trades_df=trades_df_bt # Pass the trades log to plot markers
    )
    
    # Plot equity curve
    if balance_curve_bt:
        fig_equity = go.Figure()
        fig_equity.add_trace(go.Scatter(x=list(range(len(balance_curve_bt))), y=balance_curve_bt, mode='lines', name='Equity Curve'))
        fig_equity.update_layout(title='Backtest Equity', xaxis_title='Trade/Time Step', yaxis_title='Balance', template='plotly_dark')
        fig_equity.show()

    logging.info("--- Main Trading Pipeline Finished ---")

# --- Run the Pipeline ---
if __name__ == "__main__":
    main_trading_pipeline()