# Crypto AI Backtest (Multi-Coin)

This notebook trains an AI model on BTC then applies it to BTC, GALA and XRP. It backtests a Bollinger+AI-filtered strategy on each coin and shows performance metrics and equity curves.

**Run in Google Colab or Jupyter.** Uncomment the pip installs if needed.

In [72]:
# Install packages if running in a fresh environment
!pip install yfinance pandas numpy ta scikit-learn joblib vectorbt matplotlib




In [73]:
import yfinance as yf
import pandas as pd
import numpy as np
import ta
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import joblib
import vectorbt as vbt
import matplotlib.pyplot as plt

pd.options.display.max_columns = 100


## Step 1 — Download 30m OHLCV data for BTC, GALA, XRP (60 days)

In [74]:
symbols = ['BTC-USD', 'GALA-USD', 'XRP-USD']
interval = '30m'
period = '60d'

price_data = {}
for sym in symbols:
    print(f'Downloading {sym}...')
    df = yf.download(sym, period=period, interval=interval, progress=False)
    if df.empty:
        print(f'Warning: no data for {sym}')
    price_data[sym] = df.dropna()

# show samples
for s, df in price_data.items():
    print(s, 'rows:', len(df))
    display(df.head())


Downloading BTC-USD...



YF.download() has changed argument auto_adjust default to True



Downloading GALA-USD...



YF.download() has changed argument auto_adjust default to True



Downloading XRP-USD...



YF.download() has changed argument auto_adjust default to True



BTC-USD rows: 2841


Price,Close,High,Low,Open,Volume
Ticker,BTC-USD,BTC-USD,BTC-USD,BTC-USD,BTC-USD
Datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2025-07-28 00:00:00+00:00,119666.90625,119684.445312,119391.039062,119443.953125,1335595008
2025-07-28 00:30:00+00:00,119329.554688,119672.867188,119202.328125,119672.867188,1386631168
2025-07-28 01:00:00+00:00,119366.0625,119515.492188,119129.507812,119252.304688,1314926592
2025-07-28 01:30:00+00:00,119043.179688,119425.992188,119019.257812,119425.992188,2064678912
2025-07-28 02:00:00+00:00,119514.328125,119514.328125,119066.179688,119066.179688,1844445184


GALA-USD rows: 2837


Price,Close,High,Low,Open,Volume
Ticker,GALA-USD,GALA-USD,GALA-USD,GALA-USD,GALA-USD
Datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2025-07-28 00:00:00+00:00,0.018466,0.018499,0.018343,0.018348,2641976
2025-07-28 00:30:00+00:00,0.018372,0.018487,0.01833,0.018457,4361656
2025-07-28 01:00:00+00:00,0.018145,0.018304,0.018145,0.018304,2467960
2025-07-28 01:30:00+00:00,0.018075,0.018148,0.018061,0.018148,255720
2025-07-28 02:00:00+00:00,0.01825,0.01825,0.018078,0.018093,2562280


XRP-USD rows: 2841


Price,Close,High,Low,Open,Volume
Ticker,XRP-USD,XRP-USD,XRP-USD,XRP-USD,XRP-USD
Datetime,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2025-07-28 00:00:00+00:00,3.273717,3.273717,3.229481,3.24068,145593344
2025-07-28 00:30:00+00:00,3.270307,3.280304,3.260545,3.273969,260034048
2025-07-28 01:00:00+00:00,3.243617,3.270296,3.243205,3.270296,127732224
2025-07-28 01:30:00+00:00,3.237179,3.243737,3.227086,3.243737,149325824
2025-07-28 02:00:00+00:00,3.258971,3.258971,3.237911,3.237911,122406400


Step-2




In [75]:
def build_features(df):
    df = df.copy()
    close = df['Close'].squeeze()  # Convert to Series if it's a DataFrame
    volume = df['Volume'].squeeze()
    open_ = df['Open'].squeeze()
    high = df['High'].squeeze()
    low = df['Low'].squeeze()

    df['rsi'] = ta.momentum.RSIIndicator(close).rsi()
    macd = ta.trend.MACD(close)
    df['macd'] = macd.macd()
    df['macd_signal'] = macd.macd_signal()
    bb = ta.volatility.BollingerBands(close)
    df['bb_mid'] = bb.bollinger_mavg()
    df['bb_high'] = bb.bollinger_hband()
    df['bb_low'] = bb.bollinger_lband()
    df['bb_width'] = (df['bb_high'] - df['bb_low']) / df['bb_mid']
    df['percent_b'] = (close - df['bb_low']) / (df['bb_high'] - df['bb_low'])
    df['volume_change'] = volume.pct_change()

    body = abs(close - open_)
    candle_range = high - low
    upper_shadow = high - np.maximum(open_, close)
    lower_shadow = np.minimum(open_, close) - low

    df['shooting_star'] = ((body <= 0.3 * candle_range) & (upper_shadow >= 2 * body) & (lower_shadow <= 0.2 * body)).astype(int)
    df['hammer'] = ((body <= 0.3 * candle_range) & (lower_shadow >= 2 * body) & (upper_shadow <= 0.2 * body)).astype(int)

    return df.dropna()

## Step 3 — Label data for training

Label definition: a future horizon of 3 bars (~90 minutes). Label = 1 if future return > 0.2% (0.002), else 0. You can tune horizon & threshold.

In [89]:
horizon = 3
label_threshold = 0.002

labeled = {}

for sym, df in price_data.items():
    print(f'Preparing data for {sym}...')

    df_feat = build_features(df)
    df_feat['future_return'] = df_feat['Close'].shift(-horizon) / df_feat['Close'] - 1
    df_feat = df_feat.dropna()
    df_feat['label'] = (df_feat['future_return'] > label_threshold).astype(int)

    labeled[sym] = df_feat

    print(sym, 'label distribution:')
    print(df_feat['label'].value_counts(normalize=True).to_string())

Preparing data for BTC-USD...
BTC-USD label distribution:
label
0    0.763337
1    0.236663
Preparing data for GALA-USD...
GALA-USD label distribution:
label
0    0.596943
1    0.403057
Preparing data for XRP-USD...
XRP-USD label distribution:
label
0    0.63697
1    0.36303


## Step 4 — Prepare features and train RandomForest

We use a modest feature set. Train on BTC and save model.

In [95]:
# Step 4: Train model on all symbols

# 🔁 Combine labeled data from all symbols
train_df = pd.concat([labeled[sym] for sym in ['BTC-USD', 'GALA-USD', 'XRP-USD']], axis=0)

# ✅ Define features and labels
X = train_df[feature_cols].replace([np.inf, -np.inf], np.nan).fillna(0)
X = np.clip(X, -100, 100)
y = train_df['label']

# ✅ Train model
model = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=42)
model.fit(X, y)
print("✅ Model trained on multiple symbols")


✅ Model trained on multiple symbols


In [111]:
results = {}

# 🔁 Train on multiple symbols (optional)
# train_df = pd.concat([labeled[sym] for sym in ['BTC-USD', 'GALA-USD', 'XRP-USD']], axis=0)
# X = train_df[feature_cols].replace([np.inf, -np.inf], np.nan).fillna(0)
# X = np.clip(X, -100, 100)
# y = train_df['label']
# model.fit(X, y)  # retrain model with broader data

for sym in labeled:
    print(f'\n📊 Processing {sym}...')
    df = labeled[sym].copy()

    # ✅ Flatten multi-index columns
    df.columns = ['_'.join(filter(None, col)).strip() if isinstance(col, tuple) else col for col in df.columns]

    # ✅ Prepare features
    X_sym = df[feature_cols].copy()
    X_sym = X_sym.replace([np.inf, -np.inf], np.nan).fillna(0)
    X_sym = np.clip(X_sym, -100, 100)  # 🧠 Tighter clipping

    # ✅ Align lengths
    if len(X_sym) != len(df):
        df = df.iloc[-len(X_sym):].copy()

    # ✅ Predict signals
    try:
        df['ai_signal'] = model.predict(X_sym)
    except Exception as e:
        print(f"❌ Prediction failed for {sym}: {e}")
        continue

    # ✅ Define columns for backtest
    close_col = f'Close_{sym}'
    required_cols = [close_col, 'bb_low', 'bb_high', 'ai_signal']

    print(f"{sym} columns before dropna:", df.columns.tolist())
    missing_cols = [col for col in required_cols if col not in df.columns]
    if missing_cols:
        print(f"❌ Skipping {sym}: missing columns {missing_cols}")
        continue

    # ✅ Clean inf and NaN
    for col in required_cols:
        df[col] = df[col].replace([np.inf, -np.inf], np.nan)
    try:
        df = df.dropna(subset=required_cols)
    except KeyError as e:
        print(f"❌ Skipping {sym}: dropna failed - {e}")
        continue

    # ✅ Volatility filter
    bb_width_ma = df['bb_width'].rolling(50).mean()
    vol_filter = df['bb_width'] < bb_width_ma

    # ✅ Entry & exit signals
    entries = (df[close_col] <= df['bb_low']) & (df['ai_signal'] == 1) & vol_filter
    exits_signal = df[close_col] >= df['bb_high']

    entries_exec = entries.shift(1).fillna(False)

    # ✅ Max holding time as number of bars (3 days in 30min bars)
    max_holding_bars = int(pd.Timedelta('3d') / pd.Timedelta('30min'))

    # ✅ Forced exits after max holding bars
    forced_exits = entries_exec.shift(max_holding_bars).fillna(False)

    # ✅ Combine regular exits with forced exits
    exits_exec = exits_signal | forced_exits
    exits_exec = exits_exec.fillna(False)

    # ✅ Backtest portfolio
    pf = vbt.Portfolio.from_signals(
        close=df[close_col],
        entries=entries_exec,
        exits=exits_exec,
        init_cash=10000,
        fees=0.001,
        slippage=0.0005,
        freq='30min'
    )

    stats = pf.stats()
    print(f'✅ Stats for {sym}')
    display(stats)
    results[sym] = pf


📊 Processing BTC-USD...
BTC-USD columns before dropna: ['Close_BTC-USD', 'High_BTC-USD', 'Low_BTC-USD', 'Open_BTC-USD', 'Volume_BTC-USD', 'rsi', 'macd', 'macd_signal', 'bb_mid', 'bb_high', 'bb_low', 'bb_width', 'percent_b', 'volume_change', 'shooting_star', 'hammer', 'future_return', 'label', 'ai_signal']



X has feature names, but RandomForestClassifier was fitted without feature names


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`



✅ Stats for BTC-USD


Unnamed: 0,0
Start,2025-07-28 16:30:00+00:00
End,2025-09-25 03:30:00+00:00
Period,58 days 04:30:00
Start Value,10000.0
End Value,10129.912279
Total Return [%],1.299123
Benchmark Return [%],-4.629261
Max Gross Exposure [%],100.0
Total Fees Paid,20.130062
Max Drawdown [%],0.149825



📊 Processing GALA-USD...
GALA-USD columns before dropna: ['Close_GALA-USD', 'High_GALA-USD', 'Low_GALA-USD', 'Open_GALA-USD', 'Volume_GALA-USD', 'rsi', 'macd', 'macd_signal', 'bb_mid', 'bb_high', 'bb_low', 'bb_width', 'percent_b', 'volume_change', 'shooting_star', 'hammer', 'future_return', 'label', 'ai_signal']
✅ Stats for GALA-USD



X has feature names, but RandomForestClassifier was fitted without feature names


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`



Unnamed: 0,0
Start,2025-07-28 16:30:00+00:00
End,2025-09-25 03:30:00+00:00
Period,55 days 21:00:00
Start Value,10000.0
End Value,11095.098412
Total Return [%],10.950984
Benchmark Return [%],-14.734415
Max Gross Exposure [%],100.0
Total Fees Paid,182.911752
Max Drawdown [%],7.298845



📊 Processing XRP-USD...
XRP-USD columns before dropna: ['Close_XRP-USD', 'High_XRP-USD', 'Low_XRP-USD', 'Open_XRP-USD', 'Volume_XRP-USD', 'rsi', 'macd', 'macd_signal', 'bb_mid', 'bb_high', 'bb_low', 'bb_width', 'percent_b', 'volume_change', 'shooting_star', 'hammer', 'future_return', 'label', 'ai_signal']



X has feature names, but RandomForestClassifier was fitted without feature names


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`


Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`



✅ Stats for XRP-USD


Unnamed: 0,0
Start,2025-07-28 16:30:00+00:00
End,2025-09-25 03:30:00+00:00
Period,53 days 02:00:00
Start Value,10000.0
End Value,10427.882858
Total Return [%],4.278829
Benchmark Return [%],-9.452943
Max Gross Exposure [%],100.0
Total Fees Paid,160.520217
Max Drawdown [%],6.637194


## Step 6 — Plot equity curves for all symbols

In [125]:
for sym, raw_df in labeled.items():
    print(f"\n🔍 Processing {sym}...")

    # Flatten columns if multi-index
    raw_df.columns = ['_'.join(filter(None, col)).strip() if isinstance(col, tuple) else col for col in raw_df.columns]

    df = raw_df.copy()
    close_col = f'Close_{sym}'
    if close_col not in df.columns:
        print(f"❌ Missing {close_col} in {sym}, skipping.")
        continue

    # Dynamically find feature columns by symbol suffix
    try:
        rsi_col = [col for col in df.columns if col.startswith('rsi')][0]
        macd_col = [col for col in df.columns if col.startswith('macd') and 'signal' not in col][0]
        macd_signal_col = [col for col in df.columns if col.endswith('signal') and 'macd' in col][0]
        ai_signal_col = [col for col in df.columns if col.startswith('ai_signal')][0]
    except IndexError:
        print(f"❌ Missing one or more required columns in {sym}, skipping.")
        continue

    # Technical sell signals
    macd_cross_sell = (df[macd_col].shift(1) > df[macd_signal_col].shift(1)) & (df[macd_col] < df[macd_signal_col])
    technical_sell_signal = (df[rsi_col] > 70) | macd_cross_sell

    # AI sell signal (assuming -1 means sell)
    ai_sell_signal = df[ai_signal_col] == -1

    # Combined sell signal (either AI or technical)
    df['combined_sell_signal'] = technical_sell_signal | ai_sell_signal

    # Buy signal (assuming ai_signal == 1 means buy)
    entries_signal = df[ai_signal_col] == 1
    exits_signal = df['combined_sell_signal']

    # Backtest with vectorbt
    pf = vbt.Portfolio.from_signals(
        close=df[close_col],
        entries=entries_signal,
        exits=exits_signal,
        init_cash=10000,
        fees=0.001,
        slippage=0.0005,
        freq='30min'
    )

    print(f"✅ Stats for {sym}")
    print(pf.stats())

    # Plotting code remains the same as before (adjust column names accordingly)



🔍 Processing BTC-USD...
❌ Missing one or more required columns in BTC-USD, skipping.

🔍 Processing GALA-USD...
❌ Missing one or more required columns in GALA-USD, skipping.

🔍 Processing XRP-USD...
❌ Missing one or more required columns in XRP-USD, skipping.


## Next steps

- Tune label threshold, horizon, model hyperparameters.
- Consider training on combined multi-coin dataset for better generalization.
- Add transaction costs & more realistic slippage models.
- Run walk-forward validation and cross-validation for robustness.
