# Backtesting Trading Strategy

This notebook demonstrates how to backtest our trading entry point prediction model on historical data. We'll cover:
- Loading the trained model
- Finding entry points
- Simulating trades based on model predictions
- Analyzing strategy performance
- Visualizing results

In [1]:
# Add parent directory to path to import from src
import sys
import os
sys.path.append(os.path.abspath('..'))

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Import from src modules
from src.data.loader import load_data, preprocess_data
from src.data.features import prepare_features
from src.models.random_forest_model import RandomForestModel
from src.backtesting.backtester import backtest_strategy, analyze_backtest_results, generate_backtest_report
from src.visualization.charts import plot_entry_points, plot_backtest_results
from src.utils.helpers import set_pandas_display_options

# Set display options
set_pandas_display_options()

# Matplotlib settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = [12, 6]
%matplotlib inline

## 1. Load Trained Model and Data

First, let's load our trained model and the data we'll use for backtesting.

In [2]:
# Try to load the trained model
try:
    model = RandomForestModel.load('../trained_model.pkl')
    print("Loaded saved model successfully")
except FileNotFoundError:
    print("Trained model not found. Please run the model training notebook first.")
    # Create and train a simple model for demonstration
    model = RandomForestModel(
        n_estimators=100,
        max_depth=10,
        random_state=42
    )

Trained model not found. Please run the model training notebook first.


In [3]:
# Load the data for backtesting
# We'll use a combination of training and test data to have a comprehensive view
try:
    # Try to load processed data
    df_features = pd.read_csv('../processed_data.csv', index_col=0, parse_dates=True)
    print(f"Loaded processed dataset with {df_features.shape[1]} columns and {df_features.shape[0]} rows")
except FileNotFoundError:
    print("Processed data file not found. Processing raw data...")
    # Load raw data
    file_path = '../USATECH.IDXUSD_Candlestick_15_M_BID_01.01.2023-18.01.2025.csv'
    df_raw = load_data(file_path)
    df = preprocess_data(df_raw)
    
    # Prepare features
    df_features = prepare_features(df, include_target=True)
    print(f"Prepared dataset with {df_features.shape[1]} columns and {df_features.shape[0]} rows")

Loaded processed dataset with 37 columns and 49815 rows


## 2. Prepare Backtesting Data

Let's split our data into training and testing portions to ensure we're testing on out-of-sample data.

In [26]:
# Split data for backtesting
test_size = 0.40  # Use 40% of data for testing
train_size = 0.60  # Use 60% of data for training

# Calculate split indices
train_end_idx = int(len(df_features) * train_size)
test_start_idx = train_end_idx

# Split data
df_train = df_features.iloc[:train_end_idx]
df_test = df_features.iloc[test_start_idx:]

print(f"Training data: {len(df_train)} rows, {df_train.index.min()} to {df_train.index.max()}")
print(f"Testing data: {len(df_test)} rows, {df_test.index.min()} to {df_test.index.max()}")

Training data: 29889 rows, 2023-01-04 07:45:00+07:00 to 2024-03-26 09:15:00+07:00
Testing data: 19926 rows, 2024-03-26 09:30:00+07:00 to 2025-01-18 04:00:00+07:00


### Train Model if Needed

If we couldn't load a saved model, let's train one on our training data.

In [27]:
# Train the model if not loaded from file
if not hasattr(model, 'model') or model.model is None:
    print("Training a new model...")
    X_train, y_train = model.extract_features_target(df_train)
    model.train(X_train, y_train)
    print("Model training completed")

## 3. Find Entry Points for Backtesting

Now, let's use our model to find potential entry points in the test data.

In [28]:
# Extract features and target from test data
X_test, y_test = model.extract_features_target(df_test)

# Set confidence threshold for entry points
confidence_threshold = 0.65

# Find entry points
entry_points = model.find_entry_points(
    X_test, 
    df_test.index, 
    df_test['Close'].values, 
    confidence_threshold=confidence_threshold
)

print(f"Found {len(entry_points)} entry points with confidence >= {confidence_threshold}")

Found 0 entry points with confidence >= 0.65


In [7]:
# Display some entry points
if entry_points:
    print("Sample entry points:")
    for i, entry in enumerate(entry_points[:5]):
        print(f"\nEntry Point {i+1}:")
        print(f"Date: {entry['date']}")
        print(f"Price: {entry['price']:.2f}")
        print(f"Direction: {entry['direction']}")
        print(f"Confidence: {entry['confidence']:.2%}")
        print(f"Target Price: {entry['target_price']:.2f}")
        print(f"Stop Loss Price: {entry['stop_loss_price']:.2f}")

## 4. Visualize Entry Points

Let's visualize our predicted entry points on a price chart.

In [8]:
# Plot entry points
if entry_points:
    fig = plot_entry_points(df_test, entry_points, days=100)
    plt.tight_layout()
    plt.show()

## 5. Backtest the Strategy

Now, let's simulate trading these entry points to see how our strategy would have performed.

In [16]:
# Set the number of future bars to look ahead
n_future_bars = 150  # Look ahead 150 periods

# Run the backtest
print(f"Running backtest on {len(entry_points)} entry points...")
backtest_results = backtest_strategy(df_test, entry_points, n_future_bars=n_future_bars)
print(f"Backtest completed with {len(backtest_results)} trades")

Running backtest on 0 entry points...
Backtest completed with 0 trades


In [17]:
# Display sample results
if backtest_results:
    print("Sample trade results:")
    for i, result in enumerate(backtest_results[:5]):
        print(f"\nTrade {i+1}:")
        print(f"Entry: {result['timestamp']} at {result['entry_price']:.2f} ({result['direction']})")
        print(f"Exit: {result['exit_timestamp']} at {result['exit_price']:.2f}")
        print(f"Outcome: {result['outcome']}")
        print(f"Profit/Loss: {result['profit_pct']:.2f}%")
        print(f"Bars Held: {result['bars_held']}")

## 6. Analyze Backtest Results

Let's analyze the performance of our backtested strategy.

In [18]:
# Analyze backtest results
backtest_analysis = analyze_backtest_results(backtest_results)

# Generate backtest report
backtest_report = generate_backtest_report(backtest_analysis)
print(backtest_report)

ZeroDivisionError: division by zero

In [None]:
# Visualize backtest results
if backtest_results:
    fig = plot_backtest_results(backtest_results)
    plt.tight_layout()
    plt.show()

## 7. Monthly Performance Analysis

Let's break down our strategy performance by month to see if there are any seasonal patterns.

In [None]:
# Convert to DataFrame for easier analysis
if backtest_results:
    df_results = pd.DataFrame(backtest_results)
    
    # Convert timestamp to datetime if needed
    if not pd.api.types.is_datetime64_any_dtype(df_results['timestamp']):
        df_results['timestamp'] = pd.to_datetime(df_results['timestamp'])
    
    # Extract month and year
    df_results['month'] = df_results['timestamp'].dt.month
    df_results['year'] = df_results['timestamp'].dt.year
    df_results['month_year'] = df_results['timestamp'].dt.strftime('%Y-%m')
    
    # Group by month and analyze performance
    monthly_performance = df_results.groupby('month_year').agg({
        'profit_pct': ['mean', 'sum', 'count'],
        'outcome': lambda x: (x == 'WIN').mean() * 100  # Win rate as percentage
    })
    
    # Flatten multi-index columns
    monthly_performance.columns = ['avg_profit_pct', 'total_profit_pct', 'num_trades', 'win_rate']
    
    # Sort by date
    monthly_performance = monthly_performance.sort_index()
    
    print("Monthly Performance:")
    print(monthly_performance)

In [None]:
# Visualize monthly performance
if backtest_results and len(monthly_performance) > 0:
    fig, ax1 = plt.subplots(figsize=(14, 7))
    
    # Bar chart for total profit
    bars = ax1.bar(monthly_performance.index, monthly_performance['total_profit_pct'], 
                  color=['green' if x > 0 else 'red' for x in monthly_performance['total_profit_pct']])
    ax1.set_xlabel('Month')
    ax1.set_ylabel('Total Profit (%)', color='black')
    ax1.tick_params(axis='y', labelcolor='black')
    ax1.set_xticklabels(monthly_performance.index, rotation=45)
    
    # Line chart for win rate on secondary y-axis
    ax2 = ax1.twinx()
    ax2.plot(monthly_performance.index, monthly_performance['win_rate'], 'b-', marker='o')
    ax2.set_ylabel('Win Rate (%)', color='blue')
    ax2.tick_params(axis='y', labelcolor='blue')
    ax2.set_ylim(0, 100)
    
    # Add number of trades as text on bars
    for i, bar in enumerate(bars):
        num_trades = monthly_performance['num_trades'].iloc[i]
        ax1.text(i, bar.get_height() + (0.1 if bar.get_height() >= 0 else -0.5), 
                f"{num_trades} trades", ha='center', va='bottom', rotation=0)
    
    plt.title('Monthly Performance')
    plt.tight_layout()
    plt.show()

## 8. Trade Duration Analysis

Let's analyze how trade duration affects profitability.

In [None]:
# Analyze trade duration vs. profit
if backtest_results:
    # Group by bars held
    duration_analysis = df_results.groupby('bars_held').agg({
        'profit_pct': ['mean', 'count'],
        'outcome': lambda x: (x == 'WIN').mean() * 100  # Win rate as percentage
    })
    
    # Flatten multi-index columns
    duration_analysis.columns = ['avg_profit_pct', 'num_trades', 'win_rate']
    
    print("Trade Duration Analysis:")
    print(duration_analysis)

In [None]:
# Visualize trade duration vs. profit
if backtest_results and len(duration_analysis) > 0:
    fig, ax1 = plt.subplots(figsize=(14, 7))
    
    # Bar chart for average profit
    bars = ax1.bar(duration_analysis.index, duration_analysis['avg_profit_pct'], 
                  color=['green' if x > 0 else 'red' for x in duration_analysis['avg_profit_pct']])
    ax1.set_xlabel('Trade Duration (bars)')
    ax1.set_ylabel('Average Profit per Trade (%)', color='black')
    ax1.tick_params(axis='y', labelcolor='black')
    
    # Line chart for win rate on secondary y-axis
    ax2 = ax1.twinx()
    ax2.plot(duration_analysis.index, duration_analysis['win_rate'], 'b-', marker='o')
    ax2.set_ylabel('Win Rate (%)', color='blue')
    ax2.tick_params(axis='y', labelcolor='blue')
    ax2.set_ylim(0, 100)
    
    # Add number of trades as text on bars
    for i, bar in enumerate(bars):
        num_trades = duration_analysis['num_trades'].iloc[i]
        ax1.text(i, bar.get_height() + (0.1 if bar.get_height() >= 0 else -0.5), 
                f"{num_trades}", ha='center', va='bottom', rotation=0)
    
    plt.title('Trade Duration vs. Profitability')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

## 9. Equity Curve

Let's create an equity curve to see the cumulative performance of our strategy over time.

In [None]:
# Create equity curve
if backtest_results:
    # Sort by timestamp
    df_results = df_results.sort_values('timestamp')
    
    # Calculate cumulative returns (assumes equal position sizing)
    df_results['cum_return'] = (1 + df_results['profit_pct']/100).cumprod() - 1
    df_results['cum_return_pct'] = df_results['cum_return'] * 100
    
    # Calculate drawdowns
    df_results['peak'] = df_results['cum_return'].cummax()
    df_results['drawdown'] = (df_results['cum_return'] / df_results['peak'] - 1) * 100
    
    # Plot equity curve
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10), sharex=True, 
                                  gridspec_kw={'height_ratios': [3, 1]})
    
    # Equity curve
    ax1.plot(df_results['timestamp'], df_results['cum_return_pct'], label='Equity Curve')
    ax1.set_ylabel('Cumulative Return (%)')
    ax1.set_title('Strategy Equity Curve')
    ax1.grid(True, alpha=0.3)
    ax1.legend()
    
    # Drawdown
    ax2.fill_between(df_results['timestamp'], df_results['drawdown'], 0, color='red', alpha=0.3, label='Drawdown')
    ax2.set_ylabel('Drawdown (%)')
    ax2.set_xlabel('Date')
    ax2.grid(True, alpha=0.3)
    ax2.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Print key statistics
    final_return = df_results['cum_return_pct'].iloc[-1]
    max_drawdown = df_results['drawdown'].min()
    
    print(f"Final Return: {final_return:.2f}%")
    print(f"Max Drawdown: {max_drawdown:.2f}%")
    print(f"Return/Max Drawdown Ratio: {abs(final_return/max_drawdown):.2f}")

## 10. Strategy Refinement Ideas

Based on our backtest results, let's consider some potential refinements to improve the strategy.

### Analyze Failed Trades

Let's look at the characteristics of losing trades to identify potential improvements.

In [None]:
# Analyze losing trades vs. winning trades
if backtest_results:
    winning_trades = df_results[df_results['outcome'] == 'WIN']
    losing_trades = df_results[df_results['outcome'] == 'LOSS']
    
    print(f"Number of winning trades: {len(winning_trades)}")
    print(f"Number of losing trades: {len(losing_trades)}")
    
    # Calculate average trade characteristics
    print("\nAverage trade characteristics:")
    print(f"Average profit on winning trades: {winning_trades['profit_pct'].mean():.2f}%")
    print(f"Average loss on losing trades: {losing_trades['profit_pct'].mean():.2f}%")
    print(f"Average duration of winning trades: {winning_trades['bars_held'].mean():.2f} bars")
    print(f"Average duration of losing trades: {losing_trades['bars_held'].mean():.2f} bars")
    
    # Analyze direction bias
    if 'direction' in df_results.columns:
        long_trades = df_results[df_results['direction'] == 'LONG']
        short_trades = df_results[df_results['direction'] == 'SHORT']
        
        print("\nDirection analysis:")
        print(f"Long trades: {len(long_trades)}, Win rate: {(long_trades['outcome'] == 'WIN').mean() * 100:.2f}%")
        print(f"Short trades: {len(short_trades)}, Win rate: {(short_trades['outcome'] == 'WIN').mean() * 100:.2f}%")
        print(f"Average profit on long trades: {long_trades['profit_pct'].mean():.2f}%")
        print(f"Average profit on short trades: {short_trades['profit_pct'].mean():.2f}%")

### Potential Strategy Refinements

Based on our analysis, here are some potential refinements to consider:

1. **Adjust position sizing** - Consider using variable position sizing based on prediction confidence.
2. **Optimize profit targets and stop losses** - Analyze the optimal risk-reward ratio.
3. **Apply time filters** - Only trade during certain hours of the day or days of the week.
4. **Apply market condition filters** - Only trade during specific market conditions (e.g., low volatility, trending markets).
5. **Combine with other indicators** - Add additional filters using traditional indicators.
6. **Adjust confidence threshold** - Increase or decrease based on backtest results.
7. **Optimize trade duration** - Set maximum holding period based on the analysis.

## 11. Save Backtest Results

Let's save our backtest results for future reference and comparison.

In [None]:
# Save backtest results and report
if backtest_results:
    # Save DataFrame to CSV
    df_results.to_csv('../backtest_results.csv', index=False)
    print("Saved backtest results to '../backtest_results.csv'")
    
    # Save backtest report to text file
    with open('../backtest_report.txt', 'w') as f:
        f.write(backtest_report)
    print("Saved backtest report to '../backtest_report.txt'")

## Summary

In this notebook, we've completed a comprehensive backtest of our trading entry point prediction model:

1. Loaded our trained model and prepared backtesting data
2. Found potential entry points based on model predictions
3. Visualized these entry points on price charts
4. Simulated trades with target prices and stop losses
5. Analyzed overall strategy performance
6. Examined monthly performance patterns
7. Analyzed the relationship between trade duration and profitability
8. Created an equity curve to visualize cumulative returns
9. Identified characteristics of winning and losing trades
10. Proposed potential refinements to improve the strategy

Next, we can use these insights to refine our model and trading strategy for better performance.