# Session 2: Logistic Regression & Threshold Tuning

## Learning Objectives

In this session, we will:

1. **Explore a more complex dataset** - Non-linear relationships, different feature distributions
2. **Build a logistic regression model** - Predict direction (up/down) instead of magnitude
3. **Tune prediction thresholds** - Find optimal thresholds for trading signals
4. **Compare classification vs regression** - When to use each approach
5. **Analyze threshold impact** - How different thresholds affect trading performance
6. **Explore transaction costs** - Understand the impact of trading frequency

## The Problem

In Session 1, we used **linear regression** to predict return magnitudes. But for trading, we often care more about **direction** than magnitude:
- Will the price go **up** or **down**?
- How confident are we in the prediction?

**Logistic regression** predicts probabilities, which we can convert to trading signals by choosing a threshold.

**Key question:** What threshold maximizes our trading performance (e.g., Sharpe ratio)?


## 1. Data Loading


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys

# Set style
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

# Import our modules
sys.path.insert(0, str(Path("..").resolve()))

from eda.analysis import basic_summary
from features.engineering import prepare_features, prepare_target
from backtesting.engine import backtest_strategy, print_backtest_metrics


In [None]:
# Load Session 2 data
data_path = Path("../data/saved/stock_session2.csv")
df = pd.read_csv(data_path, parse_dates=["timestamp"])

print(f"Data loaded: {df.shape[0]} rows, {df.shape[1]} columns")
print(f"\nFeatures: {[col for col in df.columns if col.startswith('X')]}")
df.head()


### Question 2.1: Dataset Characteristics

**What makes this dataset different from Session 1?**

**Answer:**
- **4 features** instead of 3: X1 (normal), X2 (poisson), X3 (binomial), X4 (normal)
- **Non-linear relationship**: Polynomial terms (X1², X1³), exponential (exp(X3)), conditional logic (if X2 > 1)
- **Time-dependent component**: X4 only affects returns in the second half of the dataset
- **Different distributions**: Poisson and binomial features add complexity

This dataset is more realistic - real financial relationships are often non-linear and time-varying.


In [None]:
# Quick EDA
basic_summary(df)

# Feature distributions
feature_cols = ["X1", "X2", "X3", "X4"]
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
for i, col in enumerate(feature_cols):
    row, col_idx = i // 2, i % 2
    df[col].hist(bins=50, ax=axes[row, col_idx], edgecolor="black")
    axes[row, col_idx].set_title(f"{col} Distribution")
    axes[row, col_idx].set_xlabel(col)
    axes[row, col_idx].set_ylabel("Frequency")
plt.tight_layout()
plt.show()


## 2. Prepare Features and Target

For logistic regression, we need to convert returns into a binary target: **1 if return > 0, 0 otherwise**.


In [None]:
# Prepare features
X = prepare_features(df, feature_cols=feature_cols)

# Prepare target: binary classification (1 = positive return, 0 = negative/zero)
y_returns = prepare_target(df, target_col="returns")
y_direction = (y_returns > 0).astype(int)  # 1 if return > 0, else 0

print(f"Features shape: {X.shape}")
print(f"Target shape: {y_direction.shape}")
print(f"\nDirection distribution:")
print(y_direction.value_counts())
print(f"\n% Positive returns: {y_direction.mean() * 100:.2f}%")


## 3. Train-Test Split


In [None]:
# Chronological split: 80% train, 20% test
split_idx = int(len(df) * 0.8)

X_train = X.iloc[:split_idx]
X_test = X.iloc[split_idx:]
y_train = y_direction.iloc[:split_idx]
y_test = y_direction.iloc[split_idx:]
y_train_returns = y_returns.iloc[:split_idx]
y_test_returns = y_returns.iloc[split_idx:]

print(f"Train set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")

# Also split the full dataframe for backtesting
df_train = df.iloc[:split_idx].copy()
df_test = df.iloc[split_idx:].copy()


## 4. Logistic Regression Model

**Logistic regression** models the probability that a return will be positive:

$$P(y=1 | X) = \frac{1}{1 + e^{-(\\beta_0 + \\beta_1 X_1 + ... + \\beta_n X_n)}}$$

The model outputs probabilities between 0 and 1. We convert these to predictions using a **threshold** (typically 0.5).


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix

# Fit logistic regression model
log_model = LogisticRegression(max_iter=1000, random_state=42)
log_model.fit(X_train, y_train)

# Get probabilities (not just predictions)
y_train_proba = log_model.predict_proba(X_train)[:, 1]  # Probability of positive return
y_test_proba = log_model.predict_proba(X_test)[:, 1]

# Predictions with default threshold (0.5)
y_train_pred = log_model.predict(X_train)
y_test_pred = log_model.predict(X_test)

# Model coefficients
print("Model Coefficients:")
for i, col in enumerate(X.columns):
    print(f"  {col}: {log_model.coef_[0][i]:.6f}")
print(f"  Intercept: {log_model.intercept_[0]:.6f}")


### 4.1 Model Evaluation (Classification Metrics)


In [None]:
# Calculate classification metrics
train_acc = accuracy_score(y_train, y_train_pred)
test_acc = accuracy_score(y_test, y_test_pred)
train_prec = precision_score(y_train, y_train_pred)
test_prec = precision_score(y_test, y_test_pred)
train_rec = recall_score(y_train, y_train_pred)
test_rec = recall_score(y_test, y_test_pred)
train_f1 = f1_score(y_train, y_train_pred)
test_f1 = f1_score(y_test, y_test_pred)
train_auc = roc_auc_score(y_train, y_train_proba)
test_auc = roc_auc_score(y_test, y_test_proba)

print("=" * 80)
print("MODEL PERFORMANCE (CLASSIFICATION METRICS)")
print("=" * 80)
print(f"\nTrain Set:")
print(f"  Accuracy: {train_acc:.4f}")
print(f"  Precision: {train_prec:.4f}")
print(f"  Recall: {train_rec:.4f}")
print(f"  F1 Score: {train_f1:.4f}")
print(f"  ROC AUC: {train_auc:.4f}")
print(f"\nTest Set:")
print(f"  Accuracy: {test_acc:.4f}")
print(f"  Precision: {test_prec:.4f}")
print(f"  Recall: {test_rec:.4f}")
print(f"  F1 Score: {test_f1:.4f}")
print(f"  ROC AUC: {test_auc:.4f}")
print("=" * 80)

# Confusion matrix
print("\nTest Set Confusion Matrix:")
cm = confusion_matrix(y_test, y_test_pred)
print(cm)
print(f"\nTrue Negatives: {cm[0,0]}, False Positives: {cm[0,1]}")
print(f"False Negatives: {cm[1,0]}, True Positives: {cm[1,1]}")


## 5. Threshold Tuning

The default threshold of 0.5 may not be optimal for trading. Let's explore different thresholds and see their impact on trading performance.


In [None]:
# Test different thresholds
thresholds = np.arange(0.1, 0.95, 0.05)
results = []

for threshold in thresholds:
    # Convert probabilities to predictions
    y_pred = (y_test_proba >= threshold).astype(int)
    
    # Calculate metrics
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred, zero_division=0)
    rec = recall_score(y_test, y_pred, zero_division=0)
    f1 = f1_score(y_test, y_pred, zero_division=0)
    
    # Calculate signal frequency
    signal_rate = y_pred.mean()
    
    results.append({
        'threshold': threshold,
        'accuracy': acc,
        'precision': prec,
        'recall': rec,
        'f1': f1,
        'signal_rate': signal_rate
    })

results_df = pd.DataFrame(results)
results_df.head(10)


In [None]:
# Visualize threshold impact
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Accuracy
axes[0, 0].plot(results_df['threshold'], results_df['accuracy'], marker='o')
axes[0, 0].set_xlabel('Threshold')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].set_title('Accuracy vs Threshold')
axes[0, 0].grid(True)

# Precision and Recall
axes[0, 1].plot(results_df['threshold'], results_df['precision'], marker='o', label='Precision')
axes[0, 1].plot(results_df['threshold'], results_df['recall'], marker='s', label='Recall')
axes[0, 1].set_xlabel('Threshold')
axes[0, 1].set_ylabel('Score')
axes[0, 1].set_title('Precision & Recall vs Threshold')
axes[0, 1].legend()
axes[0, 1].grid(True)

# F1 Score
axes[1, 0].plot(results_df['threshold'], results_df['f1'], marker='o', color='green')
axes[1, 0].set_xlabel('Threshold')
axes[1, 0].set_ylabel('F1 Score')
axes[1, 0].set_title('F1 Score vs Threshold')
axes[1, 0].grid(True)

# Signal Rate
axes[1, 1].plot(results_df['threshold'], results_df['signal_rate'], marker='o', color='red')
axes[1, 1].set_xlabel('Threshold')
axes[1, 1].set_ylabel('Signal Rate')
axes[1, 1].set_title('Trading Signal Frequency vs Threshold')
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()


## 6. Backtesting with Different Thresholds

Let's see how different thresholds affect actual trading performance (not just ML metrics).


In [None]:
# Test thresholds on trading performance
thresholds_to_test = [0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
trading_results = []

for threshold in thresholds_to_test:
    # Convert probabilities to return predictions
    # If probability > threshold, predict positive return (use mean positive return)
    # Otherwise, predict negative return (use mean negative return)
    mean_positive = y_test_returns[y_test_returns > 0].mean()
    mean_negative = y_test_returns[y_test_returns <= 0].mean()
    
    # Create return predictions based on probabilities
    y_pred_returns = np.where(
        y_test_proba >= threshold,
        mean_positive,  # If high probability of positive, predict positive return
        mean_negative  # Otherwise, predict negative return
    )
    
    # Backtest
    results = backtest_strategy(
        df_test,
        y_pred_returns,
        initial_capital=100000,
        transaction_cost=0.001,  # 0.1%
        trade_at="close"
    )
    
    trading_results.append({
        'threshold': threshold,
        'total_return': results['total_return_pct'],
        'sharpe_ratio': results['sharpe_ratio'],
        'max_drawdown': results['max_drawdown_pct'],
        'win_rate': results['win_rate_pct'],
        'signal_rate': (y_test_proba >= threshold).mean()
    })

trading_df = pd.DataFrame(trading_results)
print("Trading Performance by Threshold:")
print(trading_df.round(4))


In [None]:
# Visualize trading performance vs threshold
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Total Return
axes[0, 0].plot(trading_df['threshold'], trading_df['total_return'], marker='o')
axes[0, 0].set_xlabel('Threshold')
axes[0, 0].set_ylabel('Total Return (%)')
axes[0, 0].set_title('Total Return vs Threshold')
axes[0, 0].grid(True)
axes[0, 0].axhline(y=0, color='r', linestyle='--', alpha=0.5)

# Sharpe Ratio
axes[0, 1].plot(trading_df['threshold'], trading_df['sharpe_ratio'], marker='o', color='green')
axes[0, 1].set_xlabel('Threshold')
axes[0, 1].set_ylabel('Sharpe Ratio')
axes[0, 1].set_title('Sharpe Ratio vs Threshold')
axes[0, 1].grid(True)

# Max Drawdown
axes[1, 0].plot(trading_df['threshold'], trading_df['max_drawdown'], marker='o', color='red')
axes[1, 0].set_xlabel('Threshold')
axes[1, 0].set_ylabel('Max Drawdown (%)')
axes[1, 0].set_title('Max Drawdown vs Threshold')
axes[1, 0].grid(True)

# Signal Rate
axes[1, 1].plot(trading_df['threshold'], trading_df['signal_rate'], marker='o', color='orange')
axes[1, 1].set_xlabel('Threshold')
axes[1, 1].set_ylabel('Signal Rate')
axes[1, 1].set_title('Trading Frequency vs Threshold')
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()

# Find optimal threshold (maximize Sharpe ratio)
optimal_idx = trading_df['sharpe_ratio'].idxmax()
optimal_threshold = trading_df.loc[optimal_idx, 'threshold']
print(f"\nOptimal threshold (max Sharpe): {optimal_threshold:.2f}")
print(f"  Sharpe Ratio: {trading_df.loc[optimal_idx, 'sharpe_ratio']:.4f}")
print(f"  Total Return: {trading_df.loc[optimal_idx, 'total_return']:.2f}%")
print(f"  Max Drawdown: {trading_df.loc[optimal_idx, 'max_drawdown']:.2f}%")


## 7. Compare with Linear Regression

Let's compare logistic regression (direction prediction) with linear regression (magnitude prediction) from Session 1.


In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Fit linear regression (predicting returns directly)
lin_model = LinearRegression()
lin_model.fit(X_train, y_train_returns)
y_test_pred_returns = lin_model.predict(X_test)

# Backtest linear regression strategy
lin_results = backtest_strategy(
    df_test,
    y_test_pred_returns,
    initial_capital=100000,
    transaction_cost=0.001,
    trade_at="close"
)

# Backtest logistic regression with optimal threshold
y_pred_returns_log = np.where(
    y_test_proba >= optimal_threshold,
    y_test_returns[y_test_returns > 0].mean(),
    y_test_returns[y_test_returns <= 0].mean()
)

log_results = backtest_strategy(
    df_test,
    y_pred_returns_log,
    initial_capital=100000,
    transaction_cost=0.001,
    trade_at="close"
)

# Compare
comparison = pd.DataFrame({
    'Linear Regression': [
        lin_results['total_return_pct'],
        lin_results['sharpe_ratio'],
        lin_results['max_drawdown_pct'],
        lin_results['win_rate_pct'],
        np.sqrt(mean_squared_error(y_test_returns, y_test_pred_returns)),
        r2_score(y_test_returns, y_test_pred_returns)
    ],
    'Logistic Regression': [
        log_results['total_return_pct'],
        log_results['sharpe_ratio'],
        log_results['max_drawdown_pct'],
        log_results['win_rate_pct'],
        np.sqrt(mean_squared_error(y_test_returns, y_pred_returns_log)),
        r2_score(y_test_returns, y_pred_returns_log)
    ]
}, index=['Total Return (%)', 'Sharpe Ratio', 'Max Drawdown (%)', 'Win Rate (%)', 'RMSE', 'R²'])

print("Model Comparison:")
print(comparison.round(4))


### Question 7.1: When to Use Which Model?

**When should we use logistic regression vs linear regression?**

**Answer:**

**Logistic Regression (Direction):**
- ✅ Focus on predicting up/down (binary classification)
- ✅ Can tune threshold to optimize trading metrics
- ✅ Better when magnitude is less important than direction
- ✅ More interpretable probabilities
- ❌ Doesn't capture magnitude information

**Linear Regression (Magnitude):**
- ✅ Predicts actual return values
- ✅ Can use magnitude to size positions
- ✅ Better when magnitude matters (e.g., position sizing)
- ❌ More sensitive to outliers
- ❌ Harder to optimize threshold (need to choose signal threshold on returns)

**Hybrid approach:** Use logistic regression for direction, then use linear regression magnitude for position sizing!


## 8. Impact of Transaction Costs

Let's see how transaction costs affect the optimal threshold.


In [None]:
# Test different transaction costs
costs = [0.000, 0.0005, 0.001, 0.002, 0.005]  # 0%, 0.05%, 0.1%, 0.2%, 0.5%
cost_results = []

for cost in costs:
    # Use optimal threshold from before
    y_pred_returns_cost = np.where(
        y_test_proba >= optimal_threshold,
        y_test_returns[y_test_returns > 0].mean(),
        y_test_returns[y_test_returns <= 0].mean()
    )
    
    results = backtest_strategy(
        df_test,
        y_pred_returns_cost,
        initial_capital=100000,
        transaction_cost=cost,
        trade_at="close"
    )
    
    cost_results.append({
        'transaction_cost_pct': cost * 100,
        'total_return': results['total_return_pct'],
        'sharpe_ratio': results['sharpe_ratio'],
        'max_drawdown': results['max_drawdown_pct']
    })

cost_df = pd.DataFrame(cost_results)
print("Impact of Transaction Costs:")
print(cost_df.round(4))

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(cost_df['transaction_cost_pct'], cost_df['total_return'], marker='o')
axes[0].set_xlabel('Transaction Cost (%)')
axes[0].set_ylabel('Total Return (%)')
axes[0].set_title('Total Return vs Transaction Cost')
axes[0].grid(True)
axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.5)

axes[1].plot(cost_df['transaction_cost_pct'], cost_df['sharpe_ratio'], marker='o', color='green')
axes[1].set_xlabel('Transaction Cost (%)')
axes[1].set_ylabel('Sharpe Ratio')
axes[1].set_title('Sharpe Ratio vs Transaction Cost')
axes[1].grid(True)

plt.tight_layout()
plt.show()


### Question 8.1: Transaction Costs Impact

**What happens as transaction costs increase?**

**Answer:**
- **Higher costs** = Lower net returns
- **High-frequency strategies** (low threshold, many trades) become unprofitable
- **Optimal threshold may shift higher** to reduce trading frequency
- **Sharpe ratio decreases** as costs eat into returns

**Key insight:** Transaction costs can turn a profitable strategy unprofitable. Always include realistic costs in backtests!


## Summary

In this session, we:

1. ✅ Explored a complex dataset with non-linear relationships
2. ✅ Built a logistic regression model for direction prediction
3. ✅ Tuned prediction thresholds to optimize trading performance
4. ✅ Compared logistic vs linear regression approaches
5. ✅ Analyzed the impact of transaction costs

**Key Takeaways:**
- **Direction vs Magnitude**: Sometimes predicting direction is more important than magnitude
- **Threshold matters**: The default 0.5 threshold is rarely optimal for trading
- **Transaction costs are critical**: They can make or break a strategy
- **Optimize for trading metrics**: ML metrics (accuracy, F1) don't always align with trading performance (Sharpe, returns)

**Next Session:** We'll explore more complex models, overfitting, regularization, and walk-forward validation.


### Bonus question

**Here we fitted a binary classifier - in real life what classification problem would you solve and why?**

**Answer:**

The higher the trading frequency, but more generally in most cases, we are interested to fit a multi-class classification to train the model to predict "flat". We can also choose what flat means, usually a threshold around 0.