# Week 6: Classification - Logistic Regression, SVM, Meta-Labeling

## ðŸŽ¯ Learning Objectives

By the end of this week, you will understand:
- **Logistic Regression**: Probability-based classification
- **Support Vector Machines (SVM)**: Maximum margin classifiers
- **Meta-Labeling**: Marcos Lopez de Prado's approach to bet sizing
- **Evaluation Metrics**: Precision, Recall, ROC-AUC

---

## Why Classification in Finance?

Many trading problems are classification tasks:
- **Direction prediction**: Will the price go up or down?
- **Event detection**: Will there be a volatility spike?
- **Trade filtering**: Should we take this signal?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
print("âœ… Libraries loaded!")
print("ðŸ“š Week 6: Classification Theory")

---

## Part 1: Logistic Regression

### The Problem

Predict probability that $y = 1$ given features $X$.

### The Sigmoid Function

$$P(y=1|X) = \sigma(X\beta) = \frac{1}{1 + e^{-X\beta}}$$

### Loss Function (Cross-Entropy)

$$L = -\sum_{i} [y_i \log(p_i) + (1-y_i) \log(1-p_i)]$$

### ðŸ¤” Simple Explanation

Logistic regression is like linear regression, but squashed through a sigmoid function to output probabilities (0 to 1). It answers: "What's the probability of success given these features?"

In [None]:
# Visualize the sigmoid function
x = np.linspace(-6, 6, 100)
sigmoid = 1 / (1 + np.exp(-x))

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(x, sigmoid, 'b-', linewidth=2)
plt.axhline(y=0.5, color='r', linestyle='--', alpha=0.5)
plt.axvline(x=0, color='r', linestyle='--', alpha=0.5)
plt.xlabel('Linear combination (XÎ²)')
plt.ylabel('Probability')
plt.title('Sigmoid Function')
plt.grid(True, alpha=0.3)

# Create classification data
n = 500
momentum = np.random.randn(n) * 2
volatility = np.random.randn(n) * 2

# True probability of positive return
linear_comb = 0.5 * momentum - 0.3 * volatility
prob = 1 / (1 + np.exp(-linear_comb))
y = (np.random.random(n) < prob).astype(int)

plt.subplot(1, 2, 2)
colors = ['red' if label == 0 else 'green' for label in y]
plt.scatter(momentum, volatility, c=colors, alpha=0.5, s=20)
plt.xlabel('Momentum Signal')
plt.ylabel('Volatility')
plt.title('Classification: Up (green) vs Down (red)')
plt.tight_layout()
plt.show()

In [None]:
# Fit logistic regression
X = np.column_stack([momentum, volatility])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

logit = LogisticRegression()
logit.fit(X_train, y_train)

# Predictions
y_pred = logit.predict(X_test)
y_prob = logit.predict_proba(X_test)[:, 1]

print("Logistic Regression Results")
print("="*50)
print(f"Coefficients: Momentum={logit.coef_[0][0]:.3f}, Volatility={logit.coef_[0][1]:.3f}")
print(f"Intercept: {logit.intercept_[0]:.3f}")
print(f"\nROC-AUC Score: {roc_auc_score(y_test, y_prob):.3f}")
print(f"\n{classification_report(y_test, y_pred)}")

---

## Part 2: Support Vector Machines (SVM)

### The Idea: Maximum Margin

Find the hyperplane that maximizes the margin between classes.

$$\max_{w,b} \frac{2}{||w||} \text{ subject to } y_i(w^Tx_i + b) \geq 1$$

### Kernel Trick

Map data to higher dimensions where it becomes linearly separable:

- **Linear**: $K(x_i, x_j) = x_i^T x_j$
- **RBF**: $K(x_i, x_j) = \exp(-\gamma ||x_i - x_j||^2)$
- **Polynomial**: $K(x_i, x_j) = (x_i^T x_j + c)^d$

### ðŸ¤” Simple Explanation

SVM finds the best "dividing line" between classes. It's like drawing a line between red and blue points, but making sure the line is as far as possible from both colors.

In [None]:
# Compare SVM kernels
from sklearn.svm import SVC

kernels = ['linear', 'rbf', 'poly']
results = {}

print("SVM Kernel Comparison")
print("="*50)

for kernel in kernels:
    svm = SVC(kernel=kernel, probability=True, random_state=42)
    svm.fit(X_train, y_train)
    y_prob = svm.predict_proba(X_test)[:, 1]
    auc = roc_auc_score(y_test, y_prob)
    results[kernel] = auc
    print(f"{kernel.upper():8} kernel: AUC = {auc:.3f}")

print(f"\nâœ… Best kernel: {max(results, key=results.get).upper()}")

---

## Part 3: Meta-Labeling

### The Concept (Marcos Lopez de Prado)

Instead of predicting direction, predict **whether to take a trade** given a primary signal.

### Two-Stage Process

1. **Primary Model**: Predicts direction (up/down)
2. **Meta Model**: Predicts if primary model's trade will be profitable

### ðŸ¤” Simple Explanation

Think of it as a filter:
- Your main model says "BUY"
- The meta-model asks "Should we actually buy? Is this a good opportunity?"

### Why Meta-Labeling?

- Decouples direction from sizing
- Allows confident model to reject uncertain trades
- Controls false positives

In [None]:
# Meta-Labeling Example
np.random.seed(42)

n_trades = 1000

# Primary signal (e.g., momentum crossover)
primary_signal = np.random.choice([1, -1], n_trades)  # Long or Short

# Features for meta-model
volatility = np.random.exponential(0.02, n_trades)
volume = np.random.exponential(1, n_trades)
trend_strength = np.random.random(n_trades)

# True profitability depends on conditions
# Trade is profitable if: low vol + high trend + decent volume
profit_prob = 0.3 + 0.3 * (1 - volatility/volatility.max()) + 0.2 * trend_strength + 0.1 * (volume > 0.5)
profit_prob = np.clip(profit_prob, 0, 1)
is_profitable = (np.random.random(n_trades) < profit_prob).astype(int)

# Actual PnL (for visualization)
returns = np.where(is_profitable, np.abs(np.random.randn(n_trades) * 0.02),
                   -np.abs(np.random.randn(n_trades) * 0.02))

# Create meta-model
X_meta = np.column_stack([volatility, volume, trend_strength])
X_train_m, X_test_m, y_train_m, y_test_m, ret_train, ret_test = train_test_split(
    X_meta, is_profitable, returns, test_size=0.3, random_state=42
)

meta_model = LogisticRegression()
meta_model.fit(X_train_m, y_train_m)

# Predictions
meta_prob = meta_model.predict_proba(X_test_m)[:, 1]

print("Meta-Labeling Results")
print("="*50)
print(f"Meta-Model AUC: {roc_auc_score(y_test_m, meta_prob):.3f}")

In [None]:
# Compare strategies: Take All Trades vs Meta-Filtered
threshold = 0.5
take_trade = meta_prob >= threshold

pnl_all = ret_test.sum()
pnl_filtered = ret_test[take_trade].sum()

n_all = len(ret_test)
n_filtered = take_trade.sum()

print("\nStrategy Comparison")
print("="*50)
print(f"Take All Trades:")
print(f"  Trades: {n_all}, Total PnL: {pnl_all:.2%}, Avg PnL: {pnl_all/n_all:.4%}")
print(f"\nMeta-Filtered (threshold={threshold}):")
print(f"  Trades: {n_filtered}, Total PnL: {pnl_filtered:.2%}, Avg PnL: {pnl_filtered/n_filtered:.4%}")
print(f"\nâœ… Meta-labeling filtered out {n_all-n_filtered} low-quality trades!")

---

## Part 4: Classification Metrics for Finance

### Key Metrics

| Metric | Formula | Finance Meaning |
|--------|---------|----------------|
| **Precision** | TP/(TP+FP) | % of predicted wins that are actual wins |
| **Recall** | TP/(TP+FN) | % of actual wins we captured |
| **F1** | 2Ã—(PÃ—R)/(P+R) | Balance of precision and recall |
| **ROC-AUC** | Area under ROC | Overall ranking ability |

### Finance Consideration

**Precision vs Recall Trade-off:**
- High precision = Fewer but higher quality trades
- High recall = Don't miss opportunities

In [None]:
# Precision-Recall trade-off
thresholds = [0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

print("Threshold Analysis")
print("="*60)
print(f"{'Threshold':<12} {'Trades':<10} {'Precision':<12} {'Recall':<10} {'PnL/Trade'}")
print("-"*60)

for thresh in thresholds:
    take = meta_prob >= thresh
    if take.sum() == 0:
        continue
    precision = y_test_m[take].mean()
    recall = y_test_m[take].sum() / y_test_m.sum()
    pnl_per_trade = ret_test[take].mean()
    print(f"{thresh:<12.1f} {take.sum():<10} {precision:<12.3f} {recall:<10.3f} {pnl_per_trade:.4%}")

---

## Interview Questions

### Conceptual
1. When would you use logistic regression over SVM?
2. What is the kernel trick and why is it useful?
3. Explain meta-labeling and its advantages.

### Technical
1. Derive the gradient of logistic regression's loss function.
2. How do you handle class imbalance in trading (many non-events)?
3. What's the difference between hard and soft margin SVM?

### Finance-Specific
1. Should you optimize for precision or recall in a trading system?
2. How would you use classification probabilities for position sizing?
3. What features would you include in a meta-labeling model?

---

## Key Takeaways

| Model | Output | Best For |
|-------|--------|----------|
| Logistic | Probabilities | Interpretable, bet sizing |
| SVM | Decision boundary | Non-linear patterns |
| Meta-Label | Trade filter | Quality over quantity |