# Week 13: Neural Networks - MLP for Finance

## üéØ Learning Objectives

By the end of this week, you will understand:
- **Neural Network Fundamentals**: Architecture, activations, forward pass
- **Backpropagation**: How neural networks learn
- **MLP for Finance**: Regression and classification tasks
- **Regularization**: Dropout, batch normalization, early stopping

---

## Why Neural Networks in Finance?

- Capture complex non-linear relationships
- Handle high-dimensional data
- Learn feature interactions automatically
- Foundation for advanced architectures (LSTM, Transformer)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)
print("‚úÖ Libraries loaded!")
print("üìö Week 13: Neural Networks")

---

## Part 1: Neural Network Architecture

### Building Blocks

**Neuron**: $y = \sigma(w^T x + b)$

**Layer**: Multiple neurons processing inputs

**Network**: Stack of layers

### Forward Pass

$$h^{(l)} = \sigma(W^{(l)} h^{(l-1)} + b^{(l)})$$

### ü§î Simple Explanation

A neural network is layers of simple functions stacked together. Each layer takes input, multiplies by weights, adds bias, and applies activation. The magic is in learning the right weights.

In [None]:
# Activation functions
def relu(x):
    return np.maximum(0, x)

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

def tanh(x):
    return np.tanh(x)

# Visualize
x = np.linspace(-5, 5, 100)

fig, axes = plt.subplots(1, 3, figsize=(12, 3))

axes[0].plot(x, relu(x), 'b-', linewidth=2)
axes[0].set_title('ReLU')
axes[0].grid(True, alpha=0.3)

axes[1].plot(x, sigmoid(x), 'g-', linewidth=2)
axes[1].set_title('Sigmoid')
axes[1].grid(True, alpha=0.3)

axes[2].plot(x, tanh(x), 'r-', linewidth=2)
axes[2].set_title('Tanh')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Part 2: MLP with PyTorch

### Network Architecture for Returns Prediction

In [None]:
try:
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torch.utils.data import DataLoader, TensorDataset
    
    # Define MLP
    class FinanceMLP(nn.Module):
        def __init__(self, input_dim, hidden_dims=[64, 32], dropout=0.2):
            super().__init__()
            layers = []
            prev_dim = input_dim
            
            for hidden_dim in hidden_dims:
                layers.append(nn.Linear(prev_dim, hidden_dim))
                layers.append(nn.ReLU())
                layers.append(nn.Dropout(dropout))
                layers.append(nn.BatchNorm1d(hidden_dim))
                prev_dim = hidden_dim
            
            layers.append(nn.Linear(prev_dim, 1))
            self.network = nn.Sequential(*layers)
        
        def forward(self, x):
            return self.network(x)
    
    print("PyTorch MLP Architecture")
    print("="*50)
    model = FinanceMLP(input_dim=10)
    print(model)
    
except ImportError:
    print("‚ö†Ô∏è PyTorch not installed. Using sklearn MLP instead.")

In [None]:
# Generate financial data
n = 2000
np.random.seed(42)

# Features
momentum = np.random.randn(n)
volatility = np.abs(np.random.randn(n))
volume = np.random.exponential(1, n)
rsi = np.random.uniform(20, 80, n)
ma_ratio = 1 + np.random.randn(n) * 0.1

# Non-linear target
target = (
    0.01 * momentum * (volatility < 0.5) +
    0.005 * np.log(volume + 1) * (momentum > 0) +
    0.002 * (rsi - 50) / 50 +
    np.random.randn(n) * 0.01
)

X = np.column_stack([momentum, volatility, volume, rsi, ma_ratio])
y = target

# Scale
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, shuffle=False
)

In [None]:
# Train with sklearn (fallback)
from sklearn.neural_network import MLPRegressor

mlp = MLPRegressor(
    hidden_layer_sizes=(64, 32),
    activation='relu',
    solver='adam',
    max_iter=500,
    early_stopping=True,
    validation_fraction=0.1,
    random_state=42
)

mlp.fit(X_train, y_train)

print("MLP Results")
print("="*50)
print(f"Train R¬≤: {mlp.score(X_train, y_train):.4f}")
print(f"Test R¬≤:  {mlp.score(X_test, y_test):.4f}")
print(f"Iterations: {mlp.n_iter_}")

# Learning curve
plt.figure(figsize=(8, 4))
plt.plot(mlp.loss_curve_, label='Training Loss')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.title('MLP Learning Curve')
plt.legend()
plt.show()

---

## Part 3: Regularization Techniques

### Preventing Overfitting

1. **Dropout**: Randomly zero neurons during training
2. **Batch Normalization**: Normalize layer inputs
3. **Early Stopping**: Stop when validation loss increases
4. **L2 Regularization**: Penalize large weights

In [None]:
# Compare regularization effects
configs = [
    {'alpha': 0.0001, 'early_stopping': False},  # Baseline
    {'alpha': 0.01, 'early_stopping': False},    # L2 regularization
    {'alpha': 0.0001, 'early_stopping': True},   # Early stopping
]

print("Regularization Comparison")
print("="*60)
print(f"{'Config':<30} {'Train R¬≤':<12} {'Test R¬≤':<12}")
print("-"*60)

for config in configs:
    model = MLPRegressor(
        hidden_layer_sizes=(64, 32),
        max_iter=500,
        random_state=42,
        **config
    )
    model.fit(X_train, y_train)
    train_r2 = model.score(X_train, y_train)
    test_r2 = model.score(X_test, y_test)
    print(f"{str(config):<30} {train_r2:<12.4f} {test_r2:<12.4f}")

---

## Interview Questions

### Conceptual
1. Why do we need non-linear activation functions?
2. What problem does batch normalization solve?
3. How does dropout prevent overfitting?

### Technical
1. Derive backpropagation for a simple 2-layer network.
2. What's the vanishing gradient problem?
3. Why is ReLU preferred over sigmoid in deep networks?

### Finance-Specific
1. Why might neural networks struggle with financial data?
2. How would you prevent overfitting in a return prediction model?
3. When would you choose MLP over tree-based models?

---

## Key Takeaways

| Concept | Key Point |
|---------|----------|
| Architecture | Input ‚Üí Hidden ‚Üí Output with activations |
| Training | Backpropagation + gradient descent |
| Regularization | Dropout, early stopping, L2 |
| Finance | Watch for overfitting on noisy data |