# üéõÔ∏è Hyperparameter Tuning with Optuna

This notebook demonstrates **Bayesian Optimization** using Optuna.

## Why Optuna?

| Method | Efficiency |
|--------|------------|
| Grid Search | Low |
| Random Search | Medium |
| **Optuna** | High |

---
## 1. Setup and Imports

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import optuna
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

---
## 2. Generate Synthetic Dataset

In [None]:
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, n_classes=2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training: {X_train.shape[0]} samples, Validation: {X_val.shape[0]} samples")

### Examine the Data

In [None]:
print("Sample input:", X_train[0][:5], "...")
print(f"Label: {y_train[0]}")

### Convert to PyTorch Tensors

In [None]:
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.long)
X_val_t = torch.tensor(X_val, dtype=torch.float32)
y_val_t = torch.tensor(y_val, dtype=torch.long)
print(f"Shapes: X={X_train_t.shape}, y={y_train_t.shape}")

---
## 3. Define the Neural Network

In [None]:
class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 2)
        )
    def forward(self, x):
        return self.network(x)

test_model = SimpleNN(20, 64)
print(test_model)

---
## 4. Define the Optuna Objective Function

| Parameter | Type | Range |
|-----------|------|-------|
| learning_rate | Log-uniform | 1e-4 to 1e-1 |
| hidden_dim | Integer | 16 to 128 |

In [None]:
def objective(trial):
    """Objective function for Optuna optimization.
    
    Note: Training for 20 epochs during search for speed.
    Final model will train for 50 epochs (see section 8).
    """
    lr = trial.suggest_float('learning_rate', 1e-4, 1e-1, log=True)
    hidden_dim = trial.suggest_int('hidden_dim', 16, 128)
    
    model = SimpleNN(20, hidden_dim)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    train_loader = DataLoader(TensorDataset(X_train_t, y_train_t), batch_size=32, shuffle=True)
    
    # Train for 20 epochs (fast evaluation during search)
    for epoch in range(20):
        model.train()
        for bx, by in train_loader:
            optimizer.zero_grad()
            loss = criterion(model(bx), by)
            loss.backward()
            optimizer.step()
    
    # Evaluate on validation set
    model.eval()
    with torch.no_grad():
        preds = model(X_val_t).argmax(1)
        acc = (preds == y_val_t).float().mean().item()
    return acc

---
## 5. Run Optuna Optimization

In [None]:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20, show_progress_bar=True)

---
## 6. Analyze Results

In [None]:
print(f"Best accuracy: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")

### All Trials

In [None]:
for t in study.trials:
    best = '*' if t.number == study.best_trial.number else ''
    print(f"Trial {t.number}: lr={t.params['learning_rate']:.6f}, hidden={t.params['hidden_dim']}, acc={t.value:.4f} {best}")

---
## 7. Visualization

In [None]:
try:
    optuna.visualization.plot_optimization_history(study).show()
except: print('Install plotly: pip install plotly')

---
## 8. Train Final Model with Extended Training and Validation Monitoring

### Important: Epoch Mismatch and Mitigation

**The Issue:**
- Hyperparameter search: 20 epochs (fast evaluation)
- Final training: 50 epochs (better convergence)

**Why This Mismatch?**
- Searching with 20 epochs is faster; final model gets more training for better performance
- Hyperparameters tuned for 20 epochs might not be optimal for 50 epochs

**How We Mitigate This Risk:**
1. **Periodic validation** every 10 epochs to monitor generalization
2. **Track improvement** - if validation accuracy stops improving, model may be overfitting
3. **Early stopping indicators** - watch for increasing "No improve" count

**What to Look For:**
- If validation accuracy plateaus or decreases ‚Üí overfitting detected
- If "No improve" count is high ‚Üí consider reducing epochs or adding regularization

In [None]:
final_model = SimpleNN(20, study.best_params['hidden_dim'])
optimizer = optim.Adam(final_model.parameters(), lr=study.best_params['learning_rate'])
criterion = nn.CrossEntropyLoss()
train_loader = DataLoader(TensorDataset(X_train_t, y_train_t), batch_size=32, shuffle=True)
val_loader = DataLoader(TensorDataset(X_val_t, y_val_t), batch_size=32)

print(f"Training final model with best hyperparameters:")
print(f"  Learning rate: {study.best_params['learning_rate']:.6f}")
print(f"  Hidden dim: {study.best_params['hidden_dim']}")
print(f"\n{'Epoch':<8} {'Train Loss':<12} {'Val Accuracy':<15} {'Status':<20}")
print("-" * 55)

best_val_acc = 0
epochs_no_improve = 0

for epoch in range(50):
    # Training phase
    final_model.train()
    train_loss = 0
    for bx, by in train_loader:
        optimizer.zero_grad()
        loss = criterion(final_model(bx), by)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    train_loss /= len(train_loader)
    
    # Validation every 10 epochs (or at final epoch)
    if (epoch + 1) % 10 == 0 or epoch == 49:
        final_model.eval()
        val_acc = 0
        with torch.no_grad():
            for bx, by in val_loader:
                val_acc += (final_model(bx).argmax(1) == by).float().sum().item()
        val_acc /= len(y_val_t)
        
        # Check for improvement
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            epochs_no_improve = 0
            status = "‚úì Improved"
        else:
            epochs_no_improve += 1
            status = f"No improve ({epochs_no_improve})"
        
        print(f"{epoch+1:<8} {train_loss:<12.4f} {val_acc:<15.4f} {status:<20}")

print(f"\n{'='*55}")
print(f"Final validation accuracy: {best_val_acc:.4f}")
print(f"\nInterpretation:")
if epochs_no_improve == 0:
    print("‚úì Model improved throughout training - good generalization")
elif epochs_no_improve == 1:
    print("‚ö† Model plateaued in last 10 epochs - normal behavior")
else:
    print(f"‚ö† Model hasn't improved for {epochs_no_improve*10} epochs - possible overfitting")
    print("  Consider: reducing epochs, adding dropout, or increasing L2 regularization")

---
## 9. Key Takeaways

1. **Optuna uses Bayesian optimization** to intelligently search hyperparameter space
2. **Use `log=True` for learning rate** to explore all orders of magnitude equally
3. **20+ trials** usually finds good hyperparameters
4. **Validate during final training** to detect overfitting and ensure hyperparameters generalize
5. **Monitor improvement** - if validation accuracy plateaus, consider early stopping