# üéõÔ∏è Hyperparameter Tuning with Optuna

This notebook demonstrates **Bayesian Optimization** using Optuna, a state-of-the-art hyperparameter optimization framework.

## What You'll Learn

1. Why Optuna is better than Grid/Random Search
2. How to define an objective function
3. How to suggest hyperparameters with different distributions
4. How to run optimization and analyze results

## Why Optuna?

| Method | How it Works | Efficiency |
|--------|--------------|------------|
| Grid Search | Try all combinations | Low (wastes time on bad regions) |
| Random Search | Random sampling | Medium (doesn't learn) |
| **Optuna** | Learns from past trials | High (focuses on promising regions) |

Optuna uses **Bayesian Optimization** to intelligently select which hyperparameters to try next based on previous results.

---
## 1. Setup and Imports

We'll use:
- **PyTorch** for building and training the neural network
- **Optuna** for hyperparameter optimization
- **scikit-learn** for dataset generation and preprocessing

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import optuna
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

---
## 2. Generate Synthetic Dataset

We'll create a binary classification dataset with:
- **1000 samples**
- **20 features**
- **2 classes** (binary classification)

This is a controlled environment to demonstrate hyperparameter tuning without the complexity of real-world data preprocessing.

In [None]:
# Generate synthetic dataset
X, y = make_classification(
    n_samples=1000,      # Total samples
    n_features=20,       # Number of input features
    n_informative=15,    # Features that actually carry information
    n_redundant=5,       # Redundant features (linear combinations)
    n_classes=2,         # Binary classification
    random_state=42      # Reproducibility
)

# Split into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(
    X, y, 
    test_size=0.2,       # 20% for validation
    random_state=42
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Validation set: {X_val.shape[0]} samples")
print(f"Features: {X_train.shape[1]}")

### Examine the Data

Let's look at what our data looks like:

In [None]:
# View a single sample
print("Sample input (20 features):")
print(X_train[0])
print(f"\nCorresponding label: {y_train[0]}")

### Convert to PyTorch Tensors

PyTorch requires data in tensor format:
- Features (`X`) ‚Üí `float32` tensors
- Labels (`y`) ‚Üí `long` tensors (for CrossEntropyLoss)

In [None]:
# Convert to PyTorch tensors
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.long)
X_val_t = torch.tensor(X_val, dtype=torch.float32)
y_val_t = torch.tensor(y_val, dtype=torch.long)

print(f"X_train tensor shape: {X_train_t.shape}")
print(f"y_train tensor shape: {y_train_t.shape}")

---
## 3. Define the Neural Network

We'll create a simple feedforward neural network with:
- **Input layer**: 20 features
- **Hidden layer**: Variable size (this is what we'll tune!)
- **Output layer**: 2 classes

The `hidden_dim` parameter controls the network's capacity:
- Too small ‚Üí Underfitting (can't learn complex patterns)
- Too large ‚Üí Overfitting (memorizes training data) + slower training

In [None]:
class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),  # Input ‚Üí Hidden
            nn.ReLU(),                          # Non-linearity
            nn.Linear(hidden_dim, 2)            # Hidden ‚Üí Output (2 classes)
        )
        
    def forward(self, x):
        return self.network(x)

# Test the model architecture
test_model = SimpleNN(input_dim=20, hidden_dim=64)
print(test_model)
print(f"\nTotal parameters: {sum(p.numel() for p in test_model.parameters()):,}")

---
## 4. Define the Optuna Objective Function

The **objective function** is the heart of Optuna optimization. It:

1. **Receives a `trial` object** from Optuna
2. **Suggests hyperparameters** using `trial.suggest_*` methods
3. **Trains a model** with those hyperparameters
4. **Returns a metric** to optimize (accuracy in our case)

#   model = SimpleNN(input_dim=20, hidden_dim=hidden_dim)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    # Training loop
    epochs = 20
    batch_size = 32
    train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(TensorDataset(X_val, y_val), batch_size=batch_size)

    for epoch in range(epochs):
        model.train()
        for batch_X, batch_y in train_loader:
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()

    # Validation accuracy
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_X, batch_y in val_loader:
            outputs = model(batch_X)
            _, predicted = torch.max(outputs, 1)
            total += batch_y.size(0)
            correct += (predicted == batch_y).sum().item()

    accuracy = correct / total
    return accuracy

# 4. Run the Optuna optimization
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=20)

# 5. Print the best hyperparameters
print("Best hyperparameters found:")
print(study.best_params)            