# Optuna and Ray Tune: Advanced Hyperparameter Optimization

These are **powerful alternatives** to `GridSearchCV` and `RandomizedSearchCV` that use **smarter search strategies**.

## The Problem with Grid/Random Search

```python
# GridSearchCV tries EVERY combination (exhaustive)
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1]
}
# Total trials: 4 √ó 4 = 16 (grows exponentially!)

# RandomizedSearchCV samples randomly
# Better, but still blind - doesn't learn from previous trials
```

**Limitations**:
- ‚ùå No learning from past trials
- ‚ùå Wastes computation on bad regions
- ‚ùå Struggles with high-dimensional spaces (>10 hyperparameters)
- ‚ùå Can't handle conditional parameters (e.g., kernel-specific params)

## What Optuna and Ray Tune Do Differently

Both use **Bayesian Optimization** and **smart sampling**:

```
Trial 1: Try random params ‚Üí Score = 0.75
Trial 2: Try nearby params ‚Üí Score = 0.78 (getting warmer!)
Trial 3: Focus search here ‚Üí Score = 0.82 (found good region!)
Trial 4: Refine further ‚Üí Score = 0.84
...
Trial 50: Optimal found!

vs GridSearch: Blindly tries all combinations
```

### Optuna
- **From**: Preferred Networks (Japan)
- **Philosophy**: Lightweight, Pythonic, easy to get started
- **Strengths**: Simple API, great for single-node work, excellent pruning

### Ray Tune
- **From**: UC Berkeley (part of Ray ecosystem)
- **Philosophy**: Distributed, scalable, production-grade
- **Strengths**: Multi-node clusters, distributed training, schedulers

---

## 1. Optuna Deep Dive

### Basic Usage (Standalone)

```python
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def objective(trial):
    """
    Optuna calls this function for each trial
    trial: object that suggests hyperparameters
    """
    # Define search space
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
        'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
        'max_features': trial.suggest_categorical('max_features', ['sqrt', 'log2', None])
    }
    
    # Create model with suggested params
    model = RandomForestClassifier(**params, random_state=42)
    
    # Evaluate (Optuna will MINIMIZE this by default)
    score = cross_val_score(model, X_train, y_train, cv=5, scoring='roc_auc').mean()
    
    return score  # Optuna tries to maximize/minimize this

# Run optimization
study = optuna.create_study(
    direction='maximize',  # or 'minimize'
    study_name='rf_optimization',
    sampler=optuna.samplers.TPESampler(seed=42)  # Tree-structured Parzen Estimator
)

study.optimize(
    objective, 
    n_trials=100,  # Number of trials
    timeout=3600,  # Or time limit (1 hour)
    n_jobs=-1      # Parallel trials
)

# Best results
print(f"Best score: {study.best_value}")
print(f"Best params: {study.best_params}")
```

### Key Concepts

**1. Trial Object** - Suggests hyperparameters:

```python
# Different types of parameters
trial.suggest_int('n_estimators', 10, 1000)  # Integer
trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)  # Log scale
trial.suggest_categorical('kernel', ['rbf', 'poly', 'sigmoid'])  # Discrete
trial.suggest_uniform('C', 0.1, 100)  # Continuous uniform
```

**2. Samplers** - How to choose next trial:

```python
# TPE (Tree-structured Parzen Estimator) - most popular
optuna.samplers.TPESampler()  # Bayesian optimization

# Random (baseline)
optuna.samplers.RandomSampler()

# Grid (exhaustive)
optuna.samplers.GridSampler(search_space)

# CMA-ES (evolution strategy)
optuna.samplers.CmaEsSampler()
```

**3. Pruning** - Stop bad trials early:

```python
import optuna

def objective_with_pruning(trial):
    params = {...}
    model = RandomForestClassifier(**params)
    
    # Evaluate each fold
    for fold_idx, (train_idx, val_idx) in enumerate(cv.split(X_train, y_train)):
        X_fold_train = X_train[train_idx]
        y_fold_train = y_train[train_idx]
        X_fold_val = X_train[val_idx]
        y_fold_val = y_train[val_idx]
        
        model.fit(X_fold_train, y_fold_train)
        score = model.score(X_fold_val, y_fold_val)
        
        # Report intermediate score
        trial.report(score, fold_idx)
        
        # Prune if this trial is clearly worse than others
        if trial.should_prune():
            raise optuna.TrialPruned()  # Stop early, save computation!
    
    return score

study = optuna.create_study(
    direction='maximize',
    pruner=optuna.pruners.MedianPruner(  # Prune if below median
        n_startup_trials=5,  # Don't prune first 5 trials
        n_warmup_steps=2     # Need at least 2 folds before pruning
    )
)
```

---

## 2. Integration with Your Workflow

### A. Optuna + Scikit-learn Pipeline

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
import optuna

def objective_with_pipeline(trial):
    """
    Optimize entire pipeline including preprocessing
    """
    # Preprocessing params
    scaler_type = trial.suggest_categorical('scaler', ['standard', 'minmax', 'robust'])
    
    if scaler_type == 'standard':
        from sklearn.preprocessing import StandardScaler
        scaler = StandardScaler()
    elif scaler_type == 'minmax':
        from sklearn.preprocessing import MinMaxScaler
        scaler = MinMaxScaler()
    else:
        from sklearn.preprocessing import RobustScaler
        scaler = RobustScaler()
    
    # Model params (conditional on kernel choice)
    kernel = trial.suggest_categorical('kernel', ['rbf', 'poly', 'sigmoid'])
    
    model_params = {
        'C': trial.suggest_float('C', 1e-3, 1e3, log=True),
        'kernel': kernel
    }
    
    if kernel == 'rbf':
        model_params['gamma'] = trial.suggest_float('gamma', 1e-5, 1e-1, log=True)
    elif kernel == 'poly':
        model_params['degree'] = trial.suggest_int('degree', 2, 5)
        model_params['gamma'] = trial.suggest_float('gamma', 1e-5, 1e-1, log=True)
    
    # Build pipeline
    pipeline = Pipeline([
        ('scaler', scaler),
        ('classifier', SVC(**model_params, probability=True, random_state=42))
    ])
    
    # Evaluate
    score = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='roc_auc').mean()
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective_with_pipeline, n_trials=100)
```

### B. Optuna + Imbalanced-learn

```python
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler

def objective_with_imbalanced(trial):
    """
    Optimize sampling strategy + model together
    """
    # Sampling strategy
    sampling_strategy = trial.suggest_categorical(
        'sampling', 
        ['none', 'smote', 'undersample', 'both']
    )
    
    steps = [('scaler', StandardScaler())]
    
    if sampling_strategy == 'smote':
        k_neighbors = trial.suggest_int('smote_k', 3, 10)
        steps.append(('sampler', SMOTE(k_neighbors=k_neighbors, random_state=42)))
    
    elif sampling_strategy == 'undersample':
        sampling_ratio = trial.suggest_float('under_ratio', 0.5, 1.0)
        steps.append(('sampler', RandomUnderSampler(
            sampling_strategy=sampling_ratio, 
            random_state=42
        )))
    
    elif sampling_strategy == 'both':
        # SMOTE then undersample
        steps.append(('smote', SMOTE(random_state=42)))
        steps.append(('under', RandomUnderSampler(random_state=42)))
    
    # Model params
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'class_weight': trial.suggest_categorical('class_weight', ['balanced', None])
    }
    
    steps.append(('classifier', RandomForestClassifier(**params, random_state=42)))
    
    pipeline = ImbPipeline(steps)
    score = cross_val_score(pipeline, X_train, y_train, cv=5, scoring='roc_auc').mean()
    
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective_with_imbalanced, n_trials=150)
```

### C. Optuna + MLflow Integration

**MLflow** tracks experiments, parameters, metrics, and artifacts.

```python
import mlflow
import optuna
from optuna.integration.mlflow import MLflowCallback

# Set MLflow tracking URI
mlflow.set_tracking_uri("file:./mlruns")  # or remote server
mlflow.set_experiment("hyperparameter-optimization")

def objective_with_mlflow(trial):
    """
    Optuna trial with MLflow tracking
    """
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 300),
        'max_depth': trial.suggest_int('max_depth', 3, 15),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True)
    }
    
    # Start MLflow run for this trial
    with mlflow.start_run(nested=True):
        # Log parameters
        mlflow.log_params(params)
        
        # Train model
        from sklearn.ensemble import GradientBoostingClassifier
        model = GradientBoostingClassifier(**params, random_state=42)
        
        # Cross-validation
        scores = cross_val_score(model, X_train, y_train, cv=5, scoring='roc_auc')
        mean_score = scores.mean()
        std_score = scores.std()
        
        # Log metrics
        mlflow.log_metric("cv_auc_mean", mean_score)
        mlflow.log_metric("cv_auc_std", std_score)
        
        # Log model
        model.fit(X_train, y_train)
        mlflow.sklearn.log_model(model, "model")
        
    return mean_score

# Create study with MLflow callback
mlflow_callback = MLflowCallback(
    tracking_uri="file:./mlruns",
    metric_name="auc"
)

study = optuna.create_study(direction='maximize')
study.optimize(
    objective_with_mlflow, 
    n_trials=100,
    callbacks=[mlflow_callback]  # Auto-logs to MLflow
)

# After optimization, retrieve best run
print(f"Best trial: {study.best_trial.number}")
print(f"Best score: {study.best_value}")
print(f"Best params: {study.best_params}")

# Load best model from MLflow
best_run_id = study.best_trial.user_attrs['mlflow_run_id']
best_model = mlflow.sklearn.load_model(f"runs:/{best_run_id}/model")
```

**What MLflow gives you**:
- üìä **Experiment tracking**: All trials logged automatically
- üìà **Metric visualization**: Compare trials, plot learning curves
- üíæ **Model versioning**: Save and load models easily
- üîç **Reproducibility**: Every parameter/metric/artifact tracked

---

## 3. Complete Workflow Integration

Here's how everything fits together:

```python
import optuna
import mlflow
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import StratifiedKFold, cross_val_score

# Setup
mlflow.set_experiment("complete-ml-workflow")
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

def complete_objective(trial):
    """
    Full pipeline: imbalanced data ‚Üí model ‚Üí calibration
    """
    with mlflow.start_run(nested=True):
        # 1. SAMPLING STRATEGY
        use_smote = trial.suggest_categorical('use_smote', [True, False])
        
        steps = [('scaler', StandardScaler())]
        
        if use_smote:
            k_neighbors = trial.suggest_int('smote_k', 3, 10)
            steps.append(('smote', SMOTE(k_neighbors=k_neighbors, random_state=42)))
            mlflow.log_param("smote_k", k_neighbors)
        
        mlflow.log_param("use_smote", use_smote)
        
        # 2. MODEL HYPERPARAMETERS
        model_params = {
            'n_estimators': trial.suggest_int('n_estimators', 50, 300),
            'max_depth': trial.suggest_int('max_depth', 3, 15),
            'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
            'class_weight': trial.suggest_categorical('class_weight', ['balanced', None]),
            'random_state': 42
        }
        
        steps.append(('classifier', RandomForestClassifier(**model_params)))
        
        # Build pipeline
        pipeline = ImbPipeline(steps)
        mlflow.log_params(model_params)
        
        # 3. EVALUATE UNCALIBRATED MODEL
        uncal_scores = cross_val_score(
            pipeline, X_train, y_train, cv=cv, scoring='roc_auc'
        )
        uncal_mean = uncal_scores.mean()
        
        mlflow.log_metric("uncalibrated_auc", uncal_mean)
        mlflow.log_metric("uncalibrated_auc_std", uncal_scores.std())
        
        # 4. CALIBRATION
        calibration_method = trial.suggest_categorical('calibration', ['sigmoid', 'isotonic'])
        
        calibrated_pipeline = CalibratedClassifierCV(
            pipeline,
            method=calibration_method,
            cv=cv,
            n_jobs=-1
        )
        
        mlflow.log_param("calibration_method", calibration_method)
        
        # 5. EVALUATE CALIBRATED MODEL
        cal_scores = cross_val_score(
            calibrated_pipeline, X_train, y_train, cv=cv, scoring='roc_auc'
        )
        cal_mean = cal_scores.mean()
        
        mlflow.log_metric("calibrated_auc", cal_mean)
        mlflow.log_metric("calibrated_auc_std", cal_scores.std())
        
        # 6. FINAL TRAINING & LOGGING
        calibrated_pipeline.fit(X_train, y_train)
        mlflow.sklearn.log_model(calibrated_pipeline, "calibrated_model")
        
        # Evaluate on test set
        test_auc = roc_auc_score(y_test, calibrated_pipeline.predict_proba(X_test)[:, 1])
        mlflow.log_metric("test_auc", test_auc)
        
        return cal_mean  # Optimize calibrated CV AUC

# Run optimization
study = optuna.create_study(
    direction='maximize',
    study_name='complete-workflow',
    sampler=optuna.samplers.TPESampler(seed=42),
    pruner=optuna.pruners.MedianPruner(n_startup_trials=10)
)

study.optimize(complete_objective, n_trials=200, n_jobs=1)  # n_jobs=1 for MLflow

# Get best configuration
print("\n" + "="*60)
print("BEST CONFIGURATION")
print("="*60)
print(f"Best CV AUC: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")

# Retrain final model with best params
best_params = study.best_params
# ... rebuild pipeline with best_params ...
```

---

## 4. Optuna Visualization

```python
import optuna.visualization as vis

# After study.optimize(...)

# 1. Optimization history
fig = vis.plot_optimization_history(study)
fig.show()

# 2. Parameter importances (which params matter most?)
fig = vis.plot_param_importances(study)
fig.show()

# 3. Parallel coordinate plot (see relationships between params)
fig = vis.plot_parallel_coordinate(study)
fig.show()

# 4. Slice plot (how each param affects score)
fig = vis.plot_slice(study)
fig.show()

# 5. Contour plot (2D interactions)
fig = vis.plot_contour(study, params=['n_estimators', 'max_depth'])
fig.show()
```

---

## 5. Ray Tune (Quick Overview)

**Ray Tune** is more complex but scales to clusters:

```python
from ray import tune
from ray.tune.sklearn import TuneSearchCV

# Define search space
param_distributions = {
    'n_estimators': tune.randint(50, 300),
    'max_depth': tune.randint(3, 15),
    'learning_rate': tune.loguniform(0.01, 0.3)
}

# Use like GridSearchCV
tune_search = TuneSearchCV(
    GradientBoostingClassifier(random_state=42),
    param_distributions,
    search_optimization="bayesian",  # Bayesian optimization
    n_trials=100,
    cv=5,
    scoring='roc_auc'
)

tune_search.fit(X_train, y_train)
print(tune_search.best_params_)
```

**When to use Ray Tune**:
- Multi-node clusters
- Distributed deep learning
- Need advanced schedulers (ASHA, PBT)
- Production ML infrastructure

**When to use Optuna**:
- Single machine / small cluster
- Quick prototyping
- Simpler API needed
- Scikit-learn focused

---

## 6. Comparison Summary

| Feature | GridSearchCV | RandomizedSearchCV | Optuna | Ray Tune |
|---------|-------------|-------------------|--------|----------|
| **Search Strategy** | Exhaustive grid | Random sampling | Bayesian (smart) | Bayesian (smart) |
| **Efficiency** | ‚≠ê Worst | ‚≠ê‚≠ê Better | ‚≠ê‚≠ê‚≠ê‚≠ê Great | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê Best |
| **Ease of Use** | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê |
| **Scalability** | Single node | Single node | Single/small cluster | Large clusters |
| **Pruning** | ‚ùå | ‚ùå | ‚úÖ | ‚úÖ |
| **Conditional Params** | ‚ùå | ‚ùå | ‚úÖ | ‚úÖ |
| **MLflow Integration** | Manual | Manual | ‚úÖ Built-in | ‚úÖ Built-in |
| **Visualization** | ‚ùå | ‚ùå | ‚úÖ Excellent | ‚úÖ Good |

---

## When to Use What?

```
Small search space (<20 trials):
‚Üí GridSearchCV (simple, complete)

Medium search (20-100 trials):
‚Üí RandomizedSearchCV (good baseline)

Large search (100+ trials), single machine:
‚Üí Optuna (best ROI)

Very large search, distributed:
‚Üí Ray Tune (scales best)

Complex pipelines + tracking:
‚Üí Optuna + MLflow (production-ready)
```

---

Want me to show:
1. **How to resume interrupted Optuna studies** (save/load progress)?
2. **Multi-objective optimization** (optimize AUC AND calibration together)?
3. **Distributed Optuna** (multiple workers)?