# 👩‍💻 Grid, Random, or Bayesian? Tune and Compare Your Models

## 📋 Overview
This activity challenges you to implement and compare three distinct hyperparameter tuning techniques: Grid Search, Random Search, and Bayesian Optimization. By the end, you’ll have a hands-on understanding of how each method operates, its strengths and limitations, and where it might be best applied. You'll experience firsthand how these techniques impact model performance across different datasets, helping you make informed decisions in your future machine learning projects.

## 🎯 Learning Outcomes
By the end of this lab, you will be able to:

- ✅ Implement and compare Grid Search, Random Search, and Bayesian Optimization for hyperparameter tuning
- ✅ Evaluate the effectiveness of different tuning methods using performance metrics
- ✅ Make informed decisions on which hyperparameter tuning method to use in various scenarios

## Task 1: Dataset Selection and Preprocessing

**Context:** Proper dataset selection and preprocessing ensures the data is clean and ready for modeling.

**Steps:**

1. Ensure the dataset is preprocessed: handle missing data, normalize features as needed, and split into training and testing sets.

In [None]:
# Import libraries
# Task 1: Dataset Selection and Preprocessing
# Load and preprocess dataset

💡 **Tip:** Use `train_test_split` from `sklearn.model_selection` for data splitting.

⚙️ **Test Your Work:**
- The dataset should show the features and corresponding labels, demonstrating the preprocessing steps.

## Task 2: Implementing Grid Search

**Context:** Grid Search involves exhaustively searching through a predefined hyperparameter grid.

**Steps:**

1. Set up a hyperparameter grid for a Random Forest model.
2. Use `GridSearchCV` from scikit-learn to carry out the grid search.
3. Record the best parameters along with performance metrics like accuracy.

In [None]:
# Task 2: Implementing Grid Search

💡 **Tip:** Define a comprehensive grid for key hyperparameters.

⚙️ **Test Your Work:**
- Plots should clearly show the relationship between actual and predicted values for the best model from Grid Search. Legends - should correctly identify each data series in the plot.

## Task 3: Applying Random Search

**Context:** Random Search involves sampling hyperparameters randomly from a defined space.

**Steps:**

1. Configure a random search space for the same Random Forest model.
2. Implement `RandomizedSearchCV` to sample configurations and carry out the search.
3. Record the best parameters along with performance metrics.

In [None]:
# Task 3: Applying Random Search

💡 **Tip:** Ensure the random search space is broad and varied.

⚙️ **Test Your Work:**
- Plots should clearly show the relationship between actual and predicted values for the best model from Random Search. Legends should correctly identify each data series in the plot.

## Task 4: Exploring Bayesian Optimization

**Context:** Bayesian Optimization iteratively focuses on promising hyperparameter configurations.

**Steps:**

1. Utilize a tool like `optuna` to apply Bayesian Optimization for the Random Forest model's hyperparameters.
2. Observe and document how Bayesian Optimization iteratively improves the configurations.
3. Record the best parameters along with performance metrics.

In [None]:
# Task 4: Exploring Bayesian Optimization

💡 **Tip:** Define an objective function for `optuna` to optimize.

⚙️ **Test Your Work:**
- Plots should clearly show the relationship between actual and predicted values for the best model from Bayesian Optimization. Legends should correctly identify each data series in the plot.

## Task 5: Comparing Approaches and Performance

**Context:** Comparing results helps evaluate the strengths and weaknesses of different hyperparameter tuning methods.

**Steps:**

1. Systematically compare the results from Grid Search, Random Search, and Bayesian Optimization.
2. Reflect on key aspects such as speed, accuracy, and computational cost.

In [None]:
# Task 5: Comparing Approaches and Performance

💡 **Tip:** Use visualizations or statistical summaries to aid comparison.

⚙️ **Test Your Work:**
- Plots should clearly illustrate the performance comparison between Grid Search, Random Search, and Bayesian Optimization. Legends should correctly identify the tuning method and corresponding performance metrics.

### ✅ Success Checklist

- Successfully selected and preprocessed the dataset
- Implemented and tuned hyperparameters using Grid Search, Random Search, and Bayesian Optimization
- Compared and analyzed results from the three tuning methods
- Provided reflections and recommendations based on findings

### 🔍 Common Issues & Solutions

**Problem:** Dataset not loading correctly.   
**Solution:** Verify the data source and ensure proper loading using pandas.

**Problem:** Hyperparameter tuning errors.   
**Solution:** Check the parameter grid and search space to ensure compatibility with the chosen model.

**Problem:** Bayesian Optimization not providing good results.   
**Solution:** Verify the objective function and optimization setup, and try increasing the number of trials.

### 🔑 Key Points

- Grid Search exhaustively searches through predefined hyperparameter combinations.
- Random Search samples hyperparameter configurations randomly and can be more time-efficient.
- Bayesian Optimization iteratively focuses on promising configurations to refine performance.
- Comparing different tuning methods helps understand their strengths, weaknesses, and suitable applications.

## 💻 Exemplar Solution

<details>    
<summary><strong>Click HERE to see an exemplar solution</strong></summary>    

```python
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import numpy as np
import optuna
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler    
    

# Task 1: Load and preprocess dataset
iris = load_iris()
    
# Splitting data
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)
  
# Impute missing values (if any) with the mean of each column
imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
X_test = imputer.transform(X_test)

# Handle feature scaling using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Task 2: Grid Search
param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20], 'min_samples_split': [2, 5, 10]}
grid_search = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5, verbose=2)
grid_search.fit(X_train, y_train)
grid_best_params = grid_search.best_params_
grid_accuracy = accuracy_score(y_test, grid_search.best_estimator_.predict(X_test))

# Task 3: Random Search
param_distributions = {'n_estimators': np.arange(50, 200, 50), 'max_depth': [None, 10, 20], 'min_samples_split': np.arange(2, 11)}
random_search = RandomizedSearchCV(estimator=RandomForestClassifier(), param_distributions=param_distributions, n_iter=10, cv=5, verbose=2, random_state=42)
random_search.fit(X_train, y_train)
random_best_params = random_search.best_params_
random_accuracy = accuracy_score(y_test, random_search.best_estimator_.predict(X_test))

# Task 4: Bayesian Optimization using Optuna
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 200)
    max_depth = trial.suggest_int('max_depth', 5, 30)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, min_samples_split=min_samples_split)
    score = np.mean(cross_val_score(model, X_train, y_train, cv=3))
    return score

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)
bayesian_best_params = study.best_params
model = RandomForestClassifier(**study.best_params)
model.fit(X_train, y_train)
bayesian_accuracy = accuracy_score(y_test, model.predict(X_test))

# Task 5:Comparing Approaches and Performance
print(f"Grid Search Best Parameters: {grid_best_params}, Accuracy: {grid_accuracy}")
print(f"Random Search Best Parameters: {random_best_params}, Accuracy: {random_accuracy}")
print(f"Bayesian Optimization Best Parameters: {bayesian_best_params}, Accuracy: {bayesian_accuracy}")
```