# M5L2 Screencasts

## M5-L2-SC1: Hyperparameter Tuning with GridSearchCV

### Step 1: Setting Up Your Environment

Install and import necessary libraries and modules.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

### Step 2: Loading and Preparing Your Dataset

Load the Iris dataset and create training/testing sets.

In [None]:
# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


### Step 3: Defining the Model and Parameters

Initialize a Random Forest Classifier and define a parameter grid.

In [None]:
# Initialize Random Forest
rf_clf = RandomForestClassifier()

# Define hyperparameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}


### Step 4: Running GridSearchCV
Implement GridSearchCV to optimize hyperparameters.

In [None]:
# Initialize GridSearchCV
grid_search = GridSearchCV(estimator=rf_clf, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2)

# Fit to the training data
grid_search.fit(X_train, y_train)


Fitting 5 folds for each of 36 candidates, totalling 180 fits


### Step 5: Evaluating Optimal Parameters and Model Performance
Identify the best parameters and evaluate model performance.

In [None]:
# Retrieve best parameters
print("Best Parameters:", grid_search.best_params_)

# Evaluate model performance
best_model = grid_search.best_estimator_
accuracy = best_model.score(X_test, y_test)
print(f"Model Test Accuracy: {accuracy:.2f}")

Best Parameters: {'max_depth': 30, 'min_samples_split': 10, 'n_estimators': 50}
Model Test Accuracy: 1.00


## M5L2SC2: Efficient Hyperparameter Tuning with RandomizedSearchCV

### Step 1: Setting Up Your Workspace
Install and import the necessary libraries and modules for the project.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
import numpy as np


### Step 2: Preparing Your Dataset
Load the Iris dataset and partition it into training and testing sets.


In [None]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


### Step 3: Configure Randomized Search Parameters
Set up a parameter distribution and configure the Random Forest model.

In [None]:
# Define the Random Forest model
rf_clf = RandomForestClassifier()

# Define a parameter distribution
param_distributions = {
    'n_estimators': np.arange(10, 200, 10),
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': np.arange(2, 10),
    'min_samples_leaf': np.arange(1, 10)
}

### Step 4: Executing RandomizedSearchCV
Execute RandomizedSearchCV to evaluate combinations of hyperparameters.

In [None]:
# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(estimator=rf_clf, param_distributions=param_distributions,
                                   n_iter=50, cv=5, verbose=2, random_state=42, n_jobs=-1)

# Fit the model to training data
random_search.fit(X_train, y_train)

Fitting 5 folds for each of 50 candidates, totalling 250 fits


### Step 5: Reviewing Optimal Parameters and Model Accuracy
Detail results and evaluate model performance with optimized hyperparameters.

In [None]:
# Retrieve the best parameters
print("Best Parameters:", random_search.best_params_)

# Evaluate accuracy on the test set
best_model = random_search.best_estimator_
accuracy = best_model.score(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2f}")

Best Parameters: {'n_estimators': np.int64(150), 'min_samples_split': np.int64(9), 'min_samples_leaf': np.int64(3), 'max_depth': 30}
Test Accuracy: 1.00


## M5L2SC3: What Is Bayesian Optimization and How Does It Work?

### Step 1: Setting Up Your Environment
Install scikit-optimize and import essential Python libraries for the task.

In [None]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from skopt import gp_minimize
from skopt.space import Integer
from skopt.utils import use_named_args

### Step 2: Implementing Bayesian Optimization with scikit-optimize
Define the hyperparameter search space and set up the Bayesian Optimization.

In [None]:
# Load sample data
X, y = load_iris(return_X_y=True)

# Define search space
space = [
    Integer(10, 200, name='n_estimators'),
    Integer(1, 10, name='max_depth')
]

# Objective function
@use_named_args(space)
def objective(n_estimators, max_depth):
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
    return -np.mean(cross_val_score(model, X, y, cv=5, n_jobs=-1))

### Step 3: Executing and Analyzing the Process
Execute the optimization using gp_minimize and analyze the results.

In [None]:
# Run Bayesian optimization
result = gp_minimize(
    func=objective,
    dimensions=space,
    n_calls=30,
    random_state=42,
    verbose=True
)

Iteration No: 1 started. Evaluating function at random point.
Iteration No: 1 ended. Evaluation done at random point.
Time taken: 1.1115
Function value obtained: -0.9667
Current minimum: -0.9667
Iteration No: 2 started. Evaluating function at random point.
Iteration No: 2 ended. Evaluation done at random point.
Time taken: 1.0811
Function value obtained: -0.9667
Current minimum: -0.9667
Iteration No: 3 started. Evaluating function at random point.
Iteration No: 3 ended. Evaluation done at random point.
Time taken: 0.6610
Function value obtained: -0.9467
Current minimum: -0.9667
Iteration No: 4 started. Evaluating function at random point.
Iteration No: 4 ended. Evaluation done at random point.
Time taken: 0.6659
Function value obtained: -0.9667
Current minimum: -0.9667
Iteration No: 5 started. Evaluating function at random point.
Iteration No: 5 ended. Evaluation done at random point.
Time taken: 0.2686
Function value obtained: -0.9600
Current minimum: -0.9667
Iteration No: 6 started. 

In [None]:
# Print best result
print("Best score: %.4f" % -result.fun)
print("Best parameters:")
for name, val in zip([dim.name for dim in space], result.x):
    print(f"  {name}: {val}")

Best score: 0.9667
Best parameters:
  n_estimators: 161
  max_depth: 3


## M5L2SC4: Hands-On: Hyperparameter Tuning with Optuna

### Step 1: Setting Up Your Environment
Install Optuna and import essential libraries and datasets.

In [None]:
!pip install optuna


import optuna
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier

Collecting optuna
  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.3.0-py3-none-any.whl (386 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.6/386.6 kB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.16.1-py3-none-any.whl (242 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.5/242.5 kB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, alembic, optuna
Successfully installed alembic-1.16.1 colorlog-6.9.0 optuna-4.3.0


### Step 2: Preparing the Dataset
Load the Iris dataset and partition it for training and testing.

In [None]:
# Load the Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

### Step 3: Defining the Objective Function
Define the objective function that Optuna will use for hyperparameter optimization.

In [None]:
def objective(trial):
    # Suggest hyperparameters for RandomForest
    n_estimators = trial.suggest_int('n_estimators', 50, 200)
    max_depth = trial.suggest_int('max_depth', 5, 30)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    # Initialize and configure the model
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth,
                                   min_samples_split=min_samples_split)
    # Execute cross-validation
    score = cross_val_score(model, X_train, y_train, cv=3)
    return score.mean()

### Step 4: Running the Optimization with Optuna
Execute Optuna's optimization process to find the best hyperparameters.

In [None]:
# Start the optimization process
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

# Display the best hyperparameters discovered
print("Best Hyperparameters Found: ", study.best_params)
print("Best Score Achieved: ", study.best_value)

[I 2025-05-28 00:06:33,532] A new study created in memory with name: no-name-3054dbe0-2a36-42dc-82bf-7482f08e7f1d
[I 2025-05-28 00:06:35,884] Trial 0 finished with value: 0.9333333333333332 and parameters: {'n_estimators': 196, 'max_depth': 16, 'min_samples_split': 5}. Best is trial 0 with value: 0.9333333333333332.
[I 2025-05-28 00:06:37,165] Trial 1 finished with value: 0.9333333333333332 and parameters: {'n_estimators': 141, 'max_depth': 29, 'min_samples_split': 2}. Best is trial 0 with value: 0.9333333333333332.
[I 2025-05-28 00:06:38,395] Trial 2 finished with value: 0.9333333333333332 and parameters: {'n_estimators': 123, 'max_depth': 23, 'min_samples_split': 9}. Best is trial 0 with value: 0.9333333333333332.
[I 2025-05-28 00:06:39,149] Trial 3 finished with value: 0.9333333333333332 and parameters: {'n_estimators': 60, 'max_depth': 15, 'min_samples_split': 2}. Best is trial 0 with value: 0.9333333333333332.
[I 2025-05-28 00:06:40,376] Trial 4 finished with value: 0.933333333333

Best Hyperparameters Found:  {'n_estimators': 97, 'max_depth': 17, 'min_samples_split': 10}
Best Score Achieved:  0.9523809523809524
