# ***`Learn the difference between Parameters and Hyperparameters, and use GridSearchCV to automatically tune your model for the best performance. Key Skill: Optimization (Making the model smarter without more data).`***

# 1: Setup & Load Data

We will use the Breast Cancer Dataset again. It's tricky, so tuning helps.

In [9]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split , GridSearchCV
from sklearn.ensemble import RandomForestClassifier

In [14]:
# load data

data = load_breast_cancer()
X= data.data
y= data.target

# split test and train

X_train , X_test , y_train, y_test = train_test_split(X, y , test_size = 0.2 ,random_state = 42)

# 2: Define the "Menu" (The Grid)

In [20]:
# The Menu of Options [param = parameter]

param_grid = {'n_estimators':[10,50,100],'max_depth':[None, 10 , 20],'min_samples_split':[2,5]}

# 3: Run the Grid Search (The Bake-Off)

We don't train the model directly. We train the GridSearchCV object.

cv=5 means "Cross Validation." It checks each recipe 5 times to make sure it wasn't just luck.

In [21]:
# Create the base model (don't set settings here)

base_model = RandomForestClassifier(random_state = 42)

In [22]:
# Create the base model (don't set settings here)
base_model = RandomForestClassifier(random_state=42)

# Setup the Search
# cv=5 means: "Test each combo 5 times on different data slices"
grid_search = GridSearchCV(estimator=base_model, param_grid=param_grid, cv=5, scoring='accuracy')

# START THE SEARCH (This might take a few seconds)
print("Tuning the model... Please wait.")
grid_search.fit(X_train, y_train)

print("Search Complete.")

Tuning the model... Please wait.
Search Complete.


# 4: The Winner

In [23]:
print(f"The Best Accuracy : {grid_search.best_score_*100:.2f}%")
print("The Best Parameters (The Perfect Recipes):")
print(grid_search.best_params_)

The Best Accuracy : 95.82%
The Best Parameters (The Perfect Recipes):
{'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}


# 5 : The FINAL test

In [24]:
best_model = grid_search.best_estimator_
final_accuracy = best_model.score(X_test,y_test)

print(f"THE FINAL TEST ACCURACY : {final_accuracy*100:.2f}%")

THE FINAL TEST ACCURACY : 96.49%
