You work in the telecommunication industry and are tasked with finding the best machine learning model to predict customer churn. After research you decide to perform hyperparameter tunning and grid search on models like Random Forest Classifier, SVC and Gradient Boosting Classifier with different values for their parameters to find the best model along with their optimal parameters and accuracy scores.

In [13]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.svm import SVC
import numpy as np

Using the imported “make_classification”  method create a synthetic classification dataset with 0 random state.

In [14]:
X, y = make_classification(random_state=0)

Now that the dataset is created and stored let’s define the parameters for the models.

In [15]:
param_grid_rf = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

param_grid_svm = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf'],
    'gamma': [0.1, 1, 10]
}

param_grid_gb = {
    'n_estimators': [50, 100, 200],
    'learning_rate': [0.01, 0.1, 1],
    'max_depth': [3, 4, 5]
}

With the parameters defined let’s split the dataset into training and test sets.

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

With the dataset split into training and test setsinitialize all the models and store them in a variable.

In [17]:
rf_model = RandomForestClassifier()
svm_model = SVC()
gb_model = GradientBoostingClassifier()

Before we perform grid search cv let’s create a dictionary of the model name and parameters grid.

In [18]:
models = {
'Random Forest': (rf_model, param_grid_rf),
'SVM': (svm_model, param_grid_svm),
'Gradient Boosting': (gb_model, param_grid_gb)
}

As the final step, we can perform grid search cv for each model and print out the results.

In [19]:
# Perform Grid Search CV for each model
for model_name, (model, param_grid) in models.items():
    grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
    grid_search.fit(X_train, y_train)
    
    best_params = grid_search.best_params_
    best_model = grid_search.best_estimator_
    
    test_score = best_model.score(X_test, y_test)
    
    print(f"Model: {model_name}")
    print("Best Hyperparameters:", best_params)
    print("Test Accuracy:", test_score)
print("Best model:", best_model)

Model: Random Forest
Best Hyperparameters: {'max_depth': None, 'min_samples_split': 10, 'n_estimators': 50}
Test Accuracy: 0.95
Model: SVM
Best Hyperparameters: {'C': 0.1, 'gamma': 0.1, 'kernel': 'linear'}
Test Accuracy: 0.85
Model: Gradient Boosting
Best Hyperparameters: {'learning_rate': 1, 'max_depth': 3, 'n_estimators': 100}
Test Accuracy: 0.95
Best model: GradientBoostingClassifier(learning_rate=1)
