Random Forest Hyperparameter Tuning using Sklearn

In [1]:
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

data = pd.read_csv("/content/heart.csv")
data.head(7)

X = data.drop("target", axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

model = RandomForestClassifier(
    n_estimators=100,
    max_features="sqrt",
    max_depth=6,
    max_leaf_nodes=6
)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(classification_report(y_pred, y_test))

              precision    recall  f1-score   support

           0       0.75      0.90      0.82       110
           1       0.91      0.78      0.84       147

    accuracy                           0.83       257
   macro avg       0.83      0.84      0.83       257
weighted avg       0.84      0.83      0.83       257



. Hyperparameter Tuning using GridSearchCV

In [2]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2],
    'bootstrap': [True, False]
}

grid_search = GridSearchCV(RandomForestClassifier(), param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)
print("Best Estimator:", grid_search.best_estimator_)

Best Parameters: {'bootstrap': False, 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 200}
Best Estimator: RandomForestClassifier(bootstrap=False, n_estimators=200)


1.)param_grid: A dictionary containing hyperparameters and their possible values. GridSearchCV will try every combination of these values to find the best-performing set of hyperparameters.
2.)grid_search.fit(X_train, y_train): This trains the model on the training data (X_train, y_train) for every combination of hyperparameters defined in param_grid.
3.)grid_search.best_estimator_: After completing the grid search, this will print the RandomForest model that has the best combination of hyperparameters from the search.

In [3]:
#updating the model
model_grid = RandomForestClassifier(max_depth=3,
                                    max_features="log2",
                                    max_leaf_nodes=3,
                                    n_estimators=50)
model_grid.fit(X_train, y_train)
y_pred_grid = model.predict(X_test)
print(classification_report(y_pred_grid, y_test))

              precision    recall  f1-score   support

           0       0.75      0.90      0.82       110
           1       0.91      0.78      0.84       147

    accuracy                           0.83       257
   macro avg       0.83      0.84      0.83       257
weighted avg       0.84      0.83      0.83       257



 Hyperparameter Tuning using RandomizedSearchCV

In [4]:
random_search = RandomizedSearchCV(RandomForestClassifier(),
                                   param_grid)
random_search.fit(X_train, y_train)
print(random_search.best_estimator_)

RandomForestClassifier(bootstrap=False, max_depth=10)


param_grid specifies the hyperparameters that you want to tune similar to the grid in GridSearchCV.
fit(X_train, y_train) trains the model using the training data.
best_estimator_ shows the model with the best combination of hyperparameters found by the search process.

In [5]:
#updating model
model_random = RandomForestClassifier(max_depth=3,
                                      max_features='log2',
                                      max_leaf_nodes=6,
                                      n_estimators=100)
model_random.fit(X_train, y_train)
y_pred_rand = model.predict(X_test)
print(classification_report(y_pred_rand, y_test))

              precision    recall  f1-score   support

           0       0.75      0.90      0.82       110
           1       0.91      0.78      0.84       147

    accuracy                           0.83       257
   macro avg       0.83      0.84      0.83       257
weighted avg       0.84      0.83      0.83       257

