# Finding Optimal Training Parameters Using Grid Search

When you are working with classifiers, you do not always know what the best parameters are. You cannot brute-force it by checking for all possible combinations manually. This is where grid search becomes useful. Grid search allows us to specify a range of values and the classifier will automatically run various configurations to figure out the best combination of parameters. 

In [5]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import ExtraTreesClassifier

In [23]:
# Load input data
input_file = 'data_random_forests.txt'
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]

# Separate input data into three classes based on labels
class_0 = np.array(X[y==0])
class_1 = np.array(X[y==1])
class_2 = np.array(X[y==2])

# Split the data into training and testing datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=5)

# Define the parameter grid
parameter_grid = [{'n_estimators': [100], 
                   'max_depth': [2, 4, 7, 12, 16]},
                  {'max_depth': [4],
                   'n_estimators': [25, 50, 100, 250]}]

metrics = ['precision_weighted', 'recall_weighted']

for metric in metrics:
    classifier = GridSearchCV(ExtraTreesClassifier(random_state=0),
                              parameter_grid, 
                              cv=5,
                              scoring=metric)
    classifier.fit(X_train, y_train)
    
    # Print the score for each parameter combination
#     print('Grid scores for the parameter grid:')
#     for i, params in enumerate(classifier.cv_results_):
#         print('{} --> {}'.format(params, 1))
    
    # Print the performance report
    y_pred = classifier.predict(X_test)
    print('Performance report:')
    print(classification_report(y_test, y_pred))
    

Performance report:
             precision    recall  f1-score   support

        0.0       0.94      0.81      0.87        79
        1.0       0.81      0.86      0.83        70
        2.0       0.83      0.91      0.87        76

avg / total       0.86      0.86      0.86       225

Performance report:
             precision    recall  f1-score   support

        0.0       0.93      0.84      0.88        79
        1.0       0.85      0.86      0.85        70
        2.0       0.84      0.92      0.88        76

avg / total       0.87      0.87      0.87       225

