# Grid Search

Each model is given parameters in the constructor.  We could try to tune them ourselves, but there are many possible combinations.  Checking every possible combination isn't practical, so we use grid search.  We issue possible ranges of values for each parameter, then the grid search tries them out to maximise the effectiveness of the model.

In [44]:
import numpy
from matplotlib import pyplot
from sklearn.metrics import classification_report
from sklearn import model_selection
from sklearn.ensemble import ExtraTreesClassifier

import sys

sys.path.append("../")

from common import util

## Loading the Data

In [45]:
data = numpy.loadtxt("data_random_forests.txt", delimiter=",")

# X: 2D coordinates
# Y: The integers 0, 1, or 2
X, Y = data[:, :-1], data[:, -1]

# The data set, grouped by class
classes = (X[Y == 0], X[Y == 1], X[Y == 2])

## Splitting the Data

In [46]:
split = model_selection.train_test_split(X, Y, test_size=0.25, random_state=5)

feature_train, feature_test = split[0], split[1]
class_train, class_test = split[2], split[3]

## The Parameter Grid

In practice, we usually fix one parameter and vary others.  There's usually a metric or two that we want to maximize.

In [83]:
parameters = {
    "n_estimators": [25, 50, 100, 250],
    "max_depth": [2, 4, 7, 12, 16]
}

# The metrics we want to maximize
metrics = ['precision_weighted', 'recall_weighted']

## Using Grid Search

In [84]:
for metric in metrics:
    print("{0}:".format(metric))
    extreme_forest = ExtraTreesClassifier(random_state=0)
    classifier = model_selection.GridSearchCV(extreme_forest, parameters, cv=5, scoring=metric, n_jobs=-1)
    classifier.fit(feature_train, class_train)
    
    best = lambda b: " (BEST)" if b == classifier.best_params_ else "" 

    for params, average, _ in classifier.grid_scores_:
        print("\tParams: {0}\t--> {1:.5f}{2}".format(params, average, best(params)))

    predictions = classifier.predict(feature_test)
    
    print()
    print(classification_report(class_test, predictions))


precision_weighted:
	Params: {'max_depth': 2, 'n_estimators': 25}	--> 0.83814
	Params: {'max_depth': 2, 'n_estimators': 50}	--> 0.84481
	Params: {'max_depth': 2, 'n_estimators': 100}	--> 0.84684
	Params: {'max_depth': 2, 'n_estimators': 250}	--> 0.84806
	Params: {'max_depth': 4, 'n_estimators': 25}	--> 0.84605
	Params: {'max_depth': 4, 'n_estimators': 50}	--> 0.84002
	Params: {'max_depth': 4, 'n_estimators': 100}	--> 0.84130
	Params: {'max_depth': 4, 'n_estimators': 250}	--> 0.84486
	Params: {'max_depth': 7, 'n_estimators': 25}	--> 0.84291
	Params: {'max_depth': 7, 'n_estimators': 50}	--> 0.84058
	Params: {'max_depth': 7, 'n_estimators': 100}	--> 0.84397
	Params: {'max_depth': 7, 'n_estimators': 250}	--> 0.84847 (BEST)
	Params: {'max_depth': 12, 'n_estimators': 25}	--> 0.83151
	Params: {'max_depth': 12, 'n_estimators': 50}	--> 0.82906
	Params: {'max_depth': 12, 'n_estimators': 100}	--> 0.83634
	Params: {'max_depth': 12, 'n_estimators': 250}	--> 0.82754
	Params: {'max_depth': 16, 'n_est