# Scikit Learn - Parameter Selection

We can optimize our choice of parameters for a model by comparing the results using different metrics. One of the most intuitive metric to use is the loss value, which is what we want to minimize.

GridSearchCV is a technique which allows us to train a classifier using multiple choices of parameters (grid) and storing the best parameter / model by the score of our choice (i.e. loss function).

Let us explore the same Clayton Kershaw data we saw for multi-classification.

In [1]:
# import useful tools
import csv
import pickle
from sklearn import metrics
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
from sklearn.model_selection import cross_val_score

In [2]:
# Reading pitches thrown for Clayton Kershaw.
csv_file = 'data/ClaytonKershaw.csv'
file = open(csv_file, "r")
reader = csv.reader(file)

# store each instances and target values
instances = []
target = []
row_num = 0 
for row in reader:
    if row_num is 0:
        header = row
    else:
        col_num = 0
        features = []
        for col in row:
            if col_num is 0:
                target.append(int(col))
                instances.append([])
            else:
                instances[row_num-1].append(int(col))
            col_num += 1
    row_num +=1
file.close()
data = [instances, target]

In [3]:
# Split X and y values.
X = data[0]
y = data[1]
n_samples = len(X)

# split into a training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
clf = MLPClassifier()
clf.fit(X_train, y_train)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

In [4]:
# Evaluation using metrics
actual = y_test
predicted = clf.predict(X_test)
print("Classification report for classifier %s:\n%s\n"
  % (clf, metrics.classification_report(actual, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(actual, predicted))

Classification report for classifier MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False):
             precision    recall  f1-score   support

          1       0.65      0.83      0.73      1130
          7       0.41      0.25      0.31       564
         10       0.38      0.23      0.29       268

avg / total       0.54      0.58      0.55      1962


Confusion matrix:
[[943 133  54]
 [375 141  48]
 [132  74  62]]


## Idea of iterations (naive Box's Loop)

We want to enhance from the result above by trying to fit the data with a different model.

Indeed, the choice of our best-model will result from an iterative method where we first 

(1) select a model
(2) fit a model
(3) evaluate a model
(4) change parameters for a model and repeat the process

The following techinque will go through the iteration steps and store the best model

In [5]:
# we can define a 'grid' of paramters to iteratively test different models
# note that adding a new element to the grid makes the computation become exponentially larger

param_grid = {'hidden_layer_sizes': [(10, ), (25, ), (50, ), (75, ), (100, )],
                    'activation' : ['relu', 'logistic'],
                    'solver' : ['lbfgs', 'sgd', 'adam'],
                    'alpha' : [0.001, 0.005, 0.01],
                    'batch_size' : [200,400,600],                    
                    }

In [6]:
# perform GridSearchCV using the parameter grid (ignore convergence warnings for now)
grid_search = GridSearchCV(clf, param_grid=param_grid)
grid_search.fit(X_train, y_train)
best_clf = grid_search.best_estimator_



In [9]:
# Evaluation the best-selected-model and compare with the results above.
# Note that improvement might seem negligible for the choice of wrong hypothesis
# A model selection must be made with thorough analysis of a problem before choosing parameters.

actual = y_test
predicted = best_clf.predict(X_test)
print("Classification report for classifier %s:\n%s\n"
  % (best_clf, metrics.classification_report(actual, predicted)))
# confusion matrix
print("Confusion matrix:\n%s" % metrics.confusion_matrix(actual, predicted))

Classification report for classifier MLPClassifier(activation='relu', alpha=0.005, batch_size=200, beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(50,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False):
             precision    recall  f1-score   support

          1       0.67      0.79      0.73      1130
          7       0.41      0.29      0.34       564
         10       0.37      0.31      0.34       268

avg / total       0.55      0.58      0.56      1962


Confusion matrix:
[[895 157  78]
 [336 163  65]
 [103  81  84]]
