# Tuning models

In this notebook we show how the tuning library can be called for any predictor implementing our `fit` and `evaluate` methods.
The tuning process can be differentiated based on a number of different settings, most importantly the set of parameters to be explored and the method of evaluation. The latter is by default set to train-test split for timing reasons (since GridSearch is a computationally expensive procedure). However more robust results are likely using `method='CV'`

In [None]:
import pandas as pd

import sys
sys.path.append('..')

from linear_predictor import LogisticPredictor, SVMPredictor
from tuning import tune, bayesian_optimization
from utils import TAGS
from preprocessing import tf_idf

In [None]:
train = pd.read_csv("../data/train.csv")
test = pd.read_csv("../data/test.csv")

# Preprocess raw text data
train_ys = {tag: train[tag].values for tag in TAGS}
train_x, test_x = tf_idf(train, test)

# Define output file
write_to = '../data/tuning.txt'

### Lets run the tuner using sample parameters choices.

We can optionally persist our results to a file for later inspection. This was we will not have to check the same parameter sets again and again.

In [None]:
param_grid = {
    'C': [4, 5],
    'dual': [True, False]
}

best_params, best_score = tune(LogisticPredictor, train_x, train_ys, param_grid, silent=False, persist=False, write_to=write_to)
print("Optimal parameters achieve log loss = {}".format(best_score))
print("Optimal 'C': {}\nOptimal 'dual': {}".format(best_params['C'], best_params['dual']))

### Comparing linear predictors

How do our classifiers perform in comparison to each other? Let's try them both with minimal tuning

In [None]:
param_grid = {
    'C': [5, 6],
    'dual': [True, False]
}

best_params, best_score = tune(SVMPredictor, train_x, train_ys, param_grid, silent=False, persist=True, write_to=write_to)
print("Optimal parameters achieve log loss = {}".format(best_score))
print("Optimal 'C': {}\nOptimal 'dual': {}".format(best_params['C'], best_params['dual']))

*The linear SVM Predictor seems to outperform logistic regression as it is not only faster but also coming up with a better log loss. **This could change given proper tuning however.***

### Bayesian Optimization

The tuning library contains an implementation of Bayesian Optimization using the GPyOpt package. Inputs are similar as above, complemented with the following:

1. The parameter bounds are specified as a list of dictionaries, where each dict specifies the name, type ("continuous" or "discrete"), and the domain of the variable. Note, continuous variables must be specified before discrete ones. Example:

2. The maximum number of iterations or the maximum allowed time needs to be specified.

3. The model type is either a Gaussian Process ('GP'), ...,

4. Acquisition function determines the next set of parameters to be evaluated.

5. Acquisition weight is used to define the balance between exploration and exploitation. 

Let's test the tuner!

In [None]:
params = [{"name": "C", "type":"continuous", "domain": (0.1, 6.0)},
          {"name": "dual", "type":"discrete", "domain": [False, True]}]

best_params, best_score = bayesian_optimization(SVMPredictor, train_x, train_ys, params, model_type='GP', acquisition_type='EI', 
                              acquisition_weight=2, max_iter=10, max_time=None, silent=True, persist=False, write_to=write_to)

print("Optimal parameters achieve log loss = {}".format(best_score))
print("Optimal 'C': {}\nOptimal 'dual': {}".format(best_params['C'], best_params['dual']))

As you can see, Bayesian Optimization let's us easily try a wider range of settings at lower costs (time). In fact, we can now evaluate continuous variables on a continuous scale, rather than choosing a fixed range of values. 

Note that at this moment, no parallel processing is implemented for Bayesian Optimization. This would greatly improve the efficiency of this approach.