# Tuning models

In this notebook we show how the tuning library can be called for any predictor implementing our `fit` and `evaluate` methods.
The tuning process can be differentiated based on a number of different settings, most importantly the set of parameters to be explored and the method of evaluation. The latter is by default set to train-test split for timing reasons (since GridSearch is a computationally expensive procedure). However more robust results are likely using `method='CV'`

In [None]:
import pandas as pd

import sys
sys.path.append('..')

from linear_predictor import LogisticPredictor
from tuning import tune
from utils import TAGS
from preprocessing import tf_idf

train = pd.read_csv("../data/train.csv")
test = pd.read_csv("../data/test.csv")

# Preprocess raw text data
train_ys = {tag: train[tag].values for tag in TAGS}
train_x, test_x = tf_idf(train, test)

### Lets run the tuner using sample parameters choices.

We can optionally persist our results to a file for later inspection. This was we will not have to check the same parameter sets again and again.

In [None]:
write_to = '../data/tuning.txt'

param_grid = {
    'C': [4, 5],
    'dual': [True, False]
}

best_params, best_score = tune(LogisticPredictor, train_x, train_ys, param_grid, silent=False, persist=True, write_to=write_to)
print("Optimal parameters achieve log loss = {}".format(best_score))
print("Optimal 'C': {}\nOptimal 'dual': {}".format(best_params['C'], best_params['dual']))