# Hyperparameter Tuning for LinearSVC

This notebook provides code for hyperparameter tuning for LinearSVC. I take this model as it is the best performing model out of the models I looked at so far - LogisticRegression, NaiveBayes and LinearSVC trained with word embeddings as features, instead of one-hot representations. I check whether hyperparameter tuning results in an even better performing model for LinearSVC. I am taking the expanded set of features for hypertuning.

In [None]:
import pickle

import feature_extraction_util as extract
import classification_util as classify
import evaluation_util as evaluate

#### The cell below contains necessary definitions in order to train the model, as well as classify the test-sets with pretrained model.

In [None]:
selected_features = ['Token', 'POS', 'Chunk', 'Allcaps', 'Cap_after_lower', 'Demonym', 'Comp_suf', 'Poss_mark']

train_file = '../data/conll2003.train.preprocessed.conll'
test_file = '../data/conll2003.test.preprocessed.conll'
outputfile = '../data/conll2003.test.output.conll' # generic pathname for saving results

## Extracting the features, training and saving the model

If cell has already been executed, the trained model is loaded in the next cell.

In [None]:
# extract features and labels
training_features, gold_labels = extract.features_and_labels(train_file,selected_features)

# create classifier
hypertuned_model, vec = classify.create_hypertuned_classifier(training_features,gold_labels)

# save trained model and vectorizer
classifier_pathname = '../models/hypertuned_svm_model_conll2003.sav'
vectorizer_pathname = '../models/hypertuned_svm_vec_conll2003.sav'

pickle.dump(hypertuned_model,open(classifier_pathname,'wb'))
pickle.dump(vec,open(vectorizer_pathname,'wb'))

## Classifying the test sets with the saved model and evaluating the results

In [None]:
# load saved model and vec
classifier_pathname = '../models/hypertuned_svm_model_conll2003.sav'
vectorizer_pathname = '../models/hypertuned_svm_vec_conll2003.sav'

loaded_hypertuned_model = pickle.load(open(classifier_pathname,'rb'))
loaded_vec = pickle.load(open(vectorizer_pathname,'rb'))

# classify data and write to file
classify.classify_data(loaded_hypertuned_model,loaded_vec,selected_features,test_file,
              outputfile.replace('.conll','.hypertuned_svm.conll'))

outputdata = '../data/conll2003.test.output.hypertuned_svm.conll'

# print confusion matrix and classification report
evaluate.get_confusion_matrix_and_classification_report(outputdata,exclude_majority=True)