## **Introduction to ML for NLP [Network + Practical]**

### **Linear Support Vector Classifier**

#### **Libraries**

We have seen in notebook 7.A that we can correctly train a LinearSVC in each language.

We now have to tune the model, and store the best results.

In [1]:
# general
import pandas as pd

from utility.models_sklearn import CustomLinearSVC

print("> Libraries Imported")

> Libraries Imported


#### **Setup**

We only need to import the dataset

In [2]:
dataframe = pd.read_pickle("data/3_multi_eurlex_encoded.pkl")
dataframe.head(3)

Unnamed: 0,celex_id,labels,labels_new,text_en,text_de,text_it,text_pl,text_sv,text_en_enc,text_de_enc,text_it_enc,text_pl_enc,text_sv_enc,set
0,32010D0395,2,0,commission decision of december on state aid c...,beschluss der kommission vom dezember uber die...,decisione della commissione del dicembre conce...,decyzja komisji z dnia grudnia r w sprawie pom...,kommissionens beslut av den december om det st...,"[[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...","[[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...","[[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...","[[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...","[[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ...",train
1,32012R0453,2,0,commission implementing regulation eu no of ma...,durchfuhrungsverordnung eu nr der kommission v...,regolamento di esecuzione ue n della commissio...,rozporzadzenie wykonawcze komisji ue nr z dnia...,kommissionens genomforandeforordning eu nr av ...,"[[2, 1275, 1276, 29, 100, 4, 743, 1277, 15, 12...","[[1302, 33, 1303, 3, 4, 5, 807, 15, 1304, 3, 6...","[[453, 10, 1422, 38, 14, 3, 4, 5, 990, 1423, 1...","[[1753, 1754, 3, 34, 24, 4, 5, 829, 7, 1755, 9...","[[2, 1239, 33, 23, 4, 5, 806, 7, 774, 4, 132, ...",train
2,32012D0043,2,0,commission implementing decision of january au...,durchfuhrungsbeschluss der kommission vom janu...,decisione di esecuzione della commissione del ...,decyzja wykonawcza komisji z dnia stycznia r u...,kommissionens genomforandebeslut av den januar...,"[[2, 1275, 3, 4, 1310, 1311, 15, 1015, 4, 1312...","[[1344, 3, 4, 5, 1345, 15, 1346, 74, 1347, 134...","[[2, 10, 1422, 3, 4, 5, 1454, 245, 1455, 24, 1...","[[2, 1791, 3, 4, 5, 1792, 7, 1, 1793, 1794, 65...","[[2, 1279, 4, 5, 1280, 7, 1281, 19, 1282, 1283...",train


#### **Create a Grid Search for Fine Tuning**

First, we setup the needed parameters.

In [3]:
LANGUAGE_LIST = ["en", "de", "it", "pl", "sv"]
C_LIST = [0.1, 1, 5, 10, 50, 100, 1000]

Then, we execute a grid search.

In [4]:
# setup placeholders for results
df_language_list = []
df_c_list = []
df_mean_acc = []

# start grid search
for LANGUAGE in LANGUAGE_LIST:
    for C in C_LIST:

        # instantiate the model
        LinearSVC_temp = CustomLinearSVC(
            dataset     = dataframe,
            language    = LANGUAGE,

            C           = C # only hyperparameter
        )

        # train the model
        eval_res = LinearSVC_temp.train_model()

        # save results
        df_language_list.append(LANGUAGE)
        df_c_list.append(C)
        df_mean_acc.append(eval_res["macro avg"]["precision"])

# save results in a dataframe
global_res_df = pd.DataFrame(
    list(zip(df_language_list,df_c_list,df_mean_acc)),
    columns = ["language", "c", "validation_accuracy (mean)"]
    )

# save them and show them
global_res_df.to_csv("models/LinearSVC/training_results.csv")
global_res_df

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 0.081 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8927 

              precision    recall  f1-score   support

           0     0.9361    0.8643    0.8988       339
           1     0.8734    0.8818    0.8776       313
           2     0.8701    0.9351    0.9014       308

    accuracy                         0.8927       960
   macro avg     0.8932    0.8937    0.8926       960
weighted avg     0.8945    0.8927    0.8927       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 0.208 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.9115 

              precision    recall  f1-score   support

           0  



> Training completed in 2.1245 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8896 

              precision    recall  f1-score   support

           0     0.9021    0.8968    0.8994       339
           1     0.8963    0.8562    0.8758       313
           2     0.8704    0.9156    0.8924       308

    accuracy                         0.8896       960
   macro avg     0.8896    0.8895    0.8892       960
weighted avg     0.8900    0.8896    0.8895       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 2.0065 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8885 

              precision    recall  f1-score   support

           0     0.9042    0.8909    0.8975       339
           1     0.8970    0.8626    0.8795       313
           2     0.8646    0.9123    0.8878       308

    accuracy                         0.8885       960
   macro avg     0.8886    0.8886    0.8883       960
weighted avg     0.8892    0.8885    0.8885       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 2.0305 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8635 

              precision    recall  f1-score   support

           0     0.8020    0.9322    0.8622       339
           1     0.9091    0.7987    0.8503       313
           2     0.9038    0.8539    0.8781       308

    accuracy                         0.8635       960
   macro avg     0.8716    0.8616    0.8636       960
weighted avg     0.8696    0.8635    0.8634       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 0.101 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8792 

              precision    recall  f1-score   support

           0     0.9228    0.8466    0.8831       339
           1     0.8505    0.8722    0.8612       313
           2     0.8659    0.9221    0.8931       308

    accuracy            



> Training completed in 3.4588 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8917 

              precision    recall  f1-score   support

           0     0.9072    0.8938    0.9004       339
           1     0.8703    0.8786    0.8744       313
           2     0.8968    0.9026    0.8997       308

    accuracy                         0.8917       960
   macro avg     0.8914    0.8917    0.8915       960
weighted avg     0.8918    0.8917    0.8917       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 3.6169 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8844 

              precision    recall  f1-score   support

           0     0.9018    0.8938    0.8978       339
           1     0.8662    0.8690    0.8676       313
           2     0.8839    0.8896    0.8867       308

    accuracy                         0.8844       960
   macro avg     0.8840    0.8841    0.8840       960
weighted avg     0.8844    0.8844    0.8844       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 3.6649 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.875 

              precision    recall  f1-score   support

           0     0.8882    0.8909    0.8895       339
           1     0.8576    0.8466    0.8521       313
           2     0.8778    0.8864    0.8821       308

    accuracy                         0.8750       960
   macro avg     0.8746    0.8746    0.8746       960
weighted avg     0.8749    0.8750    0.8749       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 0.0881 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8979 

              precision    recall  f1-score   support

           0     0.9304    0.8673    0.8977       339
           1     0.8839    0.8754    0.8796       313
           2     0.8802    0.9545    0.9159       308

    accuracy            



> Training completed in 2.5716 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8865 

              precision    recall  f1-score   support

           0     0.9045    0.8938    0.8991       339
           1     0.8930    0.8530    0.8725       313
           2     0.8620    0.9123    0.8864       308

    accuracy                         0.8865       960
   macro avg     0.8865    0.8864    0.8860       960
weighted avg     0.8871    0.8865    0.8864       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 2.7296 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8802 

              precision    recall  f1-score   support

           0     0.8935    0.8909    0.8922       339
           1     0.8833    0.8466    0.8646       313
           2     0.8634    0.9026    0.8825       308

    accuracy                         0.8802       960
   macro avg     0.8801    0.8800    0.8798       960
weighted avg     0.8805    0.8802    0.8801       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 2.7719 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8677 

              precision    recall  f1-score   support

           0     0.8817    0.8791    0.8804       339
           1     0.8916    0.8147    0.8514       313
           2     0.8333    0.9091    0.8696       308

    accuracy                         0.8677       960
   macro avg     0.8689    0.8676    0.8671       960
weighted avg     0.8694    0.8677    0.8675       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 0.118 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8938 

              precision    recall  f1-score   support

           0     0.9367    0.8732    0.9038       339
           1     0.8734    0.8818    0.8776       313
           2     0.8720    0.9286    0.8994       308

    accuracy            



> Training completed in 3.9151 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8938 

              precision    recall  f1-score   support

           0     0.9212    0.8968    0.9088       339
           1     0.8803    0.8690    0.8746       313
           2     0.8785    0.9156    0.8967       308

    accuracy                         0.8938       960
   macro avg     0.8933    0.8938    0.8934       960
weighted avg     0.8942    0.8938    0.8938       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 4.0581 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8885 

              precision    recall  f1-score   support

           0     0.9157    0.8968    0.9061       339
           1     0.8791    0.8594    0.8691       313
           2     0.8696    0.9091    0.8889       308

    accuracy                         0.8885       960
   macro avg     0.8881    0.8884    0.8880       960
weighted avg     0.8889    0.8885    0.8885       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 4.0723 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8792 

              precision    recall  f1-score   support

           0     0.8844    0.9027    0.8934       339
           1     0.8866    0.8243    0.8543       313
           2     0.8669    0.9091    0.8875       308

    accuracy                         0.8792       960
   macro avg     0.8793    0.8787    0.8784       960
weighted avg     0.8795    0.8792    0.8788       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 0.089 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8865 

              precision    recall  f1-score   support

           0     0.9439    0.8437    0.8910       339
           1     0.8589    0.8946    0.8764       313
           2     0.8610    0.9253    0.8920       308

    accuracy            



> Training completed in 3.1159 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8854 

              precision    recall  f1-score   support

           0     0.9015    0.8909    0.8961       339
           1     0.8845    0.8562    0.8701       313
           2     0.8696    0.9091    0.8889       308

    accuracy                         0.8854       960
   macro avg     0.8852    0.8854    0.8851       960
weighted avg     0.8857    0.8854    0.8853       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated




> Training completed in 2.9956 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8781 

              precision    recall  f1-score   support

           0     0.8961    0.8909    0.8935       339
           1     0.8656    0.8435    0.8544       313
           2     0.8711    0.8994    0.8850       308

    accuracy                         0.8781       960
   macro avg     0.8776    0.8779    0.8776       960
weighted avg     0.8781    0.8781    0.8780       960

> Parameters imported
> Dataset correctly divided in training set, validation set and test set
> Computed TF-IDF for train, val and test set
> Model 'LinearSVC' Instantiated
> Training completed in 3.1599 seconds

> Testing the model on 'val set'
  - Accuracy Score: 0.8479 

              precision    recall  f1-score   support

           0     0.8088    0.9233    0.8623       339
           1     0.8516    0.7700    0.8087       313
           2     0.8966    0.8442    0.8696       308

    accuracy           



Unnamed: 0,language,c,validation_accuracy (mean)
0,en,0.1,0.893204
1,en,1.0,0.91119
2,en,5.0,0.902952
3,en,10.0,0.893621
4,en,50.0,0.88959
5,en,100.0,0.888606
6,en,1000.0,0.871634
7,de,0.1,0.879717
8,de,1.0,0.912142
9,de,5.0,0.898639
