#CANINE vs BERT on SST-2

**In this notebook, we will use the pre-trained CANINE model to fine-tune the following NLP task: predict whether the movie review is semantically correct or not, then we compare its performance against BERT. We will use [SST-2](https://nlp.stanford.edu/sentiment/index.html), dataset consists of $70042$ sentences taken from movie reviews with human annotations of their sentiment.**

**Most of the code in this directory is taken from the huggingface notebooks, and is modified to support other models.**

#Setup

**Mount on google drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount= True)
Folder_name = 'MVA_NLP'
assert Folder_name is not None, "[1] Enter the folder name"

import sys 
sys.path.append('content/drive/MyDrive/{}'.format(Folder_name))
%cd drive/MyDrive/$Folder_name/


Mounted at /content/drive
/content/drive/MyDrive/MVA_NLP


**Check GPU**

In [None]:
import torch

if torch.cuda.is_available():    
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

**Install**

In [None]:
! pip install datasets transformers

Collecting datasets
  Downloading datasets-2.0.0-py3-none-any.whl (325 kB)
[?25l[K     |█                               | 10 kB 35.5 MB/s eta 0:00:01[K     |██                              | 20 kB 42.2 MB/s eta 0:00:01[K     |███                             | 30 kB 35.2 MB/s eta 0:00:01[K     |████                            | 40 kB 25.2 MB/s eta 0:00:01[K     |█████                           | 51 kB 20.2 MB/s eta 0:00:01[K     |██████                          | 61 kB 23.2 MB/s eta 0:00:01[K     |███████                         | 71 kB 22.9 MB/s eta 0:00:01[K     |████████                        | 81 kB 24.3 MB/s eta 0:00:01[K     |█████████                       | 92 kB 26.4 MB/s eta 0:00:01[K     |██████████                      | 102 kB 28.2 MB/s eta 0:00:01[K     |███████████                     | 112 kB 28.2 MB/s eta 0:00:01[K     |████████████                    | 122 kB 28.2 MB/s eta 0:00:01[K     |█████████████                   | 133 kB 28.2 MB/s eta

In [None]:
!apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (2,878 kB/s)
Selecting previously unselected package git-lfs.
(Reading database ... 156210 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...
Unpacking git-lfs (2.3.4-1) ...
Setting up git-lfs (2.3.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...


**Imports** 

In [None]:
import numpy as np

**Push on Huggingface: If you would like to push the results of your model and save them on your Huggingface account (sign up [here](https://huggingface.co/join)). You need to store your authentication token from your account and copy it when asked to do so (after executing the following cell).
If you wish to do so, set `push_hub = True`, otherwise, `push_hub = False`.**

In [None]:
push_hub = True 
if push_hub:
  from huggingface_hub import notebook_login
  notebook_login()
else: 
  pass

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


#Dataset

**Loading SST-2 Dataset**

In [None]:
from datasets import load_dataset, load_metric

In [None]:
task = "sst2"
dataset = load_dataset("glue", task)


Downloading builder script:   0%|          | 0.00/7.78k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/4.47k [00:00<?, ?B/s]

Downloading and preparing dataset glue/sst2 (download: 7.09 MiB, generated: 4.81 MiB, post-processed: Unknown size, total: 11.90 MiB) to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...


Downloading data:   0%|          | 0.00/7.44M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

**The dataset object itself is dictionary, which contains one key for the training, validation and test set.**

In [None]:
dataset 

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})

**To access an element of a specific split (train, validation, test) of the dataset**

In [None]:
dataset["train"][0]

{'idx': 0,
 'label': 0,
 'sentence': 'hide new secretions from the parental units '}

**To have a look on how the dataset looks like**

In [None]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(dataset["train"])

Unnamed: 0,sentence,label,idx
0,the very definition of what critics have come to term an `` ambitious failure .,negative,34197
1,in cynicism every bit,negative,28874
2,satisfyingly odd and intriguing,positive,41363
3,does n't add anything fresh to the myth,negative,65640
4,companionable,positive,19994
5,let crocodile hunter steve irwin do what he does best,positive,43547
6,attempt to do something different over actually pulling it off,positive,28932
7,memorable and resourceful,positive,65844
8,", loud , painful , obnoxious",negative,19769
9,"nurtures the multi-layers of its characters , allowing us to remember that life 's ultimately a gamble and last orders are to be embraced .",positive,41895


**Fine-Tuning a model on SST-2**

We will use the pre-trained CANINE models: CANINE-C (Canine with character loss), CANINE-S (Canine with subwords loss) and BERT (bert-base-uncased) to fine-tune it on CoLA. 

CANINE-C is pre-trained with autoregressive character loss, $12$-layer, $768$-hidden, $12$-heads, $121M$ parameters.

CANINE-S is pre-trained with subword loss, $12$-layer, $768$-hidden, $12$-heads, $121M$ parameters.

BERT, bert-base-uncased, is pretrained on lower-cased English text that consists of $12$-layers, $768$-hidden, $12$-heads, and $110M$ parameters.

In this notebook, we are using CANINE-C but to use other models, you just need to change the value of the `model_checkpoint` to `model_checkpoint = "google/canine-s"` for CANINE-S and `model_checkpoint = "bert-base-uncased"` for BERT. 

The results for all the models are presnted in the report attached to this notebook.  




In [None]:
model_checkpoint = "google/canine-c"
batch_size = 16

**Preprocess Dataset: The input to any of the models CANINE or BERT is a sequence of integers representing the data. Thus, we process the text before feeding it into the model by using transformers tokenizer which convert the tokens to their IDs in the pre-trained vocubulary.**

In [None]:
from transformers import AutoTokenizer
    
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

Downloading:   0%|          | 0.00/892 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/657 [00:00<?, ?B/s]

Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.
Using unk_token, but it is not set yet.


**See an example on how the tokenizer works:**

In [None]:
tokenizer("Hello, this one sentence!", "And this sentence goes with it.")

{'input_ids': [57344, 72, 101, 108, 108, 111, 44, 32, 116, 104, 105, 115, 32, 111, 110, 101, 32, 115, 101, 110, 116, 101, 110, 99, 101, 33, 57345, 65, 110, 100, 32, 116, 104, 105, 115, 32, 115, 101, 110, 116, 101, 110, 99, 101, 32, 103, 111, 101, 115, 32, 119, 105, 116, 104, 32, 105, 116, 46, 57345], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

**To preprocess the dataset, the name of the column containing the sentence is needed, thus we define:**

In [None]:
task_key = "sentence"

**Check if it works on the dataset**

In [None]:
print(f"Sentence: {dataset['train'][0][task_key]}")

Sentence: hide new secretions from the parental units 


**To preprocess the dataset, a function that takes all the samples and preprocess them is needed.**

In [None]:
def preprocess_function(examples):
  return tokenizer(examples[task_key], truncation=True)

**Example on how this function works:**

In [None]:
preprocess_function(dataset['train'][:5])

{'input_ids': [[57344, 104, 105, 100, 101, 32, 110, 101, 119, 32, 115, 101, 99, 114, 101, 116, 105, 111, 110, 115, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 112, 97, 114, 101, 110, 116, 97, 108, 32, 117, 110, 105, 116, 115, 32, 57345], [57344, 99, 111, 110, 116, 97, 105, 110, 115, 32, 110, 111, 32, 119, 105, 116, 32, 44, 32, 111, 110, 108, 121, 32, 108, 97, 98, 111, 114, 101, 100, 32, 103, 97, 103, 115, 32, 57345], [57344, 116, 104, 97, 116, 32, 108, 111, 118, 101, 115, 32, 105, 116, 115, 32, 99, 104, 97, 114, 97, 99, 116, 101, 114, 115, 32, 97, 110, 100, 32, 99, 111, 109, 109, 117, 110, 105, 99, 97, 116, 101, 115, 32, 115, 111, 109, 101, 116, 104, 105, 110, 103, 32, 114, 97, 116, 104, 101, 114, 32, 98, 101, 97, 117, 116, 105, 102, 117, 108, 32, 97, 98, 111, 117, 116, 32, 104, 117, 109, 97, 110, 32, 110, 97, 116, 117, 114, 101, 32, 57345], [57344, 114, 101, 109, 97, 105, 110, 115, 32, 117, 116, 116, 101, 114, 108, 121, 32, 115, 97, 116, 105, 115, 102, 105, 101, 100, 32, 116, 111, 

**Apply this funcion to all sentences in the dataset for different splits: train, valid, and test.**

In [None]:
encoded_dataset = dataset.map(preprocess_function, batched=True)

  0%|          | 0/68 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

#Loading Metrics



**Load the metric we want to evaluate our model on. The load_metric function will load the metric asscoiated with the SST-2 task, which is accuracy**

In [None]:
metric_name = "accuracy" 


In [None]:
metric = load_metric('glue', task)

Downloading builder script:   0%|          | 0.00/1.84k [00:00<?, ?B/s]

In [None]:
metric 

Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
    predictions: list of predictions to score.
        Each translation should be tokenized into a list of tokens.
    references: list of lists of references for each translation.
        Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
    "accuracy": Accuracy
    "f1": F1 score
    "pearson": Pearson Correlation
    "spearmanr": Spearman Correlation
    "matthews_correlation": Matthew Correlation
Examples:

    >>> glue_metric = datasets.load_metric('glue', 'sst2')  # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
    >>> references = [0, 1]
    >>> predictions = [0, 1]
    >>> results = glue_metric.compute(predictions=predictions, references=references)
    >>> print(res

#Fine-tuning the model


**To fine-tune the model, we need to instantiate a `Trainer` that needs the following: tokenizer (defined before to pre-process the data), pretrained model, training arguments, and a function to compute the predictions.**

**For the pre-trained model, we will use the `AutoModelForSequenceClassification` class that takes the model_checkpoints (corresponds to the model we defined before) and the number of labels which is 2 in our case (binary classification task)**

In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

num_labels = 2 
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

Downloading:   0%|          | 0.00/698 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/504M [00:00<?, ?B/s]

Some weights of CanineForSequenceClassification were not initialized from the model checkpoint at google/canine-c and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


**The [`TrainingArguments`](https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments) consist of the attributes to customize the training. From these attributes you can modify: the `batch_size`, the `num_train_epochs` which consists of the number of training epochs, the weight_decay, and the `learning_rate`. As we are doing multiple epochs, the best model may not be the one at the end of the training, thus we ask to load the best model saved (according to `metric_name`) at the end of the training using `metric_for_best_model=metric_name`. The last argument, `push_to_hub` is used to push the model on [Hub](https://huggingface.co/models) regularly during training (`push_hub ` variable is defined before).** 

In [None]:
model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
    push_to_hub=push_hub,
)

**For metric computation, we define a function that computes the metrics from the predictions using the `metric` loaded before.**

In [None]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

**Now, everything needed is defined, we can call the `Trainer`:**

In [None]:
validation_key = "validation_mismatched" if task == "mnli-mm" else "validation_matched" if task == "mnli" else "validation"
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/dinalzein/canine-c-finetuned-sst2 into local empty directory.


**Finetune our model by just calling the `train` method:**

In [None]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 67349
  Num Epochs = 5
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 21050


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3481,0.454396,0.819954
2,0.2333,0.453918,0.849771
3,0.1698,0.572195,0.856651
4,0.1402,0.679275,0.84633
5,0.112,0.773787,0.844037


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/checkpoint-4210
Configuration saved in canine-c-finetuned-sst2/checkpoint-4210/config.json
Model weights saved in canine-c-finetuned-sst2/checkpoint-4210/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/checkpoint-4210/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/checkpoint-4210/special_tokens_map.json
tokenizer config file saved in canine-c-finetuned-sst2/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `C

TrainOutput(global_step=21050, training_loss=0.22230987138816127, metrics={'train_runtime': 4698.8947, 'train_samples_per_second': 71.665, 'train_steps_per_second': 4.48, 'total_flos': 3.344492169723618e+16, 'train_loss': 0.22230987138816127, 'epoch': 5.0})

**We can check with the `evaluate` method that our `Trainer` did reload the best model properly (if it was not the last one):**

In [None]:
trainer.evaluate()

The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16


{'epoch': 5.0,
 'eval_accuracy': 0.856651376146789,
 'eval_loss': 0.5721946358680725,
 'eval_runtime': 5.0624,
 'eval_samples_per_second': 172.251,
 'eval_steps_per_second': 10.864}

**To see how your model fared you can compare it to the [GLUE Benchmark leaderboard](https://gluebenchmark.com/leaderboard).**

**Upload the results to the Hub**

In [None]:
trainer.push_to_hub()

# Hyperparameter search

**For hyperparameter search, one of these two libraries is needed: [optuna](https://optuna.org/) or [Ray Tune](https://docs.ray.io/en/latest/tune/).**

In [None]:
! pip install optuna
! pip install ray[tune]

Collecting optuna
  Downloading optuna-2.10.0-py3-none-any.whl (308 kB)
[K     |████████████████████████████████| 308 kB 25.2 MB/s 
Collecting colorlog
  Downloading colorlog-6.6.0-py2.py3-none-any.whl (11 kB)
Collecting cliff
  Downloading cliff-3.10.1-py3-none-any.whl (81 kB)
[K     |████████████████████████████████| 81 kB 12.7 MB/s 
Collecting cmaes>=0.8.2
  Downloading cmaes-0.8.2-py3-none-any.whl (15 kB)
Collecting alembic
  Downloading alembic-1.7.7-py3-none-any.whl (210 kB)
[K     |████████████████████████████████| 210 kB 43.3 MB/s 
Collecting Mako
  Downloading Mako-1.2.0-py3-none-any.whl (78 kB)
[K     |████████████████████████████████| 78 kB 8.8 MB/s 
Collecting cmd2>=1.0.0
  Downloading cmd2-2.4.0-py3-none-any.whl (150 kB)
[K     |████████████████████████████████| 150 kB 56.4 MB/s 
[?25hCollecting stevedore>=2.0.1
  Downloading stevedore-3.5.0-py3-none-any.whl (49 kB)
[K     |████████████████████████████████| 49 kB 7.0 MB/s 
[?25hCollecting pbr!=2.1.0,>=2.0.0
  Downl

**During hyperparameter search, the `Trainer` will run several trainings, so it needs to have the model defined via a function (so it can be reinitialized at each new run) instead of just having it passed. We jsut use the same function as before:**

In [None]:
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

**And we can instantiate our `Trainer` like before:**

In [None]:
trainer = Trainer(
    model_init=model_init,
    args=args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "torch_dtype": "float32",
  "transformers_version": "4.17.0",
  "type_vocab_size": 16,
  "upsampling_kernel_s

**Now, call `hyperparameter_search` method.**

In [None]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[32m[I 2022-04-03 13:40:45,517][0m A new study created in memory with name: no-name-1006a193-edbc-47dc-ae2a-208a9341133e[0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5693,0.52872,0.745413


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-0/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/special_tokens_map.json
tokenizer config file saved in canine-c-finetuned-sst2/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/special_tokens_map.json
Several commits (2) will be pushed upstream.


Training 

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4911,0.441904,0.779817
2,0.3587,0.458243,0.795872


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-1/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5642,0.633745,0.768349
2,0.573,0.661387,0.813073
3,0.5314,0.847453,0.799312
4,0.48,0.887737,0.818807
5,0.4354,0.898034,0.823394


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-2/checkpoint-16838
Configuration saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/config.json
Model weights saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `Can

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3566,0.417551,0.816514
2,0.2159,0.434812,0.83945


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-3/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4226,0.471082,0.784404
2,0.3075,0.4384,0.822248
3,0.2435,0.59691,0.819954


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-4/checkpoint-4210
Configuration saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/config.json
Model weights saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4226,0.471082,0.784404
2,0.3075,0.4384,0.822248
3,0.2435,0.59691,0.819954
4,0.2004,0.611315,0.825688


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-4/checkpoint-16840
Configuration saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/config.json
Model weights saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-sst2/run-4/checkpoint-16840 (score: 0.8256880733944955).
[32m[I 2022-04-03 

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4526,0.435278,0.802752
2,0.3058,0.427526,0.830275
3,0.223,0.472849,0.823394


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-5/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6109,0.650904,0.659404


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
[32m[I 2022-04-03 19:21:35,477][0m Trial 6 pruned. [0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initial

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3642,0.454796,0.800459


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-7/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-sst2/run-7/checkpoint-1053 (score: 0.8004587155963303).
[32m[I 2022-04-03 19:36:

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6866,0.69764,0.509174


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
[32m[I 2022-04-03 19:55:24,345][0m Trial 8 pruned. [0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initial

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6866,0.690041,0.521789


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
[32m[I 2022-04-03 20:10:33,132][0m Trial 9 pruned. [0m


**The `hyperparameter_search` method returns a `BestRun` objects, which contains the value of the maximum objective (by default the sum of all metrics) and the hyperparameters it used for that run.**

In [None]:
best_run

BestRun(run_id='3', objective=0.8394495412844036, hyperparameters={'learning_rate': 4.491828013369628e-05, 'num_train_epochs': 2, 'seed': 16, 'per_device_train_batch_size': 64})

**To reproduce the best training, just set the hyperparameters in your `TrainingArgument` before creating a `Trainer`:**

In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "torch_dtype": "float32",
  "transformers_version": "4.17.0",
  "type_vocab_size": 16,
  "upsampling_kernel_s

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3566,0.417551,0.816514
2,0.2159,0.434812,0.83945


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forwar

TrainOutput(global_step=2106, training_loss=0.3360756596042673, metrics={'train_runtime': 1828.7249, 'train_samples_per_second': 73.657, 'train_steps_per_second': 1.152, 'total_flos': 1.736900908619772e+16, 'train_loss': 0.3360756596042673, 'epoch': 2.0})