#CANINE vs BERT on SST-2

**In this notebook, we will use the pre-trained CANINE model to fine-tune the following NLP task: predict whether the movie review is semantically correct or not, then we compare its performance against BERT. We will use [SST-2](https://nlp.stanford.edu/sentiment/index.html), dataset consists of $70042$ sentences taken from movie reviews with human annotations of their sentiment.**

**Notebook adapted from [Hugging face text classififcation guide](https://github.com/huggingface/notebooks/blob/main/examples/text_classification.ipynb)**

#Setup

**Mount on google drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount= True)
Folder_name = 'MVA_NLP'
assert Folder_name is not None, "[1] Enter the folder name"

import sys 
sys.path.append('content/drive/MyDrive/{}'.format(Folder_name))
%cd drive/MyDrive/$Folder_name/


Mounted at /content/drive
/content/drive/MyDrive/MVA_NLP


**Check GPU**

In [None]:
import torch

if torch.cuda.is_available():    
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

**Install**

In [None]:
! pip install datasets transformers
!apt install git-lfs

**Imports** 

In [None]:
import numpy as np
import random
import pandas as pd

#Dataset

**Loading SST-2 Dataset**

In [None]:
from datasets import load_dataset, load_metric

In [None]:
task = "sst2"
dataset = load_dataset("glue", task)
dataset 

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})

**To have a look on how the dataset looks like**

In [None]:
import datasets
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(dataset["train"])

Unnamed: 0,sentence,label,idx
0,the very definition of what critics have come to term an `` ambitious failure .,negative,34197
1,in cynicism every bit,negative,28874
2,satisfyingly odd and intriguing,positive,41363
3,does n't add anything fresh to the myth,negative,65640
4,companionable,positive,19994
5,let crocodile hunter steve irwin do what he does best,positive,43547
6,attempt to do something different over actually pulling it off,positive,28932
7,memorable and resourceful,positive,65844
8,", loud , painful , obnoxious",negative,19769
9,"nurtures the multi-layers of its characters , allowing us to remember that life 's ultimately a gamble and last orders are to be embraced .",positive,41895


**Fine-Tuning a model on SST-2**

We will use the pre-trained CANINE models: CANINE-C (Canine with character loss), CANINE-S (Canine with subwords loss) and BERT (bert-base-uncased) to fine-tune it on CoLA. 

CANINE-C is pre-trained with autoregressive character loss, $12$-layer, $768$-hidden, $12$-heads, $121M$ parameters.

CANINE-S is pre-trained with subword loss, $12$-layer, $768$-hidden, $12$-heads, $121M$ parameters.

BERT, bert-base-uncased, is pretrained on lower-cased English text that consists of $12$-layers, $768$-hidden, $12$-heads, and $110M$ parameters.

In this notebook, we are using CANINE-C but to use other models, you just need to change the value of the `model_checkpoint` to `model_checkpoint = "google/canine-s"` for CANINE-S and `model_checkpoint = "bert-base-uncased"` for BERT. 

The results for all the models are presnted in the report attached to this notebook.  




In [None]:
model_checkpoint = "google/canine-c"
batch_size = 16

**Data Preprocessing**

In [None]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

task_key = "sentence"
def preprocess_function(examples):
  return tokenizer(examples[task_key], truncation=True)
encoded_dataset = dataset.map(preprocess_function, batched=True)

  0%|          | 0/68 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

#Loading Metrics



In [None]:
metric_name = "accuracy" 
metric = load_metric('glue', task)
metric 

Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
    predictions: list of predictions to score.
        Each translation should be tokenized into a list of tokens.
    references: list of lists of references for each translation.
        Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
    "accuracy": Accuracy
    "f1": F1 score
    "pearson": Pearson Correlation
    "spearmanr": Spearman Correlation
    "matthews_correlation": Matthew Correlation
Examples:

    >>> glue_metric = datasets.load_metric('glue', 'sst2')  # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
    >>> references = [0, 1]
    >>> predictions = [0, 1]
    >>> results = glue_metric.compute(predictions=predictions, references=references)
    >>> print(res

#Fine-tuning the model


In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

num_labels = 2 
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
)

Downloading:   0%|          | 0.00/698 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/504M [00:00<?, ?B/s]

Some weights of CanineForSequenceClassification were not initialized from the model checkpoint at google/canine-c and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/dinalzein/canine-c-finetuned-sst2 into local empty directory.


In [None]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 67349
  Num Epochs = 5
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 21050


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3481,0.454396,0.819954
2,0.2333,0.453918,0.849771
3,0.1698,0.572195,0.856651
4,0.1402,0.679275,0.84633
5,0.112,0.773787,0.844037


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/checkpoint-4210
Configuration saved in canine-c-finetuned-sst2/checkpoint-4210/config.json
Model weights saved in canine-c-finetuned-sst2/checkpoint-4210/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/checkpoint-4210/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/checkpoint-4210/special_tokens_map.json
tokenizer config file saved in canine-c-finetuned-sst2/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `C

TrainOutput(global_step=21050, training_loss=0.22230987138816127, metrics={'train_runtime': 4698.8947, 'train_samples_per_second': 71.665, 'train_steps_per_second': 4.48, 'total_flos': 3.344492169723618e+16, 'train_loss': 0.22230987138816127, 'epoch': 5.0})

In [None]:
trainer.evaluate()

The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16


{'epoch': 5.0,
 'eval_accuracy': 0.856651376146789,
 'eval_loss': 0.5721946358680725,
 'eval_runtime': 5.0624,
 'eval_samples_per_second': 172.251,
 'eval_steps_per_second': 10.864}

**To see how your model fared you can compare it to the [GLUE Benchmark leaderboard](https://gluebenchmark.com/leaderboard).**

# Hyperparameter search

In [None]:
! pip install optuna
! pip install ray[tune]

In [None]:
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

In [None]:
trainer = Trainer(
    model_init=model_init,
    args=args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "torch_dtype": "float32",
  "transformers_version": "4.17.0",
  "type_vocab_size": 16,
  "upsampling_kernel_s

In [None]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[32m[I 2022-04-03 13:40:45,517][0m A new study created in memory with name: no-name-1006a193-edbc-47dc-ae2a-208a9341133e[0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5693,0.52872,0.745413


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-0/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-0/checkpoint-1053/special_tokens_map.json
tokenizer config file saved in canine-c-finetuned-sst2/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/special_tokens_map.json
Several commits (2) will be pushed upstream.


Training 

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4911,0.441904,0.779817
2,0.3587,0.458243,0.795872


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-1/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-1/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5642,0.633745,0.768349
2,0.573,0.661387,0.813073
3,0.5314,0.847453,0.799312
4,0.48,0.887737,0.818807
5,0.4354,0.898034,0.823394


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-2/checkpoint-16838
Configuration saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/config.json
Model weights saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-2/checkpoint-16838/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `Can

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3566,0.417551,0.816514
2,0.2159,0.434812,0.83945


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-3/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-3/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4226,0.471082,0.784404
2,0.3075,0.4384,0.822248
3,0.2435,0.59691,0.819954


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-4/checkpoint-4210
Configuration saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/config.json
Model weights saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-4/checkpoint-4210/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4226,0.471082,0.784404
2,0.3075,0.4384,0.822248
3,0.2435,0.59691,0.819954
4,0.2004,0.611315,0.825688


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-4/checkpoint-16840
Configuration saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/config.json
Model weights saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-4/checkpoint-16840/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-sst2/run-4/checkpoint-16840 (score: 0.8256880733944955).
[32m[I 2022-04-03 

Epoch,Training Loss,Validation Loss,Accuracy
1,0.4526,0.435278,0.802752
2,0.3058,0.427526,0.830275
3,0.223,0.472849,0.823394


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-5/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-5/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineFo

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6109,0.650904,0.659404


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
[32m[I 2022-04-03 19:21:35,477][0m Trial 6 pruned. [0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initial

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3642,0.454796,0.800459


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/run-7/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/run-7/checkpoint-1053/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-sst2/run-7/checkpoint-1053 (score: 0.8004587155963303).
[32m[I 2022-04-03 19:36:

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6866,0.69764,0.509174


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
[32m[I 2022-04-03 19:55:24,345][0m Trial 8 pruned. [0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initial

Epoch,Training Loss,Validation Loss,Accuracy
1,0.6866,0.690041,0.521789


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
[32m[I 2022-04-03 20:10:33,132][0m Trial 9 pruned. [0m


In [None]:
best_run

BestRun(run_id='3', objective=0.8394495412844036, hyperparameters={'learning_rate': 4.491828013369628e-05, 'num_train_epochs': 2, 'seed': 16, 'per_device_train_batch_size': 64})

In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "torch_dtype": "float32",
  "transformers_version": "4.17.0",
  "type_vocab_size": 16,
  "upsampling_kernel_s

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3566,0.417551,0.816514
2,0.2159,0.434812,0.83945


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 872
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-sst2/checkpoint-1053
Configuration saved in canine-c-finetuned-sst2/checkpoint-1053/config.json
Model weights saved in canine-c-finetuned-sst2/checkpoint-1053/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-sst2/checkpoint-1053/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-sst2/checkpoint-1053/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: sentence, idx. If sentence, idx are not expected by `CanineForSequenceClassification.forwar

TrainOutput(global_step=2106, training_loss=0.3360756596042673, metrics={'train_runtime': 1828.7249, 'train_samples_per_second': 73.657, 'train_steps_per_second': 1.152, 'total_flos': 1.736900908619772e+16, 'train_loss': 0.3360756596042673, 'epoch': 2.0})