#CANINE vs BERT on COLA



**In this notebook, we will use the pre-trained CANINE model to fine-tune a a binary classification NLP task that predicts whether or not an English sentence is grammatically correct and we compare its performance against BERT. We will use [CoLA](https://nyu-mll.github.io/CoLA/), a corpus consisting of $10657$ English sentences associated with a label that tells if the sentence is grammatically correct.**

**Notebook adapted from [Hugging face text classififcation guide](https://github.com/huggingface/notebooks/blob/main/examples/text_classification.ipynb)**

#Setup

**Mount on google drive**

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount= True)
Folder_name = 'MVA_NLP'
assert Folder_name is not None, "[1] Enter the folder name"

import sys 
sys.path.append('content/drive/MyDrive/{}'.format(Folder_name))
%cd drive/MyDrive/$Folder_name/


Mounted at /content/drive
/content/drive/MyDrive/MVA_NLP


**Check GPU**

In [None]:
import torch

if torch.cuda.is_available():    
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

**Install**

In [None]:
! pip install datasets transformers
!apt install git-lfs

**Imports** 

In [None]:
import numpy as np
import random
import pandas as pd

#Dataset

**Loading [CoLA](https://nyu-mll.github.io/CoLA/) Dataset**

In [None]:
from datasets import load_dataset, load_metric

In [None]:
task = "cola"
dataset = load_dataset("glue", task)

dataset 

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1063
    })
})

**To have a look on how the dataset looks like**

In [None]:
import datasets
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [None]:
show_random_elements(dataset["train"])

Unnamed: 0,sentence,label,idx
0,Two miles are as far as they can walk.,unacceptable,4161
1,He has left.,acceptable,403
2,"In the classroom, the teacher praised John, whom I also respect.",acceptable,4941
3,Who did that Plato loved seem to be known by everyone.,unacceptable,8054
4,The excellent whisky which I went to the store and have bought was very costly.,unacceptable,1282
5,The man arrived on the train was my brother.,unacceptable,219
6,John saw more horses than Bill saw cows or Pete talked to cats.,acceptable,6558
7,Emma and Harriet were attacked by those bandits.,acceptable,6964
8,I searched for treasure in the cave.,acceptable,3023
9,The apple was bitten by John.,acceptable,5940


**Fine-tuning a model on CoLA:**

**We will use the pre-trained CANINE models: CANINE-C (Canine with character loss), CANINE-S (Canine with subwords loss) and BERT (bert-base-uncased) to fine-tune it on CoLA.**

**CANINE-C is pre-trained with autoregressive character loss, $12$-layer, $768$-hidden, $12$-heads, $121M$ parameters.**

**CANINE-S is pre-trained with subword loss, $12$-layer, $768$-hidden, $12$-heads, $121M$ parameters.**

**BERT, bert-base-uncased, is pretrained on lower-cased English text that consists of $12$-layers, $768$-hidden, $12$-heads, and $110M$ parameters.**

**In this notebook, we are using CANINE-C but to use other models, you just need to change the value of the `model_checkpoint` to `model_checkpoint = "google/canine-s"` for CANINE-S and `model_checkpoint = "bert-base-uncased"` for BERT.** 

**The results for all the models are presnted in the report attached to this notebook.**  



In [None]:
model_checkpoint = "google/canine-c"
batch_size = 16

**Preprocessing Dataset**

In [None]:
from transformers import AutoTokenizer
#choose a tokenizer that works with the model chosen.
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)

In [None]:
task_key = "sentence"
def preprocess_function(examples):
  return tokenizer(examples[task_key], truncation=True)
encoded_dataset = dataset.map(preprocess_function, batched=True)

  0%|          | 0/9 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

#Loading Metrics



**Load the metric we want to evaluate our model on. The metric asscoiated with the CoLA task is Matthews Correlation Coefficient. 
For more information on how this metric works, check: [Matthews Correlation Coefficient](https://en.wikipedia.org/wiki/Matthews_correlation_coefficient).**


In [None]:
metric_name = "matthews_correlation" 
metric = load_metric('glue', task)
metric 

Metric(name: "glue", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: """
Compute GLUE evaluation metric associated to each GLUE dataset.
Args:
    predictions: list of predictions to score.
        Each translation should be tokenized into a list of tokens.
    references: list of lists of references for each translation.
        Each reference should be tokenized into a list of tokens.
Returns: depending on the GLUE subset, one or several of:
    "accuracy": Accuracy
    "f1": F1 score
    "pearson": Pearson Correlation
    "spearmanr": Spearman Correlation
    "matthews_correlation": Matthew Correlation
Examples:

    >>> glue_metric = datasets.load_metric('glue', 'sst2')  # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]
    >>> references = [0, 1]
    >>> predictions = [0, 1]
    >>> results = glue_metric.compute(predictions=predictions, references=references)
    >>> print(res

#Fine-tuning the model



In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

num_labels = 2 
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

Downloading:   0%|          | 0.00/698 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/504M [00:00<?, ?B/s]

Some weights of CanineForSequenceClassification were not initialized from the model checkpoint at google/canine-c and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-{task}",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=5,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name
)


In [None]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
validation_key = "validation_mismatched" if task == "mnli-mm" else "validation_matched" if task == "mnli" else "validation"
trainer = Trainer(
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

Cloning https://huggingface.co/dinalzein/canine-c-finetuned-cola into local empty directory.


In [None]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 8551
  Num Epochs = 5
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 2675


Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6136,0.639571,0.0
2,0.6086,0.617789,0.0
3,0.6127,0.617412,0.0
4,0.5957,0.621486,0.0
5,0.5677,0.668877,0.064819


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/checkpoint-535
Configuration saved in canine-c-finetuned-cola/checkpoint-535/config.json
Model weights saved in canine-c-finetuned-cola/checkpoint-535/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/checkpoint-535/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/checkpoint-535/special_tokens_map.json
tokenizer config file saved in canine-c-finetuned-cola/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `Canin

TrainOutput(global_step=2675, training_loss=0.595980549749927, metrics={'train_runtime': 330.9922, 'train_samples_per_second': 129.172, 'train_steps_per_second': 8.082, 'total_flos': 2378238037956300.0, 'train_loss': 0.595980549749927, 'epoch': 5.0})

In [None]:
trainer.evaluate()

The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16


{'epoch': 5.0,
 'eval_loss': 0.6688772439956665,
 'eval_matthews_correlation': 0.06481858054613046,
 'eval_runtime': 1.9749,
 'eval_samples_per_second': 528.125,
 'eval_steps_per_second': 33.419}

**To see how your model performed you can compare it to the [GLUE Benchmark leaderboard](https://gluebenchmark.com/leaderboard).**

# Hyperparameter search

In [None]:
! pip install optuna
! pip install ray[tune]

In [None]:
def model_init():
    return AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)

In [None]:
trainer = Trainer(
    model_init=model_init,
    args=args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset[validation_key],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "torch_dtype": "float32",
  "transformers_version": "4.17.0",
  "type_vocab_size": 16,
  "upsampling_kernel_s

In [None]:
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")

[32m[I 2022-03-27 22:46:22,220][0m A new study created in memory with name: no-name-50794803-50bd-4726-af43-1295c6580844[0m
Trial:
loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6132,0.624533,0.0
2,0.6094,0.623943,0.0
3,0.6085,0.621228,0.0
4,0.6084,0.620064,0.0
5,0.6103,0.619751,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-0/checkpoint-535
Configuration saved in canine-c-finetuned-cola/run-0/checkpoint-535/config.json
Model weights saved in canine-c-finetuned-cola/run-0/checkpoint-535/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-0/checkpoint-535/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-0/checkpoint-535/special_tokens_map.json
tokenizer config file saved in canine-c-finetuned-cola/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/special_tokens_map.json
Several commits (2) will be pushed upstream.
The following c

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,No log,0.624367,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-1/checkpoint-268
Configuration saved in canine-c-finetuned-cola/run-1/checkpoint-268/config.json
Model weights saved in canine-c-finetuned-cola/run-1/checkpoint-268/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-1/checkpoint-268/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-1/checkpoint-268/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-cola/run-1/checkpoint-268 (score: 0.0).
[32m[I 2022-03-27 22:53:30,882][0m Trial 1 

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6057,0.628829,0.0
2,0.6065,0.618925,0.0
3,0.6083,0.61867,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-2/checkpoint-1069
Configuration saved in canine-c-finetuned-cola/run-2/checkpoint-1069/config.json
Model weights saved in canine-c-finetuned-cola/run-2/checkpoint-1069/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-2/checkpoint-1069/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-2/checkpoint-1069/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineF

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,No log,0.617921,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-3/checkpoint-134
Configuration saved in canine-c-finetuned-cola/run-3/checkpoint-134/config.json
Model weights saved in canine-c-finetuned-cola/run-3/checkpoint-134/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-3/checkpoint-134/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-3/checkpoint-134/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-cola/run-3/checkpoint-134 (score: 0.0).
[32m[I 2022-03-27 22:59:13,974][0m Trial 3 

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6254,0.620574,0.0
2,0.626,0.639653,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-4/checkpoint-2138
Configuration saved in canine-c-finetuned-cola/run-4/checkpoint-2138/config.json
Model weights saved in canine-c-finetuned-cola/run-4/checkpoint-2138/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-4/checkpoint-2138/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-4/checkpoint-2138/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineF

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.624,0.620024,0.0
2,0.6161,0.621107,0.0
3,0.608,0.6214,0.0
4,0.6195,0.6206,0.0
5,0.6129,0.620881,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-5/checkpoint-2138
Configuration saved in canine-c-finetuned-cola/run-5/checkpoint-2138/config.json
Model weights saved in canine-c-finetuned-cola/run-5/checkpoint-2138/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-5/checkpoint-2138/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-5/checkpoint-2138/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineF

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6093,0.620018,0.0
2,0.6088,0.618146,0.0
3,0.6142,0.623934,0.0
4,0.5982,0.631749,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-6/checkpoint-535
Configuration saved in canine-c-finetuned-cola/run-6/checkpoint-535/config.json
Model weights saved in canine-c-finetuned-cola/run-6/checkpoint-535/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-6/checkpoint-535/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-6/checkpoint-535/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSeq

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6229,0.62104,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-7/checkpoint-2138
Configuration saved in canine-c-finetuned-cola/run-7/checkpoint-2138/config.json
Model weights saved in canine-c-finetuned-cola/run-7/checkpoint-2138/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-7/checkpoint-2138/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-7/checkpoint-2138/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-cola/run-7/checkpoint-2138 (score: 0.0).
[32m[I 2022-03-27 23:22:29,531][0m Tr

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6134,0.627335,0.0
2,0.6143,0.623007,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-8/checkpoint-2138
Configuration saved in canine-c-finetuned-cola/run-8/checkpoint-2138/config.json
Model weights saved in canine-c-finetuned-cola/run-8/checkpoint-2138/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-8/checkpoint-2138/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-8/checkpoint-2138/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineF

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6063,0.624228,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/run-9/checkpoint-2138
Configuration saved in canine-c-finetuned-cola/run-9/checkpoint-2138/config.json
Model weights saved in canine-c-finetuned-cola/run-9/checkpoint-2138/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/run-9/checkpoint-2138/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/run-9/checkpoint-2138/special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from canine-c-finetuned-cola/run-9/checkpoint-2138 (score: 0.0).
[32m[I 2022-03-27 23:29:33,230][0m Tr

In [None]:
best_run

BestRun(run_id='0', objective=0.0, hyperparameters={'learning_rate': 2.068234516547173e-06, 'num_train_epochs': 5, 'seed': 14, 'per_device_train_batch_size': 16})

In [None]:
for n, v in best_run.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

loading configuration file https://huggingface.co/google/canine-c/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/6b093dfc17fa050a4c019e4c09e2741b9b033068d20773077495920af01a7579.71fffe7f3108fd2f56b687ac1950da52fbfa8d85b6a0f311454ba92945232018
Model config CanineConfig {
  "_name_or_path": "google/canine-c",
  "architectures": [
    "CanineModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 57344,
  "downsampling_rate": 4,
  "eos_token_id": 57345,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "local_transformer_stride": 128,
  "max_position_embeddings": 16384,
  "model_type": "canine",
  "num_attention_heads": 12,
  "num_hash_buckets": 16384,
  "num_hash_functions": 8,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "torch_dtype": "float32",
  "transformers_version": "4.17.0",
  "type_vocab_size": 16,
  "upsampling_kernel_s

Epoch,Training Loss,Validation Loss,Matthews Correlation
1,0.6132,0.624533,0.0
2,0.6094,0.623943,0.0
3,0.6085,0.621228,0.0
4,0.6084,0.620064,0.0
5,0.6103,0.619751,0.0


The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1043
  Batch size = 16
Saving model checkpoint to canine-c-finetuned-cola/checkpoint-535
Configuration saved in canine-c-finetuned-cola/checkpoint-535/config.json
Model weights saved in canine-c-finetuned-cola/checkpoint-535/pytorch_model.bin
tokenizer config file saved in canine-c-finetuned-cola/checkpoint-535/tokenizer_config.json
Special tokens file saved in canine-c-finetuned-cola/checkpoint-535/special_tokens_map.json
The following columns in the evaluation set  don't have a corresponding argument in `CanineForSequenceClassification.forward` and have been ignored: idx, sentence. If idx, sentence are not expected by `CanineForSequenceClassification.forward`, 

TrainOutput(global_step=2675, training_loss=0.6102783602420415, metrics={'train_runtime': 314.4352, 'train_samples_per_second': 135.974, 'train_steps_per_second': 8.507, 'total_flos': 2370458675224140.0, 'train_loss': 0.6102783602420415, 'epoch': 5.0})