<a href="https://colab.research.google.com/github/Chirann/FDU_NLP/blob/main/Bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pretrained Language Model (PLM)
Recently, there are mainly two types of learning paradigm in NLP: pretrain-then-finetuning and pretrain-prompt-predict. Both of them requires the costly pretraining phase, and the difference is if we need additional task-specific training (i.e., finetuning) or not (i.e., prompt-predict).

Since pretraining is very costly, there are many open-source PLMs available, such as [Huggingface](https://huggingface.co/docs/transformers/index). Today, we will learn how to use these PLMs for downstream tasks. Note that each PLM has two separate learned models:

*   Pre-trained tokenizer
*   Pre-trained model

These two models must be paired well! Otherwise, the input vocabulary will be different and cause errors.

Here are illustrations of the two paradigms:

## pretrain-then-finetuning
![](https://drive.google.com/uc?export=view&id=1DdxxKb15LUofLw3fgKcPw7xrx8ClczHy)

Pretrain-then-finetuning will re-use the parameters (except the output layer) for downstream tasks. For a specific task, e.g., text classification, here are the steps:

1.   We instantiate a tokenizer, whose parameters are loaded from a pre-trained tokenizer.
2.   we randomly initialize a network that has the same architecture with the PLM. Optionally, the output layer may be different or the same.
3.   Load the pre-trained parameters from PLM.
    * if the final output layer is the same, we also copy the parameters.
    * if the final output layer is different, we randomly initialize.
4.   Finetuning. The loaded parameters can be freeze (*required_grad=False*) or tuned (*required_grad=True*). Note that if the final output layer is different, we must tune them.

## pretrain-prompt-predict
![](https://drive.google.com/uc?export=view&id=1vaa3cPB30X7esBpkl392YtuBTNat8qHk)

Pretrain-prompt-predict will re-use all the parameters including the output layer for downstream tasks. And, there will be no further training. The only thing you can do is to design different input template or the verbalizer for better performance.

## prompt-based tuning
Alternatively, there are a new trend to finetuning a small amount of parameters following pretrain-prompt-predict paradigm. The tuned parameters are called soft prompt (e.g., some special tokens in the inputs or randomly initialized hidden vectors in layers). Note that this is different from finetuning because these is no additional output layer.

Today, we will learn how to use Huggingface pre-trained BERT model for a text classification task: Natural Language Inference (NLI).

In [1]:
#@title show your CPU or GPU details
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 16197269057422960445
 xla_global_id: -1,
 name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 14626652160
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 13482051039700097479
 physical_device_desc: "device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5"
 xla_global_id: 416903419]

In [2]:
#@title connect google drive folder

from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/SMU_MITB_NLP/week9/

Mounted at /content/drive
/content/drive/MyDrive/SMU_MITB_NLP/week9


# BERT for text classification

[BERT](https://arxiv.org/abs/1810.04805) is a deep Bidirectional Transformers that can encode a sequence of words and output their contextualized representations. It opens an era of pretraining-then-finetuning and achieves great success in many downstream tasks.

Hugging Face Transformers is a Python library that provides many PLMs including BERT. We can use them for text classification, token classification, masked language, question answer, or even obtain the output hidden states for custom BERT. [Here](https://huggingface.co/docs/transformers/v4.29.1/en/model_doc/bert#transformers.BertConfig) are various implementations based on BERT.

Except for this tutorial, here is another detailed [reference colab tutorial](https://colab.research.google.com/drive/1pxc-ehTtnVM72-NViET_D2ZqOlpOi2LH?usp=sharing#scrollTo=5WzqhpquoD4E).

It is also recommended to request a **GPU** for training.


We're going to go through a few use cases:
* [Tokenizers](https://huggingface.co/docs/transformers/main_classes/tokenizer)
* BERT Models
* Finetuning.

In [3]:
# install hugging face packages
!pip install transformers
!pip install datasets==2.19.2
!pip install accelerate

Collecting datasets==2.19.2
  Downloading datasets-2.19.2-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets==2.19.2)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets==2.19.2)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets==2.19.2)
  Downloading multiprocess-0.70.17-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.3.1,>=2023.1.0 (from fsspec[http]<=2024.3.1,>=2023.1.0->datasets==2.19.2)
  Downloading fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
INFO: pip is looking at multiple versions of multiprocess to determine which version is compatible with other requirements. This could take a while.
Collecting multiprocess (from datasets==2.19.2)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-2.19.2-py3-none-any.whl (542 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [4]:
#@title import packages
import transformers
from transformers import BertTokenizer, AutoTokenizer, BertModel, BertForSequenceClassification
from transformers import get_linear_schedule_with_warmup
import torch
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler, TensorDataset
from torch.optim import AdamW, lr_scheduler
from datasets import load_metric
from accelerate import Accelerator
import torch.nn as nn
import torch.nn.functional as F

import pickle
import csv
import os
from typing import List, Optional, Union
import dataclasses
import json
import math
import numpy as np
transformers.logging.set_verbosity_error()


## Tokenizer
The tokenizers take raw strings (e.g., sentences) and output a list of tokens in the vocabulary as the model inputs.

You can access tokenizers either with model-specific Tokenizer class (e.g., *BertTokenizer* from BERT model), or with the AutoTokenizer class to decide the tokenizer class automatically.

In [5]:
#@title tokenizer instantiation
emp_tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') # convenient! Defaults to Fast
print(emp_tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

BertTokenizerFast(name_or_path='bert-base-uncased', vocab_size=30522, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}




In [6]:
#@title tokenize a sentence
emp_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = "HuggingFace Transformers is great!"
emp_tokens = emp_tokenizer.tokenize(text)
emp_ids = emp_tokenizer.convert_tokens_to_ids(emp_tokens)
emp_ids_special_tokens = [emp_tokenizer.cls_token_id] + emp_ids + [emp_tokenizer.sep_token_id]
decoded_str = emp_tokenizer.decode(emp_ids_special_tokens)


print("tokenize:             ", emp_tokens)
print("convert_tokens_to_ids:", emp_ids)
print("add special tokens:   ", emp_ids_special_tokens)
print("--------")
print("decode:               ", decoded_str)

tokenize:              ['hugging', '##face', 'transformers', 'is', 'great', '!']
convert_tokens_to_ids: [17662, 12172, 19081, 2003, 2307, 999]
add special tokens:    [101, 17662, 12172, 19081, 2003, 2307, 999, 102]
--------
decode:                [CLS] huggingface transformers is great! [SEP]


In [7]:
#@title tokenize for dataset curation

emp_premise = "No Weapons of Mass Destruction Found in Iraq Yet."
emp_hypothesis = "Weapons of Mass Destruction Found in Iraq."
emp_p_tokens = emp_tokenizer(emp_premise, return_tensors="pt")
emp_h_tokens = emp_tokenizer(emp_hypothesis, return_tensors="pt")
emp_tokens = emp_tokenizer.encode_plus(
                emp_premise,
                emp_hypothesis,
                truncation='longest_first',
                add_special_tokens=True,
                max_length=20,
                return_tensors='pt',
            )

print("premise token ids:    ", emp_p_tokens)
print("hypothesis token ids: ", emp_h_tokens)
print("truncated token ids:  ", emp_tokens)

premise token ids:     {'input_ids': tensor([[ 101, 2053, 4255, 1997, 3742, 6215, 2179, 1999, 5712, 2664, 1012,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
hypothesis token ids:  {'input_ids': tensor([[ 101, 4255, 1997, 3742, 6215, 2179, 1999, 5712, 1012,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
truncated token ids:   {'input_ids': tensor([[ 101, 2053, 4255, 1997, 3742, 6215, 2179, 1999, 5712, 2664,  102, 4255,
         1997, 3742, 6215, 2179, 1999, 5712, 1012,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


## BERT Models

Initializing models is very similar to initializing tokenizers. You can either use the model class specific to your model (e.g., `BertModel`) or you can use an AutoModel class.

For different downstream tasks, Hugging Face sets up the model classes with different heads. The model heads take the high-dimensional vector of hidden states as input and project them onto a different dimension, usually composed of one or a few linear layers.

* ForMaskedLM
* ForMultipleChoice
* ForQuestionAnswering
* ForSequenceClassification
* ForTokenClassification
* ... [more](https://huggingface.co/docs/transformers/model_doc/auto)

> For example, `BertForSequenceClassification` will takes whole sequence embeddings as inputs, and outputs a predefined number of logits for classification. [codes](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py)

Note that different models require different input formats. Read the documents carefully!



In [8]:
#@title PLM instantiation

emp_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
tmp_outputs = emp_model(**emp_tokens)

print('inputs: ', emp_tokenizer.decode(emp_tokens['input_ids'][0]))
print('outputs: ', tmp_outputs)
print(f"output distribution over labels: {torch.softmax(tmp_outputs.logits, dim=1)}")

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

inputs:  [CLS] no weapons of mass destruction found in iraq yet [SEP] weapons of mass destruction found in iraq. [SEP]
outputs:  SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1190, -0.0550]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
output distribution over labels: tensor([[0.5434, 0.4566]], grad_fn=<SoftmaxBackward0>)


In [9]:
#@title model architecture

print(emp_model)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

## Finetune for NLI

Basically, there are four components:

1.   prepare data
2.   prepare model
3.   finetuning
4.   evaluate

More examples can be found in [link](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_glue_no_trainer.py)



### Natural Language Inference
NLI, also called textual entailment, is a kind of text classification task. Given two sentences (a premise and a hypothesis), NLI is to predict if we can infer the hypothesis from the premise (a.k.a. entailment), or not (Some datasets define the "not" class as *non-entailment*, like RTE, while others define the "not" class with two classes: *contradiction* and *neutral*).

Here are two examples from RTE:

*   Premise: No Weapons of Mass Destruction Found in Iraq Yet.
*   Hypothesis: Weapons of Mass Destruction Found in Iraq.
*   Label: not_entailment
---
*   Premise: Lin Piao, after all, was the creator of Mao's "Little Red Book" of quotations.
*   Hypothesis: Lin Piao wrote the "Little Red Book".
*   Label: entailment
---

*   Dataset: [Recognizing Textual Entailment (RTE)](https://gluebenchmark.com/tasks), place the three downloaded files (train.tsv, dev.tsv, test.tsv) into the folder "/content/drive/MyDrive/SMU_MITB_NLP/week9/RTE/".


### prepare data

Pipeline:

1. read data from file
2. create data samples (called example here)
3. convert examples to features (tokenization, padding, etc.)
4. convert features to dataset (from list to tensor)
5. create batch data

Other functionalities:

* save and load arguments (in practice, we use `argparse.ArgumentParser`)


In [10]:
#@title class PreProcessor
# codes are revised from https://github.com/huggingface/transformers/blob/main/src/transformers/data/processors/glue.py
class InputExample(object):
    """
    A single training/test example for simple sequence classification.
    Args:
        guid: Unique id for the example.
        text_a: string. The untokenized text of the first sequence. For single
            sequence tasks, only this sequence must be specified.
        text_b: (Optional) string. The untokenized text of the second sequence.
            Only must be specified for sequence pair tasks.
        label: (Optional) string. The label of the example. This should be
            specified for train and dev examples, but not for test examples.
    """
    def __init__(self, guid, text_a, text_b=None, label=None):
        self.guid = guid
        self.text_a = text_a
        self.text_b = text_b
        self.label = label

    def to_json_string(self):
        """Serializes this instance to a JSON string."""
        return json.dumps(dataclasses.asdict(self), indent=2) + "\n"


class InputFeatures(object):
    """
    A single set of features of data. Property names are the same names as the corresponding inputs to a model.
    Args:
        input_ids: Indices of input sequence tokens in the vocabulary.
        attention_mask: Mask to avoid performing attention on padding token indices.
            Mask values selected in `[0, 1]`: Usually `1` for tokens that are NOT MASKED, `0` for MASKED (padded)
            tokens.
        token_type_ids: (Optional) Segment token indices to indicate first and second
            portions of the inputs. Only some models use them.
        label: (Optional) Label corresponding to the input. Int for classification problems,
            float for regression problems.
    """
    def __init__(self, input_ids, attention_mask=None, token_type_ids=None, label=None):
        self.input_ids = input_ids
        self.attention_mask = attention_mask
        self.token_type_ids = token_type_ids
        self.label = label

    def to_json_string(self):
        """Serializes this instance to a JSON string."""
        return json.dumps(dataclasses.asdict(self)) + "\n"

class PreProcessor():
    """Processor for the RTE data set (GLUE version)."""

    def __init__(self):
        self.model_args = {'max_seq_length':64, 'verbose':False}

    def save(self, path):
        f = open(path, 'wb')
        pickle.dump(self, f)
        f.close()

    def load(self, path):
        f = open(path, 'rb')
        proc = pickle.load(f)
        f.close()
        return proc

    def set_model_arg(self, key, value):
        self.model_args[key] = value

    def get_model_arg(self, key):
        return self.model_args.get(key, None)
        return self.avg_seq_length

    def _read_tsv(self, input_file, quotechar=None):
        """Reads a tab separated value file."""
        with open(input_file, "r", encoding="utf-8-sig") as f:
            return list(csv.reader(f, delimiter="\t", quotechar=quotechar))

    def get_train_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

    def get_dev_examples(self, data_dir):
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

    def get_test_examples(self, data_dir):
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

    def get_labels(self):
        return ["entailment", "not_entailment"]

    def _create_examples(self, lines, set_type):
        """Creates examples for the training and dev sets."""
        examples = []
        for (i, line) in enumerate(lines):
            if i == 0:
                continue
            guid = "%s-%s" % (set_type, line[0])
            label = line[-1]
            text_a = line[1]
            text_b = line[2]

            examples.append(
                InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
        return examples

    def convert_examples_to_features(self, tokenizer, examples):
        max_length = self.get_model_arg("max_seq_length")
        label_list = self.get_labels()
        pad_token = tokenizer.convert_tokens_to_ids([tokenizer.pad_token])[0]
        label_map = {label: i for i, label in enumerate(label_list)}

        features = []
        for (ex_index, example) in enumerate(examples):
            inputs = tokenizer.encode_plus(
                example.text_a,
                example.text_b,
                truncation='longest_first',
                add_special_tokens=True,
                max_length=max_length,
            )
            input_ids, token_type_ids = inputs["input_ids"], inputs["token_type_ids"]

            # The mask has 1 for real tokens and 0 for padding tokens. Only real
            # tokens are attended to.
            attention_mask = [1] * len(input_ids)

            # Zero-pad up to the sequence length.
            padding_length = max_length - len(input_ids)

            input_ids = input_ids + ([pad_token] * padding_length)
            attention_mask = attention_mask + ([0] * padding_length)
            token_type_ids = token_type_ids + ([0] * padding_length)

            assert len(input_ids) == max_length, "Error with input length {} vs {}".format(len(input_ids), max_length)
            assert len(attention_mask) == max_length, "Error with input length {} vs {}".format(len(attention_mask), max_length)
            assert len(token_type_ids) == max_length, "Error with input length {} vs {}".format(len(token_type_ids), max_length)

            label = label_map[example.label]

            if ex_index < 5 and self.get_model_arg('verbose'):
                print("*** Example ***")
                print("guid: %s" % (example.guid))
                print("input_ids: %s" % " ".join([str(x) for x in input_ids]))
                print("attention_mask: %s" % " ".join([str(x) for x in attention_mask]))
                print("token_type_ids: %s" % " ".join([str(x) for x in token_type_ids]))
                print("label: %s (id = %d)" % (example.label, label))

            features.append(
                    InputFeatures(input_ids=input_ids,
                                  attention_mask=attention_mask,
                                  token_type_ids=token_type_ids,
                                  label=label))
        return features

    def convert_feature_to_dataset(self, features):
        # Convert to Tensors and build dataset
        all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
        all_attention_mask = torch.tensor([f.attention_mask for f in features], dtype=torch.long)
        all_token_type_ids = torch.tensor([f.token_type_ids for f in features], dtype=torch.long)
        all_labels = torch.tensor([f.label for f in features], dtype=torch.long)

        dataset = TensorDataset(all_input_ids, all_attention_mask, all_token_type_ids, all_labels)

        return dataset

    def get_dataloader(self, features, batch_size, is_test=False, drop_last=True):
        dataset = self.convert_feature_to_dataset(features)

        dataset_sampler = SequentialSampler(dataset) if is_test else RandomSampler(dataset)

        dataloader = DataLoader(dataset, sampler=dataset_sampler, batch_size=batch_size, drop_last=drop_last)
        return dataloader

    def get_data_iter(self, features, batch_size, is_test=False, drop_last=True):
        dataloader = self.get_dataloader(features, batch_size, is_test=is_test, drop_last=(drop_last if not is_test else False))
        return iter(dataloader)

In [11]:
#@title prepare dataset and hyper-parameters for training
proc = PreProcessor()

# hyper-parameters for data
proc.set_model_arg('batch_size', 8)
proc.set_model_arg('max_seq_length', 256)
# hyper-parameters for model
proc.set_model_arg('learning_rate', 2e-5)
proc.set_model_arg('n_epochs', 10)
proc.set_model_arg('warmup_steps', 0.06)
proc.set_model_arg('weight_decay', 0.1)
proc.set_model_arg('adam_epsilon', 1e-8)
proc.set_model_arg('clip', 1)

# arguments for reproduction
proc.set_model_arg('log_step', 150)
proc.set_model_arg('verbose', True)    # if log details
proc.set_model_arg('init_seed', 42)
proc.set_model_arg('checkpoint_path', "./RTE/plm_rte.bin")
proc.set_model_arg('dataset_path', "./RTE/")

# save proc
arg_path = "./RTE/proc_rte.dat"
proc.save(arg_path)

In [12]:
#@title utility functions

def check_gpu():
    # torch.cuda.is_available() checks and returns a Boolean True if a GPU is available, else it'll return False
    is_cuda = torch.cuda.is_available()

    # If we have a GPU available, we'll set our device to GPU. We'll use this device variable later in our code.
    if is_cuda:
        device = torch.device("cuda")
        print("GPU is available")
    else:
        device = torch.device("cpu")
        print("GPU not available, CPU used")
    return device

def set_seed(seed=42):
    # random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

def save_model(model, path):
    torch.save(model.state_dict(), path)

def load_model(model, path):
    model.load_state_dict(torch.load(path))
    return model

In [13]:
#@title preprare model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")

# get parameters from preprocessor
init_seed = proc.get_model_arg('init_seed')
proc.set_model_arg('verbose', False)

device = check_gpu()
set_seed(init_seed)

# We'll also set the model to the device that we defined earlier (default is CPU)
model = model.to(device)

GPU is available


In [14]:
#@title finetuning function

def train(model, tokenizer, proc, device):
    # fetch hyper-parameters
    batch_size = proc.get_model_arg("batch_size")
    n_epochs = proc.get_model_arg('n_epochs')
    learning_rate = proc.get_model_arg("learning_rate")
    adam_epsilon = proc.get_model_arg('adam_epsilon')
    weight_decay = proc.get_model_arg('weight_decay')
    verbose = proc.get_model_arg("verbose")
    log_step = proc.get_model_arg("log_step")
    checkpoint_path = proc.get_model_arg("checkpoint_path")
    max_seq_length = proc.get_model_arg("max_seq_length")
    clip = proc.get_model_arg("clip")
    data_path = proc.get_model_arg("dataset_path")

    # prepare training dataset
    # Get training examples
    examples = proc.get_train_examples(data_path) # Assuming proc has a get_train_examples method
    # Convert examples to features
    features = proc.convert_examples_to_features(tokenizer, examples) # Assuming proc has a convert_examples_to_features method


    data_iter = proc.get_data_iter(features, batch_size)
    # training steps in each epoch
    examples_total_num = len(features)
    max_steps = math.ceil(float(examples_total_num)/batch_size)
    t_total = max_steps * n_epochs

    # Define Loss, Optimizer
    no_decay = ['bias', 'LayerNorm.weight']
    optimizer_grouped_parameters = [
        {'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': weight_decay},
        {'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
        ]
    # Define optimizer
    optimizer = AdamW(optimizer_grouped_parameters, lr=learning_rate, eps=adam_epsilon) # Assuming AdamW is imported
    # Define scheduler
    scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=t_total)
    # train!
    for epoch in range(n_epochs):
        total_loss = 0.0
        for step in range(max_steps):
            model.train()
            optimizer.zero_grad() # Clears existing gradients from previous epoch
            # prepare inputs
            try:
                batch = next(data_iter)
            except StopIteration:
                data_iter = proc.get_data_iter(features, batch_size)
                batch = next(data_iter)

            batch = tuple(t.to(device) for t in batch)

            inputs = {'input_ids':      batch[0],
                      'attention_mask': batch[1],
                      'token_type_ids': batch[2],
                      'labels':         batch[3]}



            outputs = model(**inputs)
            loss = outputs[0]
            loss.backward()

            # clip gradients to prevent exploding gradients
            torch.nn.utils.clip_grad_norm_(model.parameters(), clip)

            optimizer.step()
            scheduler.step()  # Update learning rate schedule

            total_loss += loss.item()
            if step%log_step==0:
                eval_metric = evaluate(model, tokenizer, proc, device)
                print("step: {}/{}, Loss: {:.4f}, eval_metric: {}".format(step, max_steps, loss.item(), eval_metric))

        eval_metric = evaluate(model, tokenizer, proc, device)
        print("epoch: {}/{}, Loss: {:.4f}, eval_metric: {}, saving model to {}".format(epoch, n_epochs, total_loss/max_steps, eval_metric, checkpoint_path))
        save_model(model, checkpoint_path)

In [15]:
#@title evaluation function
def evaluate(model, tokenizer, proc, device, is_dev=True):
    # fetch hyper-parameters
    batch_size = proc.get_model_arg("batch_size")
    n_epochs = proc.get_model_arg('n_epochs')
    learning_rate = proc.get_model_arg("learning_rate")
    adam_epsilon = proc.get_model_arg('adam_epsilon')
    weight_decay = proc.get_model_arg('weight_decay')
    verbose = proc.get_model_arg("verbose")
    log_step = proc.get_model_arg("log_step")
    checkpoint_path = proc.get_model_arg("checkpoint_path")
    max_seq_length = proc.get_model_arg("max_seq_length")
    clip = proc.get_model_arg("clip")
    data_path = proc.get_model_arg("dataset_path")

    metric = load_metric("glue", "rte")
    accelerator = Accelerator()

    # prepare dataset
    examples = proc.get_dev_examples(data_path) if is_dev else proc.get_test_examples(data_path)
    features = proc.convert_examples_to_features(tokenizer, examples)
    eval_dataloader = proc.get_dataloader(features, batch_size)

    for batch in eval_dataloader:
        model.eval()
        batch = tuple(t.to(device) for t in batch)

        with torch.no_grad():
            inputs = {'input_ids':      batch[0],
                      'attention_mask': batch[1],
                      'token_type_ids': batch[2],
                      'labels':         batch[3]}
            outputs = model(**inputs)
        predictions = outputs[1].argmax(dim=-1)
        predictions, references = accelerator.gather((predictions, inputs["labels"]))

        metric.add_batch(
                predictions=predictions,
                references=references,
            )
        eval_metric = metric.compute()

    return eval_metric

In [16]:
#@title finetuning!
train(model, tokenizer, proc, device)

  metric = load_metric("glue", "rte")
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/1.84k [00:00<?, ?B/s]

step: 0/312, Loss: 0.8139, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.4598, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.5825, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 0/10, Loss: 0.6581, eval_metric: {'accuracy': 0.625}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.6608, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.8252, eval_metric: {'accuracy': 0.375}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.1860, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 1/10, Loss: 0.4414, eval_metric: {'accuracy': 0.875}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.1956, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0274, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.4127, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 2/10, Loss: 0.2642, eval_metric: {'accuracy': 0.75}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0162, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0094, eval_metric: {'accuracy': 0.5}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0019, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 3/10, Loss: 0.1642, eval_metric: {'accuracy': 1.0}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.5367, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0195, eval_metric: {'accuracy': 0.5}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.2960, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 4/10, Loss: 0.0996, eval_metric: {'accuracy': 0.875}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0017, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0011, eval_metric: {'accuracy': 0.5}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0013, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 5/10, Loss: 0.0590, eval_metric: {'accuracy': 0.875}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0014, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0005, eval_metric: {'accuracy': 0.5}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0019, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 6/10, Loss: 0.0338, eval_metric: {'accuracy': 0.5}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0004, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0008, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0003, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 7/10, Loss: 0.0300, eval_metric: {'accuracy': 0.875}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0006, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0002, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0009, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 8/10, Loss: 0.0232, eval_metric: {'accuracy': 0.625}, saving model to ./RTE/plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0005, eval_metric: {'accuracy': 0.5}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0003, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0003, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 9/10, Loss: 0.0234, eval_metric: {'accuracy': 0.625}, saving model to ./RTE/plm_rte.bin


In [17]:
#@title Inference (load well-trained model)!

# saved model for seq2seq with or without attention
# use your own well-trained model
checkpoint_path = "./RTE/plm_rte.bin"
arg_path = "./RTE/proc_rte.dat"


proc = PreProcessor()
proc = proc.load(arg_path)

# get parameters from preprocessor
init_seed = proc.get_model_arg('init_seed')

device = check_gpu()
set_seed(init_seed)

# We'll also set the model to the device that we defined earlier (default is CPU)
# preprare model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
model = load_model(model, checkpoint_path)
model = model.to(device)

eval_metric = evaluate(model, tokenizer, proc, device, is_dev=True)
print("eval_metric: {}".format(eval_metric))

GPU is available


  model.load_state_dict(torch.load(path))
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


*** Example ***
guid: dev-0
input_ids: 101 11271 20726 1010 1996 7794 1997 1996 3364 5696 20726 1010 2038 2351 1997 11192 4456 2012 2287 4008 1010 2429 2000 1996 5696 20726 3192 1012 102 5696 20726 2018 2019 4926 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

## Create your own model

Now, can you build your own model based on BERT by adding additional output layer!

We use BertModel class, which has two outputs:



1.   output[0]: sequence_output (batch_size, sequence_length, 768) contains all tokens' hidden states in the last layer
2.   output[1]: pooled_output (batch_size, 768) is the hidden states of the [CLS] token in the layer, which is regarded as a summary of the content according to the entire input sequence.



In [18]:
class CustomBERTModel(nn.Module):
    def __init__(self, num_labels, dropout=0.1):
          super(CustomBERTModel, self).__init__()
          self.bert = BertModel.from_pretrained("bert-base-uncased")
          ### New layers:
          self.num_labels = num_labels
          self.dropout = nn.Dropout(dropout)
          self.classifier = nn.Linear(768, self.num_labels) ## 2 is the number of classes

          self.loss_fct = nn.CrossEntropyLoss()

    def forward(self,
                input_ids: Optional[torch.Tensor] = None,
                attention_mask: Optional[torch.Tensor] = None,
                token_type_ids: Optional[torch.Tensor] = None,
                labels: Optional[torch.Tensor] = None,
            ):
          outputs = self.bert(
            input_ids=input_ids,             # 填写 input_ids 参数
            attention_mask=attention_mask,   # 填写 attention_mask 参数
            token_type_ids=token_type_ids    # 填写 token_type_ids 参数
          )

          # sequence_output has the following shape: (batch_size, sequence_length, 768)
          # pooled_output has the following shape: (batch_size, 768)
          sequence_output = outputs[0]
          pooled_output = outputs[1]

          pooled_output = self.dropout(pooled_output)
          logits = self.classifier(pooled_output)

          loss = self.loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
          return loss, logits

In [19]:
#@title prepare dataset and hyper-parameters for training
custom_proc = PreProcessor()

# hyper-parameters for data
custom_proc.set_model_arg('batch_size', 8)
custom_proc.set_model_arg('max_seq_length', 256)
# hyper-parameters for model
custom_proc.set_model_arg('learning_rate', 2e-5)
custom_proc.set_model_arg('n_epochs', 10)
custom_proc.set_model_arg('warmup_steps', 0.06)
custom_proc.set_model_arg('weight_decay', 0.1)
custom_proc.set_model_arg('adam_epsilon', 1e-8)
custom_proc.set_model_arg('clip', 1)

# arguments for reproduction
custom_proc.set_model_arg('log_step', 150)
custom_proc.set_model_arg('verbose', False)    # if log details
custom_proc.set_model_arg('init_seed', 42)
custom_proc.set_model_arg('checkpoint_path', "./RTE/custom_plm_rte.bin")
custom_proc.set_model_arg('dataset_path', "./RTE/")

# save proc
arg_path = "./RTE/custom_proc_rte.dat"
custom_proc.save(arg_path)

In [20]:
#@title preprare model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
custom_model = CustomBERTModel(num_labels = 2)

# get parameters from preprocessor
init_seed = proc.get_model_arg('init_seed')

device = check_gpu()
set_seed(init_seed)

# We'll also set the model to the device that we defined earlier (default is CPU)
custom_model = custom_model.to(device)



GPU is available


In [21]:
#@title finetuning!
train(custom_model, tokenizer, custom_proc, device)

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.9729, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.5466, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.5715, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 0/10, Loss: 0.6618, eval_metric: {'accuracy': 0.5}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.6383, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.7485, eval_metric: {'accuracy': 0.375}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.1074, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 1/10, Loss: 0.4389, eval_metric: {'accuracy': 0.75}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0974, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0159, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.2379, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 2/10, Loss: 0.2654, eval_metric: {'accuracy': 0.75}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0116, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.5612, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0065, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 3/10, Loss: 0.1484, eval_metric: {'accuracy': 1.0}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.5130, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.4332, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0023, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 4/10, Loss: 0.0883, eval_metric: {'accuracy': 0.875}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0017, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0008, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0006, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 5/10, Loss: 0.0538, eval_metric: {'accuracy': 0.875}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0011, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0005, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0712, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 6/10, Loss: 0.0327, eval_metric: {'accuracy': 0.625}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0003, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0005, eval_metric: {'accuracy': 0.375}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0003, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 7/10, Loss: 0.0162, eval_metric: {'accuracy': 1.0}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0004, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0002, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0004, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 8/10, Loss: 0.0126, eval_metric: {'accuracy': 0.75}, saving model to ./RTE/custom_plm_rte.bin


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 0/312, Loss: 0.0003, eval_metric: {'accuracy': 0.625}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 150/312, Loss: 0.0002, eval_metric: {'accuracy': 0.75}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


step: 300/312, Loss: 0.0002, eval_metric: {'accuracy': 0.875}


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


epoch: 9/10, Loss: 0.0101, eval_metric: {'accuracy': 0.625}, saving model to ./RTE/custom_plm_rte.bin


In [24]:
#@title Inference (load well-trained model)! You may need to tune your hyperparameters and architectures for better performance!

# saved model for seq2seq with or without attention
# use your own well-trained model
checkpoint_path = "./RTE/custom_plm_rte.bin"
arg_path = "./RTE/custom_proc_rte.dat"


custom_proc = PreProcessor()
custom_proc = custom_proc.load(arg_path)

# get parameters from preprocessor
init_seed = custom_proc.get_model_arg('init_seed')

device = check_gpu()
set_seed(init_seed)

# We'll also set the model to the device that we defined earlier (default is CPU)
# preprare model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
custom_model = CustomBERTModel(num_labels = 2)
custom_model = load_model(custom_model, checkpoint_path)
custom_model = custom_model.to(device)

eval_metric = evaluate(custom_model, tokenizer, custom_proc, device, is_dev=True)
print("eval_metric: {}".format(eval_metric))

GPU is available


  model.load_state_dict(torch.load(path))
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this metric from the next major release of `datasets`.


eval_metric: {'accuracy': 0.75}
