# Guide to Transformers Domain Adaptation
This guide illustrates an end-to-end workflow of domain adaptation, where we domain-adapt a transfomer model for biomedical NLP applications.

It showcases the two domain adaptation techniques we investigated in our research:
1. Data Selection
2. Vocabulary Augmentation

Following that, we demonstrate how such a domain-adapted Transformers model is compatible with 🤗 `transformers`'s training interface and how it outperforms an out-of-the-box (non-domain adapted) model.

These techniques are applied to BERT small but the codebase is written to be generalizable to other classes of Transformers supported by HuggingFace.

### Caveats
For this guide, we use a much smaller subset (<0.05%) of the in-domain corpora due to memory and time constraints.

In [5]:
! pip install -e ./Transformers-Domain-Adaptation

Obtaining file:///content/Transformers-Domain-Adaptation
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting datasets<3.0.0,>=2.4.0 (from adatation-metrics==0.3.1)
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.13.0,>=0.12.0 (from adatation-metrics==0.3.1)
  Downloading tokenizers-0.12.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m86.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torch<2.0.0,>=1.12.0 (from adatation-metrics==0.3.1)
  Downloading torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
[2K     [90

### Setup: Install dependencies
We begin by installing `adatation-metrics` using `pip`.

In [1]:

!pip install huggingface-hub


Collecting huggingface-hub
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/295.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.9/295.0 kB[0m [31m3.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface-hub
Successfully installed huggingface-hub-0.17.3


## Constants
We first define some constants, including the appropriate model card and relevant paths to text corpora.

There are two types of corpora in the context of Domain Adaptation:

1. Fine-Tuning Corpus
> Given an NLP task (e.g. text classification, summarization, etc.), the text portion of this dataset is the fine-tuning corpus.

2. In-Domain Corpus
> This is an unsupervised text dataset that is used for domain pre-training. The text domain is the same as, if not broader than, the domain of fine-tuning corpus.

In [3]:

from huggingface_hub import notebook_login

notebook_login()


model_card = 'bert-base-uncased'

# Fine-tuning corpus

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
from datasets import load_dataset

model_card = 'bert-base-uncased'
subdomains = {
    'fake_news':load_dataset('redasers/difraud', 'fake_news'),
    'phishing':load_dataset('redasers/difraud', 'phishing'),
    'job_scams':load_dataset('redasers/difraud', 'job_scams'),
    'political_statements':load_dataset('redasers/difraud', 'political_statements'),
    'product_reviews':load_dataset('redasers/difraud', 'product_reviews'),
    'sms':load_dataset('redasers/difraud', 'sms'),
    'twitter_rumours':load_dataset('redasers/difraud', 'twitter_rumours'),
              }



In [5]:
from datasets import load_dataset

model_card = 'bert-base-uncased'
subdomains = {
    'fake_news':load_dataset('redasers/difraud', 'fake_news'),
    'phishing':load_dataset('redasers/difraud', 'phishing'),
    'job_scams':load_dataset('redasers/difraud', 'job_scams'),
    'political_statements':load_dataset('redasers/difraud', 'political_statements'),
    'product_reviews':load_dataset('redasers/difraud', 'product_reviews'),
    'sms':load_dataset('redasers/difraud', 'sms'),
    'twitter_rumours':load_dataset('redasers/difraud', 'twitter_rumours'),
              }

dpt_corpus_train = []
dpt_corpus_val = []
yy_corpus_train = []
yy_corpus_val = []

for domain_name in subdomains:
  dpt_corpus_train.extend(subdomains[domain_name]['train']['text'])
  dpt_corpus_val.extend(subdomains[domain_name]['validation']['text'])
  yy_corpus_train.extend(subdomains[domain_name]['train']['label'])
  yy_corpus_val.extend(subdomains[domain_name]['validation']['label'])




import pandas as pd


train = pd.DataFrame({"text":dpt_corpus_train, "label":yy_corpus_train }).sample(frac=1.0)
val = pd.DataFrame({"text":dpt_corpus_val,  "label":yy_corpus_val}).sample(frac=1.0)




In [6]:
import pandas as pd


train = pd.DataFrame({"text":dpt_corpus_train, "label":yy_corpus_train }).sample(frac=1.0)
val = pd.DataFrame({"text":dpt_corpus_val,  "label":yy_corpus_val}).sample(frac=1.0)


In [7]:
len(train), len(val)





(76680, 9585)

### Load model and tokenizer
Next we load the model and its corresponding tokenizer.

In [8]:
from transformers import AutoModelForMaskedLM, AutoTokenizer

model = AutoModelForMaskedLM.from_pretrained(model_card)
tokenizer = AutoTokenizer.from_pretrained(model_card)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [12]:
# Fit on fine-tuning corpus
selector.fit(val.text)





In [14]:
train=train.text.dropna().reset_index(drop=True)





In [20]:

# Select relevant documents from in-domain training corpus
selected_corpus = selector.transform(train.values.sparse())

AttributeError: ignored

Since we specified `keep=0.5` in the `DataSelector`, the selected corpus should be half the size of the in-domain corpus, containing the top 50% most relevant documents.

## Vocabulary Augmentation
We can extend the existing vocabulary of the model to include domain-specific terminology. This allows for the representation such terminology to be explicit learnt during domain pre-training.

In [10]:
len(selected_corpus)

76599

In [None]:
from adatation_metrics import VocabAugmentor

target_vocab_size = 31_000  # len(tokenizer) == 30_522

augmentor = VocabAugmentor(
    tokenizer=tokenizer,
    cased=False,
    target_vocab_size=target_vocab_size
)
# Obtain new domain-specific terminology based on the fine-tuning corpus
new_tokens = augmentor.get_new_tokens()


In [None]:
print(new_tokens[:20])

['cdna', 'transcriptional', 'tyrosine', 'phosphorylation', 'kda', 'homology', 'enhancer', 'assays', 'exon', 'nucleotide', 'genomic', 'encodes', 'deletion', 'polymerase', 'nf', 'cloned', 'recombinant', 'putative', 'transcripts', 'homologous']


#### Update model and tokenizer with new vocab terminologies

In [None]:
tokenizer.add_tokens(new_tokens)
model.resize_token_embeddings(len(tokenizer))

Embedding(31000, 768)

## Domain Pre-Training
Domain pre-training is the third step in domain adaptation — we continue training Transformer models with the same pre-training procedure on the in-domain corpus.

#### Create dataset

In [None]:
import itertools as it
from pathlib import Path
from typing import Sequence, Union, Generator

from datasets import load_dataset
from transformers import DataCollatorForLanguageModeling, Trainer, TrainingArguments

In [None]:
datasets = load_dataset(
    'text',
    data_files={
        "train": dpt_corpus_train_data_selected,
        "val": dpt_corpus_val
    }
)

tokenized_datasets = datasets.map(
    lambda examples: tokenizer(examples['text'], truncation=True, max_length=model.config.max_position_embeddings),
    batched=True
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1038.0, style=ProgressStyle(description…

Using custom data configuration default



Downloading and preparing dataset text/default-99d850ed1b15ea72 (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/text/default-99d850ed1b15ea72/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-99d850ed1b15ea72/0.0.0/daf90a707a433ac193b369c8cc1772139bb6cca21a9c7fe83bdd16aad9b9b6ab. Subsequent calls will reuse this data.


HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))




#### Instantiate TrainingArguments and Trainer

In [None]:
training_args = TrainingArguments(
    output_dir="./results/domain_pre_training",
    overwrite_output_dir=True,
    max_steps=100,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    evaluation_strategy="steps",
    save_steps=50,
    save_total_limit=2,
    logging_steps=50,
    seed=42,
    # fp16=True,
    dataloader_num_workers=2,
    disable_tqdm=False
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    data_collator=data_collator,
    tokenizer=tokenizer,  # This tokenizer has new tokens
)

In [None]:
trainer.train()

Step,Training Loss,Validation Loss,Runtime,Samples Per Second
50,2.8022,2.380539,36.2911,27.555
100,2.4943,2.328287,39.2985,25.446


TrainOutput(global_step=100, training_loss=2.648266372680664, metrics={'train_runtime': 170.8014, 'train_samples_per_second': 0.585, 'total_flos': 235678172444160, 'epoch': 0.16})

## Fine-Tuning for Specific Tasks
We can plug our domain-adapted model for any fine-tuning tasks supported by HuggingFace.

For this guide, we will compare the performance between an out-of-the-box (OOB) model performs against a domain-adapted model for Named Entity Recognitition on the BC2GM dataset, a popular biomedical benchmarking dataset.

Utility functions for NER preprocessing and evaluation are adapted from HuggingFace's [NER fine-tuning example notebook](https://github.com/huggingface/notebooks/blob/master/examples/token_classification.ipynb).

#### Preprocess raw dataset to form NER dataset

In [None]:
from typing import NamedTuple
from functools import partial
from typing_extensions import Literal

import numpy as np
from datasets import Dataset, load_dataset, load_metric


class Example(NamedTuple):
    token: str
    label: str

def load_ner_dataset(mode: Literal['train', 'val', 'test']):
    file = f"data/BC2GM_{mode}.tsv"
    examples = []
    with open(file) as f:
        token = []
        label = []
        for line in f:
            if line.strip() == "":
                examples.append(Example(token=token, label=label))
                token = []
                label = []
                continue
            t, l = line.strip().split("\t")
            token.append(t)
            label.append(l)

    res = list(zip(*[(ex.token, ex.label) for ex in examples]))
    d = {'token': res[0], 'labels': res[1]}
    return Dataset.from_dict(d)


def tokenize_and_align_labels(examples, tokenizer):
    tokenized_inputs = tokenizer(examples["token"], truncation=True, is_split_into_words=True)
    label_to_id = dict(map(reversed, enumerate(label_list)))

    labels = []
    for i, label in enumerate(examples["labels"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            # Special tokens have a word id that is None. We set the label to -100 so they are automatically
            # ignored in the loss function.
            if word_idx is None:
                label_ids.append(-100)
            # We set the label for the first token of each word.
            elif word_idx != previous_word_idx:
                label_ids.append(label_to_id[label[word_idx]])
            # For the other tokens in a word, we set the label to either the current label or -100, depending on
            # the label_all_tokens flag.
            else:
                label_ids.append(label_to_id[label[word_idx]])
            previous_word_idx = word_idx

        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs


def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    # Remove ignored index (special tokens)
    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric.compute(predictions=true_predictions, references=true_labels)
    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }

In [None]:
%%capture
# Install `seqeval`
!pip install seqeval

In [None]:
label_list = ["O", "B", "I"]
metric = load_metric('seqeval')

train_dataset = load_ner_dataset('train')
val_dataset = load_ner_dataset('val')
test_dataset = load_ner_dataset('test')

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1753.0, style=ProgressStyle(description…




#### Instantiate NER models
Here we instantiate three task-specific NER models for comparison:
1. `da_model`: A domain-adapted NER model we just trained in this guide
2. `da_full_corpus_model`: The same domain-adapted NER model except that it was trained on the full in-domain training corpus
3. `oob_model`: An out-of-the-box BERT NER model (not domain-adapted)

In [None]:
from transformers import AutoModelForTokenClassification, DataCollatorForTokenClassification

best_checkpoint = './results/domain_pre_training/checkpoint-100'
da_model = AutoModelForTokenClassification.from_pretrained(best_checkpoint, num_labels=len(label_list))

da_full_corpus_model = AutoModelForTokenClassification.from_pretrained('./domain-adapted-bert', num_labels=len(label_list))
full_corpus_tokenizer = AutoTokenizer.from_pretrained('./domain-adapted-bert')

oob_tokenizer = AutoTokenizer.from_pretrained(model_card)
oob_model = AutoModelForTokenClassification.from_pretrained(model_card, num_labels=len(label_list))

Some weights of the model checkpoint at ./results/domain_pre_training/checkpoint-100 were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at ./results/do

#### Create datasets, TrainingArguments and Trainer for each model

In [None]:
from typing import Dict

from datasets import Dataset


def preprocess_datasets(tokenizer, **datasets) -> Dict[str, Dataset]:
    tokenize_ner = partial(tokenize_and_align_labels, tokenizer=tokenizer)
    return {k: ds.map(tokenize_ner, batched=True) for k, ds in datasets.items()}

######################
##### `da_model` #####
######################
da_datasets = preprocess_datasets(
    tokenizer,
    train=train_dataset,
    val=val_dataset,
    test=test_dataset
)

training_args = TrainingArguments(
    output_dir="./results/domain_adapted_fine_tuning",
    overwrite_output_dir=True,
    num_train_epochs=2,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    logging_steps=100,
    seed=42,
    fp16=True,
    dataloader_num_workers=2,
    disable_tqdm=False
)

da_trainer = Trainer(
    model=da_model,
    args=training_args,
    train_dataset=da_datasets['train'],
    eval_dataset=da_datasets['val'],
    data_collator=DataCollatorForTokenClassification(tokenizer),
    tokenizer=tokenizer,  # This tokenizer has new tokens
    compute_metrics=compute_metrics
)


##################################
##### `da_model_full_corpus` #####
##################################
da_full_corpus_datasets = preprocess_datasets(
    full_corpus_tokenizer,
    train=train_dataset,
    val=val_dataset,
    test=test_dataset
)

training_args = TrainingArguments(
    output_dir="./results/domain_adapted_full_corpus_fine_tuning",
    overwrite_output_dir=True,
    num_train_epochs=2,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    logging_steps=100,
    seed=42,
    fp16=True,
    dataloader_num_workers=2,
    disable_tqdm=False
)

da_full_corpus_trainer = Trainer(
    model=da_full_corpus_model,
    args=training_args,
    train_dataset=da_full_corpus_datasets['train'],
    eval_dataset=da_full_corpus_datasets['val'],
    data_collator=DataCollatorForTokenClassification(full_corpus_tokenizer),
    tokenizer=full_corpus_tokenizer,  # This tokenizer has new tokens
    compute_metrics=compute_metrics
)


#######################
##### `oob_model` #####
#######################
oob_datasets = preprocess_datasets(
    oob_tokenizer,
    train=train_dataset,
    val=val_dataset,
    test=test_dataset
)

training_args = TrainingArguments(
    output_dir="./results/out_of_the_box_fine_tuning",
    overwrite_output_dir=True,
    num_train_epochs=2,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    evaluation_strategy="steps",
    save_steps=200,
    save_total_limit=2,
    logging_steps=100,
    seed=42,
    fp16=True,
    dataloader_num_workers=2,
    disable_tqdm=False
)

oob_model_trainer = Trainer(
    model=oob_model,
    args=training_args,
    train_dataset=oob_datasets['train'],
    eval_dataset=oob_datasets['val'],
    data_collator=DataCollatorForTokenClassification(oob_tokenizer),
    tokenizer=oob_tokenizer,  # This is the original tokenizer (without domain-specific tokens)
    compute_metrics=compute_metrics
)

HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.





HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))




#### Train and evaluate `da_model`

In [None]:
da_trainer.train()
da_trainer.evaluate(da_datasets['test'])



Step,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
100,0.2567,0.156244,0.618155,0.588832,0.603138,0.940869,8.4314,298.765
200,0.147,0.131594,0.659094,0.731315,0.693329,0.950794,8.4631,297.646
300,0.133,0.120914,0.662715,0.782776,0.717759,0.953289,8.7201,288.871
400,0.1161,0.10887,0.715941,0.768073,0.741091,0.959788,8.5756,293.739
500,0.0764,0.113966,0.731982,0.784001,0.757099,0.960742,8.5635,294.157
600,0.0744,0.104922,0.73486,0.792228,0.762466,0.960941,8.6071,292.664
700,0.0737,0.104427,0.767751,0.768423,0.768087,0.963669,8.6568,290.986


{'epoch': 2.0,
 'eval_accuracy': 0.9627387229857107,
 'eval_f1': 0.7704200580800851,
 'eval_loss': 0.10370161384344101,
 'eval_precision': 0.7605588306549301,
 'eval_recall': 0.7805403613459307,
 'eval_runtime': 16.8883,
 'eval_samples_per_second': 298.313}

#### Train and evaluate `da_model_full_corpus`

In [None]:
da_full_corpus_trainer.train()
da_full_corpus_trainer.evaluate(da_full_corpus_datasets['test'])



Step,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
100,0.2319,0.127792,0.671435,0.829809,0.742268,0.952202,8.2469,305.45
200,0.1083,0.08628,0.817341,0.82669,0.821989,0.968876,8.0642,312.366
300,0.0896,0.08302,0.807372,0.838995,0.822879,0.969839,8.0143,314.313
400,0.0806,0.078229,0.801577,0.880763,0.839306,0.971885,8.1414,309.405
500,0.0508,0.075855,0.843227,0.864125,0.853548,0.973716,8.1725,308.23
600,0.0525,0.075362,0.845051,0.858232,0.851591,0.97355,8.0579,312.611
700,0.0474,0.073649,0.851391,0.864818,0.858052,0.974442,8.0294,313.722


{'epoch': 2.0,
 'eval_accuracy': 0.9735219505320274,
 'eval_f1': 0.8525919253132421,
 'eval_loss': 0.07559072971343994,
 'eval_precision': 0.8402066015656525,
 'eval_recall': 0.8653478513839249,
 'eval_runtime': 15.8564,
 'eval_samples_per_second': 317.726}

#### Train and evaluate `oob_model`

In [None]:
oob_model_trainer.train()
oob_model_trainer.evaluate(oob_datasets['test'])



Step,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy,Runtime,Samples Per Second
100,0.2292,0.133785,0.678159,0.803118,0.735368,0.947964,8.6547,291.056
200,0.1352,0.109798,0.745311,0.825984,0.783576,0.957941,8.6607,290.855
300,0.1172,0.099117,0.782186,0.83712,0.808721,0.962326,8.6997,289.55
400,0.1013,0.095984,0.82721,0.82242,0.824808,0.965538,8.725,288.71
500,0.069,0.103978,0.788701,0.845731,0.816221,0.96144,8.6906,289.853
600,0.0641,0.092247,0.827396,0.848404,0.837768,0.967232,8.6712,290.501
700,0.0644,0.090411,0.829128,0.853749,0.841258,0.968306,8.8216,285.549


{'epoch': 2.0,
 'eval_accuracy': 0.9656225918051692,
 'eval_f1': 0.8301952580195259,
 'eval_loss': 0.09698742628097534,
 'eval_precision': 0.8164734929017214,
 'eval_recall': 0.8443861266756507,
 'eval_runtime': 17.0649,
 'eval_samples_per_second': 295.226}

#### Results
We see that out of the three models, `da_full_corpus_model` (which was domain-adapted on the entire in-domain training corpus) outperforms the `oob_model` by over 2% on the test F1 score. In fact, this `da_full_corpus_model` model is one of many domain-adapted models we trained that outperforms SOTA on BC2GM.

Also, `da_model` underperforms `oob_model`. This is to be expected, as `da_model` underwent minimal domain pre-training in this guide.

## Conclusion
In this guide, you have seen how to use `DataSelector` and `VocabAugmentor` to domain-adapt a transformers model, by performing Data Selection and Vocabulary Augmentation respectively.

You have also seen that they are compatible with all of HuggingFace products: `transformers`, `tokenizers` and `datasets`.

Finally, it is shown that a model domain-adapted on the full in-domain corpus performs better than an out-of-the-box model.