# Introduction

This notebook gives a quick example of using Transformers for NLP. It is meant to demystify these state-of-the-art models (the coe below isn't _that_ different from other things you've seen earlier in the course). 

It will be very superficial. Have a look at the [HuggingFace course](https://huggingface.co/course/chapter1/1) and their great documentation over at https://huggingface.co/transformers for more, if you're interested.

<p style="text-align:center;" ><a href="https://huggingface.co/course/chapter1/1"><img src="https://huggingface.co/front/assets/course-logo.svg"></a></p>

# Setup

In [1]:
# This is a quick check of whether the notebook is currently running 
# on Google Colaboratory
if 'google.colab' in str(get_ipython()):
    print('The notebook is running on Colab. colab=True.')
    colab=True
else:
    print('The notebook is not running on Colab. colab=False.')
    colab=False

The notebook is not running on Colab. colab=False.


In [2]:
%matplotlib inline
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from pathlib import Path

In [7]:
if colab:
    !pip install -Uqq fastbook
    import fastbook
    from fastbook import *
    !pip install git+https://github.com/huggingface/transformers.git datasets
    from google.colab import drive
    drive.mount("/content/gdrive")
    DATA = Path("/content/gdrive/MyDrive/Colab Notebooks/elmed219-data")
    DATA.mkdir(exist_ok=True)
if not colab:
    DATA=Path('./data')
    DATA.mkdir(exist_ok=True)

In [8]:
from fastai.basics import *

In [9]:
import datasets

In [10]:
from datasets import load_dataset

In [11]:
# Verify that the transformers library is installed and operational
import transformers
print(transformers.pipeline('sentiment-analysis')('we love you'))

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


[{'label': 'POSITIVE', 'score': 0.9998704195022583}]


In [12]:
from datasets import Dataset
from transformers import (AutoTokenizer, AutoModelForSequenceClassification, 
                          PreTrainedModel, BertModel, BertForSequenceClassification,
                          TrainingArguments, Trainer)

from transformers.modeling_outputs import SequenceClassifierOutput

# MedWeb using Transformers

Load the data as before:

In [13]:
df = pd.read_csv('https://github.com/MMIV-ML/ELMED219-2022/raw/main/Lab2-NLP/data/medwebdata.csv')
df.head()

Unnamed: 0,ID,Tweet,Influenza,Diarrhea,Hayfever,Cough,Headache,Fever,Runnynose,Cold,labels,is_test
0,1en,The cold makes my whole body weak.,0,0,0,0,0,0,0,1,Cold,False
1,2en,It's been a while since I've had allergy symptoms.,0,0,1,0,0,0,1,0,Hayfever;Runnynose,False
2,3en,I'm so feverish and out of it because of my allergies. I'm so sleepy.,0,0,1,0,0,1,1,0,Hayfever;Fever;Runnynose,False
3,4en,"I took some medicine for my runny nose, but it won't stop.",0,0,0,0,0,0,1,0,Runnynose,False
4,5en,I had a bad case of diarrhea when I traveled to Nepal.,0,0,0,0,0,0,0,0,sober,False


For convenience, we combine all the labels into one vector stored under `y`:

In [14]:
df.drop(['is_test','labels'], axis=1, inplace=True)

In [15]:
df['labels'] = df.apply(lambda x: [x[c] for c in df.columns[2:]], axis=1)

In [16]:
df.head()

Unnamed: 0,ID,Tweet,Influenza,Diarrhea,Hayfever,Cough,Headache,Fever,Runnynose,Cold,labels
0,1en,The cold makes my whole body weak.,0,0,0,0,0,0,0,1,"[0, 0, 0, 0, 0, 0, 0, 1]"
1,2en,It's been a while since I've had allergy symptoms.,0,0,1,0,0,0,1,0,"[0, 0, 1, 0, 0, 0, 1, 0]"
2,3en,I'm so feverish and out of it because of my allergies. I'm so sleepy.,0,0,1,0,0,1,1,0,"[0, 0, 1, 0, 0, 1, 1, 0]"
3,4en,"I took some medicine for my runny nose, but it won't stop.",0,0,0,0,0,0,1,0,"[0, 0, 0, 0, 0, 0, 1, 0]"
4,5en,I had a bad case of diarrhea when I traveled to Nepal.,0,0,0,0,0,0,0,0,"[0, 0, 0, 0, 0, 0, 0, 0]"


Set up the transformers model. We'll use the [PubMedBERT model](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract) created by Microsoft Research by training a BERT model on 14 million abstracts of PubMed articles. 

Have a look at the blog post [Domain-specific language model pretraining for biomedical natural language processing](https://www.microsoft.com/en-us/research/blog/domain-specific-language-model-pretraining-for-biomedical-natural-language-processing/) and the accompanying paper. 

In [18]:
model_name = 'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract'
model = transformers.AutoModelForSequenceClassification.from_pretrained(model_name)

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

We need to tokenize the data in the same way as the original PubMed dataset and create a data set compatible with HuggingFace:

In [19]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

In [20]:
ds = Dataset.from_pandas(df, split='train').train_test_split()

In [21]:
ds

DatasetDict({
    train: Dataset({
        features: ['ID', 'Tweet', 'Influenza', 'Diarrhea', 'Hayfever', 'Cough', 'Headache', 'Fever', 'Runnynose', 'Cold', 'labels'],
        num_rows: 1920
    })
    test: Dataset({
        features: ['ID', 'Tweet', 'Influenza', 'Diarrhea', 'Hayfever', 'Cough', 'Headache', 'Fever', 'Runnynose', 'Cold', 'labels'],
        num_rows: 640
    })
})

In [22]:
ds['train'][0]

{'ID': '351en',
 'Tweet': "I can't call in sick for a fever, so I'm taking medicine for it.",
 'Influenza': 0,
 'Diarrhea': 0,
 'Hayfever': 0,
 'Cough': 0,
 'Headache': 0,
 'Fever': 1,
 'Runnynose': 0,
 'Cold': 0,
 'labels': [0, 0, 0, 0, 0, 1, 0, 0]}

In [23]:
def tokenize_and_encode(examples):
    return tokenizer(examples["Tweet"], truncation=True)

In [24]:
cols = ds['train'].column_names
cols.remove('labels')
ds_enc = ds.map(tokenize_and_encode, batched=True, remove_columns=cols)

  0%|          | 0/2 [00:00<?, ?ba/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


  0%|          | 0/1 [00:00<?, ?ba/s]

In [25]:
ds_enc

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'token_type_ids'],
        num_rows: 1920
    })
    test: Dataset({
        features: ['attention_mask', 'input_ids', 'labels', 'token_type_ids'],
        num_rows: 640
    })
})

In [26]:
class BertForMultilabelSequenceClassification(BertForSequenceClassification):
    def __init__(self, config):
        super().__init__(config)

    def forward(self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None):
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict)

        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            loss_fct = torch.nn.BCEWithLogitsLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), 
                            labels.float().view(-1, self.num_labels))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions)

In [27]:
num_labels=8
model = BertForMultilabelSequenceClassification.from_pretrained(model_name, num_labels=num_labels).to('cuda')

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model)

We define some metrics to use when scoring on the test data:

In [28]:
from sklearn.metrics import f1_score
def accuracy_thresh(y_pred, y_true, thresh=0.5, sigmoid=True): 
    y_pred = torch.from_numpy(y_pred)
    y_true = torch.from_numpy(y_true)
    if sigmoid: 
        y_pred = y_pred.sigmoid()
    return ((y_pred>thresh)==y_true.bool()).float().mean().item()

def f1score_thresh(y_pred, y_true, average='micro',thresh=0.5, sigmoid=True): 
    y_pred = torch.from_numpy(y_pred)
    y_true = torch.from_numpy(y_true)
    if sigmoid: 
        y_pred = y_pred.sigmoid()
    return f1_score(y_true, y_pred>thresh, average='micro')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    return {'accuracy_thresh': accuracy_thresh(predictions, labels),
           'f1score_micro_thresh': f1score_thresh(predictions, labels, average='micro'),
           'f1score_macro_thresh': f1score_thresh(predictions, labels, average='macro')}

..and then the training setup:

In [29]:
batch_size = 8

args = TrainingArguments(
    output_dir=".",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=6,
    weight_decay=0.01
)

In [30]:
trainer = Trainer(
    model,
    args,
    train_dataset=ds_enc["train"],
    eval_dataset=ds_enc["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer)

Let's see how the model does without any training on the MedWeb data:

In [31]:
trainer.evaluate()

    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
***** Running Evaluation *****
  Num examples = 640
  Batch size = 16


{'eval_loss': 0.788185179233551,
 'eval_accuracy_thresh': 0.28535157442092896,
 'eval_f1score_micro_thresh': 0.20231087856987137,
 'eval_f1score_macro_thresh': 0.20231087856987137,
 'eval_runtime': 8.4103,
 'eval_samples_per_second': 76.098,
 'eval_steps_per_second': 4.756}

Then fine-tune it:

In [32]:
trainer.train()

    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
***** Running training *****
  Num examples = 1920
  Num Epochs = 6
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 720


Epoch,Training Loss,Validation Loss,Accuracy Thresh,F1score Micro Thresh,F1score Macro Thresh
1,No log,0.190666,0.948438,0.775891,0.775891
2,No log,0.13373,0.966602,0.870159,0.870159
3,No log,0.110169,0.97168,0.883534,0.883534
4,No log,0.098374,0.972266,0.887658,0.887658
5,0.171800,0.097016,0.972266,0.886218,0.886218
6,0.171800,0.095269,0.972852,0.889066,0.889066


***** Running Evaluation *****
  Num examples = 640
  Batch size = 16
    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
***** Running Evaluation *****
  Num examples = 640
  Batch size = 16
    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.
***** Running Evaluation *****
  Num examples = 640
  Batch size = 16
    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
   

TrainOutput(global_step=720, training_loss=0.1411350131034851, metrics={'train_runtime': 277.6708, 'train_samples_per_second': 41.488, 'train_steps_per_second': 2.593, 'total_flos': 231566647346688.0, 'train_loss': 0.1411350131034851, 'epoch': 6.0})

# Another example: The Genetic Association Database (GAD)

For this example we'll use GAD, which contains gene-disease relations, based on the data set prepeared by BioBERT: https://github.com/dmis-lab/biobert. Have a look at [Becker, Kevin G., et al. "The genetic association database." Nature genetics 36.5 (2004): 431-432.](https://geneticassociationdb.nih.gov/gad.pdf) for details about the database.

Let's try to fine-tune a model to perform a new task. 

# Get a pretrained model and tokenizer

In [7]:
model_name = 'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract'
model = transformers.AutoModelForSequenceClassification.from_pretrained(model_name)

Downloading:   0%|          | 0.00/385 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSeque

In [8]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

In [9]:
tokenizer("""Our findings indicate that the @GENE$-112G/A polymorphism 
          does not play a substantial role in genetic predisposition to @DISEASE$ in this Japanese population.""")

{'input_ids': [2, 2280, 2606, 3275, 1760, 1680, 35, 2397, 7, 16, 12851, 1013, 18, 42, 6343, 4042, 1888, 3568, 42, 5945, 2467, 1682, 3056, 15831, 1701, 35, 2174, 7, 1682, 1805, 7910, 2806, 17, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

# Genetic Association Database (GAD)

## Download data

In [13]:
url = 'https://www.dropbox.com/s/s91q5kp6ausq9cj/REdata.zip?dl=1'

In [14]:
if not os.path.isfile(DATA/'REdata.zip'):
    urllib.request.urlretrieve(url, DATA/'REdata.zip')
    shutil.unpack_archive(DATA/'REdata.zip', extract_dir=DATA)

In [15]:
GAD = DATA/'GAD'

In [16]:
GAD.ls()

(#11) [Path('/home/alex/data-tmp/huggingface/GAD/6'),Path('/home/alex/data-tmp/huggingface/GAD/5'),Path('/home/alex/data-tmp/huggingface/GAD/models'),Path('/home/alex/data-tmp/huggingface/GAD/9'),Path('/home/alex/data-tmp/huggingface/GAD/2'),Path('/home/alex/data-tmp/huggingface/GAD/10'),Path('/home/alex/data-tmp/huggingface/GAD/4'),Path('/home/alex/data-tmp/huggingface/GAD/3'),Path('/home/alex/data-tmp/huggingface/GAD/7'),Path('/home/alex/data-tmp/huggingface/GAD/8')...]

## Explore data

In [17]:
pd.set_option('max.colwidth', 1000)
pd.set_option('display.html.use_mathjax', False)

In [18]:
example_fn = GAD/'5'/'train.tsv'
df = pd.read_csv(example_fn, sep='\t', header=None, names=['text', 'interaction'])

In [19]:
df.head()

Unnamed: 0,text,interaction
0,An interaction with hypertension in the association between the @GENE$ G460W polymorphism and @DISEASE$ merits further testing in additional populations.,1
1,Our study suggests that the five SNPs within @GENE$ gene we studied may not play a major role in the @DISEASE$ susceptibility in the Chinese Han population.,1
2,Our findings suggest that the @GENE$ polymorphism is not associated with an increased risk of squamous cell @DISEASE$ in Korean women.,1
3,Our findings indicate that the @GENE$-112G/A polymorphism does not play a substantial role in genetic predisposition to @DISEASE$ in this Japanese population.,0
4,Although an increasing number of studies report an association between the @GENE$ G1385A variant and @DISEASE$ risk; this variant does not appear to be implicated in the development of breast cancer.,0


In [20]:
df.interaction.value_counts()

1    2521
0    2276
Name: interaction, dtype: int64

## Prepare data

In [21]:
def read_gad_split(df):
    texts = []
    labels = []
    for row in df.iterrows():
        texts.append(row[1].text)
        labels.append(row[1].interaction)
    return texts, labels

In [22]:
train_texts, train_labels = read_gad_split(df)

In [23]:
train_texts[:5]

['An interaction with hypertension in the association between the @GENE$ G460W polymorphism and @DISEASE$ merits further testing in additional populations.',
 'Our study suggests that the five SNPs within @GENE$ gene we studied may not play a major role in the @DISEASE$ susceptibility in the Chinese Han population.',
 'Our findings suggest that the @GENE$ polymorphism is not associated with an increased risk of squamous cell @DISEASE$ in Korean women.',
 'Our findings indicate that the @GENE$-112G/A polymorphism does not play a substantial role in genetic predisposition to @DISEASE$ in this Japanese population.',
 'Although an increasing number of studies report an association between the @GENE$ G1385A variant and @DISEASE$ risk; this variant does not appear to be implicated in the development of breast cancer.']

In [24]:
train_labels[:5]

[1, 1, 1, 0, 0]

In [25]:
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)

In [26]:
train_encodings = tokenizer(train_texts, padding=True)

In [27]:
val_encodings = tokenizer(val_texts, padding=True)

### Create dataset

In [28]:
import torch

In [29]:
class GADDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels
        
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item
    
    def __len__(self):
        return len(self.labels)
    

In [30]:
train_dataset = GADDataset(train_encodings, train_labels)
val_dataset = GADDataset(val_encodings, val_labels)

## Use the pre-trained transformer model

We'll fine-tune the pre-trained model to classify whether the text indicates an interaction or not. 

In [31]:
MODELS_GDB = DATA/'GAD'/'models'
MODELS_GDB.mkdir(exist_ok=True, parents=True)

In [32]:
from transformers import Trainer, TrainingArguments

In [33]:
training_args = TrainingArguments(
    output_dir=str(MODELS_GDB/'results'),
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir=str(MODELS_GDB/'logs'),
    logging_steps=10
    )

In [34]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

In [35]:
trainer = Trainer(
    model=model,                         
    args=training_args,                  
    train_dataset=train_dataset,         
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)

In [36]:
trainer.train()

    There is an imbalance between your GPUs. You may want to exclude GPU 1 which
    has less than 75% of the memory or cores of GPU 0. You can do so by setting
    the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
    environment variable.


Step,Training Loss
10,0.7198
20,0.718
30,0.6998
40,0.6982
50,0.6966
60,0.6661
70,0.6693
80,0.6666
90,0.6284
100,0.6207


TrainOutput(global_step=120, training_loss=0.6644821008046468, metrics={'train_runtime': 52.5067, 'train_samples_per_second': 2.285, 'total_flos': 327669619825080.0, 'epoch': 1.0})

In [37]:
pred = trainer.evaluate(eval_dataset=val_dataset)

In [38]:
pred

{'eval_loss': 0.5201655030250549,
 'eval_accuracy': 0.7479166666666667,
 'eval_f1': 0.7734082397003745,
 'eval_precision': 0.722027972027972,
 'eval_recall': 0.8326612903225806,
 'eval_runtime': 2.1733,
 'eval_samples_per_second': 441.732,
 'epoch': 1.0}

# Possible next steps

You can consider trying out the model on other tasks from the benchmark data sets set up by the authors of PubMedBERT [BLURB: Biomedical Language Understanding and Reasoning Benchmark](https://microsoft.github.io/BLURB/) (or any other similar task you may think of).