# BioMed: Information Retrieval - BioMedical Information Retrieval System

---

**Group:**
- Reyes Castro, Didier Yamil (didier.reyes.castro@alumnos.upm.es)
- Rodriguez Fernández, Cristina ()

**Course:** BioMedical Informatics - 2025/26

**Institution:** Polytechnic University of Madrid (UPM)

**Date:** November 2026

---

## Goal

To develop an Information Retrieval system — specifically, a **binary text classifier** — to identify scientific articles in the PubMed database that are related to a given set of abstracts within a defined research topic. In this case, the focus is on a collection of 1,308 manuscripts containing information on the polyphenol composition of various foods.

## Setup and Installation

In [None]:
# !pip install scikit-learn pandas requests transformers pytorch datasets numpy

In [1]:
import requests
import time

import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EarlyStoppingCallback
from datasets import Dataset
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix, classification_report

  from .autonotebook import tqdm as notebook_tqdm


## **Task 1:** 

Retrieve from PubMed the abstracts associated with each publication in publications.xlsx

(21 minutes with API KEY)

In [2]:
dataset = pd.read_csv('publications.csv')
dataset.head()

Unnamed: 0,id,authors,year_of_publication,title,abbreviation,journal_name,journal_volume,journal_issue,pages,created_at,updated_at
0,1216,"Aaby K., Wrolstad R.E., Ekeberg D., Skrede G.",2007,Polyphenol composition and antioxidant activit...,AABY 2007,Journal of Agricultural and Food Chemistry,55,13.0,5156-5166,2012-12-01 22:21:08 UTC,2015-04-14 04:25:30 UTC
1,1052,"Abd El Mohsen M.M., Kuhnle G., Rechner A.R., S...",2002,Uptake and metabolism of epicatechin and its a...,ABD EL MOHSEN 2002,Free Radic Biol Med,33,12.0,1693-702,2015-04-13 21:45:29 UTC,2015-04-14 04:25:30 UTC
2,356,"Abdel-Aal E.-S.M., Hucl P.",2003,Composition and stability of anthocyanins in b...,ABDEL-AAL 2003,Journal of Agricultural and Food Chemistry,51,,2174-2180,2015-04-13 21:45:25 UTC,2015-04-14 04:25:30 UTC
3,458,"Abdel-Aal E.-S. M., Young C., Rabalski I.",2006,"Anthocyanin composition in black, blue, pink, ...",ABDEL-AAL 2006,Journal of Agricultural and Food Chemistry,54,,4696-4704,2006-04-09 12:07:36 UTC,2015-04-14 04:25:31 UTC
4,332,"Abril M., Negueruela A.I., Perez C., Juan T., ...",2005,Preliminary study of resveratrol content in Ar...,Apr-05,Food Chemistry,92,4.0,729-736,2015-04-13 21:45:25 UTC,2015-04-13 21:45:25 UTC


In [3]:
BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
ESEARCH_URL = BASE_URL + "esearch.fcgi"
FETCH_URL = BASE_URL + "efetch.fcgi"

# Step 1: Search for the PMID of the article by title
def search_pmid_by_title(title, api_key=None):
    params = {
        "db": "pubmed",
        "term": f"{title}[Title]",
        "retmode": "json",
        "api_key": api_key
    }

    try:

        response = requests.get(ESEARCH_URL, params=params)
        response.raise_for_status()
        data = response.json()

        if 'esearchresult' in data and data['esearchresult']['count'] != '0':
            return data['esearchresult']['idlist'][0]
        else:
            print(f"Found {data['esearchresult']['count']} PMIDs for title: {title}. Skipping...")
            return None

    except requests.exceptions.RequestException as e:
        print(f"Error during request for title '{title}': {e}")
        return None

# Step 2: Fetch article abstract by PMID
def fetch_abstract_by_pmid(pmid, api_key=None):
    params = {
        "db": "pubmed",
        "id": pmid,
        "retmode": "text",
        "rettype": "abstract",
        "api_key": api_key
    }

    try:
        response = requests.get(FETCH_URL, params=params)
        response.raise_for_status()
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching abstract for PMID '{pmid}': {e}")
        return None

In [5]:
PUBLICATIONS_WITH_ABSTRACTS = 'publications_abstract.csv'

# Process each article in the dataset
relevant_abstracts = []
for i, article in dataset.iterrows():

    article_info = {
        'id': article['id'],
        'pmid': None,
        'title': article['title'],
        'abstract': None
    }

    pmid = search_pmid_by_title(article['title'], api_key="8e029cc2ba291ed9ee30e494f27c18017408")
    
    if pmid:
        article_info['pmid'] = pmid
        article_info['abstract'] = fetch_abstract_by_pmid(pmid, api_key="8e029cc2ba291ed9ee30e494f27c18017408")

    relevant_abstracts.append(article_info)

    # CHANGE ME TO 0.1 IF YOU HAVE AN API KEY
    print("Sleeping for 0.1...")
    time.sleep(0.1)  # Delaying 0.1s to respect NCBI rate limits (3 requests per second)

# Add relevant_abstracts to a new dataset
relevant_df = pd.DataFrame(relevant_abstracts)

# Save the updated dataset
relevant_df.to_csv(PUBLICATIONS_WITH_ABSTRACTS, index=False)

Sleeping for 0.1...
Found 0 PMIDs for title: Uptake and metabolism of epicatechin and its access to the brain after oral ingestion. Skipping...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Found 0 PMIDs for title: Preliminary study of resveratrol content in Aragon red and rose wines. Skipping...
Sleeping for 0.1...
Sleeping for 0.1...
Found 0 PMIDs for title: Enhancement of total phenolics and antioxidant properties of some tropical green leafy vegetables by steam cooking. Skipping...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Found 0 PMIDs for title: Correlation of tocopherol, tocotrienol, gamma-oryzanol and total polyphenol content in rice bran with different antioxidant capacity assays. Skipping...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Found 0 PMIDs for title: Functional attributes of soybean seeds and products, with reference to isoflavone content and antioxidant activity. Skipping...
Sleeping for 0.1...
Slee

In [6]:
print("Total number of relevant articles:", len(relevant_df))
print("Number of relevant articles with NO abstracts:", relevant_df['abstract'].isnull().sum())
relevant_df

Total number of relevant articles: 1308
Number of relevant articles with NO abstracts: 645


Unnamed: 0,id,pmid,title,abstract
0,1216,17550269,Polyphenol composition and antioxidant activit...,1. J Agric Food Chem. 2007 Jun 27;55(13):5156-...
1,1052,,Uptake and metabolism of epicatechin and its a...,
2,356,12670152,Composition and stability of anthocyanins in b...,1. J Agric Food Chem. 2003 Apr 9;51(8):2174-80...
3,458,16787017,"Anthocyanin composition in black, blue, pink, ...",1. J Agric Food Chem. 2006 Jun 28;54(13):4696-...
4,332,,Preliminary study of resveratrol content in Ar...,
...,...,...,...,...
1303,816,,Bioactive compounds in the cereal grains befor...,
1304,497,16917808,Antioxidants in thermally treated buckwheat gr...,1. Mol Nutr Food Res. 2006 Sep;50(9):824-32. d...
1305,743,,Effects of growing site and nitrogen fertiliza...,
1306,203,12059161,"Separation, characterization and quantitation ...",1. J Agric Food Chem. 2002 Jun 19;50(13):3789-...


In [7]:
# Getting rid of articles without abstracts
relevant_df = relevant_df[relevant_df['abstract'].notnull()].reset_index(drop=True)
relevant_df

Unnamed: 0,id,pmid,title,abstract
0,1216,17550269,Polyphenol composition and antioxidant activit...,1. J Agric Food Chem. 2007 Jun 27;55(13):5156-...
1,356,12670152,Composition and stability of anthocyanins in b...,1. J Agric Food Chem. 2003 Apr 9;51(8):2174-80...
2,458,16787017,"Anthocyanin composition in black, blue, pink, ...",1. J Agric Food Chem. 2006 Jun 28;54(13):4696-...
3,89,10552788,HPLC method for the quantification of procyani...,1. J Agric Food Chem. 1999 Oct;47(10):4184-8. ...
4,1122,1659780,Urinary excretion of lignans and isoflavonoid ...,1. Am J Clin Nutr. 1991 Dec;54(6):1093-100. do...
...,...,...,...,...
658,671,34071300,Antioxidant activity and phenolic compounds in...,1. Foods. 2021 May 28;10(6):1227. doi: 10.3390...
659,212,12517117,Oxygen radical absorbing capacity of phenolics...,1. J Agric Food Chem. 2003 Jan 15;51(2):502-9....
660,392,10888490,Antioxidant activity and total phenolics in se...,1. J Agric Food Chem. 2000 Jun;48(6):2008-16. ...
661,497,16917808,Antioxidants in thermally treated buckwheat gr...,1. Mol Nutr Food Res. 2006 Sep;50(9):824-32. d...


## **Task 2:**

Use the EUtilities tool to search for articles whose content is not relevant to this task. Size of the dataset should be the same of relevant documents.

In [8]:
def get_articles_pmids_for_title(title, count, api_key=None):
    
    params = {
        "db": "pubmed",
        "term": f"{title}[Title]",
        "retmode": "json",
        "retmax": count,
        "api_key": api_key
    }

    try:
        response = requests.get(ESEARCH_URL, params=params)
        response.raise_for_status()
        data = response.json()

        if 'esearchresult' in data and data['esearchresult']['count'] != '0':
            return data['esearchresult']['idlist']
        else:
            print(f"Found {data['esearchresult']['count']} irrelevant articles.")
            return []

    except requests.exceptions.RequestException as e:
        print(f"Error during request for irrelevant articles: {e}")
        return []


In [9]:
IRRELEVANT_PUBLICATIONS = 'irrelevant_publications.csv'

irrelevant_pmids_list = get_articles_pmids_for_title("cancer", len(relevant_df), api_key="8e029cc2ba291ed9ee30e494f27c18017408")

irrelevant_abstracts = []
for pmid in irrelevant_pmids_list:

    article_info = {
        'pmid': pmid,
        'abstract': None
    }

    article_info['abstract'] = fetch_abstract_by_pmid(pmid, api_key="8e029cc2ba291ed9ee30e494f27c18017408")
    irrelevant_abstracts.append(article_info)

    # CHANGE ME TO 0.1 IF YOU HAVE AN API KEY
    print("Sleeping for 0.1...")
    time.sleep(0.1)  # Delaying 0.1s to respect NCBI rate limits (3 requests per second)

# Save irrelevant abstracts to a new dataset
irrelevant_df = pd.DataFrame(irrelevant_abstracts)
irrelevant_df.to_csv(IRRELEVANT_PUBLICATIONS, index=False)

Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...
Sleeping for 0.1...


In [10]:
irrelevant_df

Unnamed: 0,pmid,abstract
0,41174974,1. Integr Cancer Ther. 2025 Jan-Dec;24:1534735...
1,41174918,1. Cancer Med. 2025 Nov;14(21):e71256. doi: 10...
2,41174903,1. IUBMB Life. 2025 Nov;77(11):e70074. doi: 10...
3,41174888,1. Biofactors. 2025 Nov-Dec;51(6):e70050. doi:...
4,41174886,1. Aliment Pharmacol Ther. 2025 Oct 31. doi: 1...
...,...,...
658,41162668,1. Arch Gynecol Obstet. 2025 Oct 29. doi: 10.1...
659,41162660,1. Int J Colorectal Dis. 2025 Oct 30;40(1):225...
660,41162642,1. Nat Immunol. 2025 Oct 29. doi: 10.1038/s415...
661,41162641,1. Sci Rep. 2025 Oct 29;15(1):37761. doi: 10.1...


## **Task 4:**

Implement the chosen retrieval system using the programming language of their choice. If the information retrieval system is based on machine learning techniques, the student must split the existing datasets (relevant and non-relevant documents) into three distinct groups (training, validation, and testing) to carry out the model training.

**CHOSEN RETRIEVAL SYSTEM:** BioBERT-based Binary Text Classifier

In [11]:
# Adding target variable 'relevance' 
relevant_df['relevance'] = 1
irrelevant_df['relevance'] = 0

# Combining relevant and irrelevant datasets and maintaining only abstract and relevance columns
features = ['abstract', 'relevance']
combined_df = pd.concat([relevant_df[features], irrelevant_df[features]], ignore_index=True)

# Remove any rows where the abstract is missing (e.g., API fetch failed)
combined_df.dropna(subset=['abstract'], inplace=True)
combined_df.reset_index(drop=True, inplace=True)

# Saving
combined_df.to_csv('combined_publications.csv', index=False)

print("Class distribution:")
print(combined_df['relevance'].value_counts())

combined_df

Class distribution:
relevance
1    663
0    662
Name: count, dtype: int64


Unnamed: 0,abstract,relevance
0,1. J Agric Food Chem. 2007 Jun 27;55(13):5156-...,1
1,1. J Agric Food Chem. 2003 Apr 9;51(8):2174-80...,1
2,1. J Agric Food Chem. 2006 Jun 28;54(13):4696-...,1
3,1. J Agric Food Chem. 1999 Oct;47(10):4184-8. ...,1
4,1. Am J Clin Nutr. 1991 Dec;54(6):1093-100. do...,1
...,...,...
1320,1. Arch Gynecol Obstet. 2025 Oct 29. doi: 10.1...,0
1321,1. Int J Colorectal Dis. 2025 Oct 30;40(1):225...,0
1322,1. Nat Immunol. 2025 Oct 29. doi: 10.1038/s415...,0
1323,1. Sci Rep. 2025 Oct 29;15(1):37761. doi: 10.1...,0


Following Fine-tuning of BERT for text classification tasks: https://huggingface.co/docs/transformers/en/tasks/sequence_classification

- Train-Test-Validation Split: 80%-10%-10%

In [12]:
RANDOM_STATE = 42

train_df, test_df = train_test_split(combined_df,
                                     test_size=0.2,
                                     stratify=combined_df["relevance"],
                                     random_state=RANDOM_STATE)

val_df, test_df = train_test_split(test_df,
                                   test_size=0.5,
                                   stratify=test_df["relevance"],
                                   random_state=RANDOM_STATE)

print(f"Training size: {len(train_df)}")
print(f"Validation size: {len(val_df)}")
print(f"Test size: {len(test_df)}")

Training size: 1060
Validation size: 132
Test size: 133


- Convert Pandas DataFrame to HuggingFace Dataset

In [13]:
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)
test_dataset = Dataset.from_pandas(test_df)

- Tokenization of abstracts using BioBERT tokenizer

In [14]:
BERT_MODEL_NAME = "dmis-lab/biobert-v1.1"
tokenizer = AutoTokenizer.from_pretrained(BERT_MODEL_NAME)

def tokenize(examples):
    return tokenizer(examples["abstract"], 
                     padding="max_length", 
                     truncation=True,
                     max_length=512 # Maximum length for BERT models
                    )

train_dataset = train_dataset.map(tokenize, batched=True)
val_dataset = val_dataset.map(tokenize, batched=True)
test_dataset = test_dataset.map(tokenize, batched=True)

# Renaming the target column to 'labels' as expected by HuggingFace Trainer
train_dataset = train_dataset.rename_column("relevance", "labels")
val_dataset = val_dataset.rename_column("relevance", "labels")
test_dataset = test_dataset.rename_column("relevance", "labels")

Map: 100%|██████████| 1060/1060 [00:00<00:00, 3277.69 examples/s]
Map: 100%|██████████| 132/132 [00:00<00:00, 3592.44 examples/s]
Map: 100%|██████████| 133/133 [00:00<00:00, 3430.66 examples/s]


- Loading BioBERT model for binary text classification (relevant vs irrelevant)

In [15]:
id2label = {0: "irrelevant", 1: "relevant"}
label2id = {"irrelevant": 0, "relevant": 1}

model = AutoModelForSequenceClassification.from_pretrained(BERT_MODEL_NAME, 
                                                           num_labels=2,
                                                           id2label=id2label,
                                                           label2id=label2id)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dmis-lab/biobert-v1.1 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


- Defining evaluation metrics

In [16]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)

    accuracy = accuracy_score(labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, 
                                                               predictions, 
                                                               average="binary",
                                                               zero_division=0)

    return {
        "accuracy": accuracy,
        "precision": precision,
        "recall": recall,
        "f1": f1
    }

- Putting the training arguments

In [17]:
training_args = TrainingArguments(
    output_dir="./biobert_pubmed_classifier",

    # Training hyperparameters
    num_train_epochs=3,
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,

    # Optimiser settings
    weight_decay=0.01,
    
    # Evaluation settings
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_steps=100,

    # Model selection    
    load_best_model_at_end=True,
    metric_for_best_model="f1",
    greater_is_better=True,

    # Performance
    fp16=torch.cuda.is_available(),
    dataloader_num_workers=4,

    seed=RANDOM_STATE,
    push_to_hub=False,
    report_to="none"
)



- Actual training using Trainer API

In [18]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

trainer.train()

  trainer = Trainer(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avo

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,No log,0.064497,0.977273,0.956522,1.0,0.977778
2,0.151500,0.001479,1.0,1.0,1.0,1.0
3,0.002000,0.001073,1.0,1.0,1.0,1.0


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

TrainOutput(global_step=201, training_loss=0.07633858400197764, metrics={'train_runtime': 531.7511, 'train_samples_per_second': 5.98, 'train_steps_per_second': 0.378, 'total_flos': 836693156044800.0, 'train_loss': 0.07633858400197764, 'epoch': 3.0})

- Evaluating on the test set

In [19]:
predictions_output = trainer.predict(test_dataset)
predictions = np.argmax(predictions_output.predictions, axis=-1)
true_labels = predictions_output.label_ids

# Calculate all metrics
test_metrics = compute_metrics((predictions_output.predictions, true_labels))

print("\nTest Set Results:")
print(f"Accuracy:  {test_metrics['accuracy']:.4f}")
print(f"Precision: {test_metrics['precision']:.4f}")
print(f"Recall:    {test_metrics['recall']:.4f}")
print(f"F1-Score:  {test_metrics['f1']:.4f}")

# Detailed classification report
print("\nClassification Report:")
print(classification_report(
    true_labels, 
    predictions,
    target_names=['Irrelevant', 'Relevant'],
    digits=4
))

# Confusion matrix
print("\nConfusion Matrix:")
cm = confusion_matrix(true_labels, predictions)
print(cm)
print(f"\nTrue Negatives:  {cm[0][0]} (correctly identified irrelevant)")
print(f"False Positives: {cm[0][1]} (incorrectly marked relevant)")
print(f"False Negatives: {cm[1][0]} (missed relevant papers)")
print(f"True Positives:  {cm[1][1]} (correctly identified relevant)")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av


Test Set Results:
Accuracy:  0.9925
Precision: 1.0000
Recall:    0.9851
F1-Score:  0.9925

Classification Report:
              precision    recall  f1-score   support

  Irrelevant     0.9851    1.0000    0.9925        66
    Relevant     1.0000    0.9851    0.9925        67

    accuracy                         0.9925       133
   macro avg     0.9925    0.9925    0.9925       133
weighted avg     0.9926    0.9925    0.9925       133


Confusion Matrix:
[[66  0]
 [ 1 66]]

True Negatives:  66 (correctly identified irrelevant)
False Positives: 0 (incorrectly marked relevant)
False Negatives: 1 (missed relevant papers)
True Positives:  66 (correctly identified relevant)


- saving the trained model

In [20]:
model_save_path = './final_biobert_classifier'
trainer.save_model(model_save_path)
tokenizer.save_pretrained(model_save_path)

('./final_biobert_classifier/tokenizer_config.json',
 './final_biobert_classifier/special_tokens_map.json',
 './final_biobert_classifier/vocab.txt',
 './final_biobert_classifier/added_tokens.json',
 './final_biobert_classifier/tokenizer.json')