<a href="https://colab.research.google.com/github/almond5/CAP5510_Datasets/blob/main/BioGPT_GPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%pip install transformers
%pip install sacremoses
%pip install torch
%pip install datasets



In [5]:
import torch
from transformers import AutoTokenizer, BioGptForSequenceClassification, Trainer, TrainingArguments

tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
model = BioGptForSequenceClassification.from_pretrained("microsoft/biogpt", num_labels=3)


Some weights of BioGptForSequenceClassification were not initialized from the model checkpoint at microsoft/biogpt and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
from datasets import load_dataset

# Load the dataset
ds = load_dataset("qiaojin/PubMedQA", "pqa_labeled")

# Preprocess the dataset
def preprocess_function(examples):
    inputs = [f"Context: {context} Question: {question}" for question, context in zip(examples['question'], examples['context'])]
    targets = examples['final_decision']

    # Expanded label mapping to handle additional cases
    label_mapping = {"yes": 0, "no": 1, "uncertain": 2, "maybe": 2}

    # Handle missing or unexpected labels gracefully
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = [label_mapping.get(target.lower(), 2) for target in targets]  # Default to 'uncertain' for unknown labels

    return model_inputs

# Select a small subset of the dataset for demonstration
small_ds = ds['train'].select(range(100))
tokenized_ds = small_ds.map(preprocess_function, batched=True)

train_size = 60
val_size = 20
test_size = 20

train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(
    tokenized_ds, [train_size, val_size, test_size]
)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    num_train_epochs=3,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    weight_decay=0.01,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
)

# Train the model
trainer.train()

# Evaluate the model
metrics = trainer.evaluate(eval_dataset=test_dataset)
print("Test set evaluation:", metrics)

[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33madrianhossen5[0m ([33madrianhossen5-university-of-central-florida[0m). Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
1,No log,1.684911
2,No log,0.924832
3,No log,1.369139


Test set evaluation: {'eval_loss': 1.0992672443389893, 'eval_runtime': 2.5632, 'eval_samples_per_second': 7.803, 'eval_steps_per_second': 7.803, 'epoch': 3.0}


In [7]:
# Define the function to predict using the trained model
def predict_answer(question, context):
    input_text = f"Context: {context} Question: {question}"
    inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True, padding=True)
    inputs = {key: val.to(model.device) for key, val in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

    # Map back to "yes", "no", "uncertain"
    label_mapping = {0: "yes", 1: "no", 2: "uncertain", 2: "maybe"}
    return label_mapping[predicted_class]

# Evaluate the Q&A performance
num_correct = 0

for example in test_dataset:
    question = example['question']
    context = example['context']
    true_answer = example['final_decision']
    predicted_answer = predict_answer(question, context)

    print(f"Question: {question}")
    print(f"Context: {context}")
    print(f"True Answer: {true_answer}")
    print(f"Predicted Answer: {predicted_answer}")

    if true_answer.lower() == predicted_answer:
        num_correct += 1

    print("="*80)

# Print accuracy
print(f"Accuracy: {num_correct / len(test_dataset)}")


Question: Can predilatation in transcatheter aortic valve implantation be omitted?
Context: {'contexts': ['The use of a balloon expandable stent valve includes balloon predilatation of the aortic stenosis before valve deployment. The aim of the study was to see whether or not balloon predilatation is necessary in transcatheter aortic valve replacement (TAVI).', 'Sixty consecutive TAVI patients were randomized to the standard procedure or to a protocol where balloon predilatation was omitted.', 'There were no significant differences between the groups regarding early hemodynamic results or complication rates.'], 'labels': ['BACKGROUND', 'METHODS', 'RESULTS'], 'meshes': ['Adult', 'Aged', 'Aged, 80 and over', 'Aortic Valve', 'Aortic Valve Stenosis', 'Balloon Valvuloplasty', 'Cardiac Catheterization', 'Dilatation', 'Female', 'Hemodynamics', 'Humans', 'Male', 'Middle Aged', 'Preoperative Care', 'Prospective Studies', 'Transcatheter Aortic Valve Replacement', 'Treatment Outcome'], 'reasoning

In [16]:
import torch
from transformers import AutoTokenizer, GPT2ForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialogRPT-updown")
model = GPT2ForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown", num_labels=3, ignore_mismatched_sizes=True)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at microsoft/DialogRPT-updown and are newly initialized because the shapes did not match:
- score.weight: found shape torch.Size([1, 1024]) in the checkpoint and torch.Size([3, 1024]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [19]:
from datasets import load_dataset

# Load the dataset
ds = load_dataset("qiaojin/PubMedQA", "pqa_labeled")

# Preprocess the dataset
def preprocess_function(examples):
    inputs = [f"Context: {context} Question: {question}" for question, context in zip(examples['question'], examples['context'])]
    targets = examples['final_decision']

    # Expanded label mapping to handle additional cases
    label_mapping = {"yes": 0, "no": 1, "uncertain": 2, "maybe": 2}

    # Handle missing or unexpected labels gracefully
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = [label_mapping.get(target.lower(), 2) for target in targets]  # Default to 'uncertain' for unknown labels

    return model_inputs

# Select a small subset of the dataset for demonstration
small_ds = ds['train'].select(range(100))
tokenized_ds = small_ds.map(preprocess_function, batched=True)

train_size = 60
val_size = 20
test_size = 20

train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(
    tokenized_ds, [train_size, val_size, test_size]
)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    num_train_epochs=3,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    weight_decay=0.01,
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
)

# Train the model
trainer.train()

# Evaluate the model
metrics = trainer.evaluate(eval_dataset=test_dataset)
print("Test set evaluation:", metrics)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]



Epoch,Training Loss,Validation Loss
1,No log,0.701938
2,No log,0.809778
3,No log,1.089404


Test set evaluation: {'eval_loss': 2.6778724193573, 'eval_runtime': 2.215, 'eval_samples_per_second': 9.03, 'eval_steps_per_second': 9.03, 'epoch': 3.0}


In [18]:
# Define the function to predict using the trained model
def predict_answer(question, context):
    input_text = f"Context: {context} Question: {question}"
    inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True, padding=True)
    inputs = {key: val.to(model.device) for key, val in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=1).item()

    # Map back to "yes", "no", "uncertain"
    label_mapping = {0: "yes", 1: "no", 2: "uncertain", 2: "maybe"}
    return label_mapping[predicted_class]

# Evaluate the Q&A performance
num_correct = 0

for example in test_dataset:
    question = example['question']
    context = example['context']
    true_answer = example['final_decision']
    predicted_answer = predict_answer(question, context)

    print(f"Question: {question}")
    print(f"Context: {context}")
    print(f"True Answer: {true_answer}")
    print(f"Predicted Answer: {predicted_answer}")

    if true_answer.lower() == predicted_answer:
        num_correct += 1

    print("="*80)

# Print accuracy
print(f"Accuracy: {num_correct / len(test_dataset)}")


Question: Can you deliver accurate tidal volume by manual resuscitator?
Context: {'contexts': ['One of the problems with manual resuscitators is the difficulty in achieving accurate volume delivery. The volume delivered to the patient varies by the physical characteristics of the person and method. This study was designed to compare tidal volumes delivered by the squeezing method, physical characteristics and education and practice levels.', '114 individuals trained in basic life support and bag-valve-mask ventilation participated in this study. Individual characteristics were obtained by the observer and the education and practice level were described by the subjects. Ventilation was delivered with a manual resuscitator connected to a microspirometer and volumes were measured. Subjects completed three procedures: one-handed, two-handed and two-handed half-compression.', 'The mean (standard deviation) volumes for the one-handed method were 592.84 ml (SD 117.39), two-handed 644.24 ml (S