<a href="https://colab.research.google.com/github/alex-smith-uwec/NLP_Spring2025/blob/main/Template_Medical_Questions_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  NLP Assignment: Fine-Tuning a Transformer on Medical Question Pairs

In this assignment, you will fine-tune a transformer model to classify whether pairs of medical questions are paraphrases of each other.

Once you have everyting in place and before training, you should restart and change the runtime to TPU.

In [None]:
# Install necessary libraries
!pip install transformers datasets -q

In [None]:
##TODO: set random seed to your Blugold ID
seed=##
##Enter your name here:

## Step 1: Load the Dataset
Use the `datasets` library to load the [curaihealth medical_questions_pairs](https://huggingface.co/datasets/curaihealth/medical_questions_pairs) dataset.

In [None]:
from datasets import load_dataset
# TODO: Load the dataset

dataset = ##
dataset

##  Train/Validation/Test Split
The `medical_questions_pairs` dataset only provides a single training set. You need to create your own train, validation, and test sets.

We'll split the dataset into:
- **Train:** 80%
- **Validation:** 10%
- **Test:** 10%

Use `train_test_split` from the `datasets` library to do this.

In [None]:
from datasets import DatasetDict

# Step 1: Split into train + temp (val + test)
temp_split = dataset['train'].train_test_split(test_size=0.2, seed=seed)

# Step 2: Split temp into validation + test (50/50 of temp = 10% each)
val_test_split = temp_split['test'].train_test_split(test_size=0.5, seed=seed)

# Step 3: Combine splits into a DatasetDict
split_dataset = DatasetDict({
    'train': temp_split['train'],
    'validation': val_test_split['train'],
    'test': val_test_split['test']
})

split_dataset

In [None]:
## TODO: find an index  so that the corresponding validation question pair has label 0
idx_0= ##
split_dataset['validation'][idx_0]


In [None]:
## TODO: find an index so that the corresponding validation question pair has label 1
idx_1= ##
split_dataset['validation'][idx_1]

## Step 2: Explore and Preprocess
Examine the fields. Tokenize question pairs using a pretrained tokenizer.

In [None]:
from transformers import AutoTokenizer

 ##Choose a model checkpoint
checkpoint = 'microsoft/MiniLM-L12-H384-uncased'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

## Tokenization function
def tokenize_fn(example):
    return tokenizer(example['question_1'], example['question_2'], truncation=True, padding='max_length',max_length=256)

##Apply to dataset
tokenized = split_dataset.map(tokenize_fn, batched=True)
tokenized

In [None]:
print(tokenized['train'][0])

In [None]:
example_question_1 = "What are the signs of having frostbite?"
example_question_2 = "What exactly is the treatment for frostbite?"

tokenized_example = tokenizer(example_question_1, example_question_2, truncation=True, padding='max_length', max_length=256)

print(f"Tokens: {tokenizer.convert_ids_to_tokens(tokenized_example['input_ids'])}")


## Step 3: Load Model
Load a model for sequence classification.

In [None]:
from transformers import AutoModelForSequenceClassification

# TODO: Define the model from the checkpoint with correct number of labels
model = ##

## Step 4: Define Training Arguments
Use Hugging Face `TrainingArguments` to configure training.

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    logging_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    report_to="none"
)

## Step 5: Define Trainer
Set up the `Trainer` object and begin training.

In [None]:
import numpy as np
from sklearn.metrics import accuracy_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=1)
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc}

In [None]:
from transformers import Trainer

##Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized['train'],
    eval_dataset=tokenized['validation'],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

In [None]:

# Start training
trainer.train()

## Step 6: Evaluation
Evaluate and inspect results.

In [None]:
##Evaluate the model
metrics = trainer.evaluate()
print(metrics)

In [None]:
# Save locally first
model.save_pretrained("medical-question-model")
tokenizer.save_pretrained("medical-question-model")

# Push to hub
model.push_to_hub("alex-smith/medical-question-model")
tokenizer.push_to_hub("alex-smith/medical-question-model")

## Training Accuracy
Now that training is complete, let's evaluate the model on the training set to report training accuracy.

In [None]:
# Evaluate on training data
train_metrics = trainer.evaluate(tokenized["train"])
print(f"Training Accuracy: {train_metrics['eval_accuracy']:.4f}")

# Evaluate Custom Question Pairs

Use this section to test your fine-tuned model on your own question pairs. This is useful for exploring how well the model generalizes to new examples outside the training set.

In [None]:
# TODO: replace "your-username" below with your huggingface user name
model_id = "your-username/medical-question-model"


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F


def evaluate_question_pair(question1, question2, model_id=model_id):
    # Load model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSequenceClassification.from_pretrained(model_id)
    model.eval()

    # Determine max length
    max_len = getattr(model.config, "max_position_embeddings", 512)

    # Tokenize with explicit max_length
    inputs = tokenizer(
        question1, question2,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=max_len
    )

    # Run through model
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = F.softmax(logits, dim=1)

    predicted_class = torch.argmax(probs).item()
    confidence = probs[0][predicted_class].item()
    label_name = model.config.id2label.get(predicted_class, str(predicted_class))

    print(f"Q1: {question1}")
    print(f"Q2: {question2}")
    print(f"Predicted Label: {label_name} (Confidence: {confidence:.4f})")

    return predicted_class, confidence


In [None]:
evaluate_question_pair(
    "What are the symptoms of anemia?",
    "Can being tired all the time mean I have anemia?"
)
