# Lab 6: BitFit - Fine-Tuning a BERT Model by Only Training Bias Terms
---
## Notebook 3: Inference

**Goal:** In this notebook, you will load the fine-tuned BERT model from the BitFit training process and use it to make predictions.

**You will learn to:**
-   Load a standard `transformers` model directly from a training checkpoint.
-   Perform inference on new data.


### Step 1: Reload Model from Checkpoint

Because BitFit doesn't use a separate adapter model, the fine-tuned parameters (the biases and classifier weights) are saved as part of the main model checkpoints. Therefore, we can load the model directly using `AutoModelForSequenceClassification.from_pretrained` and point it to the best checkpoint saved by the `Trainer`. We do **not** need the `peft` library for inference.


In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import os

# --- Path to the best checkpoint ---
output_dir = "./bert-bitfit-mrpc"
latest_checkpoint = max(
    [os.path.join(output_dir, d) for d in os.listdir(output_dir) if d.startswith("checkpoint-")],
    key=os.path.getmtime
)
print(f"Loading model from: {latest_checkpoint}")

# --- Load Fine-tuned Model and Tokenizer ---
inference_model = AutoModelForSequenceClassification.from_pretrained(latest_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(latest_checkpoint)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inference_model.to(device)
inference_model.eval()

print("✅ BitFit fine-tuned model loaded successfully!")


### Step 2: Perform Inference

The inference process is identical to that of a fully fine-tuned model. We will use the same prediction function from the Adapter Layers lab.


In [None]:
import torch.nn.functional as F

# Define the labels
id2label = {0: "Not a Paraphrase", 1: "Is a Paraphrase"}

def predict_paraphrase(sentence1, sentence2):
    inputs = tokenizer(sentence1, sentence2, return_tensors="pt", truncation=True, padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = inference_model(**inputs)
    
    logits = outputs.logits
    probabilities = F.softmax(logits, dim=1).cpu().numpy()[0]
    prediction = torch.argmax(logits, dim=-1).cpu().item()
    
    print(f"Sentence 1: '{sentence1}'")
    print(f"Sentence 2: '{sentence2}'")
    print(f"Prediction: {id2label[prediction]}")
    print(f"Probabilities:")
    print(f"  - {id2label[0]}: {probabilities[0]:.4f}")
    print(f"  - {id2label[1]}: {probabilities[1]:.4f}")

# --- Test Cases ---
print("--- Test Case 1 (Should be a paraphrase) ---")
predict_paraphrase(
    "The company said the merger was subject to the approval of its shareholders.",
    "The company said the deal was subject to the approval of its shareholders."
)

print("\n--- Test Case 2 (Should NOT be a paraphrase) ---")
predict_paraphrase(
    "The cat sat on the mat.",
    "The dog played in the garden."
)


---
### Lab Conclusion

This concludes the BitFit lab. You have successfully implemented a PEFT method manually by controlling the `requires_grad` status of model parameters. This approach highlights the flexibility of libraries like `transformers` and provides a deeper understanding of what PEFT methods are accomplishing under the hood.

**All planned PEFT labs are now complete!**
