# Lab 5: (IA)³ - Fine-Tuning a GPT-2 Model with Extreme Parameter Efficiency
---
## Notebook 3: Inference

**Goal:** In this notebook, you will load the trained (IA)³ adapter and use the fine-tuned GPT-2 model to generate positive movie reviews.

**You will learn to:**
-   Reload the base GPT-2 model.
-   Load the trained (IA)³ adapter from a checkpoint.
-   Generate text from a prompt using the fine-tuned model.


### Step 1: Reload Model and Adapter

We will load the base `gpt2` model and then apply our trained (IA)³ weights on top of it.


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
import os

# --- Load Base Model and Tokenizer ---
model_checkpoint = "gpt2"
base_model = AutoModelForCausalLM.from_pretrained(model_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

# --- Load PEFT Adapter ---
output_dir = "./gpt2-ia3-imdb"
latest_checkpoint = max(
    [os.path.join(output_dir, d) for d in os.listdir(output_dir) if d.startswith("checkpoint-")],
    key=os.path.getmtime
)
print(f"Loading adapter from: {latest_checkpoint}")

inference_model = PeftModel.from_pretrained(base_model, latest_checkpoint)

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inference_model.to(device)
inference_model.eval()

print("✅ Inference model loaded successfully!")


### Step 2: Perform Inference

Let's test the model with the same prompt we used in the Prefix Tuning lab to see how well this extremely lightweight method performs.


In [None]:
# Prepare the prompt
prompt = "This movie was absolutely"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

# Generate the text
with torch.no_grad():
    outputs = inference_model.generate(
        input_ids=inputs["input_ids"],
        max_new_tokens=50,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        num_return_sequences=3
    )

# Decode and print the generated text
print("--- Prompt ---")
print(prompt)
print("\n--- Generated Reviews (with IA³) ---")
for i, output in enumerate(outputs):
    print(f"{i+1}: {tokenizer.decode(output, skip_special_tokens=True)}")


---
### Lab Conclusion

This concludes the (IA)³ lab and the main series of PEFT labs. You have now implemented five distinct parameter-efficient fine-tuning methods and have seen how to configure and apply them using the Hugging Face `peft` and `transformers` libraries.

You should now have a strong practical understanding of the trade-offs between these different techniques in terms of parameter efficiency, architectural changes, and configuration complexity.
