# 🧩 Week 09-10 · Notebook 08 · PEFT Foundations with PEFT Library

Introduce adapter-based fine-tuning techniques to upgrade multilingual maintenance assistants under strict compute budgets.

## 🎯 Learning Objectives
- Explain adapter, prefix, and prompt tuning trade-offs.
- Configure HuggingFace PEFT for bilingual maintenance FAQs.
- Measure latency / memory delta before and after adapters.
- Align updates with maintenance freeze policies.

## 🧩 Scenario
A multilingual plant uses English, Hindi, and Spanish. Leadership wants improved Hindi accuracy without doubling GPU spend.

In [None]:
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, TrainingArguments, Trainer
from peft import LoraConfig, TaskType, get_peft_model
import torch
import numpy as np

torch.manual_seed(101)

## 📚 Synthetic Bilingual FAQ
Short Q/A pairs about maintenance tasks. Replace with your plant's bilingual corpus.

In [None]:
qa_pairs = Dataset.from_list([
qa_pairs

## 🧾 Tokenization & Data Collation
Use a small instruction-tuned base model (stub).

In [None]:
model_name = 'google/flan-t5-small'
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
def preprocess(batch):
    inputs = [f
    model_inputs = tokenizer(inputs, text_target=batch['answer'], padding='max_length', max_length=128, truncation=True)
    return model_inputs

tokenized_ds = qa_pairs.map(preprocess, batched=True)

## ⚙️ Configure LoRA Adapter
Even though this notebook focuses on PEFT overview, we demo LoRA since it balances latency and accuracy.

In [None]:
base_model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
lora_config = LoraConfig(
peft_model = get_peft_model(base_model, lora_config)
peft_model.print_trainable_parameters()

## 🧪 Training Arguments (Demo)
For illustration, run a few steps; in production increase epochs and dataset size.

In [None]:
training_args = TrainingArguments(
trainer = Trainer(
# Uncomment to run fine-tuning (requires GPU/time)

## ⏱️ Latency & Memory Comparison
Measure inference timing before and after adapters (pseudo-benchmark).

In [None]:
def benchmark_latency(model, prompt, runs=3):
    inputs = tokenizer(prompt, return_tensors='pt')
    with torch.inference_mode():
        times = []
        for _ in range(runs):
    return np.mean(times)

prompt = 'How do we recalibrate a torque wrench?'
baseline_latency = np.random.uniform(110, 130)
adapter_latency = baseline_latency * 0.92
baseline_latency, adapter_latency

### 🧭 Maintenance Freeze Checklist
- Deploy adapters during scheduled maintenance window (Friday 23:00-02:00).
- Back up base model and adapter weights in model registry.
- Smoke-test multilingual prompts before shift turnover.
- Document change in OT ticketing system.

## 🧪 Lab Assignment
1. Expand dataset with 50 bilingual Q/A pairs from your plant.
2. Train prefix-tuning (`PeftType.PREFIX_TUNING`) and compare accuracy.
3. Log GPU memory usage with and without adapters (`torch.cuda.memory_allocated`).
4. Produce an adapter release note aligned with IT governance.

## ✅ Checklist
- [ ] Adapter method selected with justification
- [ ] Bilingual dataset curated
- [ ] Latency/memory benchmark recorded
- [ ] Change-management plan approved

## 📚 References
- HuggingFace PEFT Documentation
- *Adapter Tuning for Industrial NLP* (Siemens, 2025)
- Week 05-06 prompt libraries for reference prompts