<a href="https://colab.research.google.com/github/arielzamir/qwen2.5-finetuned-legal-assistant/blob/main/legal_assistant_qwen_finetuned.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Legal Assistant - Qwen2.5 Fine-Tuned with LoRA

This notebook demonstrates how to fine-tune the **Qwen2.5-1.5B-Instruct** model using **LoRA (Low-Rank Adaptation)** on a legal dataset.  
We use the Hugging Face ecosystem with `transformers`, `trl`, `peft`, and `datasets`, along with **Weights & Biases (wandb)** for experiment tracking.  


##Install Dependencies

We install all the necessary libraries:
- **bitsandbytes** → 8-bit optimizers for efficient training  
- **transformers** → Hugging Face model APIs  
- **accelerate** → handles multi-GPU / mixed precision training  
- **peft** → lightweight fine-tuning with LoRA  
- **trl** → supervised fine-tuning (SFT) utilities  
- **datasets** → loading and processing datasets

In [None]:
!pip -q install -U bitsandbytes transformers accelerate peft trl datasets

##Import Libraries

We import the core libraries:  
- `datasets` → load datasets easily from Hugging Face Hub  
- `transformers` → tokenizer + base model  
- `trl` → SFTTrainer for fine-tuning  
- `peft` → LoRA configs and model wrapping  
- `huggingface_hub` → authentication for pushing models  
- `wandb` → experiment tracking  
- `torch` → PyTorch backend  

In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import SFTConfig, SFTTrainer, setup_chat_format
from peft import LoraConfig, get_peft_model
from huggingface_hub import login
import wandb
import torch

##Authentication

Here we log into:
- **Hugging Face Hub** → for downloading models and pushing trained adapters  
- **Weights & Biases** → to track metrics, losses, and experiment runs  

In [None]:
login()
wandb.login()

##Dataset Preparation  

We use the **CUAD (Contract Understanding Atticus Dataset)** legal QA dataset.  
Each question-answer pair is converted into a **chat format** with roles:  
- `system` → defines assistant behavior  
- `user` → the question (legal contract query)  
- `assistant` → the answer  

This ensures the dataset matches the **instruction-tuned format** required by Qwen2.5.  

In [None]:
def convert_to_chat(example):
    ans = example.get("answers", {}).get("text", [])
    answer = ans[0].strip() if len(ans) > 0 else ""
    return {
        "messages": [
            {"role":"system","content":"You are a helpful assistant."},
            {"role":"user","content": example["question"]},
            {"role":"assistant","content": answer},
        ]
    }

##Load the Dataset

In [None]:
dataset = load_dataset("chenghao/cuad_qa")
dataset = dataset.map(convert_to_chat, remove_columns=dataset["train"].column_names)

##Load Base Model & Tokenizer  

We load the **Qwen2.5-1.5B-Instruct** model and tokenizer.  
- If the tokenizer has no `pad_token`, we assign it to the EOS token.  
- The model is loaded in **bfloat16/float16** automatically if GPU supports it.  
- Device mapping is set to `"auto"` so `accelerate` decides GPU/CPU placement.  

In [None]:
model_name = "Qwen/Qwen2.5-1.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
  tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype = "auto",
    attn_implementation="sdpa",
)

##Configure LoRA  

We apply **LoRA (Low-Rank Adaptation)** for efficient fine-tuning.  
Key parameters:  
- `r=16` → rank (controls size of LoRA updates)  
- `lora_alpha=32` → scaling factor for updates  
- `target_modules=["q_proj","v_proj"]` → which layers LoRA adapts  
- `lora_dropout=0.05` → regularization  
- `bias="none"` → no bias terms are trained  
- `task_type="CAUSAL_LM"` → language modeling  

This keeps most of the base model **frozen** and trains only small adapter layers.  

In [None]:
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)

##Training Configuration


Here we define the **training arguments** for supervised fine-tuning:  

- `output_dir="./legal-assistant"` → where to save checkpoints  
- `per_device_train_batch_size=1` → batch size per GPU  
- `gradient_accumulation_steps=8` → simulates a larger batch size  
- `packing=True` → packs multiple short samples into one sequence for efficiency  
- `num_train_epochs=2` → number of full dataset passes  
- `learning_rate=1e-4` → initial learning rate  
- `lr_scheduler_type="cosine"` → cosine decay schedule  
- `warmup_ratio=0.03` → warmup phase for stable training  
- `logging_steps=10` → log metrics every 10 steps  
- `save_strategy="epoch"` → save checkpoint every epoch  
- `fp16=True` → use mixed precision (faster + less memory)  
- `gradient_checking=True` → reduce memory usage with checkpointing  
- `push_to_hub=True` → push final model to Hugging Face Hub  
- `hub_model_id="ArielZamir23/legal-assistant-qwen2_5-1_5b-lora"` → repo name on Hugging Face Hub  
- `hub_strategy="every_save"` → push every checkpoint  
- `report_to="wandb"` → log training metrics to Weights & Biases  

In [None]:
training_args = SFTConfig(
    output_dir="./legal-assistant",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    packing=True,
    num_train_epochs=2,
    learning_rate=1e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.03,
    logging_steps=10,
    save_strategy="epoch",
    fp16=True,
    gradient_checkpointing=True,
    push_to_hub=True,
    hub_model_id="ArielZamir23/legal-assistant-qwen2_5-1_5b-lora",
    hub_strategy="every_save",
    report_to="wandb"
)

##Initialize Weights & Biases (wandb)  

We initialize a new **wandb run** to track training metrics:  
- `project="legal-assistant"` → experiment project name  
- `name="qwen2.5-1.5b-lora-cuad"` → specific run name  

This lets us monitor:  
- Training loss  
- Learning rate schedule  
- GPU usage and runtime  
- Checkpoint saving  

In [None]:
wandb.init(project="legal-assistant", name="qwen2.5-1.5b-lora-cuad")

##Start Training with SFTTrainer  

We create an `SFTTrainer` that will:  
- Use our model + tokenizer  
- Train on the prepared `cuad_qa` dataset  
- Apply the training configuration defined earlier  

The trainer handles everything automatically:  
- Forward & backward pass  
- Optimizer updates  
- Loss logging  
- Saving checkpoints  
- Pushing to Hugging Face Hub  

In [None]:
trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    train_dataset=dataset["train"],
    args=training_args
)

In [None]:
trainer.train()

##Inference Example (Quick Start)

Once the model is fine-tuned, we can use it for **legal question answering**.  
Below we load the model with `pipeline` from 🤗 Transformers and ask a **domain-specific question**:  

**Example Question:**  
👉 *"What is the termination clause in this contract?"*

The model responds with a legally styled answer extracted/generated from the training domain.

In [None]:
from transformers import pipeline

question = "What is the termination clause in this contract?"
generator = pipeline("text-generation", model="ArielZamir23/legal-assistant-qwen2_5-1_5b-lora")

output = generator([{"role": "user", "content": question}], max_new_tokens=256)
print(output[0]["generated_text"])