# Domain-Specific Assistant: Agriculture QA via LLM Fine-Tuning

This notebook fine-tunes **TinyLlama-1.1B-Chat** on the **sowmya14/agriculture_QA** dataset using **LoRA (PEFT)** on Google Colab. It includes data preprocessing, training, evaluation (BLEU, ROUGE, perplexity), and a Gradio UI.

**Model**: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)  
**Dataset**: [sowmya14/agriculture_QA](https://huggingface.co/datasets/sowmya14/agriculture_QA)

## 1. Install dependencies

Run this cell first (Colab: Runtime â†’ Change runtime type â†’ GPU).

In [1]:
!pip install -q transformers datasets peft accelerate bitsandbytes evaluate nltk gradio


[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## 2. Imports and configuration

In [2]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForSeq2Seq,
    BitsAndBytesConfig,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import re

# Config (edit for experiments)
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
DATASET_ID = "sowmya14/agriculture_QA"
MAX_SEQ_LENGTH = 512
OUTPUT_DIR = "./agriculture_assistant_lora"
USE_4BIT = True  # set False if you have enough GPU memory

  from .autonotebook import tqdm as notebook_tqdm


## 3. Load and inspect dataset

In [3]:
ds = load_dataset(DATASET_ID)
print("Splits:", list(ds.keys()))
split = "train" if "train" in ds else list(ds.keys())[0]
d = ds[split]
print("Columns:", d.column_names)
print("Num examples:", len(d))
print("Sample row:", d[0])

Splits: ['train']
Columns: ['questions', 'answers']
Num examples: 999
Sample row: {'questions': 'asking about the control measure for aphid infestation in mustard crops', 'answers': 'suggested him to spray rogor@2ml/lit.at evening time.'}


## 4. Preprocessing: normalize and format as instructionâ€“response

We map dataset columns to a standard `instruction` / `response` format, normalize text, and keep sequences within the model context length.

In [4]:
def normalize_text(text):
    if not text or not isinstance(text, str):
        return ""
    text = re.sub(r"\s+", " ", text).strip()
    return text

def get_qa_columns(dataset):
    cols = dataset.column_names
    q_col = None
    a_col = None
    for c in cols:
        lower = c.lower()
        if lower in ("question", "questions", "input", "query"):
            q_col = c
        if lower in ("answer", "answers", "output", "response"):
            a_col = c
    if q_col is None:
        q_col = cols[0]
    if a_col is None:
        a_col = cols[1] if len(cols) > 1 else cols[0]
    return q_col, a_col

def format_instruction_response(example, q_col, a_col):
    q = normalize_text(example.get(q_col, ""))
    a = normalize_text(example.get(a_col, ""))
    instruction = f"You are an agriculture assistant. Answer the following question.\n\nQuestion: {q}"
    return {"instruction": instruction, "response": a}

q_col, a_col = get_qa_columns(ds[split])
print(f"Using question column: '{q_col}', answer column: '{a_col}'")

def map_to_instruction_response(examples):
    out = {"instruction": [], "response": []}
    for i in range(len(examples[q_col])):
        ex = {k: v[i] for k, v in examples.items()}
        formatted = format_instruction_response(ex, q_col, a_col)
        if formatted["instruction"] and formatted["response"]:
            out["instruction"].append(formatted["instruction"])
            out["response"].append(formatted["response"])
    return out

ds_qa = ds[split].map(map_to_instruction_response, batched=True, remove_columns=ds[split].column_names)
ds_qa = ds_qa.filter(lambda x: len(x["instruction"]) > 0 and len(x["response"]) > 0)
print("Formatted examples:", len(ds_qa))
print("Sample:", ds_qa[0])

Using question column: 'questions', answer column: 'answers'
Formatted examples: 999
Sample: {'instruction': 'You are an agriculture assistant. Answer the following question.\n\nQuestion: asking about the control measure for aphid infestation in mustard crops', 'response': 'suggested him to spray rogor@2ml/lit.at evening time.'}


*(Perplexity is computed in **Section 8** after training.)*

## 5. Tokenization and train/validation split

In [5]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Use chat template if available; otherwise simple concatenation
def tokenize_function(examples):
    texts = []
    for inst, resp in zip(examples["instruction"], examples["response"]):
        # TinyLlama chat format: <|system|>...<|user|>...<|assistant|>...
        text = f"<|system|>\nYou are an agriculture assistant.\n<|user|>\n{inst}\n<|assistant|>\n{resp}"
        texts.append(text)
    out = tokenizer(
        texts,
        truncation=True,
        max_length=MAX_SEQ_LENGTH,
        padding="max_length",
        return_tensors=None,
    )
    out["labels"] = [list(x) for x in out["input_ids"]]
    return out

ds_split = ds_qa.train_test_split(test_size=0.1, seed=42)
train_ds = ds_split["train"].map(tokenize_function, batched=True, remove_columns=["instruction", "response"])
eval_ds = ds_split["test"].map(tokenize_function, batched=True, remove_columns=["instruction", "response"])
train_ds.set_format("torch")
eval_ds.set_format("torch")
print("Train size:", len(train_ds), "Eval size:", len(eval_ds))

Train size: 899 Eval size: 100


## 6. Load base model and apply LoRA (PEFT)

In [6]:
compute_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

if USE_4BIT:
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
    )
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        quantization_config=quantization_config,
        device_map="auto",
        trust_remote_code=True,
    )
else:
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        device_map="auto",
        trust_remote_code=True,
        dtype=compute_dtype,
    )

model = prepare_model_for_kbit_training(model) if USE_4BIT else model

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Loading weights: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 201/201 [02:23<00:00,  1.40it/s, Materializing param=model.norm.weight]                              


trainable params: 2,252,800 || all params: 1,102,301,184 || trainable%: 0.2044


## 7. Training

In [None]:
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, pad_to_multiple_of=8, return_tensors="pt")

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=2,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=5e-5,
    warmup_ratio=0.05,
    logging_steps=25,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    report_to="none",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    data_collator=data_collator,
)

trainer.train()
trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
  super().__init__(loader)
  batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
  return fn(*args, **kwargs)


## 8. Evaluation: ROUGE, BLEU, and qualitative check

In [None]:
from evaluate import load as load_metric
import numpy as np
import nltk
nltk.download("punkt", quiet=True)

rouge = load_metric("rouge")
bleu = load_metric("bleu")

def generate_response(model, tokenizer, instruction, max_new_tokens=128):
    prompt = f"<|system|>\nYou are an agriculture assistant.\n<|user|>\n{instruction}\n<|assistant|>\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, pad_token_id=tokenizer.eos_token_id)
    reply = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return reply.strip()

eval_sample = min(50, len(eval_ds))
references = []
predictions = []
for i in range(eval_sample):
    ex = ds_split["test"][i]
    ref = ex["response"]
    pred = generate_response(model, tokenizer, ex["instruction"])
    references.append(ref)
    predictions.append(pred)

rouge_result = rouge.compute(predictions=predictions, references=references)
print("ROUGE:", rouge_result)

# BLEU expects list of strings and list of list of strings
refs_bleu = [[r] for r in references]
bleu_result = bleu.compute(predictions=predictions, references=refs_bleu)
print("BLEU:", bleu_result)

# Perplexity = exp(eval_loss)
eval_metrics = trainer.evaluate()
eval_loss = eval_metrics.get("eval_loss", float("nan"))
perplexity = np.exp(eval_loss) if isinstance(eval_loss, (int, float)) else float("nan")
print("Eval loss:", eval_loss, "| Perplexity:", perplexity)

## 9. Experiment table (fill with your runs)

After each training run, copy **Val loss**, **ROUGE-L**, **BLEU**, **Training time**, and **GPU memory** from the cell outputs above into the table. Document at least 2â€“3 experiments (e.g. different LR, epochs, or LoRA rank) for the report.

| Exp | LR | Batch | Epochs | LoRA r | Val loss | ROUGE-L | BLEU | Time (min) | GPU mem (GB) | Notes |
|-----|-----|-------|--------|--------|----------|---------|------|------------|--------------|------|
| 1   | 5e-5 | 2 (acc 4) | 2 | 8 | â€” | â€” | â€” | â€” | â€” | Default |
| 2   | 1e-4 | 2 (acc 4) | 2 | 8 | â€” | â€” | â€” | â€” | â€” | Higher LR |
| 3   | 5e-5 | 4 (acc 2) | 3 | 16 | â€” | â€” | â€” | â€” | â€” | Larger LoRA, more epochs |

## 8b. Base vs fine-tuned comparison (for report and demo)

The assignment requires comparing the **base pre-trained model** with the **fine-tuned** model. Run the cell below to get responses from both on the same questions. Use this output in your report and demo video.

In [None]:
# Compare base (no LoRA) vs fine-tuned. If OOM, set NUM_COMPARE = 2.
NUM_COMPARE = 5
compute_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
if USE_4BIT:
    base_model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=compute_dtype,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
        ),
        device_map="auto",
        trust_remote_code=True,
    )
else:
    base_model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID, device_map="auto", trust_remote_code=True, torch_dtype=compute_dtype
    )
base_model.eval()

compare_instructions = [ds_split["test"][i]["instruction"] for i in range(min(NUM_COMPARE, len(ds_split["test"])))]
compare_references = [ds_split["test"][i]["response"] for i in range(min(NUM_COMPARE, len(ds_split["test"])))]
base_responses = [generate_response(base_model, tokenizer, inst) for inst in compare_instructions]
finetuned_responses = list(predictions[:NUM_COMPARE])

del base_model
if torch.cuda.is_available():
    torch.cuda.empty_cache()

print("=" * 80)
print("BASE vs FINE-TUNED (use in report and demo video)")
print("=" * 80)
for i in range(len(compare_instructions)):
    q = compare_instructions[i].split("Question:")[-1].strip()[:80]
    print(f"\n--- Example {i+1} ---\nQuestion: {q}...")
    print(f"Reference:   {compare_references[i][:180]}...")
    print(f"Base:        {base_responses[i][:180]}...")
    print(f"Fine-tuned:  {finetuned_responses[i][:180]}...")
print("\n" + "=" * 80)

## 10. Gradio UI

In [None]:
import gradio as gr

def chat(user_input, history=None):
    if history is None:
        history = []
    instruction = f"You are an agriculture assistant. Answer the following question.\n\nQuestion: {user_input}"
    reply = generate_response(model, tokenizer, instruction)
    history.append((user_input, reply))
    return history, history

with gr.Blocks(title="Agriculture Assistant", theme=gr.themes.Soft()) as demo:
    gr.Markdown("""## ðŸŒ¾ Agriculture QA Assistant\n\n**Instructions:** Type your agriculture-related question in the box below and click **Submit** (or press Enter). The fine-tuned model will answer. Use **Clear** to start a new conversation. For your demo video, try in-domain questions (e.g. pests, crops, soil) and optionally an out-of-domain question to show the model stays on topic.\n\nModel fine-tuned on [sowmya14/agriculture_QA](https://huggingface.co/datasets/sowmya14/agriculture_QA).""")
    chatbot = gr.Chatbot(label="Chat")
    msg = gr.Textbox(placeholder="e.g. What are the best practices for soil preparation?", label="Your question")
    submit = gr.Button("Submit")
    clear = gr.Button("Clear")
    state = gr.State([])

    def submit_fn(msg, history):
        if not msg.strip():
            return history, history
        _, new_history = chat(msg, history)
        return new_history, new_history

    submit.click(submit_fn, [msg, state], [chatbot, state])
    clear.click(lambda: ([], []), None, [chatbot, state])
    msg.submit(submit_fn, [msg, state], [chatbot, state])

# In Colab the app appears below; share=True gives a public URL for your demo video.
demo.launch(share=True)