# Domain-Specific Assistant: Agriculture QA via LLM Fine-Tuning

**Project definition & domain**  
- **Purpose**: Build a domain-specific assistant that answers agriculture-related questions (crops, soil, pests, practices) accurately and in a consistent style.  
- **Domain**: Agriculture. The assistant is intended for farmers, students, and practitioners who need reliable, in-domain answers.  
- **Relevance**: Fine-tuning a general-purpose LLM on agriculture QA improves answer quality and relevance for this domain compared to using the base model as-is.

This notebook fine-tunes **TinyLlama-1.1B-Chat** on the **sowmya14/agriculture_QA** dataset using **LoRA (PEFT)**. It covers the full pipeline: data preprocessing, model training with PEFT, evaluation (BLEU, ROUGE, perplexity), base vs fine-tuned comparison, and a Gradio UI. Designed to run end-to-end on Google Colab with minimal setup.

**Model**: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)  
**Dataset**: [sowmya14/agriculture_QA](https://huggingface.co/datasets/sowmya14/agriculture_QA)

## 1. Install dependencies

Run this cell first (Colab: Runtime â†’ Change runtime type â†’ GPU).

In [1]:
%uv pip install -q transformers datasets peft accelerate bitsandbytes evaluate nltk gradio pandas

Note: you may need to restart the kernel to use updated packages.


c:\Users\awini\AppData\Local\Programs\Python\Python313\python.exe: No module named uv


## 2. Imports and configuration

In [2]:
import os
import time
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForSeq2Seq,
    BitsAndBytesConfig,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
import re

# Config (edit for experiments)
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
DATASET_ID = "sowmya14/agriculture_QA"
MAX_SEQ_LENGTH = 512
OUTPUT_DIR = "./agriculture_assistant_lora"
USE_4BIT = False  # set True for 4-bit (needs bitsandbytes>=0.46.1); False runs without it on Colab

  from .autonotebook import tqdm as notebook_tqdm


## 3. Load and inspect dataset

In [3]:
ds = load_dataset(DATASET_ID)
print("Splits:", list(ds.keys()))
split = "train" if "train" in ds else list(ds.keys())[0]
d = ds[split]
print("Columns:", d.column_names)
print("Num examples:", len(d))
print("Sample row:", d[0])

Splits: ['train']
Columns: ['questions', 'answers']
Num examples: 999
Sample row: {'questions': 'asking about the control measure for aphid infestation in mustard crops', 'answers': 'suggested him to spray rogor@2ml/lit.at evening time.'}


## 4. Preprocessing: normalize and format as instructionâ€“response

We map dataset columns to a standard `instruction` / `response` format, normalize text, and keep sequences within the model context length.

In [4]:
def normalize_text(text):
    if not text or not isinstance(text, str):
        return ""
    text = re.sub(r"\s+", " ", text).strip()
    return text

def get_qa_columns(dataset):
    cols = dataset.column_names
    q_col = None
    a_col = None
    for c in cols:
        lower = c.lower()
        if lower in ("question", "questions", "input", "query"):
            q_col = c
        if lower in ("answer", "answers", "output", "response"):
            a_col = c
    if q_col is None:
        q_col = cols[0]
    if a_col is None:
        a_col = cols[1] if len(cols) > 1 else cols[0]
    return q_col, a_col

def format_instruction_response(example, q_col, a_col):
    q = normalize_text(example.get(q_col, ""))
    a = normalize_text(example.get(a_col, ""))
    instruction = f"You are an agriculture assistant. Answer the following question.\n\nQuestion: {q}"
    return {"instruction": instruction, "response": a}

q_col, a_col = get_qa_columns(ds[split])
print(f"Using question column: '{q_col}', answer column: '{a_col}'")

def map_to_instruction_response(examples):
    out = {"instruction": [], "response": []}
    for i in range(len(examples[q_col])):
        ex = {k: v[i] for k, v in examples.items()}
        formatted = format_instruction_response(ex, q_col, a_col)
        if formatted["instruction"] and formatted["response"]:
            out["instruction"].append(formatted["instruction"])
            out["response"].append(formatted["response"])
    return out

ds_qa = ds[split].map(map_to_instruction_response, batched=True, remove_columns=ds[split].column_names)
ds_qa = ds_qa.filter(lambda x: len(x["instruction"]) > 0 and len(x["response"]) > 0)
print("Formatted examples:", len(ds_qa))
print("Sample:", ds_qa[0])

Using question column: 'questions', answer column: 'answers'
Formatted examples: 999
Sample: {'instruction': 'You are an agriculture assistant. Answer the following question.\n\nQuestion: asking about the control measure for aphid infestation in mustard crops', 'response': 'suggested him to spray rogor@2ml/lit.at evening time.'}


**Preprocessing documentation (rubric: dataset & preprocessing)**  
- **Normalization**: Whitespace collapsed to single spaces and stripped; non-string or empty values handled.  
- **Cleaning**: Rows with empty instruction or response are filtered out so only valid QA pairs are used.  
- **Format**: Each example is turned into a single instructionâ€“response pair with a fixed system prompt; sequences are kept within the model context in the next step (tokenization).

*(Perplexity is computed in **Section 8** after training.)*

## 5. Tokenization and train/validation split

**Tokenization (rubric: appropriate methods)**: We use the base modelâ€™s tokenizer (TinyLlama uses a **BPE/subword** tokenizer), which is appropriate for causal language models. Sequences are tokenized with truncation and padding to `MAX_SEQ_LENGTH` (512) so they fit the modelâ€™s context window. Train/validation split is 90% / 10%.

In [5]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Use chat template if available; otherwise simple concatenation
def tokenize_function(examples):
    texts = []
    for inst, resp in zip(examples["instruction"], examples["response"]):
        # TinyLlama chat format: <|system|>...<|user|>...<|assistant|>...
        text = f"<|system|>\nYou are an agriculture assistant.\n<|user|>\n{inst}\n<|assistant|>\n{resp}"
        texts.append(text)
    out = tokenizer(
        texts,
        truncation=True,
        max_length=MAX_SEQ_LENGTH,
        padding="max_length",
        return_tensors=None,
    )
    out["labels"] = [list(x) for x in out["input_ids"]]
    return out

ds_split = ds_qa.train_test_split(test_size=0.1, seed=42)
train_ds = ds_split["train"].map(tokenize_function, batched=True, remove_columns=["instruction", "response"])
eval_ds = ds_split["test"].map(tokenize_function, batched=True, remove_columns=["instruction", "response"])
train_ds.set_format("torch")
eval_ds.set_format("torch")
print("Train size:", len(train_ds), "Eval size:", len(eval_ds))



Train size: 899 Eval size: 100


## 6. Load base model and apply LoRA (PEFT)

In [6]:
compute_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

if USE_4BIT:
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
    )
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        quantization_config=quantization_config,
        device_map="auto",
        trust_remote_code=True,
    )
else:
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        device_map="auto",
        trust_remote_code=True,
        dtype=compute_dtype,
    )

model = prepare_model_for_kbit_training(model) if USE_4BIT else model

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

Loading weights: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 201/201 [06:54<00:00,  2.06s/it, Materializing param=model.norm.weight]                              


trainable params: 2,252,800 || all params: 1,102,301,184 || trainable%: 0.2044


## 7. Training

**Hyperparameter tuning (rubric)**: Learning rate (e.g. 1e-4 to 5e-5), batch size (2â€“4 with gradient accumulation), and epochs (1â€“3) can be changed in the config and in `TrainingArguments` below. Run multiple experiments and record Val loss, ROUGE-L, BLEU, training time, and GPU memory in the **experiment table** (Section 9).

In [None]:
from transformers import DataCollatorForSeq2Seq

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, pad_to_multiple_of=8, return_tensors="pt")

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=3,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=5e-5,
    warmup_ratio=0.05,
    logging_steps=25,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    bf16=torch.cuda.is_bf16_supported(),
    fp16=not torch.cuda.is_bf16_supported(),
    report_to="none",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    data_collator=data_collator,
)

train_start = time.time()
trainer.train()
train_elapsed_min = (time.time() - train_start) / 60
gpu_mem_gb = round(torch.cuda.max_memory_allocated(0) / 1e9, 2) if torch.cuda.is_available() else None
print(f"Training time: {train_elapsed_min:.1f} min")
if gpu_mem_gb is not None:
    print(f"Max GPU memory: {gpu_mem_gb} GB")

trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.


  super().__init__(loader)
  batch["labels"] = torch.tensor(batch["labels"], dtype=torch.int64)
  return fn(*args, **kwargs)


## 8. Evaluation: ROUGE, BLEU, and qualitative check

In [None]:
%uv pip install rouge_score

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[2mUsing Python 3.12.6 environment at: /usr/local[0m
[2mAudited [1m1 package[0m [2min 10ms[0m[0m
Note: you may need to restart the kernel to use updated packages.


In [None]:
from evaluate import load as load_metric
import numpy as np
import nltk
nltk.download("punkt", quiet=True)

rouge = load_metric("rouge")
bleu = load_metric("bleu")

def generate_response(model, tokenizer, instruction, max_new_tokens=128):
    prompt = f"<|system|>\nYou are an agriculture assistant.\n<|user|>\n{instruction}\n<|assistant|>\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False, pad_token_id=tokenizer.eos_token_id)
    reply = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return reply.strip()

eval_sample = min(50, len(eval_ds))
references = []
predictions = []
for i in range(eval_sample):
    ex = ds_split["test"][i]
    ref = ex["response"]
    pred = generate_response(model, tokenizer, ex["instruction"])
    references.append(ref)
    predictions.append(pred)

rouge_result = rouge.compute(predictions=predictions, references=references)
print("ROUGE:", rouge_result)

# BLEU expects list of strings and list of list of strings
refs_bleu = [[r] for r in references]
bleu_result = bleu.compute(predictions=predictions, references=refs_bleu)
print("BLEU:", bleu_result)

# Perplexity = exp(eval_loss)
eval_metrics = trainer.evaluate()
eval_loss = eval_metrics.get("eval_loss", float("nan"))
perplexity = np.exp(eval_loss) if isinstance(eval_loss, (int, float)) else float("nan")
print("Eval loss:", eval_loss, "| Perplexity:", perplexity)

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules: 0.00B [00:00, ?B/s]

ROUGE: {'rouge1': np.float64(0.13505111638604944), 'rouge2': np.float64(0.039232436785439834), 'rougeL': np.float64(0.12803137749181473), 'rougeLsum': np.float64(0.12903991500775863)}
BLEU: {'bleu': 0.0, 'precisions': [0.16981132075471697, 0.04538341158059468, 0.0050933786078098476, 0.0], 'brevity_penalty': 0.877547588776899, 'length_ratio': 0.8844672657252889, 'translation_length': 689, 'reference_length': 779}


Eval loss: 0.21341587603092194 | Perplexity: 1.237899356880534


## 9. Experiment table (auto-filled)

Run the cell below **after** Section 8 (evaluation). It fills the table from the current run. For multiple experiments, change config (LR, epochs, LoRA r, etc.), re-run from Section 6 (or 7) through Section 8, then run this cell againâ€”new rows are appended to the table.

**Performance metrics (rubric: evaluation)**  
- **ROUGE** (especially ROUGE-L): Overlap of n-grams/longest common subsequence with reference answers; higher is better.  
- **BLEU**: N-gram precision vs references; higher is better.  
- **Perplexity**: exp(eval_loss); lower means the model fits the eval set better.  
Use these together with **qualitative testing** in the Gradio UI (Section 10) and the **base vs fine-tuned comparison** (Section 8b) to assess improvement from fine-tuning.

**How to get 2â€“3 experiments in the table (for the report):**  
The cell above appends **one row per run**. You already have Experiment 1. To add more:

- **Experiment 2:** In **Section 2** (config) set `learning_rate=1e-4`, or in **Section 7** change `learning_rate=1e-4` in `TrainingArguments`. Then **re-run Sections 6, 7, and 8** (load model â†’ train â†’ eval). Then **re-run the table cell above** â†’ a second row is appended.
- **Experiment 3:** Change something else (e.g. in Section 2 or 7: `num_train_epochs=2`, or in Section 6 set `r=16` in `LoraConfig`). Re-run **Sections 6, 7, 8**, then the **table cell** again â†’ third row appended.

You can then copy the full table into your report.

In [None]:
# Auto-fill experiment table from this run (run after Section 8 evaluation)
import pandas as pd

# Current run metrics (from training and eval cells)
lr = getattr(training_args, "learning_rate", 5e-5)
batch = getattr(training_args, "per_device_train_batch_size", 2)
grad_acc = getattr(training_args, "gradient_accumulation_steps", 4)
epochs = getattr(training_args, "num_train_epochs", 3)
lora_r = getattr(lora_config, "r", 8)
val_loss = eval_loss if "eval_loss" in dir() else float("nan")
rl = rouge_result.get("rougeL") if "rouge_result" in dir() else None
rouge_l = rl.get("fmeasure", rl) if isinstance(rl, dict) else (rl if rl is not None else float("nan"))
bleu_score = bleu_result.get("bleu", float("nan")) if "bleu_result" in dir() else float("nan")
time_min = round(train_elapsed_min, 1) if "train_elapsed_min" in dir() else None
gpu_gb = gpu_mem_gb if "gpu_mem_gb" in dir() else None

# Append to experiment log (persists across runs in this session)
if "experiment_log" not in globals():
    experiment_log = []
experiment_log.append({
    "Exp": len(experiment_log) + 1,
    "LR": lr,
    "Batch": f"{batch} (acc {grad_acc})",
    "Epochs": epochs,
    "LoRA r": lora_r,
    "Val loss": round(val_loss, 4) if isinstance(val_loss, (int, float)) else "â€”",
    "ROUGE-L": round(rouge_l, 4) if isinstance(rouge_l, (int, float)) else "â€”",
    "BLEU": round(bleu_score, 4) if isinstance(bleu_score, (int, float)) else "â€”",
    "Time (min)": time_min if time_min is not None else "â€”",
    "GPU mem (GB)": gpu_gb if gpu_gb is not None else "â€”",
    "Notes": "Default" if len(experiment_log) == 0 else f"Run {len(experiment_log) + 1}",
})

# Pad to 3 rows with placeholders so the table always shows 3 experiments
display_log = list(experiment_log)
placeholder_notes = ["Exp 2: set LR=1e-4, re-run 6â€“8 + this cell", "Exp 3: set epochs=2 or r=16, re-run 6â€“8 + this cell"]
while len(display_log) < 3:
    i = len(display_log)
    display_log.append({
        "Exp": i + 1,
        "LR": "â€”", "Batch": "â€”", "Epochs": "â€”", "LoRA r": "â€”",
        "Val loss": "â€”", "ROUGE-L": "â€”", "BLEU": "â€”", "Time (min)": "â€”", "GPU mem (GB)": "â€”",
        "Notes": placeholder_notes[i - 1] if i <= len(placeholder_notes) else f"Run {i + 1}",
    })
df = pd.DataFrame(display_log)
display(df)

## 8b. Base vs fine-tuned comparison (for report and demo)

The assignment requires comparing the **base pre-trained model** with the **fine-tuned** model. Run the cell below to get responses from both on the same questions. Use this output in your report and demo video.

In [None]:
# Compare base (no LoRA) vs fine-tuned. If OOM, set NUM_COMPARE = 2.
NUM_COMPARE = 5
compute_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
if USE_4BIT:
    base_model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        quantization_config=BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_compute_dtype=compute_dtype,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
        ),
        device_map="auto",
        trust_remote_code=True,
    )
else:
    base_model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID, device_map="auto", trust_remote_code=True, torch_dtype=compute_dtype
    )
base_model.eval()

compare_instructions = [ds_split["test"][i]["instruction"] for i in range(min(NUM_COMPARE, len(ds_split["test"])))]
compare_references = [ds_split["test"][i]["response"] for i in range(min(NUM_COMPARE, len(ds_split["test"])))]
base_responses = [generate_response(base_model, tokenizer, inst) for inst in compare_instructions]
finetuned_responses = list(predictions[:NUM_COMPARE])

del base_model
if torch.cuda.is_available():
    torch.cuda.empty_cache()

print("=" * 80)
print("BASE vs FINE-TUNED (use in report and demo video)")
print("=" * 80)
for i in range(len(compare_instructions)):
    q = compare_instructions[i].split("Question:")[-1].strip()[:80]
    print(f"\n--- Example {i+1} ---\nQuestion: {q}...")
    print(f"Reference:   {compare_references[i][:180]}...")
    print(f"Base:        {base_responses[i][:180]}...")
    print(f"Fine-tuned:  {finetuned_responses[i][:180]}...")
print("\n" + "=" * 80)

BASE vs FINE-TUNED (use in report and demo video)

--- Example 1 ---
Question: asking about how to avail kisan credit card loan for sali crop....
Reference:   answer is given in details...
Base:        Sure, I'd be happy to help you with that.

To avail a Kisan Credit Card Loan for Sali Crop, you can follow these steps:

1. Check your eligibility: Before applying for a Kisan Cred...
Fine-tuned:  suggested to apply for kisan credit card loan for sali crop....

--- Example 2 ---
Question: asking about source of early ahu rice variety...
Reference:   transfer to vet expert...
Base:        Yes, I can provide you with information about the source of early ahu rice variety. Early ahu rice variety is a type of rice that has been cultivated in the Philippines for centuri...
Fine-tuned:  suggested to use 100 gms of urea and 100 gms of ammonium sulphate in 10 liters of water....

--- Example 3 ---
Question: asking that he has not got proper friut from his coconut plant...
Reference:   profex sup

## 10. Gradio UI

In [None]:
import gradio as gr

def chat(user_input, history=None):
    if history is None:
        history = []
    instruction = f"You are an agriculture assistant. Answer the following question.\n\nQuestion: {user_input}"
    reply = generate_response(model, tokenizer, instruction)
    history = history + [
        {"role": "user", "content": user_input},
        {"role": "assistant", "content": reply},
    ]
    return history, history

with gr.Blocks(title="Agriculture Assistant") as demo:
    gr.Markdown("""## ðŸŒ¾ Agriculture QA Assistant\n\n**Instructions:** Type your agriculture-related question in the box below and click **Submit** (or press Enter). The fine-tuned model will answer. Use **Clear** to start a new conversation. For your demo video, try in-domain questions (e.g. pests, crops, soil) and optionally an out-of-domain question to show the model stays on topic.\n\nModel fine-tuned on [sowmya14/agriculture_QA](https://huggingface.co/datasets/sowmya14/agriculture_QA).""")
    chatbot = gr.Chatbot(label="Chat")
    msg = gr.Textbox(placeholder="e.g. What are the best practices for soil preparation?", label="Your question")
    submit = gr.Button("Submit")
    clear = gr.Button("Clear")
    state = gr.State([])

    def submit_fn(msg, history):
        if not msg.strip():
            return history, history
        _, new_history = chat(msg, history)
        return new_history, new_history

    submit.click(submit_fn, [msg, state], [chatbot, state])
    clear.click(lambda: ([], []), None, [chatbot, state])
    msg.submit(submit_fn, [msg, state], [chatbot, state])

# In Colab the app appears below; share=True gives a public URL for your demo video.
demo.launch(share=True, theme=gr.themes.Soft())

  with gr.Blocks(title="Agriculture Assistant", theme=gr.themes.Soft()) as demo:


* Running on local URL:  http://127.0.0.1:7860
* Running on public URL: https://3218a2782f4b1fb020.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/gradio/queueing.py", line 766, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 355, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 2163, in process_api
    data = await self.postprocess_data(block_fn, result["prediction"], state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1940, in postprocess_data
    prediction_value = block.postprocess(prediction_value)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/gradio/components/chatbot.py", line 704, in postproce