# üåæ SproutBot ‚Äî Agriculture QA Assistant
## Complete LLM Fine-tuning Pipeline | Google Colab / Kaggle (GPU T4)
**Author:** Reine Mizero

---

## üéØ 1. Project Definition & Domain Alignment

**SproutBot** is a domain-specific conversational assistant fine-tuned to answer agriculture-related questions covering crop management, pest control, soil health, irrigation, and fertilization. It targets smallholder farmers and agricultural students who need quick, accurate, plain-language answers.

**Why this domain?**  
Agriculture employs over 1 billion people globally yet remains underserved by AI tooling. A specialized LLM can democratize agronomic knowledge that is otherwise locked behind expensive consultants or hard-to-find extension services.

**Approach:** Generative QA using **TinyLlama-1.1B** fine-tuned with **LoRA (PEFT)** on the `KisanVaani/agriculture-qa-english-only` dataset (~6 k English agriculture QA pairs). TinyLlama is chosen for its small footprint (~1.1 B parameters) which fits comfortably on a free Colab/Kaggle T4 GPU.

**Enable GPU before running:** Kaggle ‚Üí Sidebar ‚Üí Accelerator ‚Üí GPU T4 ‚Üí Save Version ‚Üí Run All

---
## 2. Environment Setup & GPU Check

In [1]:
# GPU availability check
!nvidia-smi
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Install required libraries
!pip install transformers datasets peft accelerate bitsandbytes \
             gradio rouge-score nltk pandas numpy trl -q

import warnings, os
warnings.filterwarnings("ignore")

import torch
import pandas as pd
import numpy as np
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    TrainingArguments, Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, TaskType, PeftModel
from rouge_score import rouge_scorer
import gradio as gr
import math, time

print("\n‚úÖ SETUP COMPLETE!")

Fri Feb 20 22:08:18 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   41C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                      

---
## 3. Dataset Collection & Preprocessing

**Dataset:** `KisanVaani/agriculture-qa-english-only`  
- Source: Hugging Face Datasets Hub  
- Domain: Agriculture (crops, pests, soil, irrigation, fertilizers)  
- Size: ~6,000 English QA pairs  
- Columns: `question`, `answer`

**Preprocessing steps:**
1. **Load** ‚Äî pull 2,000 samples for training efficiency
2. **Clean** ‚Äî drop rows with missing or very short answers (<10 chars)
3. **Normalize** ‚Äî strip extra whitespace; lower-case is *not* applied because the model is case-sensitive
4. **Format** ‚Äî wrap each pair in an instruction-response template  
   `### Question: {q}\n### Answer: {a}</s>`
5. **Tokenize** ‚Äî TinyLlama's BPE tokenizer; truncate at 256 tokens; filter sequences >256 tokens post-tokenization
6. **Split** ‚Äî 90 % train / 10 % validation

In [2]:
# ‚îÄ‚îÄ 3.1 Load dataset ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("Loading agriculture dataset...")
raw = load_dataset("KisanVaani/agriculture-qa-english-only", split="train[:2000]")
print(f"Raw samples: {len(raw)}")
print("Sample:", raw[0])
print("Columns:", raw.column_names)



Loading agriculture dataset...


README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/1.97M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/22615 [00:00<?, ? examples/s]

Raw samples: 2000
Sample: {'question': 'why is crop rotation important in farming?', 'answers': 'This helps to prevent soil erosion and depletion, and can also help to control pests and diseases'}
Columns: ['question', 'answers']


In [3]:
# ‚îÄ‚îÄ 3.2 Full Preprocessing Fix ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
MAX_LEN = 256

# Correct column names: 'question' and 'answers' (with s)
col_q = "question"
col_a = "answers"

# Build formatted texts
formatted = []
for ex in raw:
    q = " ".join(str(ex[col_q]).split())
    a = " ".join(str(ex[col_a]).split())
    if len(a) >= 10 and len(q) >= 5:
        formatted.append(f"### Question: {q}\n### Answer: {a}</s>")

print(f"Valid samples after filtering: {len(formatted)}")

# Tokenize
input_ids_list, attention_mask_list, labels_list = [], [], []
for text in formatted:
    enc = tokenizer(text, truncation=True, max_length=MAX_LEN, padding="max_length")
    input_ids_list.append(enc["input_ids"])
    attention_mask_list.append(enc["attention_mask"])
    labels_list.append(enc["input_ids"].copy())

print(f"Tokenized {len(input_ids_list)} samples")

# Plain PyTorch Dataset
from torch.utils.data import Dataset as TorchDataset

class AgriDataset(TorchDataset):
    def __init__(self, input_ids, attention_mask, labels):
        self.input_ids      = [torch.tensor(x) for x in input_ids]
        self.attention_mask = [torch.tensor(x) for x in attention_mask]
        self.labels         = [torch.tensor(x) for x in labels]
    def __len__(self):
        return len(self.input_ids)
    def __getitem__(self, idx):
        return {
            "input_ids":      self.input_ids[idx],
            "attention_mask": self.attention_mask[idx],
            "labels":         self.labels[idx]
        }

split_idx = int(len(formatted) * 0.9)
train_ds = AgriDataset(input_ids_list[:split_idx],
                       attention_mask_list[:split_idx],
                       labels_list[:split_idx])
val_ds   = AgriDataset(input_ids_list[split_idx:],
                       attention_mask_list[split_idx:],
                       labels_list[split_idx:])

print(f"‚úÖ train_ds: {len(train_ds)} | val_ds: {len(val_ds)}")

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Valid samples after filtering: 1868
Tokenized 1868 samples
‚úÖ train_ds: 1681 | val_ds: 187


---
## 4. Model Fine-tuning with LoRA (PEFT)

**Base model:** `TinyLlama/TinyLlama-1.1B-Chat-v1.0`  
**PEFT method:** LoRA ‚Äî only ~0.5 % of parameters are trainable, making this feasible on a T4 GPU.

### Experiment Plan
We run **two experiments** with different hyperparameters and compare results:

| Experiment | LR | Batch | Grad Acc | Epochs | LoRA r | LoRA Œ± |
|---|---|---|---|---|---|---|
| Exp-1 (baseline LoRA) | 2e-4 | 4 | 4 | 1 | 8 | 16 |
| Exp-2 (tuned) | 5e-5 | 4 | 4 | 2 | 16 | 32 |

In [4]:
# ‚îÄ‚îÄ 4.1 Load Base Model ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"Loading base model: {MODEL_ID}")
t0 = time.time()

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto"
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

total_params = sum(p.numel() for p in base_model.parameters())
print(f"Total parameters: {total_params/1e6:.1f}M")
print(f"Model loaded in {time.time()-t0:.1f}s")
print(f"GPU memory used: {torch.cuda.memory_allocated()/1e9:.2f} GB")

Loading base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0


`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Total parameters: 1100.0M
Model loaded in 7.7s
GPU memory used: 1.01 GB


In [5]:
# ‚îÄ‚îÄ 4.2 Training Helper ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def run_experiment(exp_name, lr, epochs, lora_r, lora_alpha, output_dir):
    print(f"\n{'='*60}")
    print(f"Running {exp_name}: lr={lr}, epochs={epochs}, r={lora_r}, alpha={lora_alpha}")
    print(f"{'='*60}")

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID, torch_dtype=torch.float16, device_map="auto"
    )
    model.config.use_cache = False

    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=lora_r,
        lora_alpha=lora_alpha,
        lora_dropout=0.05,
        target_modules=["q_proj", "v_proj"],
        bias="none"
    )
    model = get_peft_model(model, lora_config)
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Trainable params: {trainable/1e6:.2f}M / {total_params/1e6:.1f}M ({100*trainable/total_params:.2f}%)")

    args = TrainingArguments(
        output_dir=output_dir,
        num_train_epochs=epochs,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=lr,
        lr_scheduler_type="cosine",
        warmup_steps=10,
        fp16=True,
        eval_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        logging_steps=20,
        report_to="none",
        dataloader_num_workers=0,
        remove_unused_columns=False
    )

    data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=train_ds,
        eval_dataset=val_ds,
        data_collator=data_collator,
    )

    t_start = time.time()
    trainer.train()
    t_elapsed = time.time() - t_start

    eval_result = trainer.evaluate()
    eval_loss  = eval_result["eval_loss"]
    perplexity = math.exp(eval_loss)

    print(f"\n‚úÖ {exp_name} done in {t_elapsed/60:.1f} min")
    print(f"   Eval loss: {eval_loss:.4f} | Perplexity: {perplexity:.2f}")
    print(f"   GPU memory peak: {torch.cuda.max_memory_allocated()/1e9:.2f} GB")

    model.save_pretrained(output_dir)
    tokenizer.save_pretrained(output_dir)

    return {
        "exp": exp_name, "lr": lr, "epochs": epochs,
        "lora_r": lora_r, "lora_alpha": lora_alpha,
        "eval_loss": round(eval_loss, 4),
        "perplexity": round(perplexity, 2),
        "train_time_min": round(t_elapsed/60, 1),
        "gpu_mem_gb": round(torch.cuda.max_memory_allocated()/1e9, 2),
        "output_dir": output_dir
    }

print("‚úÖ run_experiment defined.")

‚úÖ run_experiment defined.


In [6]:
print(f"train_ds length: {len(train_ds)}")
print(f"val_ds length:   {len(val_ds)}")
print(f"Sample keys:     {train_ds[0].keys() if len(train_ds) > 0 else 'EMPTY'}")

train_ds length: 1681
val_ds length:   187
Sample keys:     dict_keys(['input_ids', 'attention_mask', 'labels'])


In [7]:
print("Actual columns:", raw.column_names)
print("First row:", raw[0])

Actual columns: ['question', 'answers']
First row: {'question': 'why is crop rotation important in farming?', 'answers': 'This helps to prevent soil erosion and depletion, and can also help to control pests and diseases'}


In [8]:
# ‚îÄ‚îÄ 4.3 Experiment 1 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
exp1 = run_experiment(
    exp_name="Exp-1 (lr=2e-4, r=8, 1 epoch)",
    lr=2e-4, epochs=1, lora_r=8, lora_alpha=16,
    output_dir="./sproutbot-exp1"
)


Running Exp-1 (lr=2e-4, r=8, 1 epoch): lr=0.0002, epochs=1, r=8, alpha=16


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

Trainable params: 1.13M / 1100.0M (0.10%)


Epoch,Training Loss,Validation Loss
1,1.506094,1.554048



‚úÖ Exp-1 (lr=2e-4, r=8, 1 epoch) done in 2.3 min
   Eval loss: 1.5540 | Perplexity: 4.73
   GPU memory peak: 3.25 GB


In [9]:
# ‚îÄ‚îÄ 4.4 Experiment 2 ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
exp2 = run_experiment(
    exp_name="Exp-2 (lr=5e-5, r=16, 2 epochs)",
    lr=5e-5, epochs=2, lora_r=16, lora_alpha=32,
    output_dir="./sproutbot-exp2"
)


Running Exp-2 (lr=5e-5, r=16, 2 epochs): lr=5e-05, epochs=2, r=16, alpha=32


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

Trainable params: 2.25M / 1100.0M (0.20%)


Epoch,Training Loss,Validation Loss
1,2.145249,1.804806
2,1.645467,1.670819



‚úÖ Exp-2 (lr=5e-5, r=16, 2 epochs) done in 6.3 min
   Eval loss: 1.6708 | Perplexity: 5.32
   GPU memory peak: 7.71 GB


In [10]:
# ‚îÄ‚îÄ 4.5 Experiment Comparison Table ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
results_df = pd.DataFrame([
    {k: v for k, v in exp1.items() if k != "model"},
    {k: v for k, v in exp2.items() if k != "model"}
])

print("\nüìä HYPERPARAMETER EXPERIMENT TABLE")
print("=" * 80)
print(results_df.to_string(index=False))

# Pick the best model (lowest perplexity)
best_exp = exp1 if exp1["perplexity"] <= exp2["perplexity"] else exp2
BEST_DIR = "./sproutbot-exp1" if best_exp is exp1 else "./sproutbot-exp2"
print(f"\nüèÜ Best model: {best_exp['exp']} (perplexity={best_exp['perplexity']})")


üìä HYPERPARAMETER EXPERIMENT TABLE
                            exp      lr  epochs  lora_r  lora_alpha  eval_loss  perplexity  train_time_min  gpu_mem_gb       output_dir
  Exp-1 (lr=2e-4, r=8, 1 epoch) 0.00020       1       8          16     1.5540        4.73             2.3        3.25 ./sproutbot-exp1
Exp-2 (lr=5e-5, r=16, 2 epochs) 0.00005       2      16          32     1.6708        5.32             6.3        7.71 ./sproutbot-exp2

üèÜ Best model: Exp-1 (lr=2e-4, r=8, 1 epoch) (perplexity=4.73)


---
## 5. Evaluation

We evaluate using:
- **Perplexity** ‚Äî from training eval_loss (already computed above)
- **ROUGE-L** ‚Äî on 30 validation samples
- **Qualitative comparison** ‚Äî base model vs. fine-tuned on agriculture questions

In [11]:
# ‚îÄ‚îÄ 5.1 Load best fine-tuned model for inference ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"Loading best model from {BEST_DIR}...")

ft_base = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, torch_dtype=torch.float16, device_map="auto"
)
ft_model = PeftModel.from_pretrained(ft_base, BEST_DIR)
ft_model.eval()

# Also load unmodified base for comparison
base_only = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, torch_dtype=torch.float16, device_map="auto"
)
base_only.eval()

print("‚úÖ Models loaded.")

Loading best model from ./sproutbot-exp1...


Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

‚úÖ Models loaded.


In [12]:
# ‚îÄ‚îÄ 5.2 Generation Helper ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def generate_answer(model, question, max_new_tokens=100):
    prompt = f"### Question: {question}\n### Answer:"
    inputs = tokenizer(
        prompt, return_tensors="pt", truncation=True, max_length=200
    ).to(model.device)
    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            do_sample=True,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.1
        )
    decoded = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return decoded.split("### Answer:")[-1].strip()

print("‚úÖ Generation function ready.")

‚úÖ Generation function ready.


In [13]:
# ‚îÄ‚îÄ 5.3 ROUGE-L Evaluation (Fine-tuned vs Base) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)

n_eval = min(30, len(val_ds))

# Re-decode val samples to get question/answer text
val_raw_samples = []
for i in range(n_eval):
    text = tokenizer.decode(val_ds[i]["input_ids"], skip_special_tokens=True)
    parts = text.split("### Answer:")
    q = parts[0].replace("### Question:", "").strip()
    a = parts[1].strip() if len(parts) > 1 else ""
    val_raw_samples.append({"question": q, "answer": a})

ft_scores, base_scores = [], []
for sample in val_raw_samples:
    q, ref = sample["question"], sample["answer"]
    if not ref:
        continue
    ft_ans   = generate_answer(ft_model, q)
    base_ans = generate_answer(base_only, q)
    ft_scores.append(scorer.score(ref, ft_ans)["rougeL"].fmeasure)
    base_scores.append(scorer.score(ref, base_ans)["rougeL"].fmeasure)

ft_rouge   = np.mean(ft_scores)
base_rouge = np.mean(base_scores)
improvement = (ft_rouge - base_rouge) / (base_rouge + 1e-9) * 100

print(f"\nüìä EVALUATION RESULTS ({n_eval} samples)")
print("=" * 50)
print(f"Base model   ROUGE-L: {base_rouge:.4f}")
print(f"Fine-tuned   ROUGE-L: {ft_rouge:.4f}")
print(f"Improvement:          {improvement:+.1f}%")
print(f"\nBest Perplexity (fine-tuned): {best_exp['perplexity']:.2f}")

# Metrics Table
metrics_df = pd.DataFrame({
    "Metric":      ["ROUGE-L", "ROUGE-L", "Perplexity"],
    "Model":       ["Base (no fine-tuning)", "Fine-tuned SproutBot", "Fine-tuned SproutBot"],
    "Score":       [f"{base_rouge:.4f}", f"{ft_rouge:.4f}", f"{best_exp['perplexity']:.2f}"],
    "Notes":       ["Pre-trained TinyLlama", "LoRA fine-tuned", "Lower = better"]
})
print("\n")
print(metrics_df.to_string(index=False))


üìä EVALUATION RESULTS (30 samples)
Base model   ROUGE-L: 0.1239
Fine-tuned   ROUGE-L: 0.1130
Improvement:          -8.8%

Best Perplexity (fine-tuned): 4.73


    Metric                 Model  Score                 Notes
   ROUGE-L Base (no fine-tuning) 0.1239 Pre-trained TinyLlama
   ROUGE-L  Fine-tuned SproutBot 0.1130       LoRA fine-tuned
Perplexity  Fine-tuned SproutBot   4.73        Lower = better


In [14]:
# ‚îÄ‚îÄ 5.4 Qualitative Comparison: Base vs Fine-tuned ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
test_questions = [
    "How do I treat black spot disease on my tomato plants?",
    "What is the best fertilizer for wheat cultivation?",
    "How often should I irrigate rice paddies during the growing season?",
    "What causes yellowing leaves in corn?",
]

# Out-of-domain question (model should respond appropriately / generically)
ood_questions = ["What is the capital of France?"]

print("\n" + "="*70)
print("QUALITATIVE COMPARISON: BASE vs FINE-TUNED")
print("="*70)

for q in test_questions + ood_questions:
    tag = "[IN-DOMAIN]" if q in test_questions else "[OUT-OF-DOMAIN]"
    print(f"\n{tag}\nQ: {q}")
    print(f"  BASE   : {generate_answer(base_only, q, max_new_tokens=80)}")
    print(f"  SPROUT : {generate_answer(ft_model,  q, max_new_tokens=80)}")


QUALITATIVE COMPARISON: BASE vs FINE-TUNED

[IN-DOMAIN]
Q: How do I treat black spot disease on my tomato plants?
  BASE   : Black spot is a fungal disease that affects the leaves and stems of tomato plants. It is caused by the fungus Sporothrix schenkenbergii, which grows in dark, moist environments. To treat black spot, you need to apply fungicide to your plants. Here are some common fungicides used for black spot control:

1. S
  SPROUT : Use fungicides that are labeled for use in tomatoes. These include neem oil, insecticidal soap, and copper-based fungicides. Follow the label instructions carefully when applying these products to ensure they are applied correctly and only on the affected leaves.

Best practices for managing black spot disease on tomato plants include:

1. Cleaning tools

[IN-DOMAIN]
Q: What is the best fertilizer for wheat cultivation?
  BASE   : The best fertilizer for wheat cultivation is a balanced combination of NPK (nitrogen, phosphorus, and potassium) ferti

---
## 6. Gradio Chat UI

SproutBot is deployed with a Gradio interface. Users type a question and receive an agriculture-specific answer. The UI includes:
- Chat history display
- Pre-loaded example questions
- Clear / submit buttons
- Public share link via `share=True` (works in Colab; in Kaggle use the output URL)

In [15]:
# ‚îÄ‚îÄ 6. Gradio Chat Interface ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def sproutbot_respond(message, history):
    """Gradio ChatInterface callback."""
    if not message.strip():
        return "Please enter a question about farming or agriculture."
    answer = generate_answer(ft_model, message, max_new_tokens=150)
    if not answer:
        answer = "I'm not sure about that. Please consult a local agriculture extension officer."
    return answer

EXAMPLES = [
    "What is the best fertilizer for wheat?",
    "How do I control aphids on my crops?",
    "What soil pH is ideal for growing maize?",
    "When should I plant rice in a tropical climate?",
    "How do I improve poor soil quality?",
]

demo = gr.ChatInterface(
    fn=sproutbot_respond,
    title="üåæ SproutBot ‚Äî Agriculture QA Assistant",
    description=(
        "Ask SproutBot any question about **crops, pests, soil, irrigation, or fertilizers**. "
        "Fine-tuned on 2,000+ agriculture QA pairs using LoRA on TinyLlama-1.1B."
    ),
    examples=EXAMPLES,
    theme=gr.themes.Soft(),
)

print("üöÄ Launching SproutBot...")
demo.launch(share=True, server_name="0.0.0.0", server_port=7860)

üöÄ Launching SproutBot...
* Running on local URL:  http://0.0.0.0:7860
* Running on public URL: https://ae44b8130bcbb4ccd5.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [16]:
# ‚îÄ‚îÄ 7. Package outputs ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
import shutil

# Zip the best model adapter + notebook
shutil.make_archive("sproutbot_submission", "zip", BEST_DIR)
print("üì¶ sproutbot_submission.zip created ‚Äî download from Kaggle output sidebar.")
print("\nüéâ COMPLETE! SproutBot is ready for submission.")

üì¶ sproutbot_submission.zip created ‚Äî download from Kaggle output sidebar.

üéâ COMPLETE! SproutBot is ready for submission.
