# AgricGPT - Agricultural Domain Instruction Tuning with QLoRA

Fine-tunes **Microsoft Phi-2** on **AI4Agr/CROP-dataset** with:
- QLoRA (4-bit quantization)
- Train/Validation split with loss tracking
- Perplexity evaluation
- Model card and metrics pushed to HuggingFace

## 1. Install Dependencies

In [None]:
%pip install -q torch transformers datasets peft bitsandbytes accelerate huggingface_hub

## 2. Login to Hugging Face

In [None]:
from huggingface_hub import login
login()

## 3. Configuration

In [None]:
import torch
import math
import json

# Model
MODEL_NAME = "microsoft/phi-2"
OUTPUT_DIR = "./agri_model_results"

# Hugging Face Hub
HF_MODEL_NAME = "agricgpt-phi2"
PUSH_TO_HUB = True
SAVE_STEPS = 100

# Dataset
DATASET_SIZE = 5000
VALIDATION_SPLIT = 0.1
MAX_SEQ_LENGTH = 512

# LoRA
LORA_R = 16
LORA_ALPHA = 32
LORA_DROPOUT = 0.05
TARGET_MODULES = ["fc1", "fc2", "q_proj", "k_proj", "v_proj", "dense"]

# Training
NUM_EPOCHS = 3
BATCH_SIZE = 2
GRADIENT_ACCUMULATION_STEPS = 4
LEARNING_RATE = 2e-4
LOGGING_STEPS = 10
EVAL_STEPS = 50

assert torch.cuda.is_available(), "GPU required!"
print(f"GPU: {torch.cuda.get_device_name(0)}")
torch.manual_seed(42)

## 4. Load Model

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    trust_remote_code=True,
    device_map={"":0}
)
model.config.use_cache = False
print(f"Loaded: {MODEL_NAME}")

## 5. Test Questions (20 for Before/After Comparison)

In [None]:
TEST_QUESTIONS = [
    "What is crop rotation and why is it important?",
    "How do I know when my maize is ready for harvest?",
    "What are cover crops and how do they help?",
    "How should I prepare soil before planting?",
    "What is the best time to plant tomatoes?",
    "How can I control aphids naturally without chemicals?",
    "What causes leaf blight in potatoes?",
    "How do I prevent fungal diseases in my crops?",
    "What are the signs of pest infestation in stored grains?",
    "How can I manage weeds organically?",
    "How can I improve soil fertility naturally?",
    "What is the difference between organic and inorganic fertilizers?",
    "How do I test my soil pH?",
    "What nutrients do plants need most?",
    "How can I prevent soil erosion on my farm?",
    "What is drip irrigation and what are its benefits?",
    "How much water do vegetable crops need?",
    "How can I conserve water on my farm?",
    "What are the benefits of organic farming?",
    "How do I start a small vegetable garden?"
]
print(f"Defined {len(TEST_QUESTIONS)} test questions")

## 6. Base Model Evaluation (BEFORE Training)

In [None]:
from transformers import GenerationConfig, pipeline

base_pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
gen_config = GenerationConfig(
    max_new_tokens=150, do_sample=True, temperature=0.7,
    top_p=0.9, repetition_penalty=1.2,
    eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.eos_token_id
)

def generate_response(pipe, question):
    prompt = f"### Instruction:\n{question}\n\n### Response:\n"
    result = pipe(prompt, generation_config=gen_config)
    response = result[0]['generated_text'].split("### Response:\n")[-1]
    return response.split("### Instruction:")[0].strip()[:500]

print("=" * 60)
print("BASE MODEL RESPONSES (before training)")
print("=" * 60)

all_base_responses = []
for i, q in enumerate(TEST_QUESTIONS, 1):
    torch.manual_seed(42)
    response = generate_response(base_pipe, q)
    all_base_responses.append(response)
    if i <= 3:
        print(f"\nQ{i}: {q}")
        print(f"A: {response[:200]}...")

print(f"\nCollected {len(all_base_responses)} base responses")

## 7. Prepare Dataset with Train/Validation Split

In [None]:
from datasets import load_dataset

dataset = load_dataset("AI4Agr/CROP-dataset", data_files="**/*_en/**/*.json", split="train")
if DATASET_SIZE:
    dataset = dataset.select(range(min(DATASET_SIZE, len(dataset))))

def format_instruction(sample):
    return {"text": f"### Instruction:\n{sample['instruction']}\n\n### Response:\n{sample['output']}{tokenizer.eos_token}"}

dataset = dataset.map(format_instruction)
dataset = dataset.train_test_split(test_size=VALIDATION_SPLIT, seed=42)
train_dataset, eval_dataset = dataset["train"], dataset["test"]

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, max_length=MAX_SEQ_LENGTH, padding="max_length")

tokenized_train = train_dataset.map(tokenize_function, batched=True, remove_columns=train_dataset.column_names)
tokenized_eval = eval_dataset.map(tokenize_function, batched=True, remove_columns=eval_dataset.column_names)

print(f"Train: {len(tokenized_train)}, Eval: {len(tokenized_eval)}")

## 8. Configure LoRA

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
    r=LORA_R, lora_alpha=LORA_ALPHA, target_modules=TARGET_MODULES,
    lora_dropout=LORA_DROPOUT, bias="none", task_type="CAUSAL_LM"
)
model = get_peft_model(model, peft_config)

trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"Trainable: {trainable:,} / {total:,} ({100*trainable/total:.2f}%)")

## 9. Training with Validation Loss Tracking

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    learning_rate=LEARNING_RATE,
    logging_steps=LOGGING_STEPS,
    fp16=True,
    optim="paged_adamw_32bit",
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
    eval_strategy="steps",
    eval_steps=EVAL_STEPS,
    save_strategy="steps",
    save_steps=SAVE_STEPS,
    save_total_limit=3,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    push_to_hub=PUSH_TO_HUB,
    hub_model_id=HF_MODEL_NAME if PUSH_TO_HUB else None,
    hub_strategy="every_save",
    report_to="none",
    seed=42
)

trainer = Trainer(
    model=model,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_eval,
    args=training_args,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

print("Starting training...")
trainer.train()

## 10. Perplexity Calculation

In [None]:
def calculate_perplexity(model, tokenizer, texts, max_length=512):
    model.eval()
    total_loss, total_tokens = 0, 0
    with torch.no_grad():
        for text in texts:
            inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_length)
            inputs = {k: v.to(model.device) for k, v in inputs.items()}
            outputs = model(**inputs, labels=inputs["input_ids"])
            total_loss += outputs.loss.item() * inputs["input_ids"].size(1)
            total_tokens += inputs["input_ids"].size(1)
    return math.exp(total_loss / total_tokens)

eval_texts = [s["text"] for s in eval_dataset]
perplexity = calculate_perplexity(model, tokenizer, eval_texts[:100])

print(f"\n{'='*50}")
print(f"PERPLEXITY: {perplexity:.2f}")
print(f"{'='*50}")

## 11. Fine-Tuned Model Evaluation (AFTER Training)

In [None]:
from transformers import logging
logging.set_verbosity(logging.CRITICAL)
model.eval()

ft_pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)

print("=" * 60)
print("FINE-TUNED MODEL RESPONSES (after training)")
print("=" * 60)

all_ft_responses = []
for i, q in enumerate(TEST_QUESTIONS, 1):
    torch.manual_seed(42)
    response = generate_response(ft_pipe, q)
    all_ft_responses.append(response)
    if i <= 3:
        print(f"\nQ{i}: {q}")
        print(f"A: {response[:200]}...")

print(f"\nCollected {len(all_ft_responses)} fine-tuned responses")

## 12. Side-by-Side Comparison

In [None]:
print("=" * 80)
print("BEFORE vs AFTER COMPARISON")
print("=" * 80)

for i, (q, base_r, ft_r) in enumerate(zip(TEST_QUESTIONS, all_base_responses, all_ft_responses), 1):
    print(f"\n{'â”€'*80}")
    print(f"Q{i}: {q}")
    print(f"\nðŸ“Œ BEFORE: {base_r[:150]}..." if len(base_r) > 150 else f"\nðŸ“Œ BEFORE: {base_r}")
    print(f"\nâœ… AFTER: {ft_r[:150]}..." if len(ft_r) > 150 else f"\nâœ… AFTER: {ft_r}")

## 13. Evaluation Summary

In [None]:
history = trainer.state.log_history
train_losses = [(h['step'], h['loss']) for h in history if 'loss' in h and 'eval_loss' not in h]
eval_losses = [(h['step'], h['eval_loss']) for h in history if 'eval_loss' in h]

print("\n" + "=" * 60)
print("EVALUATION SUMMARY")
print("=" * 60)
print(f"\nðŸ“Š Dataset: {len(tokenized_train)} train, {len(tokenized_eval)} eval")
print(f"\nðŸ“‰ Training Loss: {train_losses[0][1]:.4f} â†’ {train_losses[-1][1]:.4f}")
print(f"ðŸ“ˆ Validation Loss: {eval_losses[0][1]:.4f} â†’ {min(e[1] for e in eval_losses):.4f} (best)")
print(f"ðŸŽ¯ Perplexity: {perplexity:.2f}")

# Save results
results = {
    "perplexity": perplexity,
    "train_samples": len(tokenized_train),
    "eval_samples": len(tokenized_eval),
    "final_train_loss": train_losses[-1][1] if train_losses else None,
    "best_eval_loss": min(e[1] for e in eval_losses) if eval_losses else None,
    "test_questions": TEST_QUESTIONS,
    "base_responses": all_base_responses,
    "finetuned_responses": all_ft_responses
}

import os
os.makedirs(OUTPUT_DIR, exist_ok=True)
with open(f"{OUTPUT_DIR}/evaluation_results.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"\nSaved to {OUTPUT_DIR}/evaluation_results.json")

## 14. Create Model Card & Push Metrics to HuggingFace

In [None]:
if PUSH_TO_HUB:
    from huggingface_hub import HfApi
    
    # Log metrics
    metrics = {
        "eval_loss": results["best_eval_loss"],
        "perplexity": round(perplexity, 2),
        "train_samples": results["train_samples"],
        "eval_samples": results["eval_samples"]
    }
    trainer.log_metrics("eval", metrics)
    trainer.save_metrics("eval", metrics)
    
    # Create model card
    model_card = f"""---
language:
- en
license: apache-2.0
tags:
- agriculture
- phi-2
- qlora
- instruction-tuning
datasets:
- AI4Agr/CROP-dataset
base_model: microsoft/phi-2
model-index:
- name: AgricGPT-Phi2
  results:
  - task:
      type: text-generation
      name: Agricultural Q&A
    metrics:
    - type: perplexity
      value: {perplexity:.2f}
    - type: loss
      value: {results['best_eval_loss']:.4f}
---

# AgricGPT - Agricultural Domain Language Model

Fine-tuned **Microsoft Phi-2** for agricultural Q&A using QLoRA.

## Evaluation Results

| Metric | Value |
|--------|-------|
| **Perplexity** | {perplexity:.2f} |
| **Validation Loss** | {results['best_eval_loss']:.4f} |
| **Training Samples** | {results['train_samples']:,} |
| **Validation Samples** | {results['eval_samples']:,} |

## Usage

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "{HF_MODEL_NAME}")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")

prompt = "### Instruction:\\nWhat is crop rotation?\\n\\n### Response:\\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
```

## Example

**Q: What is crop rotation?**
> {all_ft_responses[0][:300]}...
"""
    
    with open(f"{OUTPUT_DIR}/README.md", "w") as f:
        f.write(model_card)
    
    # Push everything
    print("Pushing model with metrics to HuggingFace...")
    trainer.push_to_hub()
    
    api = HfApi()
    username = api.whoami()['name']
    api.upload_file(
        path_or_fileobj=f"{OUTPUT_DIR}/README.md",
        path_in_repo="README.md",
        repo_id=f"{username}/{HF_MODEL_NAME}",
        repo_type="model"
    )
    
    print(f"\nâœ… Done! View at: https://huggingface.co/{username}/{HF_MODEL_NAME}")
    print("Metrics are now visible on your model card!")