In [4]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AdamW
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import numpy as np

# Define the model name
MODEL_NAME = "recobo/agriculture-bert-uncased"

# Define our agricultural question-answering dataset
agriculture_qa = [
    {
        "question": "What is crop rotation?",
        "answers": [
            "Crop rotation is the practice of planting different crops sequentially on the same plot of land to improve soil health, optimize nutrients, and combat pest and weed pressure.",
            "Crop rotation is a method of watering crops by rotating sprinklers.",
            "Crop rotation refers to the rotation of farm equipment to different fields."
        ],
        "correct_index": 0
    },
    {
        "question": "What is the importance of soil pH in agriculture?",
        "answers": [
            "Soil pH is only important for decorative gardens, not for agriculture.",
            "Soil pH affects nutrient availability, microbial activity, and overall plant health, making it crucial for optimal crop growth and yield.",
            "Soil pH is a measure of how much water is in the soil."
        ],
        "correct_index": 1
    },
    {
        "question": "What is sustainable agriculture?",
        "answers": [
            "Sustainable agriculture focuses on producing the highest yields possible, regardless of environmental impact.",
            "Sustainable agriculture is a farming approach that focuses solely on organic production methods.",
            "Sustainable agriculture is an integrated system of plant and animal production practices that will satisfy human food needs, enhance environmental quality, and make the most efficient use of non-renewable resources."
        ],
        "correct_index": 2
    },
    {
        "question": "What is precision agriculture?",
        "answers": [
            "Precision agriculture is a farming management concept using digital techniques to monitor and optimize agricultural production processes.",
            "Precision agriculture refers to the precise measurement of crop yields after harvest.",
            "Precision agriculture is a method of hand-picking crops to ensure quality."
        ],
        "correct_index": 0
    },
    {
        "question": "What is the role of cover crops in agriculture?",
        "answers": [
            "Cover crops are purely decorative and serve no practical purpose in agriculture.",
            "Cover crops are plants seeded in the off-season to protect and enrich the soil, prevent erosion, improve water retention, and enhance biodiversity.",
            "Cover crops are tarps used to protect harvested crops from rain."
        ],
        "correct_index": 1
    },
    {
        "question": "What is organic farming?",
        "answers": [
            "Organic farming avoids the use of synthetic pesticides and fertilizers and relies on natural processes to manage soil and crops.",
            "Organic farming involves the use of genetically modified organisms to increase yields.",
            "Organic farming is a method that relies on heavy machinery to reduce labor."
        ],
        "correct_index": 0
    },
    {
        "question": "What is the purpose of irrigation in agriculture?",
        "answers": [
            "Irrigation is used to apply water to crops to support growth during dry periods.",
            "Irrigation is a method of harvesting crops without manual labor.",
            "Irrigation is a system for delivering nutrients directly to plant roots."
        ],
        "correct_index": 0
    },
    {
        "question": "What is agroforestry?",
        "answers": [
            "Agroforestry is the cultivation of trees and shrubs along with crops or livestock to enhance biodiversity and sustainability.",
            "Agroforestry is the practice of planting only fruit trees in large orchards.",
            "Agroforestry is the removal of trees to increase farmland."
        ],
        "correct_index": 0
    },
    {
        "question": "Why is pollination important in agriculture?",
        "answers": [
            "Pollination is important because it helps plants reproduce by transferring pollen from one flower to another, leading to the production of fruits and seeds.",
            "Pollination is used to protect crops from pests.",
            "Pollination only affects flowering plants, not crops."
        ],
        "correct_index": 0
    },
    {
        "question": "What is the purpose of composting in farming?",
        "answers": [
            "Composting helps recycle organic waste into nutrient-rich soil amendments, improving soil structure and fertility.",
            "Composting is used to store extra crops for the winter.",
            "Composting is a method of reducing water usage in farming."
        ],
        "correct_index": 0
    },
    {
        "question": "What is monoculture farming?",
        "answers": [
            "Monoculture farming involves growing only one type of crop over a large area, which can deplete soil nutrients and increase pest vulnerability.",
            "Monoculture farming is the practice of growing multiple crops in the same field at the same time.",
            "Monoculture farming refers to using animals to farm crops."
        ],
        "correct_index": 0
    },
    {
        "question": "What is integrated pest management (IPM)?",
        "answers": [
            "Integrated pest management (IPM) is an approach to pest control that combines biological, cultural, and chemical practices to minimize pest damage while reducing environmental impact.",
            "IPM is a method that uses only chemical pesticides to control pests.",
            "IPM refers to the use of genetic modification to eliminate pests."
        ],
        "correct_index": 0
    },
    {
        "question": "What is permaculture?",
        "answers": [
            "Permaculture is a sustainable design system that seeks to mimic natural ecosystems to produce food and maintain ecological balance.",
            "Permaculture is a method of producing only organic food.",
            "Permaculture is the exclusive cultivation of herbs and spices."
        ],
        "correct_index": 0
    },
    {
        "question": "What are genetically modified organisms (GMOs)?",
        "answers": [
            "Genetically modified organisms (GMOs) are crops that have been altered through biotechnology to exhibit desired traits, such as pest resistance or higher yields.",
            "GMOs are plants that are grown using only organic farming methods.",
            "GMOs are plants that naturally mutate over time."
        ],
        "correct_index": 0
    },
    {
        "question": "Why is biodiversity important in agriculture?",
        "answers": [
            "Biodiversity helps create resilient ecosystems by providing a variety of plants and animals that contribute to pest control, pollination, and soil fertility.",
            "Biodiversity leads to fewer crop varieties, which simplifies farming.",
            "Biodiversity is not important for agriculture, as only one crop type is needed."
        ],
        "correct_index": 0
    },
    {
        "question": "What is no-till farming?",
        "answers": [
            "No-till farming involves growing crops without disturbing the soil through plowing, which helps preserve soil structure and reduce erosion.",
            "No-till farming refers to using machinery to plant crops without human labor.",
            "No-till farming is the method of planting crops in water rather than soil."
        ],
        "correct_index": 0
    },
    {
        "question": "What is the role of nitrogen in agriculture?",
        "answers": [
            "Nitrogen is an essential nutrient for plant growth, playing a key role in photosynthesis and the production of chlorophyll.",
            "Nitrogen is only needed for specific crops like rice and corn.",
            "Nitrogen is a harmful gas that needs to be removed from the soil."
        ],
        "correct_index": 0
    },
    {
        "question": "What is contour farming?",
        "answers": [
            "Contour farming involves planting crops along the contours of a slope to reduce soil erosion and conserve water.",
            "Contour farming is the method of growing crops in perfect circular patterns.",
            "Contour farming refers to planting crops based on weather patterns."
        ],
        "correct_index": 0
    },
    {
        "question": "What is crop diversification?",
        "answers": [
            "Crop diversification is the practice of growing a variety of crops in the same area to improve soil health and reduce the risk of crop failure.",
            "Crop diversification means alternating crops every year.",
            "Crop diversification refers to mixing crops with livestock."
        ],
        "correct_index": 0
    },
    {
        "question": "What are cover crops used for?",
        "answers": [
            "Cover crops are grown to protect soil from erosion, improve soil fertility, and suppress weeds during off-seasons.",
            "Cover crops are used to cover harvested crops to protect them from rain.",
            "Cover crops are grown to attract pollinators."
        ],
        "correct_index": 0
    },
    {
        "question": "What is hydroponics?",
        "answers": [
            "Hydroponics is a method of growing plants without soil, using nutrient-rich water solutions instead.",
            "Hydroponics is the practice of growing crops in sand or gravel.",
            "Hydroponics refers to farming near bodies of water."
        ],
        "correct_index": 0
    },

]



In [5]:
class AgricultureQADataset(Dataset):
    def __init__(self, qa_list, tokenizer, max_length=512):
        self.qa_list = qa_list
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.qa_list) * 3  # 3 answers per question

    def __getitem__(self, idx):
        qa_item = self.qa_list[idx // 3]
        answer_idx = idx % 3

        question = qa_item['question']
        answer = qa_item['answers'][answer_idx]
        label = 1 if answer_idx == qa_item['correct_index'] else 0

        encoding = self.tokenizer.encode_plus(
            question,
            answer,
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'label': torch.tensor(label, dtype=torch.long)
        }


In [6]:
def finetune_model(qa_list, epochs=5, batch_size=8):
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2)

    train_data, val_data = train_test_split(qa_list, test_size=0.2, random_state=42)

    train_dataset = AgricultureQADataset(train_data, tokenizer)
    val_dataset = AgricultureQADataset(val_data, tokenizer)

    train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_dataloader = DataLoader(val_dataset, batch_size=batch_size)

    optimizer = AdamW(model.parameters(), lr=2e-5)

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)

    for epoch in range(epochs):
        model.train()
        for batch in train_dataloader:
            optimizer.zero_grad()
            inputs = {k: v.to(device) for k, v in batch.items() if k != 'label'}
            labels = batch['label'].to(device)
            outputs = model(**inputs, labels=labels)
            loss = outputs.loss
            loss.backward()
            optimizer.step()

        model.eval()
        correct = 0
        total = 0
        with torch.no_grad():
            for batch in val_dataloader:
                inputs = {k: v.to(device) for k, v in batch.items() if k != 'label'}
                labels = batch['label'].to(device)
                outputs = model(**inputs)
                _, predicted = torch.max(outputs.logits, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

        print(f"Epoch {epoch+1}/{epochs}")
        print(f"Validation Accuracy: {100 * correct / total:.2f}%")

    return model, tokenizer


In [7]:
def select_answer(model, tokenizer, qa_item):
    question = qa_item["question"]
    candidate_answers = qa_item["answers"]

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    model.eval()

    scores = []
    for answer in candidate_answers:
        inputs = tokenizer.encode_plus(
            question,
            answer,
            add_special_tokens=True,
            max_length=512,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            outputs = model(**inputs)
            logits = outputs.logits
            score = torch.softmax(logits, dim=1)[0][1].item()  # Probability of being the correct answer
        scores.append(score)

    best_answer_idx = np.argmax(scores)
    return candidate_answers[best_answer_idx], scores[best_answer_idx], best_answer_idx == qa_item["correct_index"]

In [8]:
def evaluate_agriculture_qa(model, tokenizer, qa_list):
    correct = 0
    total = len(qa_list)

    for qa_item in qa_list:
        best_answer, confidence, is_correct = select_answer(model, tokenizer, qa_item)

        print(f"Question: {qa_item['question']}")
        print(f"Model's answer: {best_answer}")
        print(f"Confidence: {confidence:.4f}")
        print(f"Correct: {'Yes' if is_correct else 'No'}")
        print("--------------------")

        if is_correct:
            correct += 1

    accuracy = correct / total
    print(f"Overall accuracy: {accuracy:.2f}")

    return accuracy

# Fine-tune the model
finetuned_model, tokenizer = finetune_model(agriculture_qa)

# Evaluate on agriculture-specific questions
accuracy = evaluate_agriculture_qa(finetuned_model, tokenizer, agriculture_qa)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/378 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/228k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/674 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at recobo/agriculture-bert-uncased and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/5
Validation Accuracy: 66.67%
Epoch 2/5
Validation Accuracy: 66.67%
Epoch 3/5
Validation Accuracy: 73.33%
Epoch 4/5
Validation Accuracy: 93.33%
Epoch 5/5
Validation Accuracy: 100.00%
Question: What is crop rotation?
Model's answer: Crop rotation is the practice of planting different crops sequentially on the same plot of land to improve soil health, optimize nutrients, and combat pest and weed pressure.
Confidence: 0.9284
Correct: Yes
--------------------
Question: What is the importance of soil pH in agriculture?
Model's answer: Soil pH affects nutrient availability, microbial activity, and overall plant health, making it crucial for optimal crop growth and yield.
Confidence: 0.8695
Correct: Yes
--------------------
Question: What is sustainable agriculture?
Model's answer: Sustainable agriculture is an integrated system of plant and animal production practices that will satisfy human food needs, enhance environmental quality, and make the most efficient use of non-renewable r