1. Problem Understanding:
   
Farmers need quick, reliable, and contextual agricultural advice.
Queries may come in multiple languages (English, Hindi, regional languages).


2. Language Detection and Translation
Detect farmer’s query language.
Translate it to English if needed for better LLM understanding


3. Query Handling (Two Paths): 

For FAQs:
 → Directly query a Large Language Model (LLM) like GPT-3.5 / fine-tuned agriculture model to answer common questions.

For Complex/Research Questions:
 → Use RAG (Retrieval-Augmented Generation):
 Search agricultural research papers, retrieve relevant sections, then use LLM to generate answers.

# PART 1

# Using NLP

In [13]:
# Step 1: Install required libraries
!pip install googletrans==4.0.0-rc1 langdetect

# Step 2: Import libraries
import pandas as pd
from googletrans import Translator
from langdetect import detect
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Step 3: Load Dataset
print("\n[INFO] Loading dataset...")
dataset = pd.read_csv("/kaggle/input/agronomic-question-and-answer-dataset/AgroQA Dataset.csv")
print("[INFO] Dataset loaded successfully!")

# Step 4: Basic Text Cleaning
print("\n[INFO] Cleaning dataset (lowercasing)...")
dataset['Question'] = dataset['Question'].str.lower()
dataset['Answer'] = dataset['Answer'].str.lower()
print("[INFO] Cleaning done!")

# Step 5: Create TF-IDF Vectorizer
print("\n[INFO] Creating TF-IDF vectors...")
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(dataset['Question'])
print("[INFO] TF-IDF vectors created!")

# Step 6: Helper Functions
translator = Translator()

def detect_and_translate_to_english(query):
    """Detect language and translate query to English if needed"""
    print("\n[STEP] Detecting language of user query...")
    detected_language = detect(query)
    print(f"[INFO] Detected Language: {detected_language}")
    if detected_language != 'en':
        print("[STEP] Converting query to English...")
        translated_query = translator.translate(query, src=detected_language, dest='en').text
        print(f"[INFO] Translated Query: {translated_query}")
        return translated_query, detected_language
    else:
        print("[INFO] Query is already in English.")
        return query, 'en'

def translate_answer(answer, target_lang):
    """Translate the answer back to user's original language"""
    if target_lang != 'en':
        print("\n[STEP] Translating answer back to user's original language...")
        translated_answer = translator.translate(answer, dest=target_lang).text
        print(f"[INFO] Translated Answer: {translated_answer}")
        return translated_answer
    else:
        print("\n[INFO] Answer already in English. No translation needed.")
        return answer

def get_best_answer(query):
    """Find the best matching answer from dataset"""
    print("\n[STEP] Fetching best answer from dataset...")
    query_vector = vectorizer.transform([query.lower()])
    cosine_sim = cosine_similarity(query_vector, X)
    index = cosine_sim.argmax()
    best_answer = dataset['Answer'][index]
    print(f"[INFO] Best matching answer fetched.")
    return best_answer

# Step 7: Main Function
def qa_system(user_query):
    print("\n========================================")
    print("[NEW QUERY RECEIVED]")
    print("========================================")
    
    # Step 7.1: Detect and Translate User Query to English
    english_query, user_language = detect_and_translate_to_english(user_query)
    
    # Step 7.2: Find best matching answer
    answer_in_english = get_best_answer(english_query)
    print(f"[INFO] Answer from Dataset (in English): {answer_in_english}")
    
    # Step 7.3: Translate Answer back to user's original language
    final_answer = translate_answer(answer_in_english, user_language)
    
    print("\n[FINAL OUTPUT]")
    print("========================================")
    return final_answer

# Step 8: Example Usage
user_query = "धान के पौधों में पानी की सही मात्रा कैसे सुनिश्चित करें?"
final_response = qa_system(user_query)
print(final_response)


[INFO] Loading dataset...
[INFO] Dataset loaded successfully!

[INFO] Cleaning dataset (lowercasing)...
[INFO] Cleaning done!

[INFO] Creating TF-IDF vectors...
[INFO] TF-IDF vectors created!

[NEW QUERY RECEIVED]

[STEP] Detecting language of user query...
[INFO] Detected Language: hi
[STEP] Converting query to English...
[INFO] Translated Query: How to ensure the correct amount of water in paddy plants?

[STEP] Fetching best answer from dataset...
[INFO] Best matching answer fetched.
[INFO] Answer from Dataset (in English): 1 by 1 meters is the suitable spacing

[STEP] Translating answer back to user's original language...
[INFO] Translated Answer: 1 से 1 मीटर उपयुक्त रिक्ति है

[FINAL OUTPUT]
1 से 1 मीटर उपयुक्त रिक्ति है


### Examples in multiple languages

In [14]:
# Example in tamil
user_query = "எப்படி நெல் பயிர்களில் புழுக்கள் கட்டுப்படுத்த வேண்டும்?"
final_response = qa_system(user_query)
print(final_response)


[NEW QUERY RECEIVED]

[STEP] Detecting language of user query...
[INFO] Detected Language: ta
[STEP] Converting query to English...
[INFO] Translated Query: How to control worms in paddy crops?

[STEP] Fetching best answer from dataset...
[INFO] Best matching answer fetched.
[INFO] Answer from Dataset (in English): larvae of some moths

[STEP] Translating answer back to user's original language...
[INFO] Translated Answer: சில அந்துப்பூச்சிகளின் லார்வாக்கள்

[FINAL OUTPUT]
சில அந்துப்பூச்சிகளின் லார்வாக்கள்


In [15]:
# Example in marathi
user_query = "तांदळाच्या पिकांमध्ये पाणी योग्य प्रमाणात कसे द्यावे?"
final_response = qa_system(user_query)
print(final_response)


[NEW QUERY RECEIVED]

[STEP] Detecting language of user query...
[INFO] Detected Language: mr
[STEP] Converting query to English...
[INFO] Translated Query: How to give water properly in rice crops?

[STEP] Fetching best answer from dataset...
[INFO] Best matching answer fetched.
[INFO] Answer from Dataset (in English): usually in the evenings when the evaporation is low due to heat

[STEP] Translating answer back to user's original language...
[INFO] Translated Answer: सहसा संध्याकाळी जेव्हा उष्णतेमुळे बाष्पीभवन कमी होते

[FINAL OUTPUT]
सहसा संध्याकाळी जेव्हा उष्णतेमुळे बाष्पीभवन कमी होते


In [16]:
# Example in PUNJABI
user_query = "ਮੈਨੂੰ ਆਪਣੇ ਚੌਲਾਂ ਦੀਆਂ ਫ਼ਸਲਾਂ ਵਿੱਚ ਕੀੜਿਆਂ ਨੂੰ ਕਿਵੇਂ ਨਿਯੰਤਰਤ ਕਰਨਾ ਚਾਹੀਦਾ ਹੈ?"
final_response = qa_system(user_query)
print(final_response)


[NEW QUERY RECEIVED]

[STEP] Detecting language of user query...
[INFO] Detected Language: pa
[STEP] Converting query to English...
[INFO] Translated Query: How should I control pests in my rice crops?

[STEP] Fetching best answer from dataset...
[INFO] Best matching answer fetched.
[INFO] Answer from Dataset (in English): use resistant varieties and increase on water availability for crop vigor

[STEP] Translating answer back to user's original language...
[INFO] Translated Answer: ਰੋਧਕ ਕਿਸਮਾਂ ਦੀ ਵਰਤੋਂ ਕਰੋ ਅਤੇ ਫਸਲਾਂ ਦੇ ਜੋਸ਼ ਲਈ ਪਾਣੀ ਦੀ ਉਪਲਬਧਤਾ 'ਤੇ ਵਾਧਾ ਕਰੋ

[FINAL OUTPUT]
ਰੋਧਕ ਕਿਸਮਾਂ ਦੀ ਵਰਤੋਂ ਕਰੋ ਅਤੇ ਫਸਲਾਂ ਦੇ ਜੋਸ਼ ਲਈ ਪਾਣੀ ਦੀ ਉਪਲਬਧਤਾ 'ਤੇ ਵਾਧਾ ਕਰੋ


### Multiple queries in NLP

In [17]:
queries = [
    "मुझे अपने टमाटर के पौधों के लिए उर्वरक की आवश्यकता है।",  # Hindi
    "माझ्या गव्हाच्या पिकाला पाणी कधी द्यावे?",                # Marathi
    "எப்படி எனது நெல் பயிர்களை பூச்சி இருந்து பாதுகாப்பது?",  # Tamil
    "আমি আমার ধানের ফসলের পোকামাকড় কিভাবে নিয়ন্ত্রণ করব?",   # Bengali
    "ਮੈਨੂੰ ਆਪਣੇ ਚੌਲਾਂ ਦੀਆਂ ਫ਼ਸਲਾਂ ਵਿੱਚ ਕੀੜਿਆਂ ਨੂੰ ਕਿਵੇਂ ਨਿਯੰਤਰਤ ਕਰਨਾ ਚਾਹੀਦਾ ਹੈ?"  # Punjabi
]

In [18]:
for query in queries:
    print("="*50)
    print(f"Original Query: {query}")
    
    # Detect language
    detected_language = detect(query)
    print(f"Detected Language: {detected_language}")
    
    # Translate to English if needed
    if detected_language != 'en':
        translator = Translator()
        translated_query = translator.translate(query, src=detected_language, dest='en').text
        print(f"Translating to English: {translated_query}")
    else:
        translated_query = query
        print("Query is already in English.")
    
    # Get answer from dataset
    answer = get_answer(translated_query)
    print(f"Answer from dataset (in English): {answer}")
    
    # Translate answer back to original language
    if detected_language != 'en':
        translated_answer = translator.translate(answer, dest=detected_language).text
        print(f"Translated Answer (in user's language): {translated_answer}")
    else:
        print(f"Answer (already in English): {answer}")

Original Query: मुझे अपने टमाटर के पौधों के लिए उर्वरक की आवश्यकता है।
Detected Language: hi
Translating to English: I need fertilizer for my tomato plants.
Answer from dataset (in English): not always but if the soil is limited it is advised to apply
Translated Answer (in user's language): हमेशा नहीं, लेकिन अगर मिट्टी सीमित है तो इसे लागू करने की सलाह दी जाती है
Original Query: माझ्या गव्हाच्या पिकाला पाणी कधी द्यावे?
Detected Language: mr
Translating to English: When to give water to my wheat crop?
Answer from dataset (in English): in the roots and stems
Translated Answer (in user's language): मुळे आणि देठांमध्ये
Original Query: எப்படி எனது நெல் பயிர்களை பூச்சி இருந்து பாதுகாப்பது?
Detected Language: ta
Translating to English: How to protect my paddy crops from insect?
Answer from dataset (in English): by fencing the garden
Translated Answer (in user's language): தோட்டத்தை ஃபென்சிங் செய்வதன் மூலம்
Original Query: আমি আমার ধানের ফসলের পোকামাকড় কিভাবে নিয়ন্ত্রণ করব?
Detected Language

# CHATBOT using NLP

In [19]:
# Install necessary libraries
!pip install googletrans==4.0.0-rc1 langdetect
import pandas as pd
from googletrans import Translator
from langdetect import detect
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load your dataset (assuming it's a CSV)
dataset = pd.read_csv("/kaggle/input/agronomic-question-and-answer-dataset/AgroQA Dataset.csv")

# Check the first few rows
print(dataset.head())

# Example of basic text cleaning
dataset['Question'] = dataset['Question'].str.lower()
dataset['Answer'] = dataset['Answer'].str.lower()

# Initialize TF-IDF Vectorizer
vectorizer = TfidfVectorizer(stop_words='english')

# Fit and transform the questions into a matrix
X = vectorizer.fit_transform(dataset['Question'])

# Function to detect language and translate to English if needed
def detect_and_translate(query):
    detected_language = detect(query)
    print(f"Detected Language: {detected_language}")

    if detected_language != 'en':
        print(f"Converting to English: {query}")
        translator = Translator()
        translated_query = translator.translate(query, src=detected_language, dest='en').text
        print(f"Translated to English: {translated_query}")
        return translated_query, detected_language
    else:
        print("Query is already in English.")
        return query, detected_language

# Function to get the best matching answer from the dataset
def get_answer(query):
    query_vector = vectorizer.transform([query.lower()])
    cosine_sim = cosine_similarity(query_vector, X)
    index = cosine_sim.argmax()
    return dataset['Answer'][index]

# Function to translate the answer back to the user's language
def translate_back(answer, target_lang):
    translator = Translator()
    translated_answer = translator.translate(answer, dest=target_lang).text
    return translated_answer

# Main loop for chatbot interaction
while True:
    user_query = input("Enter your question (or type 'bye' to exit): ")
    
    if user_query.lower() == 'bye':
        print("Goodbye!")
        break

    # Step 1: Detect language and translate to English
    translated_query, user_lang = detect_and_translate(user_query)
    
    # Step 2: Get the answer from the dataset
    print("Getting answer from dataset...")
    answer = get_answer(translated_query)
    print(f"Answer in English: {answer}")
    
    # Step 3: Translate the answer back to the user's language
    print(f"Translating answer back to {user_lang}...")
    final_answer = translate_back(answer, user_lang)
    
    # Step 4: Display the answer in the user's language
    print(f"Answer in {user_lang}: {final_answer}")

      Crop                                           Question  \
0    maize  Apart from hand weeding, what other method use...   
1    beans  Apart from insecticide, what other method used...   
2    maize  Apart from sun drying which other method used ...   
3  cassava  Apart from sun drying, what other method can I...   
4    beans           As a farmer when should I harvest beans.   

                                              Answer  
0                    Machinery weeders are available  
1  Use resistant verities and increase on water a...  
2    Use tarpaulins or cemented floor free from dust  
3                                       Solar driers  
4  When the beans pods are yellowish green or dry...  


Enter your question (or type 'bye' to exit):  As a farmer when should i harvest beans?


Detected Language: en
Query is already in English.
Getting answer from dataset...
Answer in English: when the beans pods are yellowish green or dry brown
Translating answer back to en...
Answer in en: when the beans pods are yellowish green or dry brown


Enter your question (or type 'bye' to exit):  when shoud i harvest beans?


Detected Language: en
Query is already in English.
Getting answer from dataset...
Answer in English: pull them out of ground with hands
Translating answer back to en...
Answer in en: pull them out of ground with hands


Enter your question (or type 'bye' to exit):  bye


Goodbye!


# PART 2

# USING LLMS

# **CHATBOT Using LLMS**

# T5 Small

In [20]:
from langdetect import detect
from transformers import pipeline, MarianMTModel, MarianTokenizer

# Load pre-trained model for text generation (T5 or similar)
generator = pipeline('text2text-generation', model='t5-small')

# Load translation models (English <-> French for example, you can extend for more languages)
translation_en_to_fr = MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-en-fr')
translation_fr_to_en = MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-fr-en')

# Load tokenizers for the translation models
en_tokenizer = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-en-fr')
fr_tokenizer = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-fr-en')

# Function to translate text
def translate_text(text, src_lang, tgt_lang):
    model = translation_en_to_fr if src_lang == 'en' else translation_fr_to_en
    tokenizer = en_tokenizer if src_lang == 'en' else fr_tokenizer

    translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True))
    return tokenizer.decode(translated[0], skip_special_tokens=True)

# Function to detect the language of the query and translate to English if needed
def translate_to_english(query):
    detected_language = detect(query)
    print(f"Detected Language: {detected_language}")
    
    if detected_language != 'en':
        print(f"Translating query to English...")
        return translate_text(query, detected_language, 'en')
    else:
        return query

# Function to translate the answer back to the original language
def translate_back_to_original(query, original_language):
    print(f"Translating back to {original_language}...")
    if original_language != 'en':
        return translate_text(query, 'en', original_language)
    else:
        return query

# Function to generate the answer using T5 or another model
def generate_answer(query):
    response = generator(query, max_length=50, num_return_sequences=1)
    return response[0]['generated_text']

# Main function to handle the conversation flow
def chatbot():
    while True:
        # Taking live input from the user
        query = input("Enter your query (or type 'exit' to stop): ")
        
        if query.lower() == 'exit':
            print("Exiting chatbot. Goodbye!")
            break

        print(f"\nUser query: {query}")
        
        # Step 1: Detect language and translate to English if necessary
        original_language = detect(query)
        print(f"Original query: {query}")
        print(f"Detected language: {original_language}")
        
        translated_query = translate_to_english(query)
        print(f"Translated to English: {translated_query}")
        
        # Step 2: Generate response using Hugging Face model (T5)
        response = generate_answer(translated_query)
        print(f"Generated response in English: {response}")
        
        # Step 3: Translate the response back to the original language if needed
        final_response = translate_back_to_original(response, original_language)
        print(f"Final response in original language: {final_response}\n")
        
# Run the chatbot
chatbot()

2025-04-26 18:41:49.761012: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745692910.087165      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745692910.171041      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cuda:0


config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

Enter your query (or type 'exit' to stop):  Comment gérer les ravageurs dans les cultures?



User query: Comment gérer les ravageurs dans les cultures?
Original query: Comment gérer les ravageurs dans les cultures?
Detected language: fr
Detected Language: fr
Translating query to English...
Translated to English: How to manage pests in crops?
Generated response in English: Wie kann man die Schädlinge in den Pflanzen bekämpfen?
Translating back to fr...
Final response in original language: L'homme de Wie Kann meurt Schädlinge dans le den Pflanzen bekämpfen ?



Enter your query (or type 'exit' to stop):  exit


Exiting chatbot. Goodbye!


# DistilBERT

In [22]:
from googletrans import Translator
from langdetect import detect
from transformers import pipeline

# Initialize the models
translator = Translator()

# Load a pre-trained Question Answering model and tokenizer from Hugging Face
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased")

# Function to detect language and translate to English if needed
def detect_and_translate(query):
    detected_language = detect(query)
    print(f"Detected language: {detected_language}")
    
    # If the query is not in English, translate to English
    if detected_language != 'en':
        print("Translating to English...")
        translated_query = translator.translate(query, src=detected_language, dest='en').text
        print(f"Translated query to English: {translated_query}")
    else:
        translated_query = query
        print(f"Query is already in English: {translated_query}")
    
    return translated_query, detected_language

# Function to generate answer using the Question-Answering model
def generate_answer(query):
    context = "Crop disease management includes using organic methods, biological control, and chemical pesticides."
    result = qa_pipeline(question=query, context=context)
    return result['answer']

# Function to translate the answer back to the original language if needed
def translate_back(answer, target_lang):
    if target_lang != 'en':
        print(f"Translating answer back to {target_lang}...")
        translated_answer = translator.translate(answer, src='en', dest=target_lang).text
        return translated_answer
    else:
        return answer

# Main function to process queries
def process_query(query):
    # Detect language and translate to English if needed
    translated_query, detected_language = detect_and_translate(query)
    
    # Generate answer in English
    print("Generating answer from model...")
    generated_answer = generate_answer(translated_query)
    print(f"Generated answer in English: {generated_answer}")
    
    # Translate answer back to the original language if needed
    final_answer = translate_back(generated_answer, detected_language)
    
    print(f"Final answer in original language: {final_answer}")
    return final_answer

# Example of how to use the chatbot
query = "Comment gérer les ravageurs dans les cultures?"
final_response = process_query(query)
print(f"Final Response: {final_response}")

config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cuda:0


Detected language: fr
Translating to English...
Translated query to English: How to manage pests in crops?
Generating answer from model...
Generated answer in English: using organic methods, biological control, and chemical pesticides
Translating answer back to fr...
Final answer in original language: en utilisant des méthodes organiques, un contrôle biologique et des pesticides chimiques
Final Response: en utilisant des méthodes organiques, un contrôle biologique et des pesticides chimiques


In [23]:
from googletrans import Translator
from langdetect import detect
from transformers import pipeline

# Initialize the models
translator = Translator()

# Load a pre-trained Question Answering model and tokenizer from Hugging Face
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased")

# Function to detect language and translate to English if needed
def detect_and_translate(query):
    detected_language = detect(query)
    print(f"Detected language: {detected_language}")
    
    # If the query is not in English, translate to English
    if detected_language != 'en':
        print("Translating to English...")
        translated_query = translator.translate(query, src=detected_language, dest='en').text
        print(f"Translated query to English: {translated_query}")
    else:
        translated_query = query
        print(f"Query is already in English: {translated_query}")
    
    return translated_query, detected_language

# Function to generate an answer using the Question-Answering model
def generate_answer(query):
    # Provide a general context for crop diseases (replace with domain-specific info as needed)
    context = "Crop disease management includes using organic methods, biological control, and chemical pesticides."
    result = qa_pipeline(question=query, context=context)
    return result['answer']

# Function to translate the answer back to the original language if needed
def translate_back(answer, target_lang):
    if target_lang != 'en':
        print(f"Translating answer back to {target_lang}...")
        translated_answer = translator.translate(answer, src='en', dest=target_lang).text
        return translated_answer
    else:
        return answer

# Main function to process queries
def process_query(query):
    # Detect language and translate to English if needed
    translated_query, detected_language = detect_and_translate(query)
    
    # Generate answer in English
    print("Generating answer from model...")
    generated_answer = generate_answer(translated_query)
    print(f"Generated answer in English: {generated_answer}")
    
    # Translate answer back to the original language if needed
    final_answer = translate_back(generated_answer, detected_language)
    
    print(f"Final answer in original language: {final_answer}")
    return final_answer

# Function to simulate chatbot interaction
def chatbot():
    print("Chatbot is ready. Enter 'exit' to stop.")
    
    while True:
        # Taking input from user
        query = input("Enter your query: ")
        
        if query.lower() == 'exit':
            print("Exiting chatbot.")
            break
        
        # Process the query and get the answer
        final_response = process_query(query)
        print(f"Chatbot's Response: {final_response}\n")

# Start chatbot
chatbot()

Device set to use cuda:0


Chatbot is ready. Enter 'exit' to stop.


Enter your query:  How to improve soil health?


Detected language: en
Query is already in English: How to improve soil health?
Generating answer from model...
Generated answer in English: using organic methods, biological control, and chemical pesticides
Final answer in original language: using organic methods, biological control, and chemical pesticides
Chatbot's Response: using organic methods, biological control, and chemical pesticides



Enter your query:  Comment gérer les ravageurs dans les cultures?


Detected language: fr
Translating to English...
Translated query to English: How to manage pests in crops?
Generating answer from model...
Generated answer in English: using organic methods, biological control, and chemical pesticides
Translating answer back to fr...
Final answer in original language: en utilisant des méthodes organiques, un contrôle biologique et des pesticides chimiques
Chatbot's Response: en utilisant des méthodes organiques, un contrôle biologique et des pesticides chimiques



Enter your query:  పంటలలో సూక్ష్మజీవుల నియంత్రణ ఎలా చేయాలి?


Detected language: te
Translating to English...
Translated query to English: How to control microorganisms in crops?
Generating answer from model...
Generated answer in English: biological control, and chemical pesticides
Translating answer back to te...
Final answer in original language: జీవసంబంధమైన పురుగుమందులు
Chatbot's Response: జీవసంబంధమైన పురుగుమందులు



Enter your query:  ਕਿਸਾਨਾਂ ਨੂੰ ਕਿਸ ਤਰ੍ਹਾਂ ਦੇਖਭਾਲ ਅਤੇ ਰੋਗ ਪ੍ਰਬੰਧਨ ਦੀ ਲੋੜ ਹੈ?


Detected language: pa
Translating to English...
Translated query to English: How farmers need care and disease management?
Generating answer from model...
Generated answer in English: using organic methods, biological control, and chemical pesticides
Translating answer back to pa...
Final answer in original language: ਜੈਵਿਕ ਵਿਧੀਆਂ, ਜੀਵ-ਵਿਗਿਆਨਕ ਨਿਯੰਤਰਣ, ਅਤੇ ਰਸਾਇਣਕ ਕੀਟਨਾਸ਼ਕਾਂ ਦੀ ਵਰਤੋਂ ਕਰਨਾ
Chatbot's Response: ਜੈਵਿਕ ਵਿਧੀਆਂ, ਜੀਵ-ਵਿਗਿਆਨਕ ਨਿਯੰਤਰਣ, ਅਤੇ ਰਸਾਇਣਕ ਕੀਟਨਾਸ਼ਕਾਂ ਦੀ ਵਰਤੋਂ ਕਰਨਾ



Enter your query:  फसलों में कीटों का नियंत्रण कैसे करें?


Detected language: hi
Translating to English...
Translated query to English: How to control pests in crops?
Generating answer from model...
Generated answer in English: using organic methods, biological control, and chemical pesticides
Translating answer back to hi...
Final answer in original language: कार्बनिक तरीकों, जैविक नियंत्रण और रासायनिक कीटनाशकों का उपयोग करना
Chatbot's Response: कार्बनिक तरीकों, जैविक नियंत्रण और रासायनिक कीटनाशकों का उपयोग करना



Enter your query:  exit


Exiting chatbot.


# Next Step is use llms and fine-tune it

## Used BERT-SQuAD LLMS to answer multilingual FAQs about crop diseases

In [2]:
pip install transformers googletrans==4.0.0-rc1 langdetect

Note: you may need to restart the kernel to use updated packages.


In [5]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langdetect import detect
from googletrans import Translator

# Step 1: Load your dataset
df = pd.read_csv("/kaggle/input/agronomic-question-and-answer-dataset/AgroQA Dataset.csv")  

# Step 2: Create a context (concatenate top N Q&A for now)
context = ""
for i in range(len(df)):
    context += f"Q: {df['Question'][i]}\nA: {df['Answer'][i]}\n\n"

# Step 3: Load a general-purpose QA model (BERT-SQuAD)
model_name = "deepset/bert-base-cased-squad2"  # You can try agriculturally fine-tuned if available
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

# Step 4: Multilingual support
translator = Translator()

def detect_and_translate(text):
    lang = detect(text)
    if lang != 'en':
        translated = translator.translate(text, src=lang, dest='en').text
    else:
        translated = text
    return translated, lang

def translate_back(text, lang):
    if lang != 'en':
        return translator.translate(text, src='en', dest=lang).text
    return text

# Step 5: Get answer from LLM
def get_llm_answer(user_query):
    translated_query, original_lang = detect_and_translate(user_query)
    
    # QA model prediction
    result = qa_pipeline(question=translated_query, context=context)
    answer = result['answer']
    
    # Translate back
    final_answer = translate_back(answer, original_lang)
    
    print(f"\nQuery: {user_query}")
    print(f"Answer: {final_answer}")

# Example loop
while True:
    query = input("\nAsk a crop disease question (or type 'exit'): ")
    if query.lower() == 'exit':
        break
    get_llm_answer(query)

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0



Ask a crop disease question (or type 'exit'):  Apart from insecticide, what other method used to control bean weevils?



Query: Apart from insecticide, what other method used to control bean weevils?
Answer: pyrethroids



Ask a crop disease question (or type 'exit'):  As a farmer, how can I know the qualities of a good fertilizer?



Query: As a farmer, how can I know the qualities of a good fertilizer?
Answer: Nitrogen fertiliser



Ask a crop disease question (or type 'exit'):  As a farmer, how can I know the qualities of a good fertilizer?



Query: As a farmer, how can I know the qualities of a good fertilizer?
Answer: Nitrogen fertiliser



Ask a crop disease question (or type 'exit'):  Can cassava seeds germinate if planted?



Query: Can cassava seeds germinate if planted?
Answer: vigorous germination



Ask a crop disease question (or type 'exit'):  exit


## FINE-TUNE

In [6]:
import requests
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering
import torch

# 1. Load the same QA model used in Point 1 (fine-tuned or distilbert for demo)
model_name = "deepset/bert-base-cased-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

# 2. Embedder for semantic similarity
embedder = SentenceTransformer('all-MiniLM-L6-v2')  # Lightweight and fast

# 3. Web Scraper
def extract_text_from_url(url):
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, "html.parser")
        paragraphs = soup.find_all("p")
        text = "\n".join(p.get_text() for p in paragraphs)
        return text
    except:
        return ""

# 4. Split text into passages (for retrieval)
def split_text(text, max_length=200):
    sentences = text.split('.')
    passages = []
    current = ""
    for sentence in sentences:
        if len(current) + len(sentence) < max_length:
            current += sentence + "."
        else:
            passages.append(current)
            current = sentence + "."
    if current:
        passages.append(current)
    return passages

# 5. Retrieve top-k relevant passages
def retrieve_relevant_passages(query, passages, top_k=3):
    query_embedding = embedder.encode(query, convert_to_tensor=True)
    passage_embeddings = embedder.encode(passages, convert_to_tensor=True)
    hits = util.semantic_search(query_embedding, passage_embeddings, top_k=top_k)[0]
    return [passages[hit['corpus_id']] for hit in hits]

# 6. RAG-style Answering
def rag_answer(user_query, urls):
    # Crawl and collect all passages
    all_passages = []
    for url in urls:
        text = extract_text_from_url(url)
        passages = split_text(text)
        all_passages.extend(passages)

    # Retrieve top relevant contexts
    relevant_passages = retrieve_relevant_passages(user_query, all_passages, top_k=3)
    context = "\n".join(relevant_passages)

    # Answer using QA pipeline
    result = qa_pipeline(question=user_query, context=context)
    return result["answer"]

# 7. Try with your links
urls = [
    "https://link.springer.com/article/10.1007/s10311-008-0147-0",
    "https://www.sciencedirect.com/science/article/pii/S2095311919626894",
    "https://www.indianjournals.com/ijor.aspx?target=ijor:ar&volume=36&issue=1&article=004",
    "https://www.indianjournals.com/ijor.aspx?target=ijor:ar&volume=35&issue=3&article=004",
    "https://link.springer.com/book/10.1007/978-981-10-4325-3",
    "https://books.google.co.in/books?hl=en&lr=&id=RhT8DwAAQBAJ&oi=fnd&pg=PA137&dq=Pest+Management+in+Organic+Farming&ots=a47-LNJcOR&sig=GiyMWVZ9NTDGKej4BA2iIfvQt68&redir_esc=y#v=onepage&q=Pest%20Management%20in%20Organic%20Farming&f=false",
    "https://www.annualreviews.org/content/journals/10.1146/annurev.ento.52.110405.091337",
    "https://books.google.co.in/books?hl=en&lr=&id=3cFKDwAAQBAJ&oi=fnd&pg=PR3&dq=Pest+Management+in+Organic+Farming&ots=LsHEvtOeMG&sig=8_n7DdPf07rg4jo-xwXcOzxg1OM&redir_esc=y#v=onepage&q=Pest%20Management%20in%20Organic%20Farming&f=false",
    "https://www.sciencedirect.com/science/article/abs/pii/B9780323991452000033",
    "https://www.taylorfrancis.com/chapters/edit/10.1201/9781351114578-13/disease-pest-management-organic-farming-case-applied-agroecology-finckh-junge-schmidt-weedon-universit%C3%A4t-kassel-germany",
    "https://www.mdpi.com/2073-4395/8/4/48",
    "https://www.frontiersin.org/journals/agronomy/articles/10.3389/fagro.2021.680456/full",
    "https://www.mdpi.com/2071-1050/12/12/4859",
    "https://books.google.co.in/books?hl=en&lr=&id=KQkwCwAAQBAJ&oi=fnd&pg=PA106&dq=Soil+Health+and+Fertilization+Techniques&ots=zHM3br4FAR&sig=KZyHFhwV8kuqU0HMCD52qmcLWYg&redir_esc=y#v=onepage&q=Soil%20Health%20and%20Fertilization%20Techniques&f=false",
    "https://www.cambridge.org/core/journals/agricultural-and-resource-economics-review/article/factors-affecting-the-adoption-of-sustainable-agricultural-practices/E8666B6C3040F0859C2EEFE9EB5C6E05",
    "https://www.nature.com/articles/s41893-020-00617-y",
    "https://www.sciencedirect.com/science/article/abs/pii/S0048969720323482",
    "https://link.springer.com/article/10.1007/s11356-022-23635-z",
    "https://archives.joe.org/joe/1996december/a1.php",
    "https://www.sciencedirect.com/science/article/abs/pii/S0301479716308908",
    "https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=2ca80dfb4d19709246e14e00ed2e308162f76c67",
    "https://www.sciencedirect.com/science/article/abs/pii/B9780444636614000025",
    "https://www.mdpi.com/2073-4395/12/12/3008",
    "https://www.mdpi.com/2223-7747/8/2/34",
    "https://academic.oup.com/plphys/article-abstract/160/4/1686/6109554",
    "https://www.mdpi.com/2072-4292/14/9/1990",
    "https://www.sciencedirect.com/science/article/abs/pii/S0168169919306842",
    "https://www.nature.com/articles/s41598-021-97221-7",
    "https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2019.00621/full",
    "https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7c68a32212c1f86f535f4c1658ff68399d0a9ddd"
]

# Test the system
while True:
    query = input("\nEnter your crop disease question (or type 'exit'): ")
    if query.lower() == 'exit':
        break
    answer = rag_answer(query, urls)
    print(f"Answer: {answer}")

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


Enter your crop disease question (or type 'exit'):  As a farmer, how can I know the qualities of a good fertilizer?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/87 [00:00<?, ?it/s]

Answer: when applied as per the need of the field crops



Enter your crop disease question (or type 'exit'):  Apart from insecticide, what other method used to control bean weevils?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/87 [00:00<?, ?it/s]

Answer: botanical insecticides



Enter your crop disease question (or type 'exit'):  Can cassava seeds germinate if planted?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/87 [00:00<?, ?it/s]

Answer: Not a viable option of nutrient diversification after seed formation



Enter your crop disease question (or type 'exit'):  exit


### fine tune for crop and pest

In [7]:
import requests
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline, AutoTokenizer, AutoModelForQuestionAnswering
import torch

# 1. Load the same QA model used in Point 1 (fine-tuned or distilbert for demo)
model_name = "deepset/bert-base-cased-squad2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

# 2. Embedder for semantic similarity
embedder = SentenceTransformer('all-MiniLM-L6-v2')  # Lightweight and fast

# 3. Web Scraper
def extract_text_from_url(url):
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, "html.parser")
        paragraphs = soup.find_all("p")
        text = "\n".join(p.get_text() for p in paragraphs)
        return text
    except:
        return ""

# 4. Split text into passages (for retrieval)
def split_text(text, max_length=200):
    sentences = text.split('.')
    passages = []
    current = ""
    for sentence in sentences:
        if len(current) + len(sentence) < max_length:
            current += sentence + "."
        else:
            passages.append(current)
            current = sentence + "."
    if current:
        passages.append(current)
    return passages

# 5. Retrieve top-k relevant passages
def retrieve_relevant_passages(query, passages, top_k=3):
    query_embedding = embedder.encode(query, convert_to_tensor=True)
    passage_embeddings = embedder.encode(passages, convert_to_tensor=True)
    hits = util.semantic_search(query_embedding, passage_embeddings, top_k=top_k)[0]
    return [passages[hit['corpus_id']] for hit in hits]

# 6. RAG-style Answering
def rag_answer(user_query, urls):
    # Crawl and collect all passages
    all_passages = []
    for url in urls:
        text = extract_text_from_url(url)
        passages = split_text(text)
        all_passages.extend(passages)

    # Retrieve top relevant contexts
    relevant_passages = retrieve_relevant_passages(user_query, all_passages, top_k=3)
    context = "\n".join(relevant_passages)

    # Answer using QA pipeline
    result = qa_pipeline(question=user_query, context=context)
    return result["answer"]

# 7. Try with your links
urls = [
    "https://link.springer.com/article/10.1007/s10311-008-0147-0",
    "https://www.sciencedirect.com/science/article/pii/S2095311919626894",
    "https://www.indianjournals.com/ijor.aspx?target=ijor:ar&volume=36&issue=1&article=004",
    "https://www.indianjournals.com/ijor.aspx?target=ijor:ar&volume=35&issue=3&article=004",
    "https://link.springer.com/book/10.1007/978-981-10-4325-3",
    "https://books.google.co.in/books?hl=en&lr=&id=RhT8DwAAQBAJ&oi=fnd&pg=PA137&dq=Pest+Management+in+Organic+Farming&ots=a47-LNJcOR&sig=GiyMWVZ9NTDGKej4BA2iIfvQt68&redir_esc=y#v=onepage&q=Pest%20Management%20in%20Organic%20Farming&f=false",
    "https://www.annualreviews.org/content/journals/10.1146/annurev.ento.52.110405.091337",
    "https://books.google.co.in/books?hl=en&lr=&id=3cFKDwAAQBAJ&oi=fnd&pg=PR3&dq=Pest+Management+in+Organic+Farming&ots=LsHEvtOeMG&sig=8_n7DdPf07rg4jo-xwXcOzxg1OM&redir_esc=y#v=onepage&q=Pest%20Management%20in%20Organic%20Farming&f=false",
    "https://www.sciencedirect.com/science/article/abs/pii/B9780323991452000033",
    "https://www.taylorfrancis.com/chapters/edit/10.1201/9781351114578-13/disease-pest-management-organic-farming-case-applied-agroecology-finckh-junge-schmidt-weedon-universit%C3%A4t-kassel-germany",
]

# Test the system
while True:
    query = input("\nEnter your crop disease question (or type 'exit'): ")
    if query.lower() == 'exit':
        break
    answer = rag_answer(query, urls)
    print(f"Answer: {answer}")

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0



Enter your crop disease question (or type 'exit'):  Can cassava seeds germinate if planted?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/8 [00:00<?, ?it/s]

Answer: germination



Enter your crop disease question (or type 'exit'):  Apart from insecticide, what other method used to control bean weevils?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/8 [00:00<?, ?it/s]

Answer: Biological control of plant-parasitic nematodes.



Enter your crop disease question (or type 'exit'):  As a farmer, how can I know the qualities of a good fertilizer?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/8 [00:00<?, ?it/s]

Answer: in sufficient quantity for ever growing population



Enter your crop disease question (or type 'exit'):  As a farmer, how can I know the qualities of a good fertilizer?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/8 [00:00<?, ?it/s]

Answer: in sufficient quantity for ever growing population



Enter your crop disease question (or type 'exit'):  Give me 10 names of cassava?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/8 [00:00<?, ?it/s]

Answer: )



Enter your crop disease question (or type 'exit'):  How can I add manure into the Soil?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

Answer: dairy manure or ammonium nitrate applications



Enter your crop disease question (or type 'exit'):  How can I avoid red or orange color on my bean leaves?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

Answer: incorporation of organic matter in the substrate



Enter your crop disease question (or type 'exit'):  For how long cassava is ready to be harvested?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

Answer: stage of development and N form



Enter your crop disease question (or type 'exit'):  Can't spraying chemicals causes environmental pollution


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/7 [00:00<?, ?it/s]

Answer: complex chronic effects such as change in endocrine functions and immune systems.



Enter your crop disease question (or type 'exit'):  exit
