Build a simple AI chatbot that helps farmers diagnose crop issues using two intelligence

levels:
- Level 1: Answer questions using a knowledge base (like a smart FAQ)
- Level 2: Analyze sensor data to find exact problems (data-driven diagnosis)

The Problem
A farmer asks: “Why is my lettuce yellowing?”
- Level 1 (no sensors): Give general possible causes
- Level 2  (with sensors): Check actual temperature, EC, pH data and tell the exact
cause

AURA CHATBOT 

1. SIMPLE CHAT UI

- Text input box
- Message display
- "Connect Sensors" button

2. BACKEND API

- Route to Level 1 or Level 2
- Search knowledge base
- Check sensor data
- Generate response


In [54]:
import os, sys, json
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from dotenv import load_dotenv
from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import PromptTemplate
from langchain_groq import ChatGroq
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

load_dotenv(".env.local")

True

In [10]:
embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"
vector_store = Chroma(
    embedding_function=HuggingFaceEmbeddings(model_name=embedding_model_name),
)

In [12]:
GROQ_API_KEY = os.getenv("GROQ_API_KEY", "")

if GROQ_API_KEY == "":
    print("API Key not registered")
else:
    print("API key found!")

API key found!


In [None]:
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    api_key=GROQ_API_KEY, # type: ignore
)

In [14]:
llm.invoke("hi")

AIMessage(content='How can I assist you today?', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 36, 'total_tokens': 44, 'completion_time': 0.00657203, 'completion_tokens_details': None, 'prompt_time': 0.001681107, 'prompt_tokens_details': None, 'queue_time': 0.055284823, 'total_time': 0.008253137}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_1151d4f23c', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b876d-631d-7ed3-92e2-9b343cd715d3-0', tool_calls=[], invalid_tool_calls=[], usage_metadata={'input_tokens': 36, 'output_tokens': 8, 'total_tokens': 44})

In [15]:
agri_knowledgebase_df = pd.read_json("./agricultural_knowledge_base.json")
agri_knowledgebase_df.head()

Unnamed: 0,id,category,question,answer,tags,optimal_conditions,related_sensor_checks,related_info
0,kb_001,lettuce_growth_issues,Why is my lettuce growing slowly?,Lettuce growth can slow due to several factors...,"[lettuce, growth, speed, troubleshooting, slow...","{'temperature': '18-22°C', 'ec': '1.2-1.6 mS/c...","[temperature, ec, ph, ppfd]",
1,kb_002,lettuce_yellowing,What causes lettuce leaves to turn yellow?,Yellowing in lettuce (chlorosis) typically ind...,"[lettuce, yellowing, chlorosis, nutrients, def...","{'ec': '1.2-1.6 mS/cm', 'ph': '5.5-6.5'}","[ec, ph]",
2,kb_003,lettuce_tipburn,What is tip burn in lettuce and how do I preve...,Tip burn is when leaf edges turn brown and cri...,"[lettuce, tipburn, calcium, humidity, quality]","{'humidity': '60-70%', 'ec': '1.2-1.6 mS/cm', ...","[humidity, ec, temperature]",
3,kb_004,temperature_management,What happens if my grow room temperature gets ...,High temperatures (>25°C for lettuce) cause mu...,"[temperature, heat, stress, lettuce, climate]","{'temperature': '18-22°C', 'max_temperature': ...","[temperature, humidity]",
4,kb_005,temperature_management,What happens if temperature drops too low in m...,Low temperatures (<15°C for lettuce) significa...,"[temperature, cold, frost, heater, climate]","{'temperature': '18-22°C', 'min_temperature': ...",[temperature],


> The knowledgebase even though has answers to the questions but i think i must add more information related to the tags and optimal conditions?

In [None]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=200,
)

In [21]:
agri_knowledgebase_df["combined_text"] = agri_knowledgebase_df.apply(
    lambda row: f"{row['question']}\n\n{row['answer']}", axis=1
)

In [24]:
agri_knowledgebase_df.columns

Index(['id', 'category', 'question', 'answer', 'tags', 'optimal_conditions',
       'related_sensor_checks', 'related_info', 'combined_text'],
      dtype='object')

In [29]:
documents = []

for _, row in agri_knowledgebase_df.iterrows():
    metadata = {
        "id": row["id"],
        "category": row["category"],
        "tags": (
            ", ".join(row["tags"]) if isinstance(row["tags"], list) else row["tags"]
        ),
        "related_sensor_checks": (
            ", ".join(row["related_sensor_checks"])
            if isinstance(row.get("related_sensor_checks"), list)
            else None
        ),
    }

    documents.append(
        Document(
            page_content=row["combined_text"],
            metadata=metadata,
        )
    )

In [30]:
documents[0]

Document(metadata={'id': 'kb_001', 'category': 'lettuce_growth_issues', 'tags': 'lettuce, growth, speed, troubleshooting, slow growth', 'related_sensor_checks': 'temperature, ec, ph, ppfd'}, page_content='Why is my lettuce growing slowly?\n\nLettuce growth can slow due to several factors: 1) Temperature outside optimal range (18-22°C) - lettuce metabolism significantly reduces below 15°C or above 25°C. 2) Insufficient light (PPFD below 200 µmol/m²/s) limits photosynthesis. 3) Low EC (below 1.2 mS/cm) means insufficient nutrients for growth. 4) pH imbalance (outside 5.5-6.5 range) affects nutrient uptake. 5) Root zone issues like low oxygen or disease. 6) Plant spacing too dense causing competition.')

In [31]:
vector_store.add_documents(documents)

['95a0bd87-f35b-4ac3-a627-1dada25982f8',
 '03d73494-fe3d-4bb7-85ad-275f46b5635b',
 '3458b6df-4fed-42bf-a06e-a3bfbbd5ef0f',
 'b60c3dc7-5130-4624-809a-6e805827e5da',
 '5054f56e-165b-4b67-a106-0910bb54bd2f',
 '3c41ad5b-e320-41ea-b21b-e40a9a35f932',
 'd71337dc-f3d8-4623-a73a-a4e04121a130',
 '8fba0d21-c780-4c7c-bbae-303ee89c804c',
 'a4640195-caa7-47f9-b95b-59600c2dbf0e',
 'af7e8d1e-e3d4-485e-807a-2c685b5c634d',
 '54984607-8902-4ca1-948a-2495cb672fee',
 'd117e93e-e876-4cae-944c-7c7fe6c94dcb',
 '1f792b0b-7f1f-4d7e-9ca9-ea0ac9617b59',
 '3ca78122-8a72-4679-926b-3116aeeec00d',
 'd80b70b5-f372-474a-9c85-6c3ecdf8ef25',
 'a65c399b-c225-480d-aa85-30cb4f7b29d6',
 'b19892f1-ebda-4675-b0ec-086b3e2d9216',
 '2b3891a2-7df5-432a-9366-713fd76d60c2',
 '4d9d5849-5932-4b6a-be0b-cb37d02267a3',
 '5ad129ae-68f2-464c-8cd0-919263f8fe18',
 '1ab11b78-9226-4f17-94e9-eb36cba0f71a',
 'fb9289ca-4d9e-4f7b-b651-18c45e40cb34',
 'ac8ce304-9523-449d-b453-b90077bea20b',
 '6673763e-192a-43ae-86e6-3dd3b323927e',
 'c8bab6ba-504c-

In [32]:
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 5,
    },
)

In [76]:
from typing import Dict, Any
import json


class FarmContextBuilder:
    def __init__(self, data_path: str) -> None:
        with open(data_path, "r") as fd:
            self.FARM_DATA: Dict[str, Any] = json.load(fd)

    def get_farm(self, farm_id: str) -> Dict[str, Any]:
        farms = self.FARM_DATA.get("farms", [])

        farm = next((f for f in farms if f.get("farmId") == farm_id), None)

        if not farm:
            raise ValueError(f"Farm with id '{farm_id}' not found")

        return farm

    def build_context(self, farm_id: str) -> str:
        farm = self.get_farm(farm_id)
        return self._format_farm_data(farm)

    def _format_farm_data(self, farm: dict) -> str:
        sensors = farm.get("sensors", {})

        return f"""
Farm ID: {farm.get('farmId')}
Crop: {farm.get('currentCrop')} ({farm.get('variety')})
Growth Stage: {farm.get('growthStage')}
Days After Transplant: {farm.get('daysAfterTransplant')}

Sensor Readings:
- Temperature: {sensors.get('temperature')} °C
- Humidity: {sensors.get('humidity')} %
- EC: {sensors.get('ec')} mS/cm
- pH: {sensors.get('ph')}
- Light (PPFD): {sensors.get('ppfd')}
- CO₂: {sensors.get('co2')} ppm
- Water Temp: {sensors.get('waterTemp')} °C

Notes:
{farm.get('notes')}
""".strip()

In [77]:
farm_ctx_builder  = FarmContextBuilder("./sample_farm_data.json")

In [78]:
farm_ctx_builder.build_context("farm_101")

'Farm ID: farm_101\nCrop: lettuce (Lollo Bionda)\nGrowth Stage: week_3\nDays After Transplant: 18\n\nSensor Readings:\n- Temperature: 20.5 °C\n- Humidity: 65 %\n- EC: 1.4 mS/cm\n- pH: 6.0\n- Light (PPFD): 250\n- CO₂: 850 ppm\n- Water Temp: 21.0 °C\n\nNotes:\nAll parameters optimal. Growth rate 20% above benchmark.'

In [79]:
class DiagnosticRuleEngine:
    def __init__(self, rules_path: str):
        with open(rules_path, "r") as f:
            self.rules = json.load(f)["diagnostic_rules"]

    def _condition_met(self, condition, value, threshold):
        if condition == "below_min":
            return value < threshold
        if condition == "above_max":
            return value > threshold
        if condition == "outside_range":
            return value < threshold["min"] or value > threshold["max"]
        return False

    def evaluate(self, crop: str, symptom: str, sensors: dict) -> list[dict]:
        crop_rules = self.rules.get(crop, {})
        issue_rules = crop_rules.get("issues", {}).get(symptom, {})
        checks = issue_rules.get("diagnostic_checks", [])

        findings = []

        for check in checks:
            param = check["parameter"]
            value = sensors.get(param)

            if value is None:
                continue

            condition = check["condition"]
            threshold = check["threshold"]

            if self._condition_met(condition, value, threshold):
                findings.append(
                    {
                        "parameter": param,
                        "value": value,
                        "severity": check["severity"],
                        "cause": check["cause"],
                        "explanation": check["explanation"],
                        "impact": check["impact"],
                        "action": check["action"],
                        "recovery_time": check["recovery_time"],
                    }
                )

        severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
        findings.sort(key=lambda x: severity_order.get(x["severity"], 99))

        return findings[:3]

In [80]:
def format_rule_diagnosis(findings: list[dict]) -> str:
    if not findings:
        return "No critical issues detected by diagnostic rules."

    text = "RULE-BASED DIAGNOSIS:\n"

    for idx, f in enumerate(findings, 1):
        text += f"""
{idx}. Cause: {f['cause']}
   - Severity: {f['severity']}
   - Sensor: {f['parameter']} = {f['value']}
   - Explanation: {f['explanation']}
   - Impact: {f['impact']}
   - Recommended Action: {f['action']}
   - Recovery Time: {f['recovery_time']}
"""
    return text.strip()

In [81]:
rule_engine = DiagnosticRuleEngine("./diagnostic_rules.json")

In [90]:
BASE_SYMPTOMS = [
    "yellowing",
    "slow_growth",
    "tipburn",
    "elongation",
]

SYMPTOM_DETECTION_PROMPT = """
You are a classifier.

Your task is to map the farmer's question to ONE canonical symptom.

Allowed symptoms:
- yellowing
- slow_growth
- tipburn
- elongation

Rules:
- Return ONLY the symptom name.
- Do NOT explain.
- Do NOT add extra text.
- If unsure, return slow_growth.

Farmer question:
{question}
"""

symptom_prompt = PromptTemplate(
    input_variables=["question"],
    template=SYMPTOM_DETECTION_PROMPT,
)

symptom_chain = {"question": RunnablePassthrough()} | symptom_prompt | llm


def detect_symptom_llm(question: str) -> str:
    response = symptom_chain.invoke({"question": question})
    symptom = response.content.strip().lower() # type: ignore

    if symptom not in BASE_SYMPTOMS:
        return "slow_growth"  # safe fallback

    return symptom

In [89]:
detect_symptom_llm("Why is my lettuce growing slowly?")

'slow_growth'

In [None]:
LEVEL_1_PROMPT_TEMPLATE = """
You are an expert agronomist AI.

You are answering a farmer's question using ONLY the agricultural knowledge base.
No live sensor data is available.

-------------------------------
KNOWLEDGE BASE CONTEXT:
{kb_context}

-------------------------------
FARMER QUESTION:
{question}

-------------------------------
INSTRUCTIONS:
- Provide general possible causes.
- Explain in simple, farmer-friendly language.
- Do NOT assume exact sensor values.
- Mention common factors like temperature, nutrients, light, and pH.
- Keep the answer practical and concise.

-------------------------------
RESPONSE FORMAT:
- Possible causes
- What the farmer should check
"""

LEVEL_2_PROMPT_TEMPLATE = """
You are an expert agronomist AI assisting farmers with crop diagnostics.

You are given THREE sources of information:

1. AGRICULTURAL KNOWLEDGE BASE  
This contains general best practices, common problems, and explanations.

2. LIVE FARM SENSOR DATA  
This contains the actual, real-time conditions of the farmer's crop.
This data is authoritative and MUST be prioritized over general knowledge.

3. RULE-BASED DIAGNOSTIC OUTPUT  
This contains deterministic diagnoses derived from agronomy rules.
These rules represent expert-level ground truth and MUST be trusted.

--------------------------------
KNOWLEDGE BASE CONTEXT:
{kb_context}

--------------------------------
FARM SENSOR CONTEXT:
{farm_context}

--------------------------------
RULE-BASED DIAGNOSIS:
{rule_context}

--------------------------------
FARMER QUESTION:
{question}

--------------------------------
NSTRUCTIONS:
- First, review the rule-based diagnosis.
- If rules identify one or more issues, use them as the primary diagnosis.
- Support the diagnosis using exact sensor values.
- Use the knowledge base only for explanation and best practices.
- Do NOT invent sensor values or causes not supported by data.
- Be clear, practical, and farmer-friendly.
- Suggest concrete corrective actions with expected recovery time.

--------------------------------
RESPONSE FORMAT:
- Diagnosis (2-3 sentences)
- Key evidence (rule + sensor values)
- Recommended action(s)
"""


In [95]:
level_1_prompt = PromptTemplate(
    input_variables=["kb_context", "question"],
    template=LEVEL_1_PROMPT_TEMPLATE,
)

level_2_prompt = PromptTemplate(
    input_variables=["kb_context", "farm_context", "rule_context", "question"],
    template=LEVEL_2_PROMPT_TEMPLATE,
)

In [65]:
def retrieve_kb_level_1(inputs):
    docs = retriever.invoke(inputs["question"])
    return "\n\n".join(doc.page_content for doc in docs)


def retrieve_kb_level_2(inputs):
    docs = retriever.invoke(inputs["question"])
    return "\n\n".join(doc.page_content for doc in docs)

In [66]:
def build_farm_context(inputs):
    farm_id = inputs.get("farm_id")
    if farm_id:
        return farm_ctx_builder.build_context(farm_id)
    return "No farm sensor data connected."

In [92]:
def build_rule_context(inputs):
    farm_id = inputs.get("farm_id")
    question = inputs.get("question")

    if not farm_id:
        return "No rule-based diagnosis available."

    farm = farm_ctx_builder.get_farm(farm_id)

    detected_symptom = detect_symptom_llm(question)

    rule_findings = rule_engine.evaluate(
        crop=farm["currentCrop"],
        symptom=detected_symptom,
        sensors=farm["sensors"],
    )

    return format_rule_diagnosis(rule_findings)

In [96]:
level_1_chain = (
    {
        "kb_context": RunnableLambda(retrieve_kb_level_1),
        "question": RunnablePassthrough(),
    }
    | level_1_prompt
    | llm
)


level_2_chain = (
    {
        "kb_context": RunnableLambda(retrieve_kb_level_2),
        "farm_context": RunnableLambda(build_farm_context),
         "rule_context": RunnableLambda(build_rule_context),
        "question": RunnablePassthrough(),
    }
    | level_2_prompt
    | llm
)

In [69]:
response = level_1_chain.invoke(
    {
        "question": "Why is my lettuce yellowing?",
    }
)

response.content

"**Possible causes:**\nYour lettuce is yellowing due to one or more of the following:\n\n1. **Nitrogen deficiency**: Lettuce needs nitrogen to stay healthy and green.\n2. **Iron deficiency**: Iron is another essential nutrient that helps lettuce grow.\n3. **Overwatering**: Too much water can cause roots to rot, making it hard for lettuce to absorb nutrients.\n4. **Temperature fluctuations**: Lettuce prefers temperatures between 18-22°C.\n5. **Light stress**: Too much or too little light can cause stress, leading to yellowing.\n6. **Natural aging**: Lower leaves can naturally turn yellow as the plant matures.\n\n**What to check:**\n- Check the nutrient levels in your solution, especially nitrogen and iron.\n- Ensure consistent watering and avoid overwatering.\n- Verify the temperature range in your grow room.\n- Adjust lighting to prevent stress.\n- Inspect the plant's overall health for signs of aging or disease."

In [97]:
response = level_2_chain.invoke(
    {
        "question": "Why is my lettuce yellowing?",
        "farm_id": "farm_102",
    }
)

response.content

"**Diagnosis**\n\nYour lettuce is yellowing due to a nitrogen deficiency. This is confirmed by the rule-based diagnosis and supported by the sensor values showing an EC of 0.8 mS/cm, which is below the optimal range of 1.0 mS/cm.\n\n**Key Evidence**\n\n* Rule: Nitrogen deficiency (Primary)\n* Sensor values: EC = 0.8 mS/cm\n* EC has been dropping over the past 5 days, indicating a possible issue with nutrient uptake.\n\n**Recommended Action**\n\nIncrease the EC to 1.4 mS/cm immediately using a balanced fertilizer that contains adequate nitrogen. Verify the nitrogen content in the fertilizer mix and ensure proper A+B ratio mixing.\n\n**Expected Recovery Time**\n\n3-5 days - new growth should be green, old yellow leaves won't recover.\n\n**Additional Notes**\n\n* The temperature and humidity levels are within the optimal range for lettuce growth.\n* The pH level is slightly above the optimal range, but it's not causing the yellowing issue.\n* The light and CO₂ levels are sufficient for le