# GEN - AI Assignment

**Group No:** 1  
**Members:** Anshul Yadav, Shruti Bajpayee, Saransh

---

##  Methedology

### 1. **LLM-Based FAQ System for Crop Diseases**
A specialized large language model (such as BERT) fine-tuned on agriculture-specific datasets was employed.
This model is designed to:

Respond to common questions concerning crop-related diseases.

Offer guidance informed by real-world agricultural scenarios.

Improve the availability of expert agricultural knowledge for both farmers and agronomists.

---

### 2. **Prompt Engineering for Agricultural Simulations**

Developed and applied prompt engineering techniques to produce intelligent agricultural responses, including:

* **Virtual field reports** featuring analysis of pest presence and crop health.
* **Predictions of pest outbreaks** derived from seasonal patterns and weather data.
* **Optimized irrigation recommendations** customized according to crop requirements, climatic conditions, and soil characteristics.


---


In [None]:
!pip install transformers datasets accelerate bitsandbytes torch
!pip install -U bitsandbytes

Collecting transformers
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting accelerate
  Downloading accelerate-1.6.0-py3-none-any.whl.metadata (19 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.42.0-py3-none-any.whl.metadata (9.9 kB)
Collecting huggingface-hub<1.0,>=0.30.0 (from transformers)
  Downloading huggingface_hub-0.31.2-py3-none-any.whl.metadata (13 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers)
  Downloading tokenizers-0.21.1-cp39-abi3-macosx_10_12_x86_64.whl.metadata (6.8 kB)
Collecting safetensors>=0.4.3 (from transformers)
  Downloading safetensors-0.5.3-cp38-abi3-macosx_10_12_x86_64.whl.metadata (3.8 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-20.0.0.tar.gz (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25h  Installing

  [31m   [0m copying pyarrow/src/arrow/python/ipc.h -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/iterators.h -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/numpy_convert.cc -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/numpy_convert.h -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/numpy_init.cc -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/numpy_init.h -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/numpy_internal.h -> build/lib.macosx-10.9-x86_64-cpython-39/pyarrow/src/arrow/python
  [31m   [0m copying pyarrow/src/arrow/python/numpy_interop.h -> build/lib.macosx-10

[0mCollecting bitsandbytes
  Using cached bitsandbytes-0.42.0-py3-none-any.whl.metadata (9.9 kB)
Using cached bitsandbytes-0.42.0-py3-none-any.whl (105.0 MB)
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.42.0


In [None]:
from IPython.display import display, Markdown
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from google.colab import userdata
from huggingface_hub import login

Login HF to access model.

In [None]:
from huggingface_hub import login
hf_token = login(token="HF-TOKEN")

In [None]:
model_id = "cropinailab/aksara_v1"
print(f"Attempting to load model: {model_id} WITH 4-bit quantization.")

Attempting to load model: cropinailab/aksara_v1 WITH 4-bit quantization.


Quantization, Model and Tokenizer loading

In [None]:
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
try:
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        quantization_config=quantization_config,
        device_map="auto",
        trust_remote_code=True,
        token=hf_token
    )
    tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)
    print(f"Model '{model_id}' and tokenizer loaded successfully (4-bit quantized).")

    if tokenizer.pad_token is None:
        if tokenizer.eos_token:
            tokenizer.pad_token = tokenizer.eos_token
            model.config.pad_token_id = tokenizer.eos_token_id
            print("Set pad_token to eos_token.")
        else:
            print("Warning: EOS token not found. Adding a new PAD token.")
            tokenizer.add_special_tokens({'pad_token': '[PAD]'})
            model.resize_token_embeddings(len(tokenizer))
            model.config.pad_token_id = tokenizer.pad_token_id
            print("Added a new pad_token '[PAD]'.")

except Exception as e:
    print(f"Error loading model or tokenizer: {e}")
    model = None
    tokenizer = None

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Error loading model or tokenizer: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/cropinailab/aksara_v1.
403 Client Error. (Request ID: Root=1-682326ba-1e653a665974f28b3a0b81b9;a97a5e82-7894-4c49-b6f0-2c28b5194f53)

Cannot access gated repo for url https://huggingface.co/cropinailab/aksara_v1/resolve/main/config.json.
Access to model cropinailab/aksara_v1 is restricted and you are not in the authorized list. Visit https://huggingface.co/cropinailab/aksara_v1 to ask for access.


Prompt enginerring and Generate response function

In [None]:
def get_prompt(farmer_query: str) -> str:
    template = """You are AgroFarmAI, a virtual assistant created to provide farmers with practical and agriculture-specific support.
     You can understand queries in both English and Hinglish (a mix of English and Hindi), though English is preferred for clarity.
      Your role is to assist farmers by generating field reports, forecasting pest risks, recommending smart irrigation strategies, and responding to a wide range of agriculture-related questions in a friendly and conversational manner.
"""
    return template.format(query=farmer_query)

def generate_response(query: str, max_length: int = 800, temperature: float = 0.6, top_p: float = 0.9):

    if not model or not tokenizer:
        return "Error: Model or tokenizer not loaded."
    prompt = get_prompt(query)

    eos_token_id_param = tokenizer.eos_token_id
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)

    try:
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=max_length,
                eos_token_id=eos_token_id_param,
                pad_token_id=tokenizer.pad_token_id,
                do_sample=True,
                temperature=temperature,
                top_p=top_p,
            )
        response_start_index = inputs.input_ids.shape[1]
        response_text = tokenizer.decode(outputs[0][response_start_index:], skip_special_tokens=True)
        return response_text.strip()

    except Exception as e:
        print(f"Error during model generation: {e}")
        if "CUDA out of memory" in str(e):
             print("CUDA out of memory during generation.")
             return "Error: Ran out of GPU memory during generation."
        return f"Error during generation: {e}"


Testing the workflow of the model in different prompt input scenarios.

In [None]:
if model and tokenizer:
    print("\n--- Running Sample Test Queries ---")

    query1 = "My tomato plants have leaves with brown spots and yellow rings around them. What could this be and what should I do?"
    response1 = generate_response(query1)
    print(f"\nQuery 1 (Pest ID - English): {query1}")
    print("AgroFarmAI 1:")
    display(Markdown(response1))
    print("-" * 30)

    query2 = "The soil around my bean plants is very watery. How should i change my irrigation ?"
    response2 = generate_response(query2)
    print(f"\nQuery 2 (Irrigation - English): {query2}")
    print("AgroFarmAI 2:")
    display(Markdown(response2))
    print("-" * 30)

    query3 = "Mere tamatar ke patton par chhote chhote holes hain aur green insects hain jo unhe kha rahe hain. Yeh kya hain?"
    response3 = generate_response(query3)
    print(f"\nQuery 3 (Pest ID - Hinglish): {query3}")
    print("AgroFarmAI 3:")
    display(Markdown(response3))
    print("-" * 30)

    query4 = "What steps should I take to control aphid infestation on my pepper plants over the next month? Please provide a detailed plan."
    response4 = generate_response(query4)
    print(f"\nQuery 4 (Plan Request): {query4}")
    print("AgroFarmAI 4:")
    display(Markdown(response4))
    print("-" * 30)

    query5 = "How much is wheat selling for in Punjab right now?"
    response5 = generate_response(query5)
    print(f"\nQuery 5 (Out of Scope): {query5}")
    print("AgroFarmAI 5:")
    display(Markdown(response5))
    print("-" * 30)

    query5 = "WHat is the prediction of the recession in the next 5 years ?"
    response5 = generate_response(query5)
    print(f"\nQuery 6 (Totally Out of Scope): {query5}")
    print("AgroFarmAI 6:")
    display(Markdown(response5))
    print("-" * 30)

else:
    print("Skipping tests as model or tokenizer failed to load.")


Skipping tests as model or tokenizer failed to load.


**U** I

In [None]:
import sys

print("\n--- Starting Interactive Mode ---")
print("Type 'quit' to exit.")
print("-" * 20)

while True:
    try:
        sys.stdout.flush()
        user_q = input("USER: ")

        if user_q.lower() == 'quit':
            break
        if not user_q.strip():
            continue

        if model and tokenizer:
            print("AgroFarmAI:")
            ai_r = generate_response(user_q)
            print(ai_r)

        else:
            print("AgroFarmAI: Model not available.")

        print("-" * 20) #end

    except KeyboardInterrupt:
        print("\nKeyboard interrupt detected. Exiting interactive mode.")
        break
    except Exception as e:
        print(f"\nAn error occurred: {e}")
        print("Restarting prompt...")
        print("-" * 20)


print("AgroFarmAI session ended.")


--- Starting Interactive Mode ---
Type 'quit' to exit.
--------------------
