# AI PlantDocBot: Intelligent Plant Disease Diagnosis

Project notebook for Day 1 & Day 2: environment setup, dataset download, data mapping, and sample image display.


## Objective
To develop an AI-powered chatbot that diagnoses plant diseases from uploaded leaf images or symptom text. This notebook contains Day 1 (environment & dataset) and Day 2 (data mapping & visualization) code blocks.


## Day 1 — Environment & Dataset Download
Create folders and download datasets (run shell git commands in a notebook environment such as Colab).


In [None]:
# Day 1 — Environment setup, folders and dataset download
import os
from pathlib import Path

# Project root (change if needed)
base = "/content/PlantDocBot"                # Colab/Notebook friendly default
Path(base).mkdir(parents=True, exist_ok=True)
os.makedirs(os.path.join(base, "data", "plantvillage"), exist_ok=True)
os.makedirs(os.path.join(base, "data", "plantdoc"), exist_ok=True)
os.makedirs(os.path.join(base, "data", "text_corpus"), exist_ok=True)

print("Folders created under", base)

# --- Download dataset via git clone (run in a notebook cell that supports shell commands)


#Download Dataset via git clone
!git clone https://github.com/spMohanty/plantvillage-Dataset.git " { base}/data/plantvillage"
!git clone https://github.com/pratikkayal/PlantDoc-Dataset.git "{base}/data/plantdoc"

# Verify Dataset directories and list top-level content
for sub in ["plantvillage", "plantdoc"]:
    path = os.path.join(base, "data", sub)
    print("\nContents of", sub, ":")
    try:
        print(os.listdir(path)[:20])
    except FileNotFoundError:
        print("Directory not found:", path)


## Day 1 — Quick scan to find image root
Identify image directories and set `img_root` for mapping.


In [None]:
import os

pv_base = os.path.join(base, "data", "plantvillage")
img_exts = ('.jpg', '.jpeg', '.png', '.bmp')
found_dirs = []

for root_dir, dirs, files in os.walk(pv_base):
    count = sum(1 for f in files if f.lower().endswith(img_exts))
    if count > 0:
        found_dirs.append((root_dir, count))

if not found_dirs:
    print("No image files found inside PlantVillage folder.")
else:
    print("Found image directories. Sample list (first 10):")
    for d, c in found_dirs[:10]:
        print(" ", d, "-", c, "images")
    # set img_root to pv_base (common dataset structure uses class folders under pv_base)
    img_root = pv_base
    print("\nUsing image root:", img_root)


## Day 2 — Build CSV mapping (image_path -> label)
This cell collects every image under `img_root`, infers the label from the first subfolder, and saves `image_data.csv`.


In [None]:
# Day 2 — Build CSV mapping image_path -> label
import os
import pandas as pd

img_exts = ('.jpg', '.jpeg', '.png', '.bmp')

records = []

if 'img_root' not in globals():
    raise RuntimeError("img_root is not defined. Please run the dataset discovery cell first.")

for root_dir, dirs, files in os.walk(img_root):
    for f in files:
        if f.lower().endswith(img_exts):
            path = os.path.join(root_dir, f)
            # infer label: first folder after img_root
            rel = os.path.relpath(path, img_root)        # e.g. "Apple___Black_rot/IMG_1.jpg"
            label = rel.split(os.sep)[0]                 # "Apple___Black_rot"
            records.append({"image_path": path, "label": label})

# create DataFrame and save once (outside loops)
df = pd.DataFrame(records)
print("Total images found:", len(df))
print("Sample rows:")
print(df.head())

out_csv = os.path.join(base, "data", "image_data.csv")
os.makedirs(os.path.dirname(out_csv), exist_ok=True)
df.to_csv(out_csv, index=False)
print("Saved mapping to", out_csv)


## Day 2 — Display a random RGB image
Convert to RGB if needed and display with matplotlib for correct colors.


In [None]:
import random
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

if 'img_root' not in globals():
    raise RuntimeError("img_root not defined. Run the earlier discovery cell.")

# collect all image files
all_files = []
for root_dir, dirs, files in os.walk(img_root):
    for f in files:
        if f.lower().endswith(img_exts):
            all_files.append(os.path.join(root_dir, f))

if not all_files:
    print("No images found under img_root.")
else:
    sample_file = random.choice(all_files)
    print("Displaying color image:", sample_file)
    img = Image.open(sample_file)
    print("Original image mode:", img.mode)
    if img.mode != 'RGB':
        img = img.convert('RGB')
    plt.figure(figsize=(6,6))
    plt.imshow(np.asarray(img))
    plt.axis('off')
    plt.show()


### Notes
- Do not commit large dataset files to the GitHub repo. Keep `data/` in `.gitignore` and provide cloning instructions in README.
- In Colab, uncomment the `!git clone` lines to download datasets directly into the notebook environment.
