<a href="https://colab.research.google.com/github/Suman-bot8927/AI-PlantDoc-Bot-Intelligent-Plant-Disease-Diagnosis/blob/main/Plant_ChatBot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



â–¶ **Day 1 Goals**

Environment Setup: Configure the workspace and dependencies.

Data Acquisition: Download the PlantVillage (Classification) and PlantDoc (Object Detection/Noise) datasets.

Data Verification: Validate directory structures and file integrity. : Exploratory Data Analysis (EDA): Analyze class distributions and visualize sample data.



In [None]:

#Import and Folder Creation
import os
#create project folder
base = "/content/PlantDocBot"
os.makedirs(os.path.join(base,"data","plantvillage"),exist_ok=True)
os.makedirs(os.path.join(base,"data","plantdoc"),exist_ok=True)
os.makedirs(os.path.join(base,"data","text_corpus"),exist_ok=True)

print("Folders created under",base)

Folders created under /content/PlantDocBot


In [2]:

#Download Dataset via git clone
!git clone https://github.com/spMohanty/plantvillage-Dataset.git "{base}/data/plantvillage"
!git clone https://github.com/pratikkayal/PlantDoc-Dataset.git "{base}/data/plantdoc"

Cloning into '/content/PlantDocBot/data/plantvillage'...
remote: Enumerating objects: 163235, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 163235 (delta 2), reused 1 (delta 0), pack-reused 163229 (from 1)[K
Receiving objects: 100% (163235/163235), 2.00 GiB | 30.71 MiB/s, done.
Resolving deltas: 100% (101/101), done.
Updating files: 100% (182401/182401), done.
Cloning into '/content/PlantDocBot/data/plantdoc'...
remote: Enumerating objects: 2670, done.[K
remote: Counting objects: 100% (35/35), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 2670 (delta 22), reused 22 (delta 22), pack-reused 2635 (from 1)[K
Receiving objects: 100% (2670/2670), 932.92 MiB | 36.51 MiB/s, done.
Resolving deltas: 100% (24/24), done.
Updating files: 100% (2581/2581), done.


In [3]:

# Verify Dataset directories and list top-level content
for sub in ["plantvillage","plantdoc"]:
  path = os.path.join(base,"data",sub)
  print("\nContents of", sub, ":")
  print(os.listdir(path)[:20])


Contents of plantvillage :
['run_all.sh', 'generate_data_segmented-50-50.sh', 'create_data_distribution.py', 'generate_data_grayscale-80-20.sh', 'slurm-476492.out', 'slurm-476490.out', 'slurm-476484.out', 'generate_data_for_SVM.py', 'generate_data_color-80-20.sh', '_generate_data.sh', 'slurm-476487.out', 'slurm-476486.out', 'slurm-476489.out', 'slurm-476481.out', 'slurm-476493.out', 'generated_for_paper', 'raw', 'slurm-476483.out', 'generate_data_color-20-80.sh', 'leaf_grouping']

Contents of plantdoc :
['LICENSE.txt', '.git', 'test', 'PlantDoc_Examples.png', 'train', 'README.md']


In [5]:
#Search for image directories inside PlantVillage--
import os
pv_base = os.path.join(base,"data", "plantvillage")
img_exts = ('.jpg','.jpeg','.bmp')

found_dirs = []
for root_dir, dirs, files in os.walk(pv_base):
  count=sum(1 for f in files if f.lower().endswith(img_exts))
  if count>0:
    found_dirs.append((root_dir,count))
if not found_dirs:
    print("No Image files found inside PlantVillage folder .")
else:
    print("Found image directories. Sample list(first 10):")
    for d,c in found_dirs[:10]:
      print(" ",d, "-",c,"images")
    #choose first as img_root
    img_root=found_dirs[0][0]
    print("\nUsing image root:",img_root)

Found image directories. Sample list(first 10):
  /content/PlantDocBot/data/plantvillage/raw/color/Grape___Black_rot - 1180 images
  /content/PlantDocBot/data/plantvillage/raw/color/Apple___Black_rot - 621 images
  /content/PlantDocBot/data/plantvillage/raw/color/Tomato___Spider_mites Two-spotted_spider_mite - 1676 images
  /content/PlantDocBot/data/plantvillage/raw/color/Corn_(maize)___Common_rust_ - 1192 images
  /content/PlantDocBot/data/plantvillage/raw/color/Tomato___Septoria_leaf_spot - 1771 images
  /content/PlantDocBot/data/plantvillage/raw/color/Pepper,_bell___Bacterial_spot - 997 images
  /content/PlantDocBot/data/plantvillage/raw/color/Strawberry___healthy - 456 images
  /content/PlantDocBot/data/plantvillage/raw/color/Tomato___Tomato_mosaic_virus - 373 images
  /content/PlantDocBot/data/plantvillage/raw/color/Orange___Haunglongbing_(Citrus_greening) - 5507 images
  /content/PlantDocBot/data/plantvillage/raw/color/Potato___Late_blight - 1000 images

Using image root: /conten

In [6]:
import matplotlib.pyplot as plt
from PIL import Image
import random
import os
import numpy as np

#use img_root from previous cell
if'img_root' in globals():
  sample_file = None
  for root_dir, dirs, files in os.walk(img_root):
    img_files = [f for f in files if f.lower().endswith(img_exts)]
    if img_files:
      sample_file =os.path.join(root_dir, random.choice(img_files))
      break
      if img_file:
        print("Displaying color image:",sample_file)
        img = Image.open(sample_file)

        #Check mode
        print("Original image mode:",img.mode)

        #Convert to true RGB if not already
        if img.mode != 'RGB':
          img = img.convert('RGB')
        #Use Numpy+matplotlib to ensure correct color display
        plt.figure(figsize=(6,6))
        plt.imshow(np.asarray(img))
        plt.axis('off')
        plt.show()
    else:
          print("No images found under img_root.")
  else:
          print("img_root not defined-previous detection failed.")

**Day 2 Goals**

Robust Image Visualization: Ensure images are correctly loaded in RGB format.

Dataset Mapping: Create a structured CSV file mapping every image path to its label (disease class). This is crucial for training custom models later.

In [7]:
#Build CSV Mapping image path <------
import pandas as pd
records = []
if 'img_root' in globals():
  for root_dir, dirs, files in os.walk(img_root):
    for f in files:
      if f.lower().endswith(img_exts):
        path = os.path.join(root_dir,f)
        #infer label:directory name relative to img_root
        rel = os.path.relpath(path,img_root)
        label = rel.split(os.sep)[0]  #first folder after img_root
        records.append({"image_path":path,"label":label})
df = pd.DataFrame(records)
print("Total images found:",len(df))
print("Sample rows:")
print(df.head())
out_csv=os.path.join(base,"data","image_data.csv")
df.to_csv(out_csv,index=False)
print("Saved mapping to",out_csv)

Total images found: 1180
Sample rows:
                                          image_path  \
0  /content/PlantDocBot/data/plantvillage/raw/col...   
1  /content/PlantDocBot/data/plantvillage/raw/col...   
2  /content/PlantDocBot/data/plantvillage/raw/col...   
3  /content/PlantDocBot/data/plantvillage/raw/col...   
4  /content/PlantDocBot/data/plantvillage/raw/col...   

                                               label  
0  36be3b48-82d7-4a96-8ef1-87355cfa2c59___FAM_B.R...  
1  a42262f5-5068-4640-85d3-a285fe0f5004___FAM_B.R...  
2  2b89af9b-90fc-406a-90af-7fbd25566267___FAM_B.R...  
3  ab0d0af3-5f10-4eb9-a043-9c7a3a696075___FAM_B.R...  
4  40f8f9da-fb45-4cde-b969-3ef1fdf8e713___FAM_B.R...  
Saved mapping to /content/PlantDocBot/data/image_data.csv
