# DATA ANALYTICS CAPSTONE PROJECT

## Integrating Multi-Source Data and Image Identification for Automated Crop Disease Diagnosis using Deep Learning
### Objectives
* Develop a deep learning model using multi-source data for accurate and efficient crop disease diagnosis. 
* Investigate the effectiveness of different NoSQL databases for storing agricultural data. 
* Explore different deep-learning paradigms to enhance the performance of crop disease diagnosis models.

### ABOUT DATA
Data used is on Mango crop diseases ("Anthracnose disease", "Bacterial Canker", "Cutting Weevil", "Die Back", "Gall Midge", "Healthy", "Powdery Mildew", "Sooty Mould")

## (A.) RELEVANT LIBRARIES FOR DATA LOADING AND EXPLORATION

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.model_selection import train_test_split
import random
import cv2

## (B.) LOADING IMAGE DATA FROM DATA DIRECTORY

In [3]:
data_dir = r"C:\Users\HP\Desktop\DATA ANALYTICS CAPSTONE PROJECT\DATA-ANALYTICS-CAPSTONE-PROJECT\MangoLeafBD Dataset"
subfolder_names = ["Anthracnose", "Bacterial Canker", "Cutting Weevil", "Die Back", "Gall Midge", "Healthy",
                  "Powdery Mildew", "Sooty Mould"]

image_data = []

for subfolder_name in subfolder_names:
    folder_path = os.path.join(data_dir, subfolder_name)
    image_names = os.listdir(folder_path)
    image_count = len(image_names)
    image_info = {"Category": subfolder_name, "Image Count": image_count, "Dimensions": []}
    
    
    random.seed(42)  
    selected_images = random.sample(image_names, 5)
    
    
    for i, image_name in enumerate(selected_images, 1):
        image_path = os.path.join(folder_path, image_name)
        image = cv2.imread(image_path)
        height, width, channels = image.shape
        image_info["Dimensions"].append((height, width, channels))
        print(f"Image {i}: Height={height}, Width={width}, Channels={channels}")
    
    
    image_data.append(image_info)
    print()
df_image_info = pd.DataFrame(image_data)

display(df_image_info)

Image 1: Height=320, Width=240, Channels=3
Image 2: Height=240, Width=240, Channels=3
Image 3: Height=320, Width=240, Channels=3
Image 4: Height=240, Width=320, Channels=3
Image 5: Height=240, Width=240, Channels=3

Image 1: Height=320, Width=240, Channels=3
Image 2: Height=320, Width=240, Channels=3
Image 3: Height=240, Width=320, Channels=3
Image 4: Height=320, Width=240, Channels=3
Image 5: Height=240, Width=320, Channels=3

Image 1: Height=240, Width=240, Channels=3
Image 2: Height=240, Width=240, Channels=3
Image 3: Height=240, Width=240, Channels=3
Image 4: Height=240, Width=240, Channels=3
Image 5: Height=240, Width=240, Channels=3

Image 1: Height=240, Width=240, Channels=3
Image 2: Height=240, Width=240, Channels=3
Image 3: Height=240, Width=240, Channels=3
Image 4: Height=240, Width=240, Channels=3
Image 5: Height=240, Width=240, Channels=3

Image 1: Height=240, Width=320, Channels=3
Image 2: Height=320, Width=240, Channels=3
Image 3: Height=240, Width=320, Channels=3
Image 4

Unnamed: 0,Category,Image Count,Dimensions
0,Anthracnose,500,"[(320, 240, 3), (240, 240, 3), (320, 240, 3), ..."
1,Bacterial Canker,500,"[(320, 240, 3), (320, 240, 3), (240, 320, 3), ..."
2,Cutting Weevil,500,"[(240, 240, 3), (240, 240, 3), (240, 240, 3), ..."
3,Die Back,500,"[(240, 240, 3), (240, 240, 3), (240, 240, 3), ..."
4,Gall Midge,500,"[(240, 320, 3), (320, 240, 3), (240, 320, 3), ..."
5,Healthy,500,"[(240, 320, 3), (240, 320, 3), (240, 320, 3), ..."
6,Powdery Mildew,500,"[(320, 240, 3), (240, 240, 3), (240, 240, 3), ..."
7,Sooty Mould,500,"[(320, 240, 3), (240, 320, 3), (320, 240, 3), ..."


In [None]:
# Function to Display the first 20 images & Get the count of images
def display_images(image_paths):
    plt.figure(figsize=(12, 8))
    for i, image_path in enumerate(image_paths[:20]):
        image = Image.open(image_path)
        plt.subplot(4, 5, i + 1)  
        plt.imshow(image)
        plt.title(f"Image {i+1}")
        plt.axis("off")
    plt.show()

display_images(df_anthracnose["image_path"][:20])  

num_images = len(df_anthracnose)
print(f"Number of images in 'Anthracnose disease' folder: {num_images}")