# DS510 Team Project
DS510 Artificial Intelligence for Data Science \
Term: Summer 2025 \
Team: Team XX \
Authors: Hiromi Cota, David Hiltzman, Joseph Tran \
Emails: cotahiromi@cityuniversity.edu, hiltzmandavid@cityuniversity.edu, trantung@cityuniversity.edu \

## Task: 
First, find an applicable area where an AI algorithm can be applied (e.g., weather prediction). Once the project's goal is set, the models must be developed and tested on different datasets. There are various publicly available datasets; find one with data that suits your project. Finding publicly available data that can be used for the project is a crucial step in getting the project done appropriately. You are encouraged to look at Kaggle   to see available datasets to give you some ideas for selecting the team project topic. Please have one team member send the instructor information on the team project topic for confirmation to get started on the project and the project proposal. 

In [None]:
# =======================
# 1. Imports
# =======================
import kagglehub
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from PIL import Image
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
# =======================
# 2. Download Dataset
# =======================
PATH = kagglehub.dataset_download("abdallahalidev/plantvillage-dataset")
print("Path to dataset files:", PATH)

# Dataset directory (adjust depending on extraction structure)
extract_dir = PATH  # or PATH + "/PlantVillage" if nested

In [None]:
# =======================
# 3. Parameters
# =======================
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
VAL_SPLIT = 0.2
EPOCHS = 10

In [None]:
# =======================
# 4. Data Generators
# =======================
datagen = ImageDataGenerator(
    rescale=1.0/255,
    validation_split=VAL_SPLIT,
    rotation_range=20,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

train_gen = datagen.flow_from_directory(
    extract_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='training'
)

val_gen = datagen.flow_from_directory(
    extract_dir,
    target_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='validation'
)

In [None]:
# =======================
# 5. Save Class Labels
# =======================
labels = list(train_gen.class_indices.keys())
pd.Series(labels).to_csv("labels.txt", index=False, header=False)
print(f"Saved {len(labels)} labels to labels.txt")

In [None]:
# =======================
# 6. Build Model (MobileNetV2)
# =======================
base_model = tf.keras.applications.MobileNetV2(
    input_shape=IMG_SIZE + (3,),
    include_top=False,
    weights='imagenet'
)
base_model.trainable = False  # Freeze base layers

model = tf.keras.Sequential([
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(len(labels), activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

In [None]:
# =======================
# 7. Train Model
# =======================
history = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=EPOCHS
)

In [None]:
# =======================
# 8. Save Model
# =======================
model.save("plant_disease_model.h5")
print("Model and class names saved!")

In [None]:
# =======================
# 9. Plot Accuracy/Loss
# =======================
plt.figure(figsize=(10, 4))

# Accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label="Train Acc")
plt.plot(history.history['val_accuracy'], label="Val Acc")
plt.title("Accuracy")
plt.legend()

# Loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label="Train Loss")
plt.plot(history.history['val_loss'], label="Val Loss")
plt.title("Loss")
plt.legend()

plt.show()