# ICE 3: Computer Vision

Identifying the health of plants is a critical component of maximizing yield in agriculture. Oftentimes, ailments present in the leaves of a small number of plants and so effective early identification of these issues could help prevent the spread. In this assignment, we will train a model to predict the health of a plant based off of its leaves.

### Goal:
Utilizing the dataset at the following link, train a model to detect specific types of diseases (based off of leaf imagery) for one (or more) of the available fruits.

Dataset link: https://data.mendeley.com/datasets/tywbtsjrjv/1

Tasks:
1. Train a CNN from the ground up (i.e. no pretrained models) to predict the category of health status for one or more plants.
2. Leverage an appropriate pretrained model to predict the category of health status for one or more plants. Feel free to use any model of your choice (e.g. ResNet-50 with ImageNet weights)

In [1]:
#importing necessary libraries
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import tensorflow as tf

In [2]:
#unzipping file
!unzip -qq strawberry.zip

In [4]:
import os
import shutil
from random import shuffle

# Define your base directory where the images are stored
base_dir = '/content/strawberry'

# Define the target directories for train, test, and validation sets
train_dir = '/content/strawberry/train'
test_dir = '/content/strawberry/test'
val_dir = '/content/strawberry/val'

# Ensure target directories exist
for dir in [train_dir, test_dir, val_dir]:
    os.makedirs(dir, exist_ok=True)

# List all files in the base directory
all_files = os.listdir(base_dir)

# Classify files based on a pattern in their names
healthy_files = [f for f in all_files if 'healthy' in f]
leaf_scorch_files = [f for f in all_files if 'leaf_scorch' in f]

# Define a function to handle the shuffling, splitting, and organizing
def organize_files(files, category):
    shuffle(files)

    # Calculate split indices based on your chosen ratios
    train_end = int(len(files) * 0.7)
    val_end = train_end + int(len(files) * 0.15)

    # Split the files
    train_files = files[:train_end]
    val_files = files[train_end:val_end]
    test_files = files[val_end:]

    # Function to copy files to the target directory
    def copy_files(files, target_dir):
        os.makedirs(os.path.join(target_dir, category), exist_ok=True)
        for file in files:
            shutil.copy(os.path.join(base_dir, file),
                        os.path.join(target_dir, category, file))

    # Copy files to their respective directories
    copy_files(train_files, train_dir)
    copy_files(val_files, val_dir)
    copy_files(test_files, test_dir)

# Organize healthy and leaf_scorch files
organize_files(healthy_files, 'healthy')
organize_files(leaf_scorch_files, 'leaf_scorch')

print("Files have been organized into train, test, and validation sets.")


Files have been organized into train, test, and validation sets.


In [5]:
# CNN Architecture
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(256,256,3))
x = layers.Rescaling(1./255)(inputs)

x = layers.Conv2D(filters=32, kernel_size=3, activation='relu')(x)
x = layers.MaxPooling2D(pool_size=2)(x)

x = layers.Conv2D(filters=64, kernel_size=3, activation='relu')(x)
x = layers.MaxPooling2D(pool_size=2)(x)

x = layers.Conv2D(filters=128, kernel_size=3, activation='relu')(x)
x = layers.MaxPooling2D(pool_size=2)(x)

x = layers.Conv2D(filters=256, kernel_size=3, activation='relu')(x)
x = layers.MaxPooling2D(pool_size=2)(x)

x = layers.Flatten()(x)
outputs = layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

In [6]:
#model compilation
model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [7]:
from tensorflow.keras.utils import image_dataset_from_directory

#create train dataset
train_dataset = image_dataset_from_directory(
    '/content/strawberry/train',
    image_size = (256,256),
    batch_size=32
)

#create validation dataset
validation_dataset = image_dataset_from_directory(
    '/content/strawberry/val',
    image_size = (256,256),
    batch_size=32
)

#create test dataset
test_dataset = image_dataset_from_directory(
    '/content/strawberry/test',
    image_size = (256,256),
    batch_size=32
)

Found 1476 files belonging to 2 classes.
Found 316 files belonging to 2 classes.
Found 317 files belonging to 2 classes.


In [8]:
#training CNN from scratch
history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=10,
    batch_size=32
)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [9]:
#looking at test accuracy
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test Accuracy: {test_acc}")

Test Accuracy: 1.0


# Pretrained Model (Resnet)

In [10]:
#importing ResNet model with imagenet weights
from tensorflow.keras.applications import ResNet50

base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(256, 256, 3))
base_model.trainable = False  # Freeze the base model

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


In [11]:
from tensorflow.keras import layers, models

# Add new layers on top of the model
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(1024, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

In [12]:
#compiling model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [13]:
#using pretrained model
history = model.fit(
    train_dataset,
    validation_data=validation_dataset,
    epochs=10,  # Adjust the number of epochs as needed
    batch_size=32
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
#assessing test accuracy for pretrained model
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test Accuracy: {test_acc}")

Test Accuracy: 1.0
