# Data Augmentation using Generative Adversarial Networks

We have used GAN to generate fake diseased images of plants. The architecture of model of GAN used here is CycleGAN, which takes healthy(datasetA) and diseased(datasetB) plant images and tries to convert them into each other as A -> B, B -> A.

LeafGAN is an implementation of CycleGAN, which tries to address some issues related to CycleGAN. One the major drawbacks of CycleGAN is its difficulties with the background of the plant images. The background of the image dataset we have used in this project is very consistent as they were taken in the lab environment. As a result, CycleGAN is not very efficient in creating varying backgrounds for the generated images.

The solution to this problem is using only the relevant part of the leafs using a image segmentation neural network, classifying it into 3 categories:
1. Full Leaf
2. Partial Leaf
3. Non Leaf

The main point to be noted here is of partial leaf, which just extracts plants leaf region and discards the other parts. This has a considerable effect on the output when compared against CycleGAN.

More on the LeafGAN paper can be found [here](https://arxiv.org/abs/2002.10100).

We get fake diseased plant images (and fake healthy as well) from the output of this model.

# Failures
The Aim of this project was to generate fake diseased plants images, which can be used for training of image classification models.

Our failure in this project was that, we tried to bite more than we could chew. We took a large dataset with varying plant diseases and species, which resulted in large number of classes when using the generated images with the image classifier.

For a small scope project like this, we could have taken a particular disease for a single species which would reduced the number of classes, the classifier had to work with.

As it is the case with every software project, we had changes in plan, and reduced the scope of project, to just classify the images into healthy and diseased(infected).

# Difficulties
One the major difficulties, we had to face was of hardware. As we are students and don't have access to high-end GPU's to train these beasts of models, we had to take help of google colab which disconnects after every 6 hours in the free tier.

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
!pip install -r /content/drive/MyDrive/AIES/LeafGAN/requirements.txt

In [None]:
!unzip /content/drive/MyDrive/AIES/data/images.zip -d /content/Test
!unzip /content/drive/MyDrive/AIES/data/train_mask.zip -d /content/Test

In [None]:
!mkdir m_dir
!cp -r /content/Test/trainA /content/m_dir
!cp -r /content/Test/trainB /content/m_dir

In [None]:
!wget -O "LFLSeg_resnet101.pth" "https://www.dropbox.com/scl/fi/h0t2dq5rtogxvp9ufglkj/LFLSeg_resnet101.pth?rlkey=noxfamgq5387y2hbvhjirrf7j&dl=0"

In [None]:
!python /content/drive/MyDrive/AIES/LeafGAN/prepare_mask.py --source /content/m_dir \
        -p /content/LFLSeg_resnet101.pth -i 256

In [None]:
!zip -0 -r /content/train_mask.zip /content/m_dir/

In [None]:
!unzip /content/drive/MyDrive/AIES/data/Test_LeafGAN.zip -d /content/drive/MyDrive/AIES/data/

In [None]:
!python /content/drive/MyDrive/AIES/LeafGAN/train.py --dataroot /content/Test \
        --name Test_LeafGAN --model leaf_gan --dataset_mode unaligned_masked  \
        --checkpoints_dir /content/drive/MyDrive/AIES/data/ --continue_train  \
        --epoch_count 100 --save_epoch_freq 1 --batch_size 6 --num_threads 64 \
        # --load_iter 16000

# Image Classification using CNN
We have classified the images into healthy and diseased(Infected) and are testing it in the end.

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
!unzip /content/drive/MyDrive/AIES/data/Dataset.zip

In [None]:
import os
import shutil
from sklearn.model_selection import train_test_split

# Define paths to your dataset folders
data_dir = '/content/Dataset'  # Path to the main dataset directory
train_dir = '/content/Train'   # Path to store the training data
test_dir = '/content/Test'     # Path to store the testing data

# Create train and test directories if they don't exist
os.makedirs(train_dir, exist_ok=True)
os.makedirs(test_dir, exist_ok=True)

# List all class folders in the dataset directory
classes = os.listdir(data_dir)

# Iterate through each class folder to split into train and test
for class_folder in classes:
    class_path = os.path.join(data_dir, class_folder)
    train_class_dir = os.path.join(train_dir, class_folder)
    test_class_dir = os.path.join(test_dir, class_folder)

    os.makedirs(train_class_dir, exist_ok=True)
    os.makedirs(test_class_dir, exist_ok=True)

    images = os.listdir(class_path)

    train_images, test_images = train_test_split(images, test_size=0.2, random_state=42)

    for img in train_images:
        src = os.path.join(class_path, img)
        dst = os.path.join(train_class_dir, img)
        shutil.copy(src, dst)

    for img in test_images:
        src = os.path.join(class_path, img)
        dst = os.path.join(test_class_dir, img)
        shutil.copy(src, dst)


In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Set up data generators
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    '/content/Train',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary'
)

# Build the CNN model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples/train_generator.batch_size,
    epochs=20,
)

# Evaluate the model
test_generator = test_datagen.flow_from_directory(
    '/content/Test',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary'
)

eval_result = model.evaluate(test_generator)
print("Test accuracy:", eval_result[1])

In [None]:
model.save('/content/drive/MyDrive/model.h5')

In [None]:
from google.colab import files
from IPython.display import Image
import ipywidgets as widgets
from io import BytesIO
from PIL import Image as PILImage
import numpy as np
import tensorflow as tf

model = tf.keras.models.load_model("/content/drive/MyDrive/AIES/data/model.h5")

# Function to preprocess the uploaded image
def preprocess_image(image_bytes):
    img = PILImage.open(BytesIO(image_bytes))
    img = img.resize((150, 150))
    img = np.array(img) / 255.0
    img = np.expand_dims(img, axis=0)
    return img

# Function to handle file upload and make predictions
def on_upload_change(change):
    if change['type'] == 'change' and change['name'] == 'value':
        for filename, file_info in change['new'].items():
            content = file_info['content']
            img = preprocess_image(content)
            prediction = model.predict(img)
            result = "Infected" if prediction >= 0.5 else "Healthy"

            # Display uploaded image and prediction
            display(Image(data=content, width=150, height=150))
            print(f"Prediction: {result}")

# Create a file upload widget
file_upload = widgets.FileUpload()

# Attach the function to the widget's event handler
file_upload.observe(on_upload_change)

# Display the upload widget
display(file_upload)
