 **Vine Leaf Sickness Identification Model**





In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
import datetime
import os
import os.path
import random

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # or any {'0', '1', '2'}

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
from matplotlib import pyplot


We import tensor flow and different package which will be use later on.

In [None]:
batch_size = 32
TEST_PATH = ""
VAL_PATH = ""
TRAIN_PATH = ""
IMAGE_SIZE = 200
BATCH_SIZE = 32

We set up some variable for the next steps.

In [None]:
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = (1./255),
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True,
                                   rotation_range=90, 
                                   brightness_range=[0.2,1.0]
                                   )


test_datagen = ImageDataGenerator(rescale = 1./255)

We use ImageDataGenertator to create a data generator in whoch we will feed our dataset. Using this method allow us to generate a large quantities of datas thanks to data augmentation. We hence modifie the Images in various and random ways: first we rescale the value of our pixels in our images for a faster computing. We then applie a random rotation, shear, zoom and brightness to our images.

In [None]:
training_set = train_datagen.flow_from_directory(r"/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Training",
                                                 target_size = (IMAGE_SIZE, IMAGE_SIZE),
                                                 batch_size = BATCH_SIZE,
                                                 class_mode = 'binary')

Found 2101 images belonging to 2 classes.


In [None]:
test_set = test_datagen.flow_from_directory(r"/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Test",
                                            target_size = (IMAGE_SIZE, IMAGE_SIZE),
                                            batch_size = BATCH_SIZE,
                                            class_mode = 'binary')

Found 660 images belonging to 2 classes.


In [None]:
validation_set = test_datagen.flow_from_directory(r"/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Validation",
                                                 target_size = (IMAGE_SIZE, IMAGE_SIZE),
                                                 batch_size = BATCH_SIZE,
                                                 class_mode = 'binary')

Found 689 images belonging to 2 classes.


We then import our datas from our directories and resize all the images to the same size.

In [None]:
def build_model():
    inputs = keras.Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
    #x = layers.experimental.preprocessing.Rescaling(1. / 255)(inputs)
    x = layers.Conv2D(16, 3, activation="relu", padding="same")(inputs)
    x = layers.Conv2D(16, 3, activation="relu", padding="same")(x)
    x = layers.MaxPool2D()(x)

    x = layers.SeparableConv2D(32, 3, activation="relu", padding="same")(x)
    x = layers.SeparableConv2D(32, 3, activation="relu", padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPool2D()(x)
    x = layers.SeparableConv2D(64, 3, activation="relu", padding="same")(x)
    x = layers.SeparableConv2D(64, 3, activation="relu", padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPool2D()(x)

    x = layers.SeparableConv2D(128, 3, activation="relu", padding="same")(x)
    x = layers.SeparableConv2D(128, 3, activation="relu", padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPool2D()(x)
    x = layers.Dropout(0.2)(x)

    x = layers.SeparableConv2D(256, 3, activation="relu", padding="same")(x)
    x = layers.SeparableConv2D(256, 3, activation="relu", padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPool2D()(x)
    x = layers.Dropout(0.2)(x)

    x = layers.Flatten()(x)

    x = layers.Dense(512, activation="relu")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(0.7)(x)

    x = layers.Dense(128, activation="relu")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(0.5)(x)

    x = layers.Dense(64, activation="relu")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(0.3)(x)

    outputs = layers.Dense(1, activation="sigmoid")(x)

    model = keras.Model(inputs=inputs, outputs=outputs)
    return model


model = build_model()

We create our model using five time: Convolution Filter -> Convolution Filter -> Batchnormalization -> Max Pooling -> Dropout (optional)
Batchnormalization and dropout are there to avoid the overfitting of our model to the dataset.

We then Flatten the output of the Convolution networks and add three Dense layers with batchnormalization layers and dropout layers.

Our last output layer is a Dense layer of one unit since we set the labels mode to binary with and sigmoid activation which fit the binary output. 


In [None]:
METRICS = [
    tf.keras.metrics.BinaryAccuracy(),
    tf.keras.metrics.Precision(name="precision"),
    tf.keras.metrics.Recall(name="recall"),
    tf.keras.metrics.AUC(name="auc"),
]

We setup the metrics we will use to evaluate our model. Binary Accuracy and the AUC metrics are our most important parameter.

In [None]:
checkpoint = tf.keras.callbacks.ModelCheckpoint("best_model.h5", save_best_only=True)

early_stopping = tf.keras.callbacks.EarlyStopping(
    patience=10, restore_best_weights=True
)

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

We create the different callbacks for the training:
- a checkpoint to save our model which performed the best based on its result on the validation set

- an early stoping to stop our model traing when he begin to overfitt.

- A callback for tensorboard to save the training and validation logs

In [None]:
initial_learning_rate = 0.015
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate, decay_steps=100000, decay_rate=0.95, staircase=True
)

We set a learning rate for our optimizer to have better result and avoid the plateau.

We then compile our model using Adam as our optimizer feeding it the learning rate schedule. We base the loss of our model with binary_crossentropy sinc our labels are in binary modes and we add the metrics defined erlier to our compilation. 
We set a learning rate for our optimizer to have better result and avoid the plateau.


In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
    loss="binary_crossentropy",
    metrics=METRICS,
)

We then compile our model using Adam as our optimizer feeding it the learning rate schedule. We base the loss of our model with binary_crossentropy sinc our labels are in binary modes and we add the metrics defined erlier to our compilation. 


In [None]:
model.fit(training_set, validation_data=validation_set, epochs=100, verbose=1, callbacks=[tensorboard_callback, checkpoint, early_stopping])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100


<tensorflow.python.keras.callbacks.History at 0x7fe3508bae10>

We then train our model. Using our train/validation split, for 100 epoch since we have an early stopping anyway.

In [None]:
model.load_weights("/content/best_model.h5")
model.evaluate(test_set)



[0.643674373626709,
 0.6772727370262146,
 0.7747747898101807,
 0.5134328603744507,
 0.7824156284332275]

We load the model which performed the best and evalute it using out test dataset.


In [None]:
def get_random_leaf_picture():
    PATH = ["/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Test/Malade", "/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Test/Saine/"]
    rand = random.randint(0, 1)
    rand_file = random.choice(os.listdir(PATH[rand]))
    if rand == 0:
        rand_file = "/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Test/Malade/" + rand_file
        print("this Leaf is sick")
    else:
        rand_file = "/content/gdrive/MyDrive/VineLeafDataSet/Dataset/Test/Saine/" + rand_file
        print("this Leaf is normal")
    return rand_file

model.load_weights("best_model.h5")

for i in range(10):
    file_name = get_random_leaf_picture()
    img = keras.preprocessing.image.load_img(
        file_name, target_size=(IMAGE_SIZE, IMAGE_SIZE)
    )
    img_array = keras.preprocessing.image.img_to_array(img)
    img_array = tf.expand_dims(img_array, 0)  # Create batch axis

    predictions = model.predict(img_array)
    score = predictions[0]
    print(
        "This image is %.2f percent Normal and %.2f percent Sick."
        % (100 * (1 - score), 100 * score)
    )

this Leaf is sick
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is sick
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is sick
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 0.00 percent Normal and 100.00 percent Sick.
this Leaf is normal
This image is 69.17 percent Normal and 30.83 percent Sick.


We created a simple fonction which pick 10 random images from the test dataset and test our model with those images, diplaying our leaf images scores of Sick or Normal.