# Ready, Steady, Go AI (*Exercises*)

This notebook is a supplement to the paper, **Ready, Steady, Go AI: A Practical Tutorial on Explainable Artificial Intelligence and Its Applications in Phenomics Image Analysis** (submitted to *Patterns, 2021*) by Farid Nakhle and Antoine Harfouche

Read the accompanying paper [here](https://doi.org).

# Table of Contents


* **1. Introduction**
* **2. Exercise I: Splitting Data**
* **3. Exercise II: Cropping Leaf Images**
* **4. Exercise III: Segmenting Leaf Images**
* **5. Exercise IV: Descriptive Data Analysis**
* **6. Exercise V: Balancing the Dataset**
* **7. Exercise VI: Classification Using DenseNet-161 Pretrained DCNN algorithm**
* **8. Exercise VII: Generating Confusion Matrix**
* **9. Exercise VIII: Generating Explanations With LIME**

# 1. Introduction


Before attempting to resolve the exercises found in this notebook, visit our Github repository and try to open and run all the notebooks provided by the tutorial. 

Here, the solution for each exercise can be found in a hidden code cell at its end.

Users should try to solve the exercises with the help of the notebooks provided by the tutorial before looking at the solution.

As a reminder, we are working with the PlantVillage dataset, originally obtained from [here](http://dx.doi.org/10.17632/tywbtsjrjv.1).
For the following exercises, we will be working with a subset of PlantVillage containing the tomato classes only. We have made the subset available [here](http://faridnakhle.com/pv/tomato-original.zip). 

**It is important to note that Colab deletes all unsaved data once the instance is recycled. Therefore, remember to download your results once you run the code.**

#2. Exercise I: Data Splitting


**A.** Complete the code that downloads the PlantVillage tomato leaves dataset using the link provided in the introduction. The dataset must be saved then extracted to /content/dataset/original/.

**B.** Complete the code that randomly splits the dataset into training, validation, and testing. Use the following split ratio: training: 80%, validation: 10%, testing: 10%. The split dataset must be saved under /content/dataset/split/

In [None]:
import requests
import os
import zipfile

## Write a code using http requests to download the dataset zip file.
## Use zipfile software library to extract the dataset to /content/dataset/original


# USE PIP to install Splitfolders

# Replace "?" with your answer. NB: splitfolders expect the ratio parameter in the following format: for 50%, type .5; for 60%, .6; etc.
!splitfolders --output ? --seed 1337 --ratio ? ? ?  -- "/content/dataset/original"

# Solution

In [None]:
!rm -R /content/dataset/original/
!rm -R /content/dataset/split/

import requests
import os
import zipfile

dataset_url = "http://faridnakhle.com/pv/tomato-original.zip"
save_data_to = "/content/dataset/original/"
dataset_file_name = "dataset.zip"

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

r = requests.get(dataset_url, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})

print("Downloading dataset...")  

with open(save_data_to + dataset_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Dataset downloaded")  
print("Extracting files...")  
with zipfile.ZipFile(save_data_to + dataset_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)

## Delete the zip file as we no longer need it
os.remove(save_data_to + dataset_file_name)
print("All done!")  

## SPLIT 
!pip install split-folders tqdm
!splitfolders --output "/content/dataset/split/" --seed 1337 --ratio .8 .1 .1 -- "/content/dataset/original"

# 3. Exercise II: Cropping Leaf Images

Before you start, make sure to run the "Install and import prequisites", "Download pretrained model", and "Define prerequisite functions" code cells.
Once you run all of them, you are required to pass the path of the folder containing the split data from Exercise I (training, validation, and testing) to the crop function below. Running it will trigger YOLO to crop the images using a pretrained model. 
After that, you should be able to run the "Preview a Cropped Image" cell and see a sample image from the results.

In [None]:
#@title Install and import prerequisites
!git clone https://github.com/ultralytics/yolov3
%cd yolov3
%pip install -qr requirements.txt 
import torch
from IPython.display import Image


In [None]:
#@title Download pretrained model

model_URL = "http://faridnakhle.com/pv/models/YOLOv3.zip"
save_data_to = "/content/models/"
model_file_name = "yolo.zip"

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

print("Downloading model...")  

r = requests.get(model_URL, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})
with open(save_data_to + model_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Model downloaded")  
print("Extracting files...")

with zipfile.ZipFile(save_data_to + model_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)
print("All done!")  

In [None]:
#@title Define prerequisite functions
import os
import cv2
import random
import numpy as np

# function for cropping each detection and saving as new image
def crop_objects(img, data, path):
    boxes, scores, classes, num_objects = data
    #create dictionary to hold count of objects for image name
    for i in range(len(num_objects)):
        # get count of class for part of image name
        class_index = int(classes[i])
        # get box coords
        xmin, ymin, xmax, ymax = boxes[i]
        # crop detection from image
        cropped_img = img[int(ymin)-5:int(ymax)+5, int(xmin)-5:int(xmax)+5]
        # construct image name and join it to path for saving crop properly
        img_name =  'cropped_img.png'
        img_path = os.path.join(path, img_name )
        # save image
        cv2.imwrite(img_path, cropped_img)

def crop_object(img, coords, img_path):
    # get box coords
    xmin = int(coords[0])
    ymin = int(coords[1])
    xmax = int(coords[2])
    ymax = int(coords[3])
    # crop detection from image
    cropped_img = img[ymin:ymax, xmin:xmax]
    # save image
    cv2.imwrite(img_path, cropped_img)

def plot_grid(img, line_color=(0, 255, 0), thickness=1, type_=cv2.LINE_AA, pxstep=20, pystep=20):
    x = pystep
    y = pxstep

    while x < img.shape[1]:
        cv2.line(img, (x, 0), (x, img.shape[0]), color=line_color, lineType=type_, thickness=thickness)
        x += pystep

    while y < img.shape[0]:
        cv2.line(img, (0, y), (img.shape[1], y), color=line_color, lineType=type_, thickness=thickness)
        y += pxstep
def plot_borders(img, line_color=(0, 255, 0), thickness=1):
    cv2.rectangle(img,(0 ,0),(img.shape[1]-thickness,img.shape[0]-thickness), line_color, thickness)

def myround(x, base=5):
    return base * round(x/base)
def plot_overlay(x, img, color, alpha,
 pxstep=20, pystep=20):
    overlay = img.copy()
    x0, x1, x2, x3 = int(x[0]), int(x[1]), int(x[2]), int(x[3])

    x0 = myround(x0,pystep)
    x1 = myround(x1,pxstep)
    x2 = myround(x2,pystep)
    x3 = myround(x3,pxstep)

    c1, c2 = (x0, x1), (x2, x3)
    cv2.rectangle(overlay, c1, c2, color, -1)
    # apply the overlay
    cv2.addWeighted(overlay, alpha, img, 1 - alpha, 0, img)

!cd /content/yolov3/
import argparse
import time
from pathlib import Path

import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random

from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, non_max_suppression, apply_classifier, scale_coords, xyxy2xywh, \
    strip_optimizer, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, load_classifier, time_synchronized

import glob

def crop(dataset_dir='', model_path='/content/yolov3/runs/train/exp/best.pt'):
    save_txt, imgsz = False, 224
    weights = model_path
    projectP = 'runs/detect'
    projectNameP = 'exp'
    save_img = True
    view_img = True

    save_dir = Path(increment_path(Path(projectP) / projectNameP, False))  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    #loop over train, val, and test set
    trainTestVarDirs = glob.glob(dataset_dir + "*")
    for setDir in trainTestVarDirs:
      splitDir = os.path.basename(setDir)
      setClasses = glob.glob(setDir + "/*")
      for setClass in setClasses:
        # Directories
        classDir = os.path.basename(setClass)
        finalSaveDir = os.path.join(save_dir, splitDir, classDir)
        Path(finalSaveDir).mkdir(parents=True, exist_ok=True)
        source = setClass
        

        # Initialize
        set_logging()
        device = select_device('0')
        half = device.type != 'cpu'  # half precision only supported on CUDA

        # Load model
        model = attempt_load(weights, map_location=device)  # load FP32 model
        imgsz = check_img_size(imgsz, s=model.stride.max())  # check img_size
        
        #introducing grid size
        gs = model.stride.max()
        #end

        if half:
            model.half()  # to FP16

        # Second-stage classifier
        classify = False
        if classify:
            modelc = load_classifier(name='resnet101', n=2)  # initialize
            modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model']).to(device).eval()

        # Set Dataloader
        vid_path, vid_writer = None, None
        
        dataset = LoadImages(source, img_size=imgsz)

        # Get names and colors
        names = model.module.names if hasattr(model, 'module') else model.names
        colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]

        colors = [[217, 175, 78]]

        # Run inference
        t0 = time.time()
        img = torch.zeros((1, 3, imgsz, imgsz), device=device)  # init img
        _ = model(img.half() if half else img) if device.type != 'cpu' else None  # run once
        for path, img, im0s, vid_cap in dataset:
            img = torch.from_numpy(img).to(device)
            img = img.half() if half else img.float()  # uint8 to fp16/32
            img /= 255.0  # 0 - 255 to 0.0 - 1.0
            if img.ndimension() == 3:
                img = img.unsqueeze(0)

            # Inference
            t1 = time_synchronized()
            pred = model(img, augment=True)[0]

            # Apply NMS
            final_pred = non_max_suppression(pred, 0.15, 0.3, classes=0, agnostic=True)
            pred = non_max_suppression(pred, 0.00005, 1, classes=0, agnostic=True)
            t2 = time_synchronized()

            # Apply Classifier
            if classify:
                pred = apply_classifier(pred, modelc, img, im0s)

            # Process detections
            for i, det in enumerate(pred):  # detections per image
                
                p, s, im0 = Path(path), '', im0s.copy()

                imoriginal = im0.copy()
                #plot grid
                numofsquares = int(imgsz/int(gs))
                rowstep = int(im0.shape[0]/numofsquares)
                colstep = int(im0.shape[1]/numofsquares)
                plot_borders(im0, line_color=(0,0,0), thickness=2)
                gridim_solo = im0.copy()
                plot_grid(gridim_solo, pxstep=rowstep, pystep=colstep, line_color=(0,0,0), thickness=2)
                #end plot grid
     
                save_path = str(finalSaveDir + "/" + p.name)
                s += '%gx%g ' % img.shape[2:]  # print string
                gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
                if len(det):
                    # Rescale boxes from img_size to im0 size
                    det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                    # Print results
                    for c in det[:, -1].unique():
                        n = (det[:, -1] == c).sum()  # detections per class
                        s += '%g %ss, ' % (n, names[int(c)])  # add to string
                    
                    # Write results
                    for *xyxy, conf, cls in reversed(det):
                        if save_img or view_img:  # Add bbox to image
                            label = ''#'%s %.2f' % (names[int(cls)], conf)
                            plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=1)
                            
                

                # Print time (inference + NMS)
                print('%sDone. (%.3fs)' % (s, t2 - t1))

                # Save results (image with detections)
                if save_img:
                    cv2.imwrite(save_path + "_original.jpg", imoriginal)
                    cv2.imwrite(save_path, im0)
                    cv2.imwrite(save_path + "_grid.jpg", gridim_solo)
                        


            # SAVE FINAL CROPPED IMAGES
            # Process detections
            for i, det in enumerate(final_pred):  # detections per image
                
                p, s, im0 = Path(path), '', im0s
                im2 = im0.copy() #to use with grid/map
                #background
                numofsquares = int(imgsz/int(gs))
                rowstep = int(im0.shape[0]/numofsquares)
                colstep = int(im0.shape[1]/numofsquares)
                plot_overlay([0,0, im2.shape[1], im2.shape[0]], im2, color=(255, 255, 255), alpha=0.7, pxstep=rowstep, pystep=colstep)
                
                #borders
                plot_borders(im2, line_color=(0,0,0), thickness=2)
                plot_borders(im0, line_color=(0,0,0), thickness=2)

                save_path = str(finalSaveDir + "/" +  p.name)
                s += '%gx%g ' % img.shape[2:]  # print string
                gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
                if len(det):
                    # Rescale boxes from img_size to im0 size
                    det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                    # Print results
                    for c in det[:, -1].unique():
                        n = (det[:, -1] == c).sum()  # detections per class
                        s += '%g %ss, ' % (n, names[int(c)])  # add to string

                    #FUNCTION custom crop
                    CROP = True
                    if CROP:
                        fidx = 0
                        for *xyxy, conf, cls in reversed(det):
                            if save_img or view_img:
                                fidx = fidx + 1
                                crop_object(im0, xyxy, str(finalSaveDir + "/" +  (p.stem + "_cropped_" + str(fidx) + p.suffix)))
                    #END
                    
                    # Write results
                    for *xyxy, conf, cls in reversed(det):
                        if save_img or view_img:  # Add bbox to image
                            label = ''#'%s %.2f' % (names[int(cls)], conf)
                            plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=2)
                            plot_overlay(xyxy, im2, color=colors[int(cls)], alpha=0.7, pxstep=rowstep, pystep=colstep)
                else:
                    cv2.imwrite(save_path + "_not_cropped.jpg", im0)


                gridim = im2.copy()
                plot_grid(gridim, pxstep=rowstep, pystep=colstep, line_color=(0,0,0), thickness=2)
                
                # Print time (inference + NMS)
                print('%sDone. (%.3fs)' % (s, t2 - t1))

                # Save results (image with detections)
                if save_img:
                    cv2.imwrite(save_path + "_map.jpg", gridim)
                    cv2.imwrite(save_path + "_final.jpg", im0)

        if save_txt or save_img:
            s = f"\n{len(list(finalSaveDir.glob('labels/*.txt')))} labels saved to {finalSaveDir + '/' + 'labels'}" if save_txt else ''
            print(f"Results saved to {finalSaveDir}{s}")

        print('Done. (%.3fs)' % (time.time() - t0))

In [None]:
# Read Define prerequisite functions cell (double click on it).
# Call the crop function with its corresponding parameters. The pre-trained model is located under /content/models/weights/RSGAI_YOLOv3.pt.

# Solution

In [None]:
crop(dataset_dir='/content/dataset/split/', model_path='/content/models/weights/RSGAI_YOLOv3.pt')

# Preview a Cropped Image

In [None]:
#@title Generate preview

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import glob
lastExp = max(glob.glob(os.path.join('/content/yolov3/runs/detect','*/' )), key=os.path.getmtime)

imgPath = lastExp + 'test/Tomato___Leaf_Mold/image (114).JPG'
oringinalImg = mpimg.imread(imgPath + "_original.jpg")
boundingBoxesImg = mpimg.imread(imgPath)
croppedImg = mpimg.imread(imgPath.replace(".JPG", "_cropped_1.JPG"))
gridImg = mpimg.imread(imgPath+ "_grid.jpg")
mapImg = mpimg.imread(imgPath+ "_map.jpg")
finaldetectImg = mpimg.imread(imgPath+ "_final.jpg")

print("Original Image:")
plt.axis('off')
plt.imshow(oringinalImg)
plt.show()

print("Grid:")
plt.axis('off')
plt.imshow(gridImg)
plt.show()

print("Bounding Boxes:")
plt.axis('off')
plt.imshow(boundingBoxesImg)
plt.show()

print("Probability Map:")
plt.axis('off')
plt.imshow(mapImg)
plt.show()

print("Final Detection:")
plt.axis('off')
plt.imshow(finaldetectImg)
plt.show()


print("Cropped Image:")
plt.axis('off')
plt.imshow(croppedImg)
plt.show()

# 4. Exercise III: Segmenting Leaf Images

Before you start, make sure to run the "Install and import prequisites", "Download pretrained model", and "Define prerequisite functions" code cells.
Once you run all of them, you are required to pass the path of the folder containing the cropped tomato leaves to the segment function below. The cropped version of the dataset is located under /content/dataset/tomato-cropped/.
Running the segment function will trigger SegNet to segment the images using a pretrained model. 
After that, you should be able to run the "Preview a Segmented Image" cell and see a sample image from the results.

In [None]:
#@title Install and import prerequisites
!git clone https://github.com/divamgupta/image-segmentation-keras
%cd image-segmentation-keras
from keras_segmentation.models.segnet import segnet
print("Keras and SegNet are loaded")

In [None]:
#@title Download pretrained model
##########################
### DOWNLOAD THE MODEL ###
##########################
import requests
import os
import zipfile
## FEEL FREE TO CHANGE THESE PARAMETERS
model_URL = "http://faridnakhle.com/pv/models/SegNet.zip"
save_data_to = "/content/models/"
model_file_name = "segnet.zip"
#######################################

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

print("Downloading model...")  

r = requests.get(model_URL, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})
with open(save_data_to + model_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Model downloaded")  
print("Extracting files...")

with zipfile.ZipFile(save_data_to + model_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)
print("Done!")


## CROPPED DATASET

## FEEL FREE TO CHANGE THESE PARAMETERS
dataset_url = "http://faridnakhle.com/pv/tomato-split-cropped.zip"
save_data_to = "/content/dataset/tomato-cropped/"
dataset_file_name = "tomato-cropped.zip"
#######################################

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

r = requests.get(dataset_url, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})

print("Downloading dataset...")  

with open(save_data_to + dataset_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Dataset downloaded")  
print("Extracting files...")  
with zipfile.ZipFile(save_data_to + dataset_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)

## Delete the zip file as we no longer need it
os.remove(save_data_to + dataset_file_name)
print("All done!")  

In [None]:
#@title Define prerequisite functions

import cv2
import numpy as np
from keras_segmentation.models.segnet import segnet
import glob
import os
from tqdm import tqdm
import six

def segment(inptDir = ""):

  modelName = "/content/models/RSGAI_SegNet.hdf5"
  model = segnet(n_classes=50 ,  input_height=320, input_width=640)
  model.load_weights(modelName)

  outputDir = "/content/dataset/tomato-cropped-segmented/"

  inptDirGlob = glob.glob(inptDir + "*")
  for setDir in inptDirGlob:

    splitDir = os.path.basename(setDir)
    setClasses = glob.glob(setDir + "/*")

    for setClass in setClasses:

      classDir = os.path.basename(setClass)
      inptFolder = os.path.join(inptDir, splitDir, classDir)
      outputFolder = os.path.join(outputDir, splitDir, classDir)

      if not os.path.exists(outputFolder):
          os.makedirs(outputFolder)

      inps = glob.glob(os.path.join(inptFolder, "*.jpg")) + glob.glob(
          os.path.join(inptFolder, "*.png")) + \
          glob.glob(os.path.join(inptFolder, "*.jpeg"))+ \
          glob.glob(os.path.join(inptFolder, "*.JPG"))
      inps = sorted(inps)

      if len(inps) > 0:

        all_prs = []

        for i, inp in enumerate(tqdm(inps)):
            if outputFolder is None:
                out_fname = None
            else:
                if isinstance(inp, six.string_types):
                    out_fname = os.path.join(outputFolder, os.path.basename(inp))
                else:
                    out_fname = os.path.join(outputFolder, str(i) + ".jpg")

            pr = model.predict_segmentation(
                inp=inp,
                out_fname=out_fname
            )

            img = cv2.imread(inp)
            seg = cv2.imread(out_fname)

            for row in range(0, len(seg)):
                for col in range(0, len(seg[0])):
                    #if np.all(seg[row, col] == [7,47,204]) == False:
                    #    img[row, col] = [0,0,0]
                    
                    if seg[row, col][0] > 50:
                        img[row, col] = [0,0,0]
            all_prs.append(pr)
            cv2.imwrite(out_fname, img)
  print("Segmented images are saved in:") 
  print(outputDir)

In [None]:
# Read Define prerequisite functions cell (double click on it).
# Call the segment function with its corresponding parameters.

# Solution

In [None]:
segment(inptDir="/content/dataset/tomato-cropped/")

# Preview a Segmented Image

In [None]:
#@title Generate Preview
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
imgPath = '/content/dataset/tomato-cropped/test/Tomato___Septoria_leaf_spot/image (1020)_cropped_1.JPG'
segmemtedPath = '/content/dataset/tomato-cropped-segmented/test/Tomato___Septoria_leaf_spot/image (1020)_cropped_1.JPG'

oringinalImg = mpimg.imread(imgPath)
segmentedImage = mpimg.imread(segmemtedPath)

print("Original Image:")
plt.axis('off')
plt.imshow(oringinalImg)
plt.show()

print("Segmented Image:")
plt.axis('off')
plt.imshow(segmentedImage)
plt.show()

# 5. Exercise IV: Descriptive Data Analysis

Before you start, make sure to run the "Install and import prequisites" code cell. 

Next, loop over the classes in the segmented training data folder located under /content/dataset/tomato-segmented/, then count the images in each. You can use matplotlib to generate the plot.

Based on the above, you can decide whether or not there is a need for data balancing.

In [None]:
#@title Install and import prequisites

import requests
import os
import zipfile

## FEEL FREE TO CHANGE THESE PARAMETERS
dataset_url = "http://faridnakhle.com/pv/tomato-split-cropped-segmented.zip"
save_data_to = "/content/dataset/tomato-segmented/"
dataset_file_name = "tomato-segmented.zip"
#######################################

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

r = requests.get(dataset_url, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})

print("Downloading dataset...")  

with open(save_data_to + dataset_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Dataset downloaded")  
print("Extracting files...")  
with zipfile.ZipFile(save_data_to + dataset_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)

## Delete the zip file as we no longer need it
os.remove(save_data_to + dataset_file_name)
print("All done!")  


In [None]:
### WRITE YOUR CODE HERE ###

# Solution

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import shutil
import cv2
import matplotlib.pyplot as plt
import seaborn as sns

train_dir = '/content/dataset/tomato-segmented/train/'
train_classes = [path for path in os.listdir(train_dir)]
train_imgs = dict([(ID, os.listdir(os.path.join(train_dir, ID))) for ID in train_classes])
train_classes_count = []
for trainClass in train_classes:
  train_classes_count.append(len(train_imgs[trainClass]))
  
plt.figure(figsize=(15, 10))
g = sns.barplot(x=train_classes, y=train_classes_count)
g.set_xticklabels(labels=train_classes, rotation=30, ha='right')

# 6. Exercise V: Balancing the Dataset

Based on the results of the Descriptive Data Analysis (exercise IV), it can be seen that that the classes are imbalanced. In this exercise, you are required to:

**A.** Use Augmentor to oversample any class containing less than 1500 images, except for the healthy class.

**B.** Use DCGAN to synthesize images for the healthy class. You need to point the syntesizing function to the path of the pretrained model. The path is saved under /content/models/RSGAI_DCGAN.pth

**C.** Use KNN to reduce the yellow leaf curl virus class.

Balanced classes should end up with 1500 images each.
 
 **NB:** After data balancing, generate the data distribution plot again to analyze the new distribution of classes.

 Make sure to run the "Install and import prequisites", "Download pretrained models", and "Define prerequisite functions" cells first.

In [None]:
#@title Install and import prequisites

!pip install Augmentor
import Augmentor
import os

def makedir(path):
    '''
    if path does not exist in the file system, create it
    '''
    if not os.path.exists(path):
        os.makedirs(path)

datasets_root_dir = '/content/dataset/tomato-segmented/'
dir = datasets_root_dir + 'train/'
target_dir = dir #same directory as input
makedir(target_dir)

folders = [os.path.join(dir, folder) for folder in next(os.walk(dir))[1]]
target_folders = [os.path.join(target_dir, folder) for folder in next(os.walk(dir))[1]]


In [None]:
#@title Download pretrained models

##########################
### DOWNLOAD THE MODEL ###
##########################

## FEEL FREE TO CHANGE THESE PARAMETERS
model_URL = "http://faridnakhle.com/pv/models/RSGAI_DCGAN.zip"
save_data_to = "/content/models/"
model_file_name = "dcgan.zip"
#######################################

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

print("Downloading model...")  

r = requests.get(model_URL, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})
with open(save_data_to + model_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Model downloaded")  
print("Extracting files...")

with zipfile.ZipFile(save_data_to + model_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)
print("All done!")  

In [None]:
#@title Define prerequisite functions
import argparse
import os
import numpy as np
import math

import torchvision.transforms as transforms
from torchvision.utils import save_image

from torch.utils.data import DataLoader
from torchvision import datasets
from torch.autograd import Variable

import torch.nn as nn
import torch.nn.functional as F
import torch
## YOU CAN CHANGE THESE VARIABLES    
n_epochs = 300
batch_size = 50
lr = 0.0002
b1 = 0.7 #adam: decay of first order momentum of gradient
b2 = 0.999 #adam: decay of first order momentum of gradient
n_cpu = 1
latent_dim = 100 #dimensionality of the latent space
img_size = 224
channels = 3 #R, G, and B
sample_interval = 400 #interval between image sampling
######################################################

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()

        self.init_size = img_size // 4
        self.l1 = nn.Sequential(nn.Linear(latent_dim, 128 * self.init_size ** 2))

        self.conv_blocks = nn.Sequential(
            nn.BatchNorm2d(128),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(128, 128, 3, stride=1, padding=1),
            nn.BatchNorm2d(128, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Upsample(scale_factor=2),
            nn.Conv2d(128, 64, 3, stride=1, padding=1),
            nn.BatchNorm2d(64, 0.8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, channels, 3, stride=1, padding=1),
            nn.Tanh(),
        )

    def forward(self, z):
        out = self.l1(z)
        out = out.view(out.shape[0], 128, self.init_size, self.init_size)
        img = self.conv_blocks(out)
        return img

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        def discriminator_block(in_filters, out_filters, bn=True):
            block = [nn.Conv2d(in_filters, out_filters, 3, 2, 1), nn.LeakyReLU(0.2, inplace=True), nn.Dropout2d(0.25)]
            if bn:
                block.append(nn.BatchNorm2d(out_filters, 0.8))
            return block

        self.model = nn.Sequential(
            *discriminator_block(channels, 16, bn=False),
            *discriminator_block(16, 32),
            *discriminator_block(32, 64),
            *discriminator_block(64, 128),
        )

        # The height and width of downsampled image
        ds_size = img_size // 2 ** 4
        self.adv_layer = nn.Sequential(nn.Linear(128 * ds_size ** 2, 1), nn.Sigmoid())

    def forward(self, img):
        out = self.model(img)
        out = out.view(out.shape[0], -1)
        validity = self.adv_layer(out)

        return validity

def GenerateImages(modelPath, outPutFolder, IMGS2GENERATE):
   
    if not os.path.exists(outPutFolder):
        os.makedirs(outPutFolder)

    ## YOU CAN CHANGE THESE VARIABLES    
    n_epochs = 1
    batch_size = 50
    lr = 0.0002
    b1 = 0.7 #adam: decay of first order momentum of gradient
    b2 = 0.999 #adam: decay of first order momentum of gradient
    n_cpu = 1
    latent_dim = 100 #dimensionality of the latent space
    img_size = 224
    channels = 3 #R, G, and B
    sample_interval = 400 #interval between image sampling
    ######################################################

    def weights_init_normal(m):
        classname = m.__class__.__name__
        if classname.find("Conv") != -1:
            torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
        elif classname.find("BatchNorm2d") != -1:
            torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
            torch.nn.init.constant_(m.bias.data, 0.0)

    cuda = True if torch.cuda.is_available() else False

    load_from_checkpoint = True

    # Loss function
    adversarial_loss = torch.nn.BCELoss()

    # Initialize generator and discriminator
    generator = Generator()
    discriminator = Discriminator()

    if cuda:
        generator.cuda()
        discriminator.cuda()
        adversarial_loss.cuda()

    # Initialize weights
    generator.apply(weights_init_normal)
    discriminator.apply(weights_init_normal)

    # Optimizers
    optimizer_G = torch.optim.Adam(generator.parameters(), lr=lr, betas=(b1, b2))
    optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=lr, betas=(b1, b2))

    Tensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor

    # ----------
    #  Load from Checkpoint
    # ----------

    if (load_from_checkpoint):
        checkpointName = modelPath
        checkpoint = torch.load(checkpointName)
        generator.load_state_dict(checkpoint['G_state_dict'])
        discriminator.load_state_dict(checkpoint['D_state_dict'])
        optimizer_G.load_state_dict(checkpoint['G_optimizer'])
        optimizer_D.load_state_dict(checkpoint['D_optimizer'])
        print("Loaded CheckPoint: " + checkpointName)
        if cuda:
            generator.cuda()
            discriminator.cuda()

    # ----------
    #  Generating images
    # ----------

    for i in range (0, IMGS2GENERATE):
        z = Variable(Tensor(np.random.normal(0, 1, (1, latent_dim))))
        # Generate a batch of images
        gen_imgs = generator(z)
        save_image(gen_imgs.data, outPutFolder + "/DCGAN_%d.png" % (i + 1), nrow=0, normalize=True)

# A. Augmentor

In [None]:
## Creante an Augmentor Pipline ##
## Do not augment the healthy class ##

# Solution

In [None]:
requiredNbrOfImages = 1500

for i in range(len(folders)):
    if folders[i].endswith("healthy") == False:
        path, dirs, files = next(os.walk(folders[i]))
        nbrOfImages = len(files)
        nbrOfImagesNeeded = requiredNbrOfImages - nbrOfImages
          
        if nbrOfImagesNeeded > 0:
            tfd = target_folders[i]
            print ("saving in " + tfd)
            p = Augmentor.Pipeline(source_directory=folders[i], output_directory=tfd)
            p.rotate(probability=1, max_left_rotation=15, max_right_rotation=15)
            p.flip_left_right(probability=0.5)
            p.skew(probability=1, magnitude=0.2)
            p.flip_left_right(probability=0.5)
            p.shear(probability=1, max_shear_left=10, max_shear_right=10)
            p.flip_left_right(probability=0.5)
            p.sample(nbrOfImagesNeeded)
print("Dataset Augmented!")

# B. DCGAN

In [None]:
## Use the GenerateImages() function. Hint: IMGS2GENERATE variable should be 228.

# Solution

In [None]:
GenerateImages('/content/models/RSGAI_DCGAN.pth', '/content/dataset/dcgan/', IMGS2GENERATE = 228)
print("Data Generated")

# Preview Some Generated Images


In [None]:
#@title Preview some syntetic images

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
imgPath = '/content/dataset/dcgan/DCGAN_'
imageOne = mpimg.imread(imgPath + "1.png")
imageTen = mpimg.imread(imgPath + "10.png")

plt.axis('off')
plt.imshow(imageOne)
plt.show()

plt.axis('off')
plt.imshow(imageTen)
plt.show()

# C. KNN

In [None]:
#@title Load prerequisites

from sklearn.neighbors import NearestNeighbors
from glob import glob

import numpy as np
import scipy.sparse as sp
from keras.applications import VGG19
from keras.applications.vgg19 import preprocess_input
from keras.engine import Model
from keras.preprocessing import image
import numpy as np
import os
deleteImages = True
def SaveFile(arr, filename):
    with open(filename, 'w') as filehandle:
        for listitem in arr:
            filehandle.write(str(listitem) + "\n")


def vectorize_all(files, model, px=224, n_dims=512, batch_size=512):
    min_idx = 0
    max_idx = min_idx + batch_size
    total_max = len(files)
    if (max_idx > total_max):
        max_idx = total_max
    
    preds = sp.lil_matrix((len(files), n_dims))

    print("Total: {}".format(len(files)))
    while min_idx < total_max - 1:
        print(min_idx)
        X = np.zeros(((max_idx - min_idx), px, px, 3))
        # For each file in batch, 
        # load as row into X
        i = 0
        for i in range(min_idx, max_idx):
            file = files[i]
            try:
                img = image.load_img(file, target_size=(px, px))
                img_array = image.img_to_array(img)
                X[i - min_idx, :, :, :] = img_array
            except Exception as e:
                print(e)
        max_idx = i
        X = preprocess_input(X)
        these_preds = model.predict(X)
        shp = ((max_idx - min_idx) + 1, n_dims)
        preds[min_idx:max_idx + 1, :] = these_preds.reshape(shp)
        min_idx = max_idx
        max_idx = np.min((max_idx + batch_size, total_max))
    return preds

def vectorizeOne(path, model):
    img = image.load_img(path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    pred = model.predict(x)
    return pred.ravel()

def findSimilar(vec, knn, filenames, n_neighbors=6):
    if n_neighbors >= len(filenames):
        print("Error. number of neighbours should be less than the number of images.")
    else:
        n_neighbors = n_neighbors + 1
        dist, indices = knn.kneighbors(vec.reshape(1, -1), n_neighbors=n_neighbors)
        dist, indices = dist.flatten(), indices.flatten()
        similarList = [(filenames[indices[i]], dist[i]) for i in range(len(indices))]
        del similarList[0]
        #similarImages.sort(reverse=True, key=lambda tup: tup[1])
        return similarList

In [None]:
## REPLACE ? WITH THE NAME OF THE CLASS TO BE DOWNSAMPLED
img_dir = "/content/dataset/tomato-segmented/train/?/*"
targetLimit = ? ## REPLACE ? WITH THE DESIRED NUMBER OF IMAGES

files = glob(img_dir)

nbrOfImages2Delete = len(files) - targetLimit

if (nbrOfImages2Delete > 0):

    imgToSearchFor = files[0]
    base_model = VGG19(weights='imagenet')
    model = Model(inputs=base_model.input, outputs=base_model.get_layer('fc1').output)
    vecs = vectorize_all(files, model, n_dims=4096)

    ######################
    ### YOUR CODE HERE ###
    ## Create a variable named knn and assign it to a KNN model using "cosine" as metric and "brute" as the algorithm  variable.
    ## Fit the KNN model with vecs variable.
    ###############################
    
    vec = vectorizeOne(imgToSearchFor, model)
    similarImages = findSimilar(vec, knn, files, nbrOfImages2Delete)
    print(similarImages)
    SaveFile(similarImages, "deletedImages.txt")

    if deleteImages:
        for i in range(0, len(similarImages)):
            if os.path.exists(similarImages[i][0]):
                os.remove(similarImages[i][0])
    print("Balancing done. A list of deleted images can be found in deletedImages.txt")
else:
    print("nothing to delete")

# Solution

In [None]:
img_dir = "/content/dataset/tomato-segmented/train/Tomato___Tomato_Yellow_Leaf_Curl_Virus/*"
targetLimit = 1500
deleteImages = True

files = glob(img_dir)
nbrOfImages2Delete = len(files) - targetLimit

if (nbrOfImages2Delete > 0):

    imgToSearchFor = files[0]

    base_model = VGG19(weights='imagenet')
    model = Model(inputs=base_model.input, outputs=base_model.get_layer('fc1').output)
    vecs = vectorize_all(files, model, n_dims=4096)

    knn = NearestNeighbors(metric='cosine', algorithm='brute')
    knn.fit(vecs)

    vec = vectorizeOne(imgToSearchFor, model)
    similarImages = findSimilar(vec, knn, files, nbrOfImages2Delete)
    SaveFile(similarImages, "deletedImages.txt")

    if deleteImages:
        for i in range(0, len(similarImages)):
            if os.path.exists(similarImages[i][0]):
                os.remove(similarImages[i][0])
    print("Balancing done. A list of deleted images can be found in deletedImages.txt")
else:
    print("nothing to delete")

# Display Final Data Distribution

In [None]:
#@title Generate Data Distribution

import requests
import os
import zipfile

## FEEL FREE TO CHANGE THESE PARAMETERS
dataset_url = "http://faridnakhle.com/pv/tomato-split-cropped-segmented-balanced.zip"
save_data_to = "/content/dataset/tomato-dataset-final/"
dataset_file_name = "tomato-split-cropped-segmented-balanced.zip"
#######################################

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

r = requests.get(dataset_url, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})

print("Downloading dataset...")  

with open(save_data_to + dataset_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Dataset downloaded")  
print("Extracting files...")  
with zipfile.ZipFile(save_data_to + dataset_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)

## Delete the zip file as we no longer need it
os.remove(save_data_to + dataset_file_name)
print("All done!")  

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import shutil
import cv2
import matplotlib.pyplot as plt
import seaborn as sns

train_dir = '/content/dataset/tomato-dataset-final/train/'
train_classes = [path for path in os.listdir(train_dir)]
train_imgs = dict([(ID, os.listdir(os.path.join(train_dir, ID))) for ID in train_classes])
train_classes_count = []
for trainClass in train_classes:
  train_classes_count.append(len(train_imgs[trainClass]))

plt.figure(figsize=(15, 10))
g = sns.barplot(x=train_classes, y=train_classes_count)
g.set_xticklabels(labels=train_classes, rotation=30, ha='right')


# 7. Exercise VI: Classification Using DenseNet-161 Pretrained DCNN algorithm

In this exercise, you are required to load a pretrained DCNN model and test it with the testing dataset located under /dataset/tomato-dataset-final/test/ .

In [None]:
#@title Load prerequisites and define needed functions

import argparse
import os
import time

import matplotlib.pyplot as plt

import torch
import numpy as np
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models
from PIL import Image
from collections import OrderedDict
import json
!pip install lime
from lime import lime_image

## YOU CAN CHANGE THESE VARIABLES    
EPOCHS = 100
BATCH_SIZE = 20
LEARNING_RATE = 0.0001
data_dir = '/content/dataset/tomato-dataset-final/'
save_checkpoints = True
save_model_to = '/content/output/'
!mkdir /content/output/
IMG_SIZE = 220
NUM_WORKERS = 1
using_gpu = torch.cuda.is_available()
print_every = 300
ARCH = 'densenet161'
######################################################

def data_loader(root, batch_size=256, workers=1, pin_memory=True):
    traindir = os.path.join(root, 'train')
    valdir = os.path.join(root, 'val')
    testdir = os.path.join(root, 'test')
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    train_dataset = datasets.ImageFolder(
        traindir,
        transforms.Compose([
            transforms.Resize(size=(IMG_SIZE, IMG_SIZE)),
            transforms.ToTensor(),
            normalize
        ])
    )
    val_dataset = datasets.ImageFolder(
        valdir,
        transforms.Compose([
            transforms.Resize(size=(IMG_SIZE, IMG_SIZE)),
            transforms.ToTensor(),
            normalize
        ])
    )
    test_dataset = datasets.ImageFolder(
        testdir,
        transforms.Compose([
            transforms.Resize(size=(IMG_SIZE, IMG_SIZE)),
            transforms.ToTensor(),
            normalize
        ])
    )

    train_loader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=workers,
        pin_memory=pin_memory,
        sampler=None
    )
    val_loader = torch.utils.data.DataLoader(
        val_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=workers,
        pin_memory=pin_memory
    )
    test_loader = torch.utils.data.DataLoader(
        test_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=workers,
        pin_memory=pin_memory
    )
    return train_loader, val_loader, test_loader, train_dataset, val_dataset, test_dataset

# Data loading
train_loader, val_loader, test_loader, train_dataset, val_dataset, test_dataset = data_loader(data_dir, BATCH_SIZE, NUM_WORKERS, False)
print("Training Set: " + str(len(train_loader.dataset)))
print("Validation Set: " + str(len(val_loader.dataset)))
print("Testing Set: " + str(len(test_loader.dataset)))


##########################
### DOWNLOAD THE MODEL ###
##########################

## FEEL FREE TO CHANGE THESE PARAMETERS
model_URL = "http://faridnakhle.com/pv/models/RSGAI_DenseNet.zip"
save_data_to = "/content/models/"
model_file_name = "densenet.zip"
#######################################

if not os.path.exists(save_data_to):
    os.makedirs(save_data_to)

print("Downloading model...")  

r = requests.get(model_URL, stream = True, headers={"User-Agent": "Ready, Steady, Go AI"})
with open(save_data_to + model_file_name, "wb") as file: 
    for block in r.iter_content(chunk_size = 1024):
         if block: 
             file.write(block)

## Extract downloaded zip dataset file
print("Model downloaded")  
print("Extracting files...")

with zipfile.ZipFile(save_data_to + model_file_name, 'r') as zip_dataset:
    zip_dataset.extractall(save_data_to)
print("All done!")  




# Freeze parameters so we don't backprop through them
hidden_layers = [10240, 1024]
def make_model(structure, hidden_layers, lr, preTrained):
    if structure=="densenet161":
        model = models.densenet161(pretrained=preTrained)
        input_size = 2208
    else:
        model = models.vgg16(pretrained=preTrained)
        input_size = 25088
    output_size = 102
    for param in model.parameters():
        param.requires_grad = False

    classifier = nn.Sequential(OrderedDict([
                              ('dropout',nn.Dropout(0.5)),
                              ('fc1', nn.Linear(input_size, hidden_layers[0])),
                              ('relu1', nn.ReLU()),
                              ('fc2', nn.Linear(hidden_layers[0], hidden_layers[1])),
                              ('relu2', nn.ReLU()),
                              ('fc3', nn.Linear(hidden_layers[1], output_size)),
                              ('output', nn.LogSoftmax(dim=1))
                              ]))

    model.classifier = classifier
    return model

model = make_model(ARCH, hidden_layers, LEARNING_RATE, True)
# define loss and optimizer
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=LEARNING_RATE)

def cal_accuracy(model, dataloader):
    validation_loss = 0
    accuracy = 0
    for i, (inputs,labels) in enumerate(dataloader):
                optimizer.zero_grad()
                inputs, labels = inputs.to('cuda') , labels.to('cuda')
                model.to('cuda')
                with torch.no_grad():    
                    outputs = model.forward(inputs)
                    validation_loss = criterion(outputs,labels)
                    ps = torch.exp(outputs).data
                    equality = (labels.data == ps.max(1)[1])
                    accuracy += equality.type_as(torch.FloatTensor()).mean()
                    
    validation_loss = validation_loss / len(dataloader)
    accuracy = accuracy /len(dataloader)
    
    return validation_loss, accuracy



RESUME = True
RESUME_PATH ='/content/models/RSGAI_DenseNet.pth'
def loading_checkpoint(path):
    # Loading the parameters
    state = torch.load(path)
    LEARNING_RATE = state['learning_rate']
    structure = state['structure']
    hidden_layers = state['hidden_layers']
    epochs = state['epochs']
    
    # Building the model from checkpoints
    model = make_model(structure, hidden_layers, LEARNING_RATE, False) # IF NOT PRETRAINED CHANGE TO FALSE
    model.class_to_idx = state['class_to_idx']
    model.load_state_dict(state['state_dict'])
    model.eval()
    return model

if RESUME:
  print(RESUME_PATH)
  if os.path.isfile(RESUME_PATH):
      model = loading_checkpoint(RESUME_PATH)
      print("=> loaded checkpoint '{}'".format(RESUME_PATH))
  else:
    print("Invalid model Path")


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.cuda()
@torch.no_grad()
def get_all_preds(model, dataloader):
        
        all_preds = torch.tensor([])
        all_preds = all_preds.to(device)
        all_labels = torch.tensor([])
        all_labels = all_labels.to(device)

        for data, target in dataloader:
            input = data.to(device)
            target = target.to(device)

            with torch.no_grad():
                output = model(input)

            all_preds = torch.cat(
                (all_preds, output)
                ,dim=0
            )
            all_labels = torch.cat(
                (all_labels, target)
                ,dim=0
            )

        return all_preds, all_labels
    
def get_num_correct(preds, labels):
        return preds.argmax(dim=1).eq(labels).sum().item()

In [None]:
with torch.no_grad():
    ## TEST THE MODEL ACCURACY USING THE TEST SET##

# Solution

In [None]:
with torch.no_grad():
    model.eval()
    test_preds, test_labels = get_all_preds(model,test_loader)
    preds_correct = get_num_correct(test_preds.cuda(), test_labels.cuda())
    print('total correct:', preds_correct)
    print('accuracy:')
    print(((preds_correct / (len(test_loader.dataset))) * 100))

# 8. Exercise VII: Generating Confusion Matrix

In this exercise, you will plot the confusion matrix to visualize the prediction performance for each class.

In [None]:
#@title Defining required functions

def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues):
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        #percentage: 
        cm = cm.astype('float') * 100
        # add percentage sign

    mycm = plt.imshow(cm, interpolation='nearest', cmap=cmap)
    mycm.set_clim([0,100])
    cbar = plt.colorbar(mycm, shrink=0.82, ticks=list(range(0, 120, 20)))
    cbar.ax.set_yticklabels(['0', '20', '40', '60', '80', '100'])  # vertically oriented colorbar

    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45,  ha="right")
    
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, str(format(cm[i, j], fmt)) + "%", horizontalalignment="center", color="white" if cm[i, j] > thresh else "black")
        

    plt.rcParams['font.family'] = "sans-serif"
    plt.rcParams['font.sans-serif'] = "Arial"
    plt.rcParams.update({'font.size': 12})
    plt.ylabel('True class', fontsize=17, fontweight='bold')
    plt.xlabel('Predicted class', fontsize=17, fontweight='bold')

import itertools

cmt = torch.zeros(10, 10, dtype=torch.int32) #10 is the number of classes

stacked = torch.stack(
    (
        test_labels
        ,test_preds.argmax(dim=1)
    )
    ,dim=1
)

for p in stacked:
    tl, pl = p.tolist()
    tl = int(tl)
    pl = int(pl)
    cmt[tl, pl] = cmt[tl, pl] + 1

#Plot CM
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

In [None]:
## TO CALCULATE THE CONFUSION MATRIX, USE THE CONFUSION_MATRIX FUNCTION
## PLOT THE MATRIX USING THE plot_confusion_matrix() FUNCTION DEFINED IN "Defining required functions" cell above.

# Solution

In [None]:
cm = confusion_matrix(test_labels.cpu(), test_preds.argmax(dim=1).cpu())
print(cm)

plt.figure(figsize=(12, 12))
plot_confusion_matrix(cm, test_dataset.classes, True, 'Confusion matrix', cmap=plt.cm.Blues)
plt.savefig(save_model_to + 'confusionMatrix.eps', format='eps', bbox_inches='tight')
plt.show()

# 9. Exercise VIII: Generating Explanations With LIME

In this exercise you are required to use LIME in order to generate explanations for the classification of the image located under '/content/dataset/tomato-dataset-final/test/Tomato___Late_blight/image (1076)_cropped_1.JPG'.

In [None]:
from lime import lime_image
from skimage import io
from skimage import img_as_ubyte
from skimage.segmentation import mark_boundaries


#@title Define Prerequisite Functions
Pretrainedmodel =  model
def get_PCNN_image(path):
  image = cv2.imread(path)
  image = cv2.resize(image, (226,226))
  return image

PerturbationImgs = []

def batch_predictPDCNN(images):
    Pretrainedmodel.eval()
    batch = torch.stack(tuple(preprocess_transform(i) for i in images), dim=0)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    Pretrainedmodel.to(device)
    batch = batch.to(device)
    logits = Pretrainedmodel(batch)
    probs = F.softmax(logits, dim=1)

    for image in images:
      PerturbationImgs.append(image)

    return probs.detach().cpu().numpy()

def get_preprocess_transform():
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                    std=[0.229, 0.224, 0.225])     
    transf = transforms.Compose([
        transforms.ToTensor(),
        normalize
    ])    

    return transf    

preprocess_transform = get_preprocess_transform()

explainerPCNN = lime_image.LimeImageExplainer()


In [None]:
## USE LIME TO GENERATE SUPERPIXELS ##
## THEN GENERATE PERTURBATIONS ##
## FINALLY, GENERATE THE EXPLANATION FOR image (1076) of the tomato_late_blight test set.

# Solution

In [None]:
image2explain = '/content/dataset/tomato-dataset-final/test/Tomato___Late_blight/image (1076)_cropped_1.JPG'
PCNNimg = get_PCNN_image(image2explain)
PCNNimg = cv2.cvtColor(PCNNimg, cv2.COLOR_BGR2RGB)
explanationPCNN = explainerPCNN.explain_instance(PCNNimg, 
                                         batch_predictPDCNN, top_labels=5, hide_color=0, num_samples=5000)

tempPCNN, maskPCNN = explanationPCNN.get_image_and_mask(explanationPCNN.top_labels[0], positive_only=True, num_features=1, hide_rest=False)

###############################
## SUPER PIXEL PERTURPATIONS ##
###############################
from matplotlib import gridspec
## VISUALIZE SOME PERTURBATIONS
# create a figure
fig = plt.figure()
# to change size of subplot's
fig.set_figheight(5)
# set width of each subplot as 8
fig.set_figwidth(15)

# create grid for different subplots
spec = gridspec.GridSpec(ncols=5, nrows=2, wspace=0.1, hspace=0.1)

print("PERTURBATIONS:")
i=0
for perturbationImg in PerturbationImgs:
    p = fig.add_subplot(spec[i])
    p.axis('off')
    p.imshow(perturbationImg)
    i = i + 1
    if i > 9:
      break



In [None]:
#######################
## SHOW  EXPLANATION ##
#######################
print("FINAL EXPLANATION:")
tempCNNP, maskCNNP = explanationPCNN.get_image_and_mask(explanationPCNN.top_labels[0], positive_only=False, num_features=1, hide_rest=False)
fig, (ax1) = plt.subplots(1, 1, figsize=(5,5))
ax1.bbox_inches='tight'
ax1.pad_inches = 0
ax1.axis('off')
plt.subplots_adjust(wspace=0, hspace=0)
plt.imshow(mark_boundaries(tempCNNP, maskCNNP))