# **Yolo_v2**

<font size = 4> Yolo v2 is an object detection model which detects and classifies objects in images. This is based on the second version of the original Yolo implementation which was published by [Redmon and Farhadi](https://ieeexplore.ieee.org/document/8100173).

---

<font size = 4>*Disclaimer*:

<font size = 4>This notebook is based on the following paper: **YOLO9000: Better, Faster, Stronger**, Proceedings of the IEEE conference on computer vision and pattern recognition, 7263-7271, 2017, Joseph Redmon and Ali Farhadi, [https://ieeexplore.ieee.org/document/8100173](https://ieeexplore.ieee.org/document/8100173)

<font size = 4>The source code for this notebook is adapted for keras and can be found in: [https://github.com/experiencor/keras-yolo2](https://github.com/experiencor/keras-yolo2)

<font size = 4>The dataset used currently can be downloaded from [here](https://public.roboflow.ai/object-detection/bccd)


<font size = 4>**Please also cite this original paper when using or developing this notebook.**

# **How to use this notebook?**

---

<font size = 4>Video describing how to use ZeroCostDL4Mic notebooks are available on youtube:
  - [**Video 1**](https://www.youtube.com/watch?v=GzD2gamVNHI&feature=youtu.be): Full run through of the workflow to obtain the notebooks and the provided test datasets as well as a common use of the notebook
  - [**Video 2**](https://www.youtube.com/watch?v=PUuQfP5SsqM&feature=youtu.be): Detailed description of the different sections of the notebook


---
###**Structure of a notebook**

<font size = 4>The notebook contains two types of cell:  

<font size = 4>**Text cells** provide information and can be modified by douple-clicking the cell. You are currently reading the text cell. You can create a new text by clicking `+ Text`.

<font size = 4>**Code cells** contain code and the code can be modfied by selecting the cell. To execute the cell, move your cursor on the `[ ]`-mark on the left side of the cell (play button appears). Click to execute the cell. After execution is done the animation of play button stops. You can create a new coding cell by clicking `+ Code`.

---
###**Table of contents, Code snippets** and **Files**

<font size = 4>On the top left side of the notebook you find three tabs which contain from top to bottom:

<font size = 4>*Table of contents* = contains structure of the notebook. Click the content to move quickly between sections.

<font size = 4>*Code snippets* = contain examples how to code certain tasks. You can ignore this when using this notebook.

<font size = 4>*Files* = contain all available files. After mounting your google drive (see section 1.) you will find your files and folders here. 

<font size = 4>**Remember that all uploaded files are purged after changing the runtime.** All files saved in Google Drive will remain. You do not need to use the Mount Drive-button; your Google Drive is connected in section 1.2.

<font size = 4>**Note:** The "sample data" in "Files" contains default files. Do not upload anything in here!

---
###**Making changes to the notebook**

<font size = 4>**You can make a copy** of the notebook and save it to your Google Drive. To do this click file -> save a copy in drive.

<font size = 4>To **edit a cell**, double click on the text. This will show you either the source code (in code cells) or the source text (in text cells).
You can use the `#`-mark in code cells to comment out parts of the code. This allows you to keep the original code piece in the cell as a comment.

#**0. Before getting started**
---
<font size = 4> Preparing the dataset carefully is essential to make this Yolo_v2 notebook work. This model requires as input a set of images (currently .jpg) and as target a list of annotation files in Pascal VOC format. The annotation files should have the exact same name as the input files, except with an .xml instead of the .jpg extension. The annotation files contain the class labels and all bounding boxes for the objects for each image in your dataset. Most datasets will give the option of saving the annotations in this format or using software for hand-annotations will automatically save the annotations in this format. 

<font size=4> If you want to assemble your own dataset we recommend using the open source https://www.makesense.ai/ resource.

<font size = 4>**We strongly recommend that you generate extra paired images. These images can be used to assess the quality of your trained model (Quality control dataset)**. The quality control assessment can be done directly in this notebook.

<font size = 4> **Additionally, the corresponding input and output files need to have the same name**.

<font size = 4> Please note that you currently can **only use .png or .jpg files!**


<font size = 4>Here's a common data structure that can work:
*   Experiment A
    - **Training dataset**
      - Input images (Training_source)
        - img_1.jpg, img_2.jpg, ...
      - High SNR images (Training_source_annotations)
        - img_1.xml, img_2.xml, ...
    - **Quality control dataset**
     - Input images
        - img_1.jpg, img_2.jpg
      - High SNR images
        - img_1.xml, img_2.xml
    - **Data to be predicted**
    - **Results**

---
<font size = 4>**Important note**

<font size = 4>- If you wish to **Train a network from scratch** using your own dataset (and we encourage everyone to do that), you will need to run **sections 1 - 4**, then use **section 5** to assess the quality of your model and **section 6** to run predictions using the model that you trained.

<font size = 4>- If you wish to **Evaluate your model** using a model previously generated and saved on your Google Drive, you will only need to run **sections 1 and 2** to set up the notebook, then use **section 5** to assess the quality of your model.

<font size = 4>- If you only wish to **run predictions** using a model previously generated and saved on your Google Drive, you will only need to run **sections 1 and 2** to set up the notebook, then use **section 6** to run the predictions on the desired model.
---

# **1. Initialise the Colab session**
---







## **1.1. Check for GPU access**
---

By default, the session should be using Python 3 and GPU acceleration, but it is possible to ensure that these are set properly by doing the following:

<font size = 4>Go to **Runtime -> Change the Runtime type**

<font size = 4>**Runtime type: Python 3** *(Python 3 is programming language in which this program is written)*

<font size = 4>**Accelator: GPU** *(Graphics processing unit)*


In [None]:
#@markdown ##Run this cell to check if you have GPU access
%tensorflow_version 1.x

import tensorflow as tf
if tf.test.gpu_device_name()=='':
  print('You do not have GPU access.') 
  print('Did you change your runtime ?') 
  print('If the runtime setting is correct then Google did not allocate a GPU for your session')
  print('Expect slow performance. To access GPU try reconnecting later')

else:
  print('You have GPU access')
  !nvidia-smi


## **1.2. Mount your Google Drive**
---
<font size = 4> To use this notebook on the data present in your Google Drive, you need to mount your Google Drive to this notebook.

<font size = 4> Play the cell below to mount your Google Drive and follow the link. In the new browser window, select your drive and select 'Allow', copy the code, paste into the cell and press enter. This will give Colab access to the data on the drive. 

<font size = 4> Once this is done, your data are available in the **Files** tab on the top left of notebook.

In [None]:
#@markdown ##Run this cell to connect your Google Drive to Colab

#@markdown * Click on the URL. 

#@markdown * Sign in your Google Account. 

#@markdown * Copy the authorization code. 

#@markdown * Enter the authorization code. 

#@markdown * Click on "Files" site on the right. Refresh the site. Your Google Drive folder should now be available here as "drive". 

#mounts user's Google Drive to Google Colab.

from google.colab import drive
drive.mount('/content/gdrive')

# **2. Install Yolo_v2 and Dependencies**
---


In [None]:
#@markdown ##Install Network and Dependencies
%tensorflow_version 1.x
!pip install pascal-voc-writer
from pascal_voc_writer import Writer
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import
import csv
import random
import pprint
import sys
import time
import numpy as np
from optparse import OptionParser
import pickle
import math
import cv2
import copy
import math
from matplotlib import pyplot as plt
import matplotlib.patches as patches
import tensorflow as tf
import pandas as pd
import os
import shutil
from skimage import io
from sklearn.metrics import average_precision_score

from keras.models import Model
from keras.layers import Flatten, Dense, Input, Conv2D, MaxPooling2D, Dropout, Reshape, Activation, Conv2D, MaxPooling2D, BatchNormalization, Lambda
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.merge import concatenate
from keras.applications.mobilenet import MobileNet
from keras.applications import InceptionV3
from keras.applications.vgg16 import VGG16
from keras.applications.resnet50 import ResNet50

from keras import backend as K
from keras.optimizers import Adam, SGD, RMSprop
from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, TimeDistributed
from keras.engine.topology import get_source_inputs
from keras.utils import layer_utils
from keras.utils.data_utils import get_file
from keras.objectives import categorical_crossentropy
from keras.models import Model
from keras.utils import generic_utils
from keras.engine import Layer, InputSpec
from keras import initializers, regularizers
from keras.utils import Sequence
import xml.etree.ElementTree as ET
from collections import OrderedDict, Counter
import json
import imageio
import imgaug as ia
from imgaug import augmenters as iaa
import copy
import cv2
from tqdm import tqdm
from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove
ia.seed(1)
# imgaug uses matplotlib backend for displaying images
from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
import re
import glob

if os.path.exists('/content/gdrive/My Drive/keras-yolo2'):
  shutil.rmtree('/content/gdrive/My Drive/keras-yolo2')

!git clone https://github.com/experiencor/keras-yolo2.git
shutil.move('/content/keras-yolo2','/content/gdrive/My Drive/keras-yolo2')

os.chdir('/content/gdrive/My Drive/keras-yolo2')
from backend import BaseFeatureExtractor, FullYoloFeature
from preprocessing import parse_annotation, BatchGenerator

print("Depencies installed and imported.")

def plt_rectangle(plt,label,x1,y1,x2,y2,fontsize=10):
    '''
    == Input ==
    
    plt   : matplotlib.pyplot object
    label : string containing the object class name
    x1    : top left corner x coordinate
    y1    : top left corner y coordinate
    x2    : bottom right corner x coordinate
    y2    : bottom right corner y coordinate
    '''
    linewidth = 1
    color = "yellow"
    plt.text(x1,y1,label,fontsize=fontsize,backgroundcolor="magenta")
    plt.plot([x1,x1],[y1,y2], linewidth=linewidth,color=color)
    plt.plot([x2,x2],[y1,y2], linewidth=linewidth,color=color)
    plt.plot([x1,x2],[y1,y1], linewidth=linewidth,color=color)
    plt.plot([x1,x2],[y2,y2], linewidth=linewidth,color=color)

def extract_single_xml_file(tree,object_count=True):
    Nobj = 0
    row  = OrderedDict()
    for elems in tree.iter():

        if elems.tag == "size":
            for elem in elems:
                row[elem.tag] = int(elem.text)
        if elems.tag == "object":
            for elem in elems:
                if elem.tag == "name":
                    row["bbx_{}_{}".format(Nobj,elem.tag)] = str(elem.text)              
                if elem.tag == "bndbox":
                    for k in elem:
                        row["bbx_{}_{}".format(Nobj,k.tag)] = float(k.text)
                    Nobj += 1
    if object_count == True:
      row["Nobj"] = Nobj
    return(row)

def count_objects(tree):
  Nobj=0
  for elems in tree.iter():
    if elems.tag == "object":
      for elem in elems:
        if elem.tag == "bndbox":
          Nobj += 1
  return(Nobj)

#DetectionMap

def intersect_area(box_a, box_b):
    """
    Compute the area of intersection between two rectangular bounding box
    Bounding boxes use corner notation : [x1, y1, x2, y2]
    Args:
      box_a: (np.array) bounding boxes, Shape: [A,4].
      box_b: (np.array) bounding boxes, Shape: [B,4].
    Return:
      np.array intersection area, Shape: [A,B].
    """
    resized_A = box_a[:, np.newaxis, :]
    resized_B = box_b[np.newaxis, :, :]
    max_xy = np.minimum(resized_A[:, :, 2:], resized_B[:, :, 2:])
    min_xy = np.maximum(resized_A[:, :, :2], resized_B[:, :, :2])

    diff_xy = (max_xy - min_xy)
    inter = np.clip(diff_xy, a_min=0, a_max=np.max(diff_xy))
    return inter[:, :, 0] * inter[:, :, 1]

    
def jaccard(box_a, box_b):
    """
    Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
    is simply the intersection over union of two boxes.  Here we operate on
    ground truth boxes and default boxes.
    E.g.:
        A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
    Args:
        box_a: (np.array) Predicted bounding boxes,    Shape: [n_pred, 4]
        box_b: (np.array) Ground Truth bounding boxes, Shape: [n_gt, 4]
    Return:
        jaccard overlap: (np.array) Shape: [n_pred, n_gt]
    """
    inter = intersect_area(box_a, box_b)
    area_a = ((box_a[:, 2] - box_a[:, 0]) * (box_a[:, 3] - box_a[:, 1]))
    area_b = ((box_b[:, 2] - box_b[:, 0]) * (box_b[:, 3] - box_b[:, 1]))
    area_a = area_a[:, np.newaxis]
    area_b = area_b[np.newaxis, :]
    union = area_a + area_b - inter
    return inter / union



"""
    Simple accumulator class that keeps track of True positive, False positive and False negative
    to compute precision and recall of a certain class
"""


class APAccumulator:
    def __init__(self):
        self.TP, self.FP, self.FN = 0, 0, 0

    def inc_good_prediction(self, value=1):
        self.TP += value

    def inc_bad_prediction(self, value=1):
        self.FP += value

    def inc_not_predicted(self, value=1):
        self.FN += value

    @property
    def precision(self):
        total_predicted = self.TP + self.FP
        if total_predicted == 0:
            total_gt = self.TP + self.FN
            if total_gt == 0:
                return 1.
            else:
                return 0.
        return float(self.TP) / total_predicted

    @property
    def recall(self):
        total_gt = self.TP + self.FN
        if total_gt == 0:
            return 1.
        return float(self.TP) / total_gt

    def __str__(self):
        str = ""
        str += "True positives : {}\n".format(self.TP)
        str += "False positives : {}\n".format(self.FP)
        str += "False Negatives : {}\n".format(self.FN)
        str += "Precision : {}\n".format(self.precision)
        str += "Recall : {}\n".format(self.recall)
        return str


DEBUG = False


class DetectionMAP:
    def __init__(self, n_class, pr_samples=11, overlap_threshold=0.8):
        """
        Running computation of average precision of n_class in a bounding box + classification task
        :param n_class:             quantity of class
        :param pr_samples:          quantification of threshold for pr curve
        :param overlap_threshold:   minimum overlap threshold
        """
        self.n_class = n_class
        self.overlap_threshold = overlap_threshold
        self.pr_scale = np.linspace(0, 1, pr_samples)
        self.total_accumulators = []
        self.reset_accumulators()

    def reset_accumulators(self):
        """
        Reset the accumulators state
        TODO this is hard to follow... should use a better data structure
        total_accumulators : list of list of accumulators at each pr_scale for each class
        :return:
        """
        self.total_accumulators = []
        for i in range(len(self.pr_scale)):
            class_accumulators = []
            for j in range(self.n_class):
                class_accumulators.append(APAccumulator())
            self.total_accumulators.append(class_accumulators)

    def evaluate(self, pred_bb, pred_classes, pred_conf, gt_bb, gt_classes):
        """
        Update the accumulator for the running mAP evaluation.
        For exemple, this can be called for each images
        :param pred_bb: (np.array)      Predicted Bounding Boxes [x1, y1, x2, y2] :     Shape [n_pred, 4]
        :param pred_classes: (np.array) Predicted Classes :                             Shape [n_pred]
        :param pred_conf: (np.array)    Predicted Confidences [0.-1.] :                 Shape [n_pred]
        :param gt_bb: (np.array)        Ground Truth Bounding Boxes [x1, y1, x2, y2] :  Shape [n_gt, 4]
        :param gt_classes: (np.array)   Ground Truth Classes :                          Shape [n_gt]
        :return:
        """

        if pred_bb.ndim == 1:
            pred_bb = np.repeat(pred_bb[:, np.newaxis], 4, axis=1)
        IoUmask = None
        if len(pred_bb) > 0:
            IoUmask = self.compute_IoU_mask(pred_bb, gt_bb, self.overlap_threshold)
        for accumulators, r in zip(self.total_accumulators, self.pr_scale):
            if DEBUG:
                print("Evaluate pr_scale {}".format(r))
            self.evaluate_(IoUmask, accumulators, pred_classes, pred_conf, gt_classes, r)

    @staticmethod
    def evaluate_(IoUmask, accumulators, pred_classes, pred_conf, gt_classes, confidence_threshold):
        pred_classes = pred_classes.astype(np.int)
        gt_classes = gt_classes.astype(np.int)

        for i, acc in enumerate(accumulators):
            gt_number = np.sum(gt_classes == i)
            pred_mask = np.logical_and(pred_classes == i, pred_conf >= confidence_threshold)
            pred_number = np.sum(pred_mask)
            if pred_number == 0:
                acc.inc_not_predicted(gt_number)
                continue

            IoU1 = IoUmask[pred_mask, :]
            mask = IoU1[:, gt_classes == i]

            tp = DetectionMAP.compute_true_positive(mask)
            fp = pred_number - tp
            fn = gt_number - tp
            acc.inc_good_prediction(tp)
            acc.inc_not_predicted(fn)
            acc.inc_bad_prediction(fp)

    @staticmethod
    def compute_IoU_mask(prediction, gt, overlap_threshold):
        IoU = jaccard(prediction, gt)
        # for each prediction select gt with the largest IoU and ignore the others
        for i in range(len(prediction)):
            maxj = IoU[i, :].argmax()
            IoU[i, :maxj] = 0
            IoU[i, (maxj + 1):] = 0
        # make a mask of all "matched" predictions vs gt
        return IoU >= overlap_threshold

    @staticmethod
    def compute_true_positive(mask):
        # sum all gt with prediction of its class
        return np.sum(mask.any(axis=0))

    def compute_ap(self, precisions, recalls):
        """
        Compute average precision of a particular classes (cls_idx)
        :param cls:
        :return:
        """
        previous_recall = 0
        average_precision = 0
        for precision, recall in zip(precisions[::-1], recalls[::-1]):
            average_precision += precision * (recall - previous_recall)
            previous_recall = recall
        return average_precision

    def compute_precision_recall_(self, class_index, interpolated=True):
        precisions = []
        recalls = []
        for acc in self.total_accumulators:
            precisions.append(acc[class_index].precision)
            recalls.append(acc[class_index].recall)

        if interpolated:
            interpolated_precision = []
            for precision in precisions:
                last_max = 0
                if interpolated_precision:
                    last_max = max(interpolated_precision)
                interpolated_precision.append(max(precision, last_max))
            precisions = interpolated_precision
        return precisions, recalls

    def plot_pr(self, ax, class_name, precisions, recalls, average_precision):
        ax.step(recalls, precisions, color='b', alpha=0.2,
                where='post')
        ax.fill_between(recalls, precisions, step='post', alpha=0.2,
                        color='b')
        ax.set_ylim([0.0, 1.05])
        ax.set_xlim([0.0, 1.0])
        ax.set_xlabel('Recall')
        ax.set_ylabel('Precision')
        ax.set_title('{0:} : AUC={1:0.2f}'.format(class_name, average_precision))

    def plot(self, interpolated=True, class_names=None):
        """
        Plot all pr-curves for each classes
        :param interpolated: will compute the interpolated curve
        :return:
        """
        grid = int(math.ceil(math.sqrt(self.n_class)))
        fig, axes = plt.subplots(nrows=grid, ncols=grid)
        mean_average_precision = []
        # TODO: data structure not optimal for this operation...
        for i, ax in enumerate(axes.flat):
            if i > self.n_class - 1:
                break
            precisions, recalls = self.compute_precision_recall_(i, interpolated)
            average_precision = self.compute_ap(precisions, recalls)
            class_name = class_names[i] if class_names else "Class {}".format(i)
            self.plot_pr(ax, class_name, precisions, recalls, average_precision)
            mean_average_precision.append(average_precision)

        plt.suptitle("Mean average precision : {:0.2f}".format(sum(mean_average_precision)/len(mean_average_precision)))
        fig.tight_layout(pad=2.0)


def show_frame(pred_bb, pred_classes, pred_conf, gt_bb, gt_classes, class_dict, background=np.zeros((512, 512, 3)), show_confidence=True):
    """
    Plot the boundingboxes
    :param pred_bb: (np.array)      Predicted Bounding Boxes [x1, y1, x2, y2] :     Shape [n_pred, 4]
    :param pred_classes: (np.array) Predicted Classes :                             Shape [n_pred]
    :param pred_conf: (np.array)    Predicted Confidences [0.-1.] :                 Shape [n_pred]
    :param gt_bb: (np.array)        Ground Truth Bounding Boxes [x1, y1, x2, y2] :  Shape [n_gt, 4]
    :param gt_classes: (np.array)   Ground Truth Classes :                          Shape [n_gt]
    :param class_dict: (dictionary) Key value pairs of classes, e.g. {0:'dog',1:'cat',2:'horse'}
    :return:
    """
    n_pred = pred_bb.shape[0]
    n_gt = gt_bb.shape[0]
    n_class = int(np.max(np.append(pred_classes, gt_classes)) + 1)
    #print(n_class)
    if len(background.shape) < 3:
      h, w = background.shape
    else:
      h, w, c = background.shape

    ax = plt.subplot("111")
    ax.imshow(background)
    cmap = plt.cm.get_cmap('hsv')

    confidence_alpha = pred_conf.copy()
    if not show_confidence:
        confidence_alpha.fill(1)

    for i in range(n_pred):
        x1 = pred_bb[i, 0] * w
        y1 = pred_bb[i, 1] * h
        x2 = pred_bb[i, 2] * w
        y2 = pred_bb[i, 3] * h
        rect_w = x2 - x1
        rect_h = y2 - y1
        #print(x1, y1)
        ax.add_patch(patches.Rectangle((x1, y1), rect_w, rect_h,
                                       fill=False,
                                       edgecolor=cmap(float(pred_classes[i]) / n_class),
                                       linestyle='dashdot',
                                       alpha=confidence_alpha[i]))

    for i in range(n_gt):
        x1 = gt_bb[i, 0]# * w
        y1 = gt_bb[i, 1]# * h
        x2 = gt_bb[i, 2]# * w
        y2 = gt_bb[i, 3]# * h
        rect_w = x2 - x1
        rect_h = y2 - y1
        ax.add_patch(patches.Rectangle((x1, y1), rect_w, rect_h,
                                       fill=False,
                                       edgecolor=cmap(float(gt_classes[i]) / n_class)))

    legend_handles = []

    for i in range(n_class):
        legend_handles.append(patches.Patch(color=cmap(float(i) / n_class), label=class_dict[i]))
    
    ax.legend(handles=legend_handles)
    plt.show()

class BoundBox:
    def __init__(self, xmin, ymin, xmax, ymax, c = None, classes = None):
        self.xmin = xmin
        self.ymin = ymin
        self.xmax = xmax
        self.ymax = ymax
        
        self.c     = c
        self.classes = classes

        self.label = -1
        self.score = -1

    def get_label(self):
        if self.label == -1:
            self.label = np.argmax(self.classes)
        
        return self.label
    
    def get_score(self):
        if self.score == -1:
            self.score = self.classes[self.get_label()]
            
        return self.score

class WeightReader:
    def __init__(self, weight_file):
        self.offset = 4
        self.all_weights = np.fromfile(weight_file, dtype='float32')
        
    def read_bytes(self, size):
        self.offset = self.offset + size
        return self.all_weights[self.offset-size:self.offset]
    
    def reset(self):
        self.offset = 4

def bbox_iou(box1, box2):
    intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
    intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])  
    
    intersect = intersect_w * intersect_h

    w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
    w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
    
    union = w1*h1 + w2*h2 - intersect
    
    return float(intersect) / union

def draw_boxes(image, boxes, labels):
    image_h, image_w, _ = image.shape
    #Changes in box color added by LvC
    # class_colours = []
    # for c in range(len(labels)):
    #     colour = np.random.randint(low=0,high=255,size=3).tolist()
    #     class_colours.append(tuple(colour))
    for box in boxes:
        xmin = int(box.xmin*image_w)
        ymin = int(box.ymin*image_h)
        xmax = int(box.xmax*image_w)
        ymax = int(box.ymax*image_h)
        if box.get_label() == 0:
          cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (255,0,0), 3)
        elif box.get_label() == 1:
          cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (0,255,0), 3)
        else:#WBC
          cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (0,0,255), 3)
        #cv2.rectangle(image, (xmin,ymin), (xmax,ymax), class_colours[box.get_label()], 3)
        cv2.putText(image, 
                    labels[box.get_label()] + ' ' + str(round(box.get_score(),3)), 
                    (xmin, ymin - 13), 
                    cv2.FONT_HERSHEY_SIMPLEX, 
                    1e-3 * image_h, 
                    (0,0,0), 2)
    #print(box.get_label())    
    return image          

#Function added by LvC
def save_boxes(image, boxes, labels):
    image_h, image_w, _ = image.shape
    save_boxes =[]
    for box in boxes:
        xmin = box.xmin
        save_boxes.append(xmin)
        ymin = box.ymin
        save_boxes.append(ymin)
        xmax = box.xmax
        save_boxes.append(xmax)
        ymax = box.ymax
        save_boxes.append(ymax)
        score = box.get_score()
        save_boxes.append(score)
        label = box.get_label()
        save_boxes.append(label)
    if not os.path.exists('/content/mycsv.csv'):
      with open('/content/mycsv.csv', 'w', newline='') as csvfile:
        csvwriter = csv.writer(csvfile, delimiter=',')
        csvwriter.writerow(save_boxes)
    else:
      with open('/content/mycsv.csv', 'a+', newline='') as csvfile:
        csvwriter = csv.writer(csvfile)
        csvwriter.writerow(save_boxes)

        
def decode_netout(netout, anchors, nb_class, obj_threshold=0.5, nms_threshold=0.5):
    grid_h, grid_w, nb_box = netout.shape[:3]

    boxes = []
    
    # decode the output by the network
    netout[..., 4]  = _sigmoid(netout[..., 4])
    netout[..., 5:] = netout[..., 4][..., np.newaxis] * _softmax(netout[..., 5:])
    netout[..., 5:] *= netout[..., 5:] > obj_threshold
    
    for row in range(grid_h):
        for col in range(grid_w):
            for b in range(nb_box):
                # from 4th element onwards are confidence and class classes
                classes = netout[row,col,b,5:]
                
                if np.sum(classes) > 0:
                    # first 4 elements are x, y, w, and h
                    x, y, w, h = netout[row,col,b,:4]

                    x = (col + _sigmoid(x)) / grid_w # center position, unit: image width
                    y = (row + _sigmoid(y)) / grid_h # center position, unit: image height
                    w = anchors[2 * b + 0] * np.exp(w) / grid_w # unit: image width
                    h = anchors[2 * b + 1] * np.exp(h) / grid_h # unit: image height
                    confidence = netout[row,col,b,4]
                    
                    box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, confidence, classes)
                    
                    boxes.append(box)

    # suppress non-maximal boxes
    for c in range(nb_class):
        sorted_indices = list(reversed(np.argsort([box.classes[c] for box in boxes])))

        for i in range(len(sorted_indices)):
            index_i = sorted_indices[i]
            
            if boxes[index_i].classes[c] == 0: 
                continue
            else:
                for j in range(i+1, len(sorted_indices)):
                    index_j = sorted_indices[j]
                    
                    if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_threshold:
                        boxes[index_j].classes[c] = 0
                        
    # remove the boxes which are less likely than a obj_threshold
    boxes = [box for box in boxes if box.get_score() > obj_threshold]
    
    return boxes

def replace(file_path, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    with fdopen(fh,'w') as new_file:
        with open(file_path) as old_file:
            for line in old_file:
                new_file.write(line.replace(pattern, subst))
    #Copy the file permissions from the old file to the new file
    copymode(file_path, abs_path)
    #Remove original file
    remove(file_path)
    #Move new file
    move(abs_path, file_path)

with open("/content/gdrive/My Drive/keras-yolo2/frontend.py", "r") as check:
  lineReader = check.readlines()
  reduce_lr = False
  for line in lineReader:
    if "reduce_lr" in line:
      reduce_lr = True
      break

if reduce_lr == False:
  #replace("/content/gdrive/My Drive/keras-yolo2/frontend.py","period=1)","period=1)\n        csv_logger=CSVLogger('/content/training_evaluation.csv')")
  replace("/content/gdrive/My Drive/keras-yolo2/frontend.py","period=1)","period=1)\n        reduce_lr=ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, verbose=1)")
replace("/content/gdrive/My Drive/keras-yolo2/frontend.py","import EarlyStopping","import ReduceLROnPlateau, EarlyStopping")
replace("/content/gdrive/My Drive/keras-yolo2/frontend.py", "[early_stop, checkpoint, tensorboard]","[checkpoint, reduce_lr]")

from frontend import YOLO

def train(config_path, model_path, percentage_validation):
    #config_path = args.conf

    with open(config_path) as config_buffer:    
        config = json.loads(config_buffer.read())

    ###############################
    #   Parse the annotations 
    ###############################

    # parse annotations of the training set
    train_imgs, train_labels = parse_annotation(config['train']['train_annot_folder'], 
                                                config['train']['train_image_folder'], 
                                                config['model']['labels'])

    # parse annotations of the validation set, if any, otherwise split the training set
    if os.path.exists(config['valid']['valid_annot_folder']):
        valid_imgs, valid_labels = parse_annotation(config['valid']['valid_annot_folder'], 
                                                    config['valid']['valid_image_folder'], 
                                                    config['model']['labels'])
    else:
        train_valid_split = int((1-percentage_validation/100.)*len(train_imgs))
        np.random.shuffle(train_imgs)

        valid_imgs = train_imgs[train_valid_split:]
        train_imgs = train_imgs[:train_valid_split]

    if len(config['model']['labels']) > 0:
        overlap_labels = set(config['model']['labels']).intersection(set(train_labels.keys()))

        print('Seen labels:\t', train_labels)
        print('Given labels:\t', config['model']['labels'])
        print('Overlap labels:\t', overlap_labels)           

        if len(overlap_labels) < len(config['model']['labels']):
            print('Some labels have no annotations! Please revise the list of labels in the config.json file!')
            return
    else:
        print('No labels are provided. Train on all seen labels.')
        config['model']['labels'] = train_labels.keys()
        
    ###############################
    #   Construct the model 
    ###############################

    yolo = YOLO(backend             = config['model']['backend'],
                input_size          = config['model']['input_size'], 
                labels              = config['model']['labels'], 
                max_box_per_image   = config['model']['max_box_per_image'],
                anchors             = config['model']['anchors'])

    ###############################
    #   Load the pretrained weights (if any) 
    ###############################    

    if os.path.exists(config['train']['pretrained_weights']):
        print("Loading pre-trained weights in", config['train']['pretrained_weights'])
        yolo.load_weights(config['train']['pretrained_weights'])

    ###############################
    #   Start the training process 
    ###############################

    yolo.train(train_imgs         = train_imgs,
               valid_imgs         = valid_imgs,
               train_times        = config['train']['train_times'],
               valid_times        = config['valid']['valid_times'],
               nb_epochs          = config['train']['nb_epochs'], 
               learning_rate      = config['train']['learning_rate'], 
               batch_size         = config['train']['batch_size'],
               warmup_epochs      = config['train']['warmup_epochs'],
               object_scale       = config['train']['object_scale'],
               no_object_scale    = config['train']['no_object_scale'],
               coord_scale        = config['train']['coord_scale'],
               class_scale        = config['train']['class_scale'],
               saved_weights_name = config['train']['saved_weights_name'],
               debug              = config['train']['debug'])

# The training evaluation.csv is saved (overwrites the Files if needed). 
    lossDataCSVpath = os.path.join(model_path,'Quality Control/training_evaluation.csv')
    with open(lossDataCSVpath, 'w') as f:
      writer = csv.writer(f)
      writer.writerow(['loss','val_loss', 'learning rate'])
      for i in range(len(yolo.model.history.history['loss'])):
        writer.writerow([yolo.model.history.history['loss'][i], yolo.model.history.history['val_loss'][i], yolo.model.history.history['lr'][i]])

    yolo.model.save(model_path+'/last_weights.h5')

def predict(config, weights_path, image_path):

    with open(config) as config_buffer:    
        config = json.load(config_buffer)

    ###############################
    #   Make the model 
    ###############################

    yolo = YOLO(backend             = config['model']['backend'],
                input_size          = config['model']['input_size'], 
                labels              = config['model']['labels'], 
                max_box_per_image   = config['model']['max_box_per_image'],
                anchors             = config['model']['anchors'])

    ###############################
    #   Load trained weights
    ###############################    

    yolo.load_weights(weights_path)

    ###############################
    #   Predict bounding boxes 
    ###############################

    if image_path[-4:] == '.mp4':
        video_out = image_path[:-4] + '_detected' + image_path[-4:]
        video_reader = cv2.VideoCapture(image_path)

        nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
        frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
        frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))

        video_writer = cv2.VideoWriter(video_out,
                               cv2.VideoWriter_fourcc(*'MPEG'), 
                               50.0, 
                               (frame_w, frame_h))

        for i in tqdm(range(nb_frames)):
            _, image = video_reader.read()
            
            boxes = yolo.predict(image)
            image = draw_boxes(image, boxes, config['model']['labels'])

            video_writer.write(np.uint8(image))

        video_reader.release()
        video_writer.release()  
    else:
        image = cv2.imread(image_path)
        boxes = yolo.predict(image)
        image = draw_boxes(image, boxes, config['model']['labels'])
        save_boxes(image,boxes,config['model']['labels'])#added by LvC
        print(len(boxes), 'boxes are found')
        #print(image)
        cv2.imwrite(image_path[:-4] + '_detected' + image_path[-4:], image)


# function to convert BoundingBoxesOnImage object into DataFrame
def bbs_obj_to_df(bbs_object):
#     convert BoundingBoxesOnImage object into array
    bbs_array = bbs_object.to_xyxy_array()
#     convert array into a DataFrame ['xmin', 'ymin', 'xmax', 'ymax'] columns
    df_bbs = pd.DataFrame(bbs_array, columns=['xmin', 'ymin', 'xmax', 'ymax'])
    return df_bbs

# Function that will extract column data for our CSV file
def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

# **3. Select your paths and parameters**

---

<font size = 4>The code below allows the user to enter the paths to where the training data is and to define the training parameters.

<font size = 4>After playing the cell will display some quantitative metrics of your dataset, including a count of objects per image and the number of instances per class.


# **3.1. Parameters and paths**
---

<font size = 4>**`Training_source:`, `Training_source_annotations`:** These are the paths to your folders containing the Training_source and the annotation data respectively. To find the paths of the folders containing the respective datasets, go to your Files on the left of the notebook, navigate to the folder containing your files and copy the path by right-clicking on the folder, **Copy path** and pasting it into the right box below.

<font size = 4>**`model_name`:** Use only my_model -style, not my-model (Use "_" not "-"). Do not use spaces in the name. Avoid using the name of an existing model (saved in the same folder) as it will be overwritten.

<font size = 4>**`model_path`**: Enter the path where your model will be saved once trained (for instance your result folder).

<font size = 5>**Training Parameters**

<font size = 4>**`number_of_epochs`:**Give estimates for training performance given a number of epochs and provide a default value. **Default value:10**

<font size = 4>**Note that Yolov2 uses 3 Warm-up epochs which improves the model's performance. This means the network will train for number_of_epochs + 3 epochs.**

<font size = 4>**`backend`:** There are different backends which are available to be trained for Yolo. These are usually slightly different model architectures, with pretrained weights. Take a look at the available backends and research which one will be best suited for your dataset.

<font size = 5>**Advanced Parameters - experienced users only**

<font size =4>**`batch_size:`** This parameter defines the number of patches seen in each training step. Reducing or increasing the **batch size** may slow or speed up your training, respectively, and can influence network performance. **Default value: 16**

<font size=4>**`false_negative_penalty`**: Penalize wrong detection of 'no-object'. **Default:5.0**

<font size=4>**`false_positive_penalty`**: Penalize wrong detection of 'object'. **Default:1.0**

<font size=4>**`position_size_penalty`**: Penalize inaccurate positioning or size of bounding boxes. **Default:1.0**

<font size=4>**`false_class_penalty`**: Penalize misclassification of object in bounding box. **Default:1.0**

<font size = 4>**`percentage_validation`:**  Input the percentage of your training dataset you want to use to validate the network during training. **Currently automatically implemented** **Default value: 20** 

In [None]:
class bcolors:
  WARNING = '\033[31m'

#@markdown ###Path to training images:

Training_Source = "" #@param {type:"string"}

# Ground truth images
Training_Source_annotations = "" #@param {type:"string"}

# model name and path
#@markdown ###Name of the model and path to model folder:
model_name = "" #@param {type:"string"}
model_path = "" #@param {type:"string"}

# backend
#@markdown ###Choose a backend
#os.chdir(model_path+'/keras-yolo2')
backend = "Full Yolo" #@param ["Select Model","Full Yolo","Inception3","SqueezeNet","MobileNet","Tiny Yolo"]
os.chdir('/content/gdrive/My Drive/keras-yolo2')
if backend == "Full Yolo":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/full_yolo_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/full_yolo_backend.h5
elif backend == "Inception3":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/inception_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/inception_backend.h5
elif backend == "MobileNet":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/mobilenet_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/mobilenet_backend.h5
elif backend == "SqueezeNet":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/squeezenet_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/squeezenet_backend.h5
elif backend == "Tiny Yolo":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/tiny_yolo_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/tiny_yolo_backend.h5

#os.chdir('/content/drive/My Drive/Zero-Cost Deep-Learning to Enhance Microscopy/Various dataset/Detection_Dataset_2/BCCD.v2.voc')
#if not os.path.exists(model_path+'/full_raccoon.h5'):
 # !wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1NWbrpMGLc84ow-4gXn2mloFocFGU595s' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1NWbrpMGLc84ow-4gXn2mloFocFGU595s" -O full_yolo_raccoon.h5 && rm -rf /tmp/cookies.txt

full_model_path = os.path.join(model_path,model_name)
if os.path.exists(full_model_path):
  print('Existing model path will be overwritten')
  shutil.rmtree(full_model_path)
os.mkdir(full_model_path)

full_model_file_path = full_model_path+'/best_weights.h5'
os.chdir('/content/gdrive/My Drive/keras-yolo2/')

#Change backend name
!sed -i 's@\"backend\":.*,@\"backend\":              \"$backend\",@g' config.json

#Change the name of the training folder
!sed -i 's@\"train_image_folder\":.*,@\"train_image_folder\":   \"$Training_Source/\",@g' config.json

#Change annotation folder
!sed -i 's@\"train_annot_folder\":.*,@\"train_annot_folder\":   \"$Training_Source_annotations/\",@g' config.json

#Change the name of the saved model
!sed -i 's@\"saved_weights_name\":.*,@\"saved_weights_name\":   \"$full_model_file_path\",@g' config.json

#Change warmup epochs for untrained model
!sed -i 's@\"warmup_epochs\":.*,@\"warmup_epochs\":        3,@g' config.json

#When defining a new model we should reset the pretrained model parameter
!sed -i 's@\"pretrained_weights\":.*,@\"pretrained_weights\":   \"No_pretrained_weights\",@g' config.json

# other parameters for training.
#@markdown ###Training Parameters
#@markdown Number of epochs:

number_of_epochs =  10#@param {type:"number"}

!sed -i 's@\"nb_epochs\":.*,@\"nb_epochs\":            $number_of_epochs,@g' config.json

#@markdown ###Advanced Parameters

Use_Default_Advanced_Parameters = False #@param {type:"boolean"}
#@markdown ###If not, please input:

batch_size =  16#@param {type:"number"}
learning_rate = 1e-4 #@param{type:"number"}
false_negative_penalty = 5.0 #@param{type:"number"}
false_positive_penalty = 1.0 #@param{type:"number"}
position_size_penalty = 1.0 #@param{type:"number"}
false_class_penalty = 1.0 #@param{type:"number"}
percentage_validation = 20 #@param{type:"number"}

if (Use_Default_Advanced_Parameters): 
  print("Default advanced parameters enabled")
  batch_size = 8
  learning_rate = 1e-4
  false_negative_penalty = 5.0
  false_positive_penalty = 1.0
  position_size_penalty = 1.0
  false_class_penalty = 1.0

!sed -i 's@\"batch_size\":.*,@\"batch_size\":           $batch_size,@g' config.json

!sed -i 's@\"learning_rate\":.*,@\"learning_rate\":        $learning_rate,@g' config.json

!sed -i 's@\"object_scale":.*,@\"object_scale\":         $false_negative_penalty,@g' config.json

!sed -i 's@\"no_object_scale":.*,@\"no_object_scale\":      $false_positive_penalty,@g' config.json

!sed -i 's@\"coord_scale\":.*,@\"coord_scale\":          $position_size_penalty,@g' config.json

!sed -i 's@\"class_scale\":.*,@\"class_scale\":          $false_class_penalty,@g' config.json

df_anno = []
dir_anno = Training_Source_annotations
for fnm in os.listdir(dir_anno):  
    if not fnm.startswith('.'): ## do not include hidden folders/files
        tree = ET.parse(os.path.join(dir_anno,fnm))
        row = extract_single_xml_file(tree)
        row["fileID"] = os.path.splitext(fnm)[0]
        df_anno.append(row)
df_anno = pd.DataFrame(df_anno)

maxNobj = np.max(df_anno["Nobj"])

#Write the annotations to a csv file
df_anno.to_csv(model_path+'/annot.csv', index=False)#header=False, sep=',')

#Show how many objects there are in the images
plt.figure()
plt.subplot(2,1,1)
plt.hist(df_anno["Nobj"].values,bins=50)
plt.title("max N of objects per image={}".format(maxNobj))
plt.show()

#Show the classes and how many there are of each in the dataset
from collections import Counter
class_obj = []
for ibbx in range(maxNobj):
    class_obj.extend(df_anno["bbx_{}_name".format(ibbx)].values)
class_obj = np.array(class_obj)

count             = Counter(class_obj[class_obj != 'nan'])
print(count)
class_nm          = list(count.keys())
class_labels = json.dumps(class_nm)
class_count       = list(count.values())
asort_class_count = np.argsort(class_count)

class_nm          = np.array(class_nm)[asort_class_count]
class_count       = np.array(class_count)[asort_class_count]

!sed -i 's@\"labels\":.*@\"labels\":               $class_labels@g' config.json
xs = range(len(class_count))

plt.subplot(2,1,2)
plt.barh(xs,class_count)
plt.yticks(xs,class_nm)
plt.title("The number of objects per class: {} objects in total".format(len(count)))
plt.show()


#Generate anchors for the bounding boxes
import subprocess as sp
os.chdir('/content/gdrive/My Drive/keras-yolo2')
output = sp.getoutput('python ./gen_anchors.py -c ./config.json')

anchors_1 = output.find("[")
anchors_2 = output.find("]")

config_anchors = output[anchors_1:anchors_2+1]
!sed -i 's@\"anchors\":.*,@\"anchors\":              $config_anchors,@g' config.json
#here we check that no model with the same name already exist, if so delete
#if os.path.exists(model_path+'/'+model_name):
 # shutil.rmtree(model_path+'/'+model_name)

Use_pretrained_model = False

In [None]:
#@markdown ###Play this cell to visualise some example images from your dataset to make sure annotations and images are properly matched.
import imageio
  
size = 3    
ind_random = np.random.randint(0,df_anno.shape[0],size=size)
img_dir=Training_Source

file_suffix = os.path.splitext(os.listdir(Training_Source)[0])[1]
for irow in ind_random:
    row  = df_anno.iloc[irow,:]
    path = os.path.join(img_dir, row["fileID"] + file_suffix)
    # read in image
    img  = imageio.imread(path)

    plt.figure(figsize=(12,12))
    plt.imshow(img) # plot image
    plt.title("Nobj={}, height={}, width={}".format(row["Nobj"],row["height"],row["width"]))
    # for each object in the image, plot the bounding box
    for iplot in range(row["Nobj"]):
        plt_rectangle(plt,
                      label = row["bbx_{}_name".format(iplot)],
                      x1=row["bbx_{}_xmin".format(iplot)],
                      y1=row["bbx_{}_ymin".format(iplot)],
                      x2=row["bbx_{}_xmax".format(iplot)],
                      y2=row["bbx_{}_ymax".format(iplot)])
    plt.show() ## show the plot

##**3.2. Data Augmentation**

---

<font size = 4> Data augmentation can improve training progress by amplifying differences in the dataset. This can be useful if the available dataset is small since, in this case, it is possible that a network could quickly learn every example in the dataset (overfitting), without augmentation. Augmentation is not necessary for training and if the dataset is large the values can be set to 0.

In [None]:
#@markdown ##**Augmentation Options**

def image_aug(df, images_path, aug_images_path, image_prefix, augmentor):
    # create data frame which we're going to populate with augmented image info
    aug_bbs_xy = pd.DataFrame(columns=
                              ['filename','width','height','class', 'xmin', 'ymin', 'xmax', 'ymax']
                             )
    grouped = df.groupby('filename')
    
    for filename in df['filename'].unique():
    #   get separate data frame grouped by file name
        group_df = grouped.get_group(filename)
        group_df = group_df.reset_index()
        group_df = group_df.drop(['index'], axis=1)   
    #   read the image
        image = imageio.imread(images_path+filename)
    #   get bounding boxes coordinates and write into array        
        bb_array = group_df.drop(['filename', 'width', 'height', 'class'], axis=1).values
    #   pass the array of bounding boxes coordinates to the imgaug library
        bbs = BoundingBoxesOnImage.from_xyxy_array(bb_array, shape=image.shape)
    #   apply augmentation on image and on the bounding boxes
        image_aug, bbs_aug = augmentor(image=image, bounding_boxes=bbs)
    #   disregard bounding boxes which have fallen out of image pane    
        bbs_aug = bbs_aug.remove_out_of_image()
    #   clip bounding boxes which are partially outside of image pane
        bbs_aug = bbs_aug.clip_out_of_image()
        
    #   don't perform any actions with the image if there are no bounding boxes left in it    
        if re.findall('Image...', str(bbs_aug)) == ['Image([]']:
            pass
        
    #   otherwise continue
        else:
        #   write augmented image to a file
            imageio.imwrite(aug_images_path+image_prefix+filename, image_aug)  
        #   create a data frame with augmented values of image width and height
            info_df = group_df.drop(['xmin', 'ymin', 'xmax', 'ymax'], axis=1)    
            for index, _ in info_df.iterrows():
                info_df.at[index, 'width'] = image_aug.shape[1]
                info_df.at[index, 'height'] = image_aug.shape[0]
        #   rename filenames by adding the predifined prefix
            info_df['filename'] = info_df['filename'].apply(lambda x: image_prefix+x)
        #   create a data frame with augmented bounding boxes coordinates using the function we created earlier
            bbs_df = bbs_obj_to_df(bbs_aug)
        #   concat all new augmented info into new data frame
            aug_df = pd.concat([info_df, bbs_df], axis=1)
        #   append rows to aug_bbs_xy data frame
            aug_bbs_xy = pd.concat([aug_bbs_xy, aug_df])            
    
    # return dataframe with updated images and bounding boxes annotations 
    aug_bbs_xy = aug_bbs_xy.reset_index()
    aug_bbs_xy = aug_bbs_xy.drop(['index'], axis=1)
    return aug_bbs_xy

Use_Data_augmentation = True #@param {type:"boolean"}
Use_Default_Augmentation_Parameters = False #@param {type:"boolean"}

multiply_dataset_by = 3 #@param [2,3]
#@markdown ###If you are not using the default settings, please provide the values below:

#@markdown ###**Image shift, zoom, shear and flip (%)**

#horizontal_shift =  50 #@param {type:"slider", min:0, max:100, step:1}
#vertical_shift =  25 #@param {type:"slider", min:0, max:100, step:1}
#zoom_range =  10 #@param {type:"slider", min:0, max:100, step:1}
horizontal_flip = True #@param {type:"boolean"}
vertical_flip = True #@param {type:"boolean"}

#@markdown ###**Rotate image within angle range (degrees):**
rotate_images = True #@param {type:"boolean"}

#{type:"slider", min:0, max:90, step:1}

if (Use_Default_Augmentation_Parameters):
  #horizontal_shift =  10
  #vertical_shift =  10
  #zoom_range =  10
  horizontal_flip = True
  vertical_flip = True
  if rotate_images==True:
    rotation_range = 90
  else:
    rotation_range = 0

if (Use_Data_augmentation):
  print('Data Augmentation enabled')
  # load images as NumPy arrays and append them to images list
  if os.path.exists(Training_Source+'/.ipynb_checkpoints'):
    shutil.rmtree(Training_Source+'/.ipynb_checkpoints')
  
  images = []
  for index, file in enumerate(glob.glob(Training_Source+'/*'+file_suffix)):
      images.append(imageio.imread(file))
      
  # how many images we have
  print('Augmenting {} images'.format(len(images)))

  # apply xml_to_csv() function to convert all XML files in images/ folder into labels.csv
  labels_df = xml_to_csv(Training_Source_annotations)
  labels_df.to_csv(('/content/original_labels.csv'), index=None)
  #print('Successfully converted xml to csv.')
  #labels_df

  # This setup of augmentation parameters will pick two of four given augmenters and apply them in random order
  if horizontal_flip==True:
    h=1
  else:
    h=0
  if vertical_flip==True:
    v=1
  else:
    v=0
  aug = iaa.SomeOf(2, [    
      #iaa.Affine(scale=(1-zoom_range/100., 1+zoom_range/100.)),
      iaa.Affine(rotate=rotation_range, fit_output=True),
      #iaa.Affine(translate_percent={"x": (-horizontal_shift/100., horizontal_shift/100.), "y": (-vertical_shift/100., vertical_shift/100.)}),
      iaa.Fliplr(v),
      iaa.Flipud(h)
      #iaa.Multiply((0.5, 1.5)),
      #iaa.GaussianBlur(sigma=(1.0, 3.0)),
      #iaa.AdditiveGaussianNoise(scale=(0.03*255, 0.05*255))
  ])
  aug_2 = iaa.Affine(rotate=rotation_range, fit_output=True)
  #aug_3 = iaa.Affine(rotate=(0,rotation_range))
  #augmented_training_source = os.path.dirname(Training_Source)+'/'+os.path.basename(Training_Source)+'_with_augmentation'
  #augmented_images_df = image_aug(labels_df, Training_Source+'/', Training_Source+'/aug/', 'aug_', aug)

  #Here we create a folder that will hold the original image dataset and the augmented image dataset
  augmented_training_source = os.path.dirname(Training_Source)+'/'+os.path.basename(Training_Source)+'_augmentation'
  if os.path.exists(augmented_training_source):
    shutil.rmtree(augmented_training_source)
  os.mkdir(augmented_training_source)

  #Here we create a folder that will hold the original image annotation dataset and the augmented image annotation dataset (the bounding boxes).
  augmented_training_source_annotation = os.path.dirname(Training_Source_annotations)+'/'+os.path.basename(Training_Source_annotations)+'_augmentation'
  if os.path.exists(augmented_training_source_annotation):
    shutil.rmtree(augmented_training_source_annotation)
  os.mkdir(augmented_training_source_annotation)

  #Create the augmentation
  augmented_images_df = image_aug(labels_df, Training_Source+'/', augmented_training_source+'/', 'aug_', aug)
  
  # Concat resized_images_df and augmented_images_df together and save in a new all_labels.csv file
  all_labels_df = pd.concat([labels_df, augmented_images_df])
  all_labels_df.to_csv('/content/combined_labels.csv', index=False)

  #Here we convert the new bounding boxes for the augmented images to PASCAL VOC .xml format
  def convert_to_xml(df,source,target_folder):
    grouped = df.groupby('filename')
    for file in os.listdir(source):
      group_df = grouped.get_group(file)
      group_df = group_df.reset_index()
      group_df = group_df.drop(['index'], axis=1)
      #group_df = group_df.dropna(axis=0)
      writer = Writer(source+'/'+file,group_df.iloc[1]['width'],group_df.iloc[1]['height'])
      for i, row in group_df.iterrows():
        #if not row['xmin'] == '' or row['ymin'] == '':
        writer.addObject(row['class'],round(row['xmin']),round(row['ymin']),round(row['xmax']),round(row['ymax']))
        writer.save(target_folder+'/'+os.path.splitext(file)[0]+'.xml')

  convert_to_xml(all_labels_df,augmented_training_source,augmented_training_source_annotation)
  
  #Second round of augmentation
  if multiply_dataset_by > 2:
    aug_labels_df_2 = xml_to_csv(augmented_training_source_annotation)
    augmented_images_2_df = image_aug(aug_labels_df_2, augmented_training_source+'/', augmented_training_source+'/', 'aug_2_', aug_2)
    all_aug_labels_df = pd.concat([augmented_images_df, augmented_images_2_df])
  #all_labels_df.to_csv('/content/all_labels_aug.csv', index=False)
  
    for file in os.listdir(augmented_training_source_annotation):
      os.remove(os.path.join(augmented_training_source_annotation,file))
  
    convert_to_xml(all_aug_labels_df,augmented_training_source,augmented_training_source_annotation)

  # if multiply_dataset_by > 3:
  #   aug_labels_df_3 = xml_to_csv(augmented_training_source_annotation)
  #   augmented_images_3_df = image_aug(aug_labels_df_3, augmented_training_source+'/', augmented_training_source+'/', 'aug_3_', aug_2)
  #   all_aug_labels_df = pd.concat([all_aug_labels_df, augmented_images_3_df])


    for file in os.listdir(augmented_training_source_annotation):
      os.remove(os.path.join(augmented_training_source_annotation,file))
  
    convert_to_xml(all_aug_labels_df,augmented_training_source,augmented_training_source_annotation)
  
  for file in os.listdir(Training_Source):

    shutil.copyfile(Training_Source+'/'+file,augmented_training_source+'/'+file)
    shutil.copyfile(Training_Source_annotations+'/'+os.path.splitext(file)[0]+'.xml',augmented_training_source_annotation+'/'+os.path.splitext(file)[0]+'.xml')
  # display new dataframe
  #augmented_images_df
  
  os.chdir('/content/gdrive/My Drive/keras-yolo2')
  #Change the name of the training folder
  !sed -i 's@\"train_image_folder\":.*,@\"train_image_folder\":   \"$augmented_training_source/\",@g' config.json

  #Change annotation folder
  !sed -i 's@\"train_annot_folder\":.*,@\"train_annot_folder\":   \"$augmented_training_source_annotation/\",@g' config.json


else:
  print('No augmentation will be used')


df_anno = []
dir_anno = augmented_training_source_annotation
for fnm in os.listdir(dir_anno):  
    if not fnm.startswith('.'): ## do not include hidden folders/files
        tree = ET.parse(os.path.join(dir_anno,fnm))
        row = extract_single_xml_file(tree)
        row["fileID"] = os.path.splitext(fnm)[0]
        df_anno.append(row)
df_anno = pd.DataFrame(df_anno)

maxNobj = np.max(df_anno["Nobj"])

#Write the annotations to a csv file
#df_anno.to_csv(model_path+'/annot.csv', index=False)#header=False, sep=',')

#Show how many objects there are in the images
plt.figure()
plt.subplot(2,1,1)
plt.hist(df_anno["Nobj"].values,bins=50)
plt.title("max N of objects per image={}".format(maxNobj))
plt.show()

#Show the classes and how many there are of each in the dataset
from collections import Counter
class_obj = []
for ibbx in range(maxNobj):
    class_obj.extend(df_anno["bbx_{}_name".format(ibbx)].values)
class_obj = np.array(class_obj)

count             = Counter(class_obj[class_obj != 'nan'])
print(count)
class_nm          = list(count.keys())
class_labels = json.dumps(class_nm)
class_count       = list(count.values())
asort_class_count = np.argsort(class_count)

class_nm          = np.array(class_nm)[asort_class_count]
class_count       = np.array(class_count)[asort_class_count]

xs = range(len(class_count))

plt.subplot(2,1,2)
plt.barh(xs,class_count)
plt.yticks(xs,class_nm)
plt.title("The number of objects per class: {} objects in total".format(len(count)))
plt.show()

In [None]:
#@markdown ###Play this cell to visualise some example images from your **augmented** dataset to make sure annotations and images are properly matched.
df_anno_aug = []
dir_anno_aug = augmented_training_source_annotation
for fnm in os.listdir(dir_anno_aug):  
    if not fnm.startswith('.'): ## do not include hidden folders/files
        tree = ET.parse(os.path.join(dir_anno_aug,fnm))
        row = extract_single_xml_file(tree)
        row["fileID"] = os.path.splitext(fnm)[0]
        df_anno_aug.append(row)
df_anno_aug = pd.DataFrame(df_anno_aug)

size = 3    
ind_random = np.random.randint(0,df_anno_aug.shape[0],size=size)
img_dir=augmented_training_source

file_suffix = os.path.splitext(os.listdir(augmented_training_source)[0])[1]
for irow in ind_random:
    row  = df_anno_aug.iloc[irow,:]
    path = os.path.join(img_dir, row["fileID"] + file_suffix)
    # read in image
    img  = imageio.imread(path)

    plt.figure(figsize=(12,12))
    plt.imshow(img) # plot image
    plt.title("Nobj={}, height={}, width={}".format(row["Nobj"],row["height"],row["width"]))
    # for each object in the image, plot the bounding box
    for iplot in range(row["Nobj"]):
        plt_rectangle(plt,
                      label = row["bbx_{}_name".format(iplot)],
                      x1=row["bbx_{}_xmin".format(iplot)],
                      y1=row["bbx_{}_ymin".format(iplot)],
                      x2=row["bbx_{}_xmax".format(iplot)],
                      y2=row["bbx_{}_ymax".format(iplot)])
    plt.show() ## show the plot

# **4. Train the network**
---

In [None]:
# @markdown ##Loading weights from a pretrained network

# Training_Source = "" #@param{type:"string"}
# Training_Source_annotation = "" #@param{type:"string"}
# Check if the right files exist

Use_pretrained_model = False #@param {type:"boolean"}

Weights_choice = "best" #@param ["last", "best"]

pretrained_model_path = "" #@param{type:"string"}
h5_file_path = pretrained_model_path+'/'+Weights_choice+'_weights.h5'

if not os.path.exists(h5_file_path):
  print('WARNING pretrained model does not exist')
  Use_pretrained_model = False

os.chdir('/content/gdrive/My Drive/keras-yolo2')
!sed -i 's@\"pretrained_weights\":.*,@\"pretrained_weights\":   \"$h5_file_path\",@g' config.json

if Use_pretrained_model == True:
  with open(os.path.join(pretrained_model_path, 'Quality Control', 'training_evaluation.csv'),'r') as csvfile:
    csvRead = pd.read_csv(csvfile, sep=',')
    if "learning rate" in csvRead.columns: #Here we check that the learning rate column exist (compatibility with model trained un ZeroCostDL4Mic bellow 1.4):
      print("pretrained network learning rate found")
      #find the last learning rate
      lastLearningRate = csvRead["learning rate"].iloc[-1]
      #Find the learning rate corresponding to the lowest validation loss
      min_val_loss = csvRead[csvRead['val_loss'] == min(csvRead['val_loss'])]
      #print(min_val_loss)
      bestLearningRate = min_val_loss['learning rate'].iloc[-1]

      if Weights_choice == "last":
        print('Last learning rate: '+str(lastLearningRate))
        learning_rate = lastLearningRate

      if Weights_choice == "best":
        print('Learning rate of best validation loss: '+str(bestLearningRate))
        learning_rate = bestLearningRate

      if not "learning rate" in csvRead.columns: #if the column does not exist, then initial learning rate is used instead
        #bestLearningRate = learning_rate
        #lastLearningRate = learning_rate
        print(bcolors.WARNING+'WARNING: The learning rate cannot be identified from the pretrained network. Default learning rate of '+str(bestLearningRate)+' will be used instead' + W)
  
  !sed -i 's@\"warmup_epochs\":.*,@\"warmup_epochs\":        0,@g' config.json
  !sed -i 's@\"learning_rate\":.*,@\"learning_rate\":        $learning_rate,@g' config.json

# with open(os.path.join(pretrained_model_path, 'Quality Control', 'lr.csv'),'r') as csvfile:
#         csvRead = pd.read_csv(csvfile, sep=',')
#         #print(csvRead)
    
#         if "learning rate" in csvRead.columns: #Here we check that the learning rate column exist (compatibility with model trained un ZeroCostDL4Mic bellow 1.4)
#           print("pretrained network learning rate found")
#           #find the last learning rate
#           lastLearningRate = csvRead["learning rate"].iloc[-1]
#           #Find the learning rate corresponding to the lowest validation loss
#           min_val_loss = csvRead[csvRead['val_loss'] == min(csvRead['val_loss'])]
#           #print(min_val_loss)
#           bestLearningRate = min_val_loss['learning rate'].iloc[-1]

#           if Weights_choice == "last":
#             print('Last learning rate: '+str(lastLearningRate))

#           if Weights_choice == "best":
#             print('Learning rate of best validation loss: '+str(bestLearningRate))

#         if not "learning rate" in csvRead.columns: #if the column does not exist, then initial learning rate is used instead
#           bestLearningRate = initial_learning_rate
#           lastLearningRate = initial_learning_rate
#           print(bcolors.WARNING+'WARNING: The learning rate cannot be identified from the pretrained network. Default learning rate of '+str(bestLearningRate)+' will be used instead' + W)

## **4.1. Train the network**
---
<font size = 4>When playing the cell below you should see updates after each epoch (round). Network training can take some time.

<font size = 4>* **CRITICAL NOTE:** Google Colab has a time limit for processing (to prevent using GPU power for datamining). Training time must be less than 12 hours! If training takes longer than 12 hours, please decrease the number of epochs or number of patches.

In [None]:
import time
import csv
#from frontend import YOLO

if os.path.exists(full_model_path+"/Quality Control"):
  shutil.rmtree(full_model_path+"/Quality Control")
os.makedirs(full_model_path+"/Quality Control")

start = time.time()

#@markdown ##Start Training

#os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
#os.environ["CUDA_VISIBLE_DEVICES"]="0"
# Start Training

os.chdir('/content/gdrive/My Drive/keras-yolo2')
train('config.json', full_model_path, percentage_validation)

#shutil.move('/content/training_evaluation.csv',model_path+'/Quality Control/training_evaluation.csv')
#!python ./train.py -c ./config.json
#!python /content/drive/My\ Drive/Zero-Cost\ Deep-Learning\ to\ Enhance\ Microscopy/Various\ dataset/Detection_Dataset_2/BCCD.v1.voc/keras_yolo2/train.py -c /content/drive/My\ Drive/Zero-Cost\ Deep-Learning\ to\ Enhance\ Microscopy/Various\ dataset/Detection_Dataset_2/BCCD.v1.voc/keras_yolo2/config.json
#Insert the code necessary to initiate training of your model

##**4.3. Download your model(s) from Google Drive**


---
<font size = 4>Once training is complete, the trained model is automatically saved on your Google Drive, in the **model_path** folder that was selected in Section 3. It is however wise to download the folder as all data can be erased at the next training if using the same folder.

# **5. Evaluate your model**
---

<font size = 4>This section allows the user to perform important quality checks on the validity and generalisability of the trained model. 

<font size = 4>**We highly recommend to perform quality control on all newly trained models.**



In [None]:
# model name and path
#@markdown ###Do you want to assess the model you just trained ?
Use_the_current_trained_model = True #@param {type:"boolean"}

#@markdown ###If not, please provide the name of the model folder:

QC_model_folder = "" #@param {type:"string"}

if (Use_the_current_trained_model): 
  QC_model_folder = full_model_path

#print(os.path.join(model_path, model_name))

if os.path.exists(QC_model_folder):
  print("The "+os.path.basename(QC_model_folder)+" model will be evaluated")
else:
  W  = '\033[0m'  # white (normal)
  R  = '\033[31m' # red
  print(R+'!! WARNING: The chosen model does not exist !!'+W)
  print('Please make sure you provide a valid model path before proceeding further.')


#@markdown ###Which backend is the model using?
backend = "Full Yolo" #@param ["Select Model","Full Yolo","Inception3","SqueezeNet","MobileNet","Tiny Yolo"]
os.chdir('/content/gdrive/My Drive/keras-yolo2')
if backend == "Full Yolo":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/full_yolo_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/full_yolo_backend.h5
elif backend == "Inception3":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/inception_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/inception_backend.h5
elif backend == "MobileNet":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/mobilenet_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/mobilenet_backend.h5
elif backend == "SqueezeNet":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/squeezenet_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/squeezenet_backend.h5
elif backend == "Tiny Yolo":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/tiny_yolo_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/tiny_yolo_backend.h5


## **5.1. Inspection of the loss function**
---

<font size = 4>First, it is good practice to evaluate the training progress by comparing the training loss with the validation loss. The latter is a metric which shows how well the network performs on a subset of unseen data which is set aside from the training dataset. For more information on this, see for example [this review](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6381354/) by Nichols *et al.*

<font size = 4>**Training loss** describes an error value after each epoch for the difference between the model's prediction and its ground-truth target.

<font size = 4>**Validation loss** describes the same error value between the model's prediction on a validation image and compared to it's target.

<font size = 4>During training both values should decrease before reaching a minimal value which does not decrease further even after more training. Comparing the development of the validation loss with the training loss can give insights into the model's performance.

<font size = 4>Decreasing **Training loss** and **Validation loss** indicates that training is still necessary and increasing the `number_of_epochs` is recommended. Note that the curves can look flat towards the right side, just because of the y-axis scaling. The network has reached convergence once the curves flatten out. After this point no further training is required. If the **Validation loss** suddenly increases again an the **Training loss** simultaneously goes towards zero, it means that the network is overfitting to the training data. In other words the network is remembering the exact patterns from the training data and no longer generalizes well to unseen data. In this case the training dataset has to be increased.

In [None]:
#@markdown ##Play the cell to show a plot of training errors vs. epoch number
import csv
from matplotlib import pyplot as plt

lossDataFromCSV = []
vallossDataFromCSV = []

with open(QC_model_folder+'/Quality Control/training_evaluation.csv','r') as csvfile:
    csvRead = csv.reader(csvfile, delimiter=',')
    next(csvRead)
    for row in csvRead:
        lossDataFromCSV.append(float(row[0]))
        vallossDataFromCSV.append(float(row[1]))

epochNumber = range(len(lossDataFromCSV))
plt.figure(figsize=(15,10))

plt.subplot(2,1,1)
plt.plot(epochNumber,lossDataFromCSV, label='Training loss')
plt.plot(epochNumber,vallossDataFromCSV, label='Validation loss')
plt.title('Training loss and validation loss vs. epoch number (linear scale)')
plt.ylabel('Loss')
plt.xlabel('Epoch number')
plt.legend()

plt.subplot(2,1,2)
plt.semilogy(epochNumber,lossDataFromCSV, label='Training loss')
plt.semilogy(epochNumber,vallossDataFromCSV, label='Validation loss')
plt.title('Training loss and validation loss vs. epoch number (log scale)')
plt.ylabel('Loss')
plt.xlabel('Epoch number')
plt.legend()
plt.savefig(os.path.dirname(QC_model_folder)+'/Quality Control/lossCurvePlots.png')
plt.show()



## **5.2. Error mapping and quality metrics estimation**
---

<font size = 4>This section will display an overlay of the input images ground-truth (solid lines) and predicted boxes (dashed lines) as well as calculating the recall and precision of the predictions. The images provided in the "Source_QC_folder" and "Target_QC_folder" should contain images (e.g. as .jpg)and annotations (.xml files)!

In [None]:
#@markdown ##Choose the folders that contain your Quality Control dataset

Source_QC_folder = "" #@param{type:"string"}
Annotations_QC_folder = "" #@param{type:"string"}

file_suffix = os.path.splitext(os.listdir(Source_QC_folder)[0])[1]

# Create a quality control/Prediction Folder
if os.path.exists(QC_model_folder+"/Quality Control/Prediction"):
  shutil.rmtree(QC_model_folder+"/Quality Control/Prediction")

os.makedirs(QC_model_folder+"/Quality Control/Prediction")

#Delete old csv with box predictions if one exists

if os.path.exists('/content/mycsv.csv'):
  os.remove('/content/mycsv.csv')
if os.path.exists(Source_QC_folder+'/.ipynb_checkpoints'):
  shutil.rmtree(Source_QC_folder+'/.ipynb_checkpoints')

os.chdir('/content/gdrive/My Drive/keras-yolo2')
for img in os.listdir(Source_QC_folder):
  full_image_path = Source_QC_folder+'/'+img
  predict('config.json',QC_model_folder+'/best_weights.h5',full_image_path)


for img in os.listdir(Source_QC_folder):
  if img.endswith('detected'+file_suffix):
    shutil.move(Source_QC_folder+'/'+img,QC_model_folder+"/Quality Control/Prediction/"+img)

### Get the coordinates of the predicted boxes, ###
### box classes and confidence scores           ###

# from the csv containing the predicted boxes
with open('/content/mycsv.csv','r', newline='') as csvfile:
  csv_reader = csv.reader(csvfile)
  pred_boxes = []
  pred_classes = []
  pred_conf = []
  for row in csv_reader:
    image_boxes = []
    box_classes = []
    box_conf = []
    for i in range(0,len(row),6):
      image_boxes.append(list(map(float,row[i:i+4])))
      box_classes.append(int(row[i+5]))
      box_conf.append(float(row[i+4]))
    pred_boxes.append(image_boxes)   # The rows of this list contain the coordinates for all boxes per image
    pred_classes.append(box_classes) # The rows of this list contain the predicted classes for each box in the pred_boxes
    pred_conf.append(box_conf)       # The rows of this list contain the confidence scores for each predicted box in pred_boxes

shutil.move('/content/mycsv.csv',QC_model_folder+"/Quality Control/Prediction/predicted_boxes.csv")

#### Get the coordinates of the GT boxes ###

df_anno_QC_gt = []
#dir_anno = Training_Source_annotations
for fnm in os.listdir(Annotations_QC_folder):  
    if not fnm.startswith('.'): ## do not include hidden folders/files
        tree = ET.parse(os.path.join(Annotations_QC_folder,fnm))
        row = extract_single_xml_file(tree)
        row["fileID"] = os.path.splitext(fnm)[0]
        df_anno_QC_gt.append(row)
df_anno_QC_gt = pd.DataFrame(df_anno_QC_gt)

maxNobj = np.max(df_anno_QC_gt["Nobj"])

config_path = '/content/gdrive/My Drive/keras-yolo2/config.json'
class_dict = {}

with open(config_path) as config_buffer:
  config = json.load(config_buffer)
  for i in config["model"]["labels"]:
    class_dict[i] = int(config["model"]["labels"].index(i))

reverse_class_dict = {value : key for (key, value) in class_dict.items()}

df_anno_QC_gt = df_anno_QC_gt.replace(class_dict)

gt_boxes = []
gt_labels = []
gt_label_names = []
for j in range(0,df_anno_QC_gt.shape[0]):
  row = df_anno_QC_gt.iloc[j]
  width = int(row["width"])
  height = int(row["height"])
  gt_box = []
  gt_label = []
  gt_label_name = []
  for i in range(row["Nobj"]):
    label = int(float(row["bbx_{}_name".format(i)]))
    label_name = row["bbx_{}_name".format(i)]
    x1=row["bbx_{}_xmin".format(i)]
    y1=row["bbx_{}_ymin".format(i)]
    x2=row["bbx_{}_xmax".format(i)]
    y2=row["bbx_{}_ymax".format(i)]
    #gt_box.append([x1/width,y1/height,x2/width,y2/height])
    gt_box.append([x1,y1,x2,y2])

    gt_label.append(label)
    gt_label_name.append(label_name)
  gt_boxes.append(gt_box)
  gt_labels.append(gt_label)
  gt_label_names.append(gt_label_name)

#The essential outputs from this are gt_array and gt_classes_full
#Each row contains all bounding boxes and classes for each gt image.

#Here we create the Detection Maps for the first three predictions
#Prediction

pred_box_1 = np.array(pred_boxes[0])
pred_box_2 = np.array(pred_boxes[1])
#pred_box_3 = np.array(pred_boxes[2])

pred_class_1 = np.array(pred_classes[0])
pred_class_2 = np.array(pred_classes[1])
#pred_class_3 = np.array(pred_classes[2])

pred_conf_1 = np.array(pred_conf[0])
pred_conf_2 = np.array(pred_conf[1])
#pred_conf_3 = np.array(pred_conf[2])
                      
#print(pred_box_1)

#print(pred_conf_1)

# #GT
#print(gt_box_1[0])
gt_box_1 = np.array(gt_boxes[0])
gt_box_2 = np.array(gt_boxes[1])
#gt_box_3 = np.array(gt_boxes[2])
#print(gt_box_1)

gt_class_1 = np.array(gt_labels[0])
gt_class_2 = np.array(gt_labels[1])
#gt_class_3 = np.array(gt_labels[2])

frames = [(pred_box_2, pred_class_2, pred_conf_1, gt_box_1, gt_class_1)]#,
          #(pred_box_2, pred_class_2, pred_conf_2, gt_box_2, gt_class_2)]#,
          #(pred_box_3, pred_class_3, pred_conf_3, gt_box_3, gt_class_3)]

n_class = 5

mAP = DetectionMAP(n_class, overlap_threshold=0.3)
plt.figure(figsize=(15,5))
for i, frame in enumerate(frames):
  #print(i)
  img = np.array(io.imread(os.path.join(Source_QC_folder,os.path.splitext(os.listdir(Annotations_QC_folder)[i])[0]+file_suffix)))
  #print(os.listdir(Source_QC_folder)[i])
  #print("Evaluate frame {}".format(i))
  show_frame(*frame, reverse_class_dict, background = img)
  
  mAP.evaluate(*frame)

  mAP.plot()


plt.show()

In [None]:
#@markdown ##Inspect example output from QC
import random
from matplotlib.pyplot import imread
# This will display a randomly chosen dataset input and predicted output
random_choice = random.choice(os.listdir(Source_QC_folder))
file_suffix = os.path.splitext(random_choice)[1]

x = imread(Source_QC_folder+"/"+random_choice)

#os.chdir(Result_folder)
y = imread(QC_model_folder+"/Quality Control/Prediction/"+os.path.splitext(random_choice)[0]+'_detected'+file_suffix)
#y = imread(os.path.dirname(QC_model_path)+"/Quality Control/Prediction/"+os.path.splitext(random_choice)[0]+file_suffix)


plt.figure(figsize=(30,15))

plt.subplot(1,3,1)
plt.axis('off')
plt.imshow(x, interpolation='nearest')
plt.title('Input')

plt.subplot(1,3,2)
plt.axis('off')
plt.imshow(y, interpolation='nearest')
plt.title('Predicted output');

df_anno_QC_gt = []
#dir_anno = Training_Source_annotations
for fnm in os.listdir(Annotations_QC_folder):  
    if not fnm.startswith('.'): ## do not include hidden folders/files
        tree = ET.parse(os.path.join(Annotations_QC_folder,fnm))
        row = extract_single_xml_file(tree)
        row["fileID"] = os.path.splitext(fnm)[0]
        df_anno_QC_gt.append(row)
df_anno_QC_gt = pd.DataFrame(df_anno_QC_gt)

maxNobj = np.max(df_anno_QC_gt["Nobj"])

#Write the annotations to a csv file

import imageio
for i in range(0,df_anno_QC_gt.shape[0]):
  #print(df_anno_QC_gt[i]["fileID"])
  if df_anno_QC_gt.iloc[i]["fileID"]+file_suffix == random_choice:
    row = df_anno_QC_gt.iloc[i]

img  = imageio.imread(Source_QC_folder+'/'+random_choice)
      #row = df_anno_QC_gt.iloc[i,:]
    #plt.figure(figsize=(12,12))
plt.subplot(1,3,3)
plt.axis('off')
plt.imshow(img) # plot image
plt.title('Ground Truth annotations')
    # for each object in the image, plot the bounding box
for iplot in range(row["Nobj"]):
    plt_rectangle(plt,
                  label = row["bbx_{}_name".format(iplot)],
                  x1=row["bbx_{}_xmin".format(iplot)],
                  y1=row["bbx_{}_ymin".format(iplot)],
                  x2=row["bbx_{}_xmax".format(iplot)],
                  y2=row["bbx_{}_ymax".format(iplot)])#,
                  #fontsize=8)
plt.show() ## show the plot


# **6. Using the trained model**

---

<font size = 4>In this section the unseen data is processed using the trained model (in section 4). First, your unseen images are uploaded and prepared for prediction. After that your trained model from section 4 is activated and finally saved into your Google Drive.

## **6.1. Generate prediction(s) from unseen dataset**
---
<
<font size = 4>The current trained model (from section 4.2) can now be used to process images. If you want to use an older model, untick the **Use_the_current_trained_model** box and enter the name and path of the model to use. Predicted output images are saved in your **Result_folder** folder as restored image stacks (ImageJ-compatible TIFF images).

<font size = 4>**`Data_folder`:** This folder should contain the images that you want to use your trained network on for processing.

<font size = 4>**`Result_folder`:** This folder will contain the predicted output images.

In [None]:
#@markdown ### Provide the path to your dataset and to the folder where the predictions are saved, then play the cell to predict outputs from your unseen images.

Data_folder = "" #@param {type:"string"}
Result_folder = "" #@param {type:"string"}
file_suffix = os.path.splitext(os.listdir(Data_folder)[0])[1]

# model name and path
#@markdown ###Do you want to use the current trained model?
Use_the_current_trained_model = True #@param {type:"boolean"}

#@markdown ###If not, provide the name of the model and path to model folder:

Prediction_model_path = "" #@param {type:"string"}

#@markdown ###Which backend is the model using?
backend = "Full Yolo" #@param ["Select Model","Full Yolo","Inception3","SqueezeNet","MobileNet","Tiny Yolo"]
os.chdir('/content/gdrive/My Drive/keras-yolo2')
if backend == "Full Yolo":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/full_yolo_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/full_yolo_backend.h5
elif backend == "Inception3":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/inception_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/inception_backend.h5
elif backend == "MobileNet":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/mobilenet_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/mobilenet_backend.h5
elif backend == "SqueezeNet":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/squeezenet_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/squeezenet_backend.h5
elif backend == "Tiny Yolo":
  if not os.path.exists('/content/gdrive/My Drive/keras-yolo2/tiny_yolo_backend.h5'):
    !wget https://github.com/rodrigo2019/keras_yolo2/releases/download/pre-trained-weights/tiny_yolo_backend.h5
if (Use_the_current_trained_model): 
  print("Using current trained network")
  Prediction_model_path = full_model_path

if os.path.exists(Prediction_model_path+'/best_weights.h5'):
  print("The "+os.path.basename(Prediction_model_path)+" network will be used.")
else:
  W  = '\033[0m'  # white (normal)
  R  = '\033[31m' # red
  print(R+'!! WARNING: The chosen model does not exist !!'+W)
  print('Please make sure you provide a valid model path and model name before proceeding further.')

# Provide the code for performing predictions and saving them
print("Images saved into folder:", Result_folder)

In [None]:
#@markdown ##Run Prediction

#New_full_Prediction_model_path = ""
os.chdir('/content/gdrive/My Drive/keras-yolo2')

if os.path.exists(Data_folder+'/.ipynb_checkpoints'):
  shutil.rmtree(Data_folder+'/.ipynb_checkpoints')
for img in os.listdir(Data_folder):
  full_image_path = Data_folder+'/'+img
  predict('config.json',Prediction_model_path+'/best_weights.h5',full_image_path)

for img in os.listdir(Data_folder):
  if img.endswith('detected'+file_suffix):
    shutil.move(Data_folder+'/'+img,Result_folder+'/'+img)

## **6.2. Inspect the predicted output**
---



In [None]:
# @markdown ##Run this cell to display a randomly chosen input and its corresponding predicted output.
import random
from matplotlib.pyplot import imread
# This will display a randomly chosen dataset input and predicted output
random_choice = random.choice(os.listdir(Data_folder))

x = imread(Data_folder+"/"+random_choice)

os.chdir(Result_folder)
y = imread(Result_folder+"/"+os.path.splitext(random_choice)[0]+'_detected'+file_suffix)

plt.figure(figsize=(16,8))

plt.subplot(1,2,1)
plt.axis('off')
plt.imshow(x, interpolation='nearest')
plt.title('Input')

plt.subplot(1,2,2)
plt.axis('off')
plt.imshow(y, interpolation='nearest')
plt.title('Predicted output');


## **6.3. Download your predictions**
---

<font size = 4>**Store your data** and ALL its results elsewhere by downloading it from Google Drive and after that clean the original folder tree (datasets, results, trained model etc.) if you plan to train or use new networks. Please note that the notebook will otherwise **OVERWRITE** all files which have the same name.


#**Thank you for using Yolo_v2!**