Forked from  [Object Detection with YOLO blog series](https://fairyonice.github.io/tag/object-detection-using-yolov2-on-pascal-voc2012-series.html)

Notebooks were only modified as needed, vast majority of the contents are from fairyonice.github repository.

My changes covering all notebooks were:
- Use Kaggle Wheat Detection data
- Migrate to TF 2.x
- Modified Data Generator and Loss to remove tensor error
- New notebook using albumentations for image & box augmentation
- New Kaggle submission notebook





This is the sixth blog post of [Object Detection with YOLO blog series](https://fairyonice.github.io/tag/object-detection-using-yolov2-on-pascal-voc2012-series.html). This blog performs inference using the model in trained in [Part 5 Object Detection with Yolo using VOC 2012 data - training](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html).
I will use PASCAL VOC2012 data. 
This blog assumes that the readers have read the previous blog posts - [Part 1](https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html), [Part 2](https://fairyonice.github.io/Part%202_Object_Detection_with_Yolo_using_VOC_2014_data_input_and_output_encoding.html), [Part 3](https://fairyonice.github.io/Part_3_Object_Detection_with_Yolo_using_VOC_2012_data_model.html), [Part 4](https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html), [Part 5](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html).

## Andrew Ng's YOLO lecture
- [Neural Networks - Bounding Box Predictions](https://www.youtube.com/watch?v=gKreZOUi-O0&t=0s&index=7&list=PL_IHmaMAvkVxdDOBRg2CbcJBq9SY7ZUvs)
- [C4W3L06 Intersection Over Union](https://www.youtube.com/watch?v=ANIzQ5G-XPE&t=7s)
- [C4W3L07 Nonmax Suppression](https://www.youtube.com/watch?v=VAo84c1hQX8&t=192s)
- [C4W3L08 Anchor Boxes](https://www.youtube.com/watch?v=RTlwl2bv0Tg&t=28s)
- [C4W3L09 YOLO Algorithm](https://www.youtube.com/watch?v=9s_FpMpdYW8&t=34s)

## Reference
- [You Only Look Once:Unified, Real-Time Object Detection](https://arxiv.org/pdf/1506.02640.pdf) 

- [YOLO9000:Better, Faster, Stronger](https://arxiv.org/pdf/1612.08242.pdf)
 
- [experiencor/keras-yolo2](https://github.com/experiencor/keras-yolo2)

## Reference in blog
- [Part 1 Object Detection using YOLOv2 on Pascal VOC2012 - anchor box clustering](https://fairyonice.github.io/Part_1_Object_Detection_with_Yolo_for_VOC_2014_data_anchor_box_clustering.html)
- [Part 2 Object Detection using YOLOv2 on Pascal VOC2012 - input and output encoding](https://fairyonice.github.io/Part%202_Object_Detection_with_Yolo_using_VOC_2014_data_input_and_output_encoding.html)
- [Part 3 Object Detection using YOLOv2 on Pascal VOC2012 - model](https://fairyonice.github.io/Part_3_Object_Detection_with_Yolo_using_VOC_2012_data_model.html)
- [Part 4 Object Detection using YOLOv2 on Pascal VOC2012 - loss](https://fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html)
- [Part 5 Object Detection using YOLOv2 on Pascal VOC2012 - training](https://fairyonice.github.io/Part_5_Object_Detection_with_Yolo_using_VOC_2012_data_training.html)
- [Part 6 Object Detection using YOLOv2 on Pascal VOC 2012 data - inference on image](https://fairyonice.github.io/Part_6_Object_Detection_with_Yolo_using_VOC_2012_data_inference_image.html)
- [Part 7 Object Detection using YOLOv2 on Pascal VOC 2012 data - inference on video](https://fairyonice.github.io/Part_7_Object_Detection_with_Yolo_using_VOC_2012_data_inference_video.html)

## fairyonice GitHub repository 
This repository contains all the ipython notebooks in this blog series and the funcitons (See backend.py). 
- [FairyOnIce/ObjectDetectionYolo](https://github.com/FairyOnIce/ObjectDetectionYolo)

In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import os, sys
print(sys.version)
%matplotlib inline

import pandas as pd
print("Pandas: ", pd.__version__)


3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) 
[GCC 7.3.0]
Pandas:  1.1.1


In [2]:
import numpy as np
import cv2
import copy

class ImageReader(object):
    def __init__(self, IMAGE_H, IMAGE_W, norm=None):
        '''
        IMAGE_H : the height of the rescaled image, e.g., 416
        IMAGE_W : the width of the rescaled image, e.g., 416
        '''
        self.IMAGE_H = IMAGE_H
        self.IMAGE_W = IMAGE_W
        self.norm = norm

    def encode_core(self, image, reorder_rgb=True):
        # resize the image to standard size
        image = cv2.resize(image, (self.IMAGE_H, self.IMAGE_W))
        if reorder_rgb:
            image = image[:, :, ::-1]
        if self.norm is not None:
            image = self.norm(image)
        return(image)

    def fit(self, train_instance):

        if not isinstance(train_instance, dict):
            train_instance = {'filename': train_instance}

        image_name = train_instance['filename']
        image = cv2.imread(image_name)
        h, w, c = image.shape
        if image is None:
            print('Cannot find ', image_name)

        image = self.encode_core(image, reorder_rgb=True)

        if "object" in train_instance.keys():

            all_objs = copy.deepcopy(train_instance['object'])

            # fix object's position and size
            for obj in all_objs:
                for attr in ['xmin', 'xmax']:
                    obj[attr] = int(obj[attr] * float(self.IMAGE_W) / w)
                    obj[attr] = max(min(obj[attr], self.IMAGE_W), 0)

                for attr in ['ymin', 'ymax']:
                    obj[attr] = int(obj[attr] * float(self.IMAGE_H) / h)
                    obj[attr] = max(min(obj[attr], self.IMAGE_H), 0)
        else:
            return image
        
        return image, all_objs


class ImageReaderAlbumentations(ImageReader):
    def __init__(self, IMAGE_H, IMAGE_W, augment, norm=None):
        super().__init__(IMAGE_H, IMAGE_W, norm)

        self.augment = augment

    def encode_core(self, image, reorder_rgb=True):
        # resize the image to standard size
        image = cv2.resize(image, (self.IMAGE_H, self.IMAGE_W))
        if reorder_rgb:
            image = image[:, :, ::-1]
        return(image)

    def augment_image(self, image, bboxes):
        bbox_labels = np.ones(len(bboxes))

        aug_result = self.augment(image=image,
                                  bboxes=bboxes,
                                  labels=bbox_labels)

        return np.array(aug_result['image']), aug_result['bboxes']

    def fit(self, file_id):

        image = cv2.imread(os.path.join(TEST_DATA_PATH, file_id))
        h, w, c = image.shape

        if image is None:
            raise ValueError('ERROR ImageReaderAlbumentations image: ' + image)

        image = self.encode_core(image, reorder_rgb=True)

        image, _ = self.augment_image(image, [])
        if self.norm is not None:
            image = self.norm(image)
        return image
    
class BestAnchorBoxFinder(object):
    def __init__(self, ANCHORS):
        '''
        ANCHORS: a np.array of even number length e.g.

        _ANCHORS = [4,2, ##  width=4, height=2,  flat large anchor box
                    2,4, ##  width=2, height=4,  tall large anchor box
                    1,1] ##  width=1, height=1,  small anchor box
        '''
        self.anchors = [BoundBox(0, 0, ANCHORS[2*i], ANCHORS[2*i+1])
                        for i in range(int(len(ANCHORS)//2))]

    def _interval_overlap(self, interval_a, interval_b):
        x1, x2 = interval_a
        x3, x4 = interval_b
        if x3 < x1:
            if x4 < x1:
                return 0
            else:
                return min(x2, x4) - x1
        else:
            if x2 < x3:
                return 0
            else:
                return min(x2, x4) - x3

    def bbox_iou(self, box1, box2):
        intersect_w = self._interval_overlap([box1.xmin, box1.xmax],
                                             [box2.xmin, box2.xmax])
        intersect_h = self._interval_overlap([box1.ymin, box1.ymax],
                                             [box2.ymin, box2.ymax])

        intersect = intersect_w * intersect_h

        w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
        w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin

        union = w1*h1 + w2*h2 - intersect

        return float(intersect) / union

    def find(self, center_w, center_h):
        # find the anchor that best predicts this box
        best_anchor = -1
        max_iou = -1
        # each Anchor box is specialized to have a certain shape.
        # e.g., flat large rectangle, or small square
        shifted_box = BoundBox(0, 0, center_w, center_h)
        #  For given object, find the best anchor box!
        for i in range(len(self.anchors)):  # run through each anchor box
            anchor = self.anchors[i]
            iou = self.bbox_iou(shifted_box, anchor)
            if max_iou < iou:
                best_anchor = i
                max_iou = iou
        return(best_anchor, max_iou)


class BoundBox:
    def __init__(self, xmin, ymin, xmax, ymax, confidence=None, classes=None):
        self.xmin, self.ymin = xmin, ymin
        self.xmax, self.ymax = xmax, ymax
        # the code below are used during inference
        # probability
        self.confidence = confidence
        # class probaiblities [c1, c2, .. cNclass]
        self.set_class(classes)

    def set_class(self, classes):
        self.classes = classes
        self.label = np.argmax(self.classes)

    def get_label(self):
        return(self.label)

    def get_score(self):
        return(self.classes[self.label])
    
# ========================================================================== ##
# Part 6 Object Detection with Yolo using VOC 2012 data - inference on image
# ========================================================================== ##


class OutputRescaler(object):
    def __init__(self, ANCHORS):
        self.ANCHORS = ANCHORS

    def _sigmoid(self, x):
        return 1. / (1. + np.exp(-x))

    def _softmax(self, x, axis=-1, t=-100.):
        x = x - np.max(x)

        if np.min(x) < t:
            x = x/np.min(x)*t

        e_x = np.exp(x)
        return e_x / e_x.sum(axis, keepdims=True)

    def get_shifting_matrix(self, netout):

        GRID_H, GRID_W, BOX = netout.shape[:3]
        no = netout[..., 0]

        ANCHORSw = self.ANCHORS[::2]
        ANCHORSh = self.ANCHORS[1::2]

        mat_GRID_W = np.zeros_like(no)
        for igrid_w in range(GRID_W):
            mat_GRID_W[:, igrid_w, :] = igrid_w

        mat_GRID_H = np.zeros_like(no)
        for igrid_h in range(GRID_H):
            mat_GRID_H[igrid_h, :, :] = igrid_h

        mat_ANCHOR_W = np.zeros_like(no)
        for ianchor in range(BOX):
            mat_ANCHOR_W[:, :, ianchor] = ANCHORSw[ianchor]

        mat_ANCHOR_H = np.zeros_like(no)
        for ianchor in range(BOX):
            mat_ANCHOR_H[:, :, ianchor] = ANCHORSh[ianchor]
        return(mat_GRID_W, mat_GRID_H, mat_ANCHOR_W, mat_ANCHOR_H)

    def fit(self, netout):
        GRID_H, GRID_W, BOX = netout.shape[:3]

        (mat_GRID_W,
         mat_GRID_H,
         mat_ANCHOR_W,
         mat_ANCHOR_H) = self.get_shifting_matrix(netout)

        # bounding box parameters
        netout[..., 0] = (self._sigmoid(netout[..., 0]) + mat_GRID_W)/GRID_W  # x      unit: range between 0 and 1
        netout[..., 1] = (self._sigmoid(netout[..., 1]) + mat_GRID_H)/GRID_H  # y      unit: range between 0 and 1
        netout[..., 2] = (np.exp(netout[..., 2]) * mat_ANCHOR_W)/GRID_W      # width  unit: range between 0 and 1
        netout[..., 3] = (np.exp(netout[..., 3]) * mat_ANCHOR_H)/GRID_H      # height unit: range between 0 and 1
        # rescale the confidence to range 0 and 1
        netout[..., 4] = self._sigmoid(netout[..., 4])
        expand_conf = np.expand_dims(netout[..., 4], -1)  # (N grid h , N grid w, N anchor , 1)
        # rescale the class probability to range between 0 and 1
        # Pr(object class = k) = Pr(object exists) * Pr(object class = k |object exists)
        #                      = Conf * P^c
        netout[..., 5:] = expand_conf * self._softmax(netout[..., 5:])
        # ignore the class probability if it is less than obj_threshold

        return(netout)


def find_high_class_probability_bbox(netout_scale, obj_threshold):
    GRID_H, GRID_W, BOX = netout_scale.shape[:3]

    boxes = []
    for row in range(GRID_H):
        for col in range(GRID_W):
            for b in range(BOX):
                # from 4th element onwards are confidence and class classes
                classes = netout_scale[row, col, b, 5:]

                if np.sum(classes) > 0:
                    # first 4 elements are x, y, w, and h
                    x, y, w, h = netout_scale[row, col, b, :4]
                    confidence = netout_scale[row, col, b, 4]
                    box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, confidence,
                                   classes)
                    if box.get_score() > obj_threshold:
                        boxes.append(box)
    return(boxes)

import cv2, copy
import seaborn as sns

def adjust_minmax(c, _max):
    if c < 0:
        c = 0
    if c > _max:
        c = _max
    return int(c)

def draw_boxes(_image, boxes, labels, obj_baseline=0.05, verbose=False):

    image = copy.deepcopy(_image)
    image_h, image_w, _ = image.shape
    score_rescaled = np.array([box.get_score() for box in boxes])
    score_rescaled /= obj_baseline
    color_rect, color_text = sns.color_palette("husl", 2)
    for sr, box in zip(score_rescaled, boxes):
        xmin = adjust_minmax(int(box.xmin*image_w), image_w)
        ymin = adjust_minmax(int(box.ymin*image_h), image_h)
        xmax = adjust_minmax(int(box.xmax*image_w), image_w)
        ymax = adjust_minmax(int(box.ymax*image_h), image_h)

        text = "{:10} {:4.3f}".format(labels[box.label], box.get_score())
        if verbose:
            print("{} xmin={:4.0f},ymin={:4.0f},xmax={:4.0f},ymax={:4.0f}".format(text, xmin, ymin, xmax, ymax, text))
        #cv2.rectangle(image,
                      #pt1=(int(xmin), int(ymin)),
                      #pt2=(int(xmax), int(ymax)),
                      #color=color_rect,
                      #thickness=sr)
        cv2.rectangle(image,
                      (int(xmin), int(ymin)),
                      (int(xmax), int(ymax)),
                      color_rect,
                      int(sr))
        
        cv2.putText(img=image,
                    text=text,
                    org=(xmin + 13, ymin + 13),
                    fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                    fontScale=1e-3 * image_h,
                    color=color_text,
                    thickness=1)

    return image


def nonmax_suppression(boxes, iou_threshold, obj_threshold):
    bestAnchorBoxFinder = BestAnchorBoxFinder([])

    CLASS = len(boxes[0].classes)
    index_boxes = []
    # suppress non-maximal boxes
    for c in range(CLASS):
        # extract class probabilities of the c^th class from multiple bbox
        class_probability_from_bbxs = [box.classes[c] for box in boxes]

        # sorted_indices[i] contains the i^th largest class probabilities
        sorted_indices = list(reversed(np.argsort(class_probability_from_bbxs)))

        for i in range(len(sorted_indices)):
            index_i = sorted_indices[i]

            # if class probability is zero then ignore
            if boxes[index_i].classes[c] == 0:
                continue
            else:
                index_boxes.append(index_i)
                for j in range(i+1, len(sorted_indices)):
                    index_j = sorted_indices[j]

                    # check if the selected i^th bounding box has high IOU with any of the remaining bbox
                    # if so, the remaining bbox' class probabilities are set to 0.
                    bbox_iou = bestAnchorBoxFinder.bbox_iou(boxes[index_i], boxes[index_j])
                    if bbox_iou >= iou_threshold:
                        classes = boxes[index_j].classes
                        classes[c] = 0
                        boxes[index_j].set_class(classes)

    newboxes = [boxes[i] for i in index_boxes if boxes[i].get_score() > obj_threshold]

    return newboxes


In [3]:
LABELS = ['wheat']

ANCHORS = np.array([0.06960639, 0.06130531,
                    0.11246752, 0.10739992])

BATCH_SIZE        = 1  # 32 200
BOX               = int(len(ANCHORS)/2)
TRUE_BOX_BUFFER   = 130
IMAGE_H, IMAGE_W  = 1024, 1024
GRID_H,  GRID_W   = 32, 32

ORIG_IMAGE_SIZE = 1024


In [4]:

# Home
#ROOT_PATH = "/Users/john/Documents/Python-Working/Kaggle-global-wheat-detection/"  ###### CHANGE FOR SPECIFIC ENVIRONMENT
#MODEL_PATH = "/Users/john/Documents/Python-Working/2-Kaggle-Wheat/ObjectDetectionYolo-JOHN/"  ###### CHANGE FOR SPECIFIC ENVIRONMENT

# Kaggle
ROOT_PATH = "../input/global-wheat-detection/"  ###### CHANGE FOR SPECIFIC ENVIRONMENT
MODEL_PATH = "../input/wheatmodel/"  ###### CHANGE FOR SPECIFIC ENVIRONMENT

SUBMISSION_PATH="submission.csv"

TEST_DATA_PATH = os.path.join(ROOT_PATH, "test/")

MODEL_NAME = "model-wheat-yolo-1024-A2-aug2.h5"
MODEL_PATH = os.path.join(MODEL_PATH, MODEL_NAME)

METHOD_TEST = False

In [5]:
def load_file_names_Util(file_path,
                         image_ext,
                         full_file_path=True):
    file_list = []
    file_names = os.listdir(file_path)
    for i, fn in enumerate(file_names):
        if fn.endswith(image_ext):
            if full_file_path:
                file_list.append(fn)
            else:
                head, tail = os.path.split(fn)
                file_list.append(tail[:-4])
    return file_list

test_image_ids = load_file_names_Util(TEST_DATA_PATH, ".jpg")

if METHOD_TEST:
    print(test_image_ids[:10])

In [6]:
def custom_loss(y_true, y_pred, true_boxes):  # We are only using inference, so dummy loss....
    return 1.0

print("loading... ", MODEL_PATH)
model = tf.keras.models.load_model(MODEL_PATH, custom_objects={'tf' : tf, 'custom_loss': custom_loss})


loading...  ../input/wheatmodel/model-wheat-yolo-1024-A2-aug2.h5




In [7]:
import albumentations as albu

val_augmentations = albu.Compose([
                                   albu.CLAHE(p=1),
                                   albu.ToGray(p=1),
                                  ], 
                                  bbox_params={'format': 'pascal_voc', 'label_fields': ['labels']})


def normalize(image):
    return image / 255.

imageReader = ImageReaderAlbumentations(IMAGE_H, IMAGE_W, val_augmentations, norm=normalize)

In [8]:
def format_prediction_string(boxes):
    pred_strings = []
    
    for box in boxes:
        xmin = adjust_minmax(int(box.xmin*IMAGE_W), IMAGE_W)
        ymin = adjust_minmax(int(box.ymin*IMAGE_H), IMAGE_H)
        xmax = adjust_minmax(int(box.xmax*IMAGE_W), IMAGE_W)
        ymax = adjust_minmax(int(box.ymax*IMAGE_H), IMAGE_H)
        #print(xmin, ymin, xmax, ymax, box.confidence)
        
        pred_strings.append("{0:.4f} {1} {2} {3} {4}".format(box.confidence, xmin, ymin, xmax, ymax))

    return " ".join(pred_strings)

In [9]:
if METHOD_TEST:
    actual_test_image_ids = ['2fd875eaa.jpg', '53f253011.jpg', 'f5a1f0358.jpg', 'cc3532ff6.jpg']
    #actual_test_image_ids = ['1john2.jpg', '2fd875eaa.jpg', '796707dd7.jpg','aac893a91.jpg']
    #actual_test_image_ids = ["1john2.jpg", "1jphoto.jpg"]
else:
    actual_test_image_ids = test_image_ids
    

dummy_y = np.zeros((1,GRID_H, GRID_W, BOX, 4+1+len(LABELS)), dtype=np.float32)
dummy_box = np.zeros((1,1,1,1,TRUE_BOX_BUFFER,4), dtype=np.float32)

figsize = (15,15)
test_results = []
gc_count = 0
outputRescaler = OutputRescaler(ANCHORS=ANCHORS)

for i in range(len(actual_test_image_ids)):
    gc_count = gc_count + 1
    if (gc_count % 200) == 0:
        gc.collect()

    final_boxes = []
    image_id = actual_test_image_ids[i]
    image = imageReader.fit(image_id)
    image = image[np.newaxis, ...]
    #print(image.shape)
    
    y_pred = model.predict([image, dummy_y, dummy_box])[0]
    #print(y_pred.shape)
    
    y_pred_scale = outputRescaler.fit(y_pred)

    obj_threshold = 0.4 #0.4
    iou_threshold = 0.2 #0.2
    boxes = find_high_class_probability_bbox(y_pred_scale, obj_threshold)
    if len(boxes) <= 15:
        obj_threshold = 0.6 #0.25
        iou_threshold = 0.2 #0.01
        boxes = find_high_class_probability_bbox(y_pred_scale, obj_threshold)

    if len(boxes) > 0:
        final_boxes = nonmax_suppression(boxes,
                                         iou_threshold=iou_threshold,
                                         obj_threshold=obj_threshold)
            
        result = [image_id, format_prediction_string(final_boxes)]
        #
    else:
        result = [image_id, "0.0 0 0 0 0"]
    
    
    test_results.append(result)

    if METHOD_TEST:
        ima = draw_boxes(image[0], final_boxes, LABELS,verbose=True)
        plt.figure(figsize=figsize)
        plt.title("("+str(len(final_boxes))+")")
        plt.imshow(ima); 
        plt.show()



In [10]:
test_results_df = pd.DataFrame(test_results, columns =['image_id','PredictionString'])
test_results_df.head()

Unnamed: 0,image_id,PredictionString
0,348a992bb.jpg,0.8976 599 448 719 537 0.8814 737 221 873 308 ...
1,796707dd7.jpg,0.8630 893 328 1001 418 0.8537 508 792 594 887...
2,aac893a91.jpg,0.8782 691 382 818 562 0.8743 558 534 684 716 ...
3,f5a1f0358.jpg,0.9123 944 427 1024 620 0.8909 690 200 803 292...
4,cb8d261a3.jpg,0.8539 439 462 559 571 0.8499 167 899 253 988 ...


In [11]:
test_results_df.to_csv(SUBMISSION_PATH, index=False)


In [12]:
if METHOD_TEST:
    print(test_results_df)