# YOLO in TensorFlow: Inference Tutorial

This is a Google Colaboratory notebook file to demonstrate inference using the TensorFlow Model Garden implementation of YOLOv3 on a video stream from your webcam.

First, clone the GitHub repo and import the necessary libraries.

In [1]:
%tensorflow_version 2.x
import pathlib
import os
import time
import urllib.request

# Clone the tensorflow models repository if it doesn't already exist
if "TensorFlowModelGardeners" in pathlib.Path.cwd().parts:
  while "TensorFlowModelGardeners" in pathlib.Path.cwd().parts:
    os.chdir('..')
elif not pathlib.Path('TensorFlowModelGardeners').exists():
  !git clone --depth 1 https://github.com/PurdueCAM2Project/TensorFlowModelGardeners
os.chdir('TensorFlowModelGardeners')
!git pull
!pip install -r yolo/requirements.txt

from google.colab import output
from IPython.display import JSON, HTML

import cv2
import tensorflow as tf
import numpy as np

from official.core import train_utils
from yolo import run as yolo_run
from yolo.utils.demos import utils as demo_utils

Already up to date.

!--PREPPING GPU--! 
1 Physical GPUs, 1 Logical GPUs


After cloning the repo, build the model and load the Darknet (paper implementation) pretrained weights. This may take time since Colab needs to download the weights.

A prediction function for the model is also created. The [`predict`](https://www.tensorflow.org/api_docs/python/tf/keras/Model?hl=en#predict) function is faster for batched inputs (many images being processed at the same time) than it is for single images, like used in this tutorial. It is used for the sake of example here.

In [2]:
# Try out yolov3.yaml and yolov4-tiny-eval.yaml as well for more fun
task, model = yolo_run.load_model('yolo_custom', ['yolo/configs/experiments/yolov4-eval.yaml'])
model.make_predict_function()

INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: Tesla T4, compute capability 7.5


INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK
Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: Tesla T4, compute capability 7.5


Instructions for updating:
Use tf.keras.mixed_precision.LossScaleOptimizer instead. LossScaleOptimizer now has all the functionality of DynamicLossScale


Instructions for updating:
Use tf.keras.mixed_precision.LossScaleOptimizer instead. LossScaleOptimizer now has all the functionality of DynamicLossScale


{'num_classes': 80, '_input_size': None, 'min_level': 3, 'max_level': 5, 'boxes_per_scale': 3, 'base': {'type': None}, 'dilate': False, 'filter': {'iou_thresh': 0.2, 'nms_thresh': 0.9, 'ignore_thresh': 0.7, 'loss_type': 'ciou', 'max_boxes': 200, 'anchor_generation_scale': 416, 'use_nms': False, 'iou_normalizer': 0.07, 'cls_normalizer': 1.0, 'obj_normalizer': 1.0}, 'norm_activation': {'activation': 'mish', 'use_sync_bn': False, 'norm_momentum': 0.99, 'norm_epsilon': 0.001}, 'decoder_activation': 'leaky', '_boxes': ['[12.0, 16.0]', '[19.0, 36.0]', '[40.0, 28.0]', '[36.0, 75.0]', '[76.0, 55.0]', '[72.0, 146.0]', '[142.0, 110.0]', '[192.0, 243.0]', '[459.0, 401.0]']}
InputSpec(shape=(None, None, None, 3), ndim=4)
<tensorflow.python.keras.regularizers.L2 object at 0x7f7e7bb37470>
DarkNet(model_id='cspdarknet53')
Model: "cspdarknet53"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #

<tensorflow.python.eager.def_function.Function at 0x7f7e7bb301d0>

Below is a function to infer the bounding boxes from an online image using the YOLO model. The frontend will pass the image from the webcam to the backend function in Colab by using a URI. When the backend function receives the image, it will decode the image into a Numpy array of integers (0 to 255) with 3 dimensions (width, height, RGB channels). Next, it will use the TensorFlow to normalize the pixels and resize the image. After that, it will use the model to predict the bounding boxes for any objects that may appear in the image. Finally, the bounding box format is converted to a JSON object so it can be returned and shown on the frontend.

In [3]:
# A OpenCV flag to treat the input image as an RGB image. This flag has a
# different name in OpenCV 2 and OpenCV 3, so a condition is needed to work with
# both versions.
COLOR_IMAGE = cv2.IMREAD_COLOR if int(cv2.__version__.split('.', 1)[0]) >= 3 \
  else cv2.CV_LOAD_IMAGE_COLOR

if model.backbone.model_id == 'cspdarknet53':
  MODEL_INPUT_RESOLUTION = (608, 608)
else:
  MODEL_INPUT_RESOLUTION = (416, 416)
DEMO_SCREEN_RESOLUTION = (500, 375)

def yolo_infer(uri: str) -> JSON:
  try:
    # Decode the URI to an image
    with urllib.request.urlopen(uri) as response:
      data = response.read()
    img_buf = np.frombuffer(data, np.uint8)
    img = cv2.imdecode(img_buf, COLOR_IMAGE)
    
    # Rescale the image for use with the model
    mat = tf.cast(img, tf.float16)
    mat /= 255
    mat = tf.expand_dims(mat, axis = 0)
    mat = tf.image.resize(mat, MODEL_INPUT_RESOLUTION)

    # Run the inference
    a = time.time()
    pred = model.predict(mat)
    a = time.time() - a

    # Rescale bounding boxes
    bboxes, classes = demo_utils.int_scale_boxes(pred['bbox'], pred['classes'],
                                                 *DEMO_SCREEN_RESOLUTION)

    # Convert the format of the bounding boxes to match the format in the
    # HTML document shown below: [x1, x2, y1, y2, c, p]
    bboxes = bboxes[0].numpy()
    classes = classes[0].numpy()
    confidences = pred['confidence'][0]
    num_dets = pred['num_dets'][0]
    boxes = []
    for i, bbox, class_id, confidence in zip(range(num_dets), bboxes, classes,
                                             confidences):
      boxes.append(list(bbox) + [class_id, confidence])
    print('\r', a, boxes, end='')

    # Return control to the client side (JavaScript in the HTML document)
    return JSON(boxes)
  except Exception as e:
    import traceback
    traceback.print_exc()

output.register_callback('yolo_infer', yolo_infer)

Below is a frontend interface to access the webcam, and stream the images to the backend, and display the bounding boxes to the user. The stream uses JPEG compression to speed up the transfer of images to Colab. The `yolo_infer` function that was made earlier is then called on the compressed JPEG image and the resulting bounding boxes are then drawn on the webcam images when they are recieved back from the backend.

In [4]:
HTML(filename='utils/demos/colab_templates/inference.html')

 0.08053755760192871 [[20, 487, 70, 382, 0, 0.7905]]