# YOLO in TensorFlow: Inference Tutorial

This is a Google Colaboratory notebook file to demonstrate inference using the TensorFlow Model Garden implementation of YOLOv3 on a video stream from your webcam.

First, clone the GitHub repo and import the necessary libraries.

In [None]:
%tensorflow_version 2.x

from google.colab import output
from IPython.display import JSON

import cv2
import tensorflow as tf
import numpy as np

import os
import pathlib
import urllib.request

# Clone the tensorflow models repository if it doesn't already exist
if "TensorFlowModelGardeners" in pathlib.Path.cwd().parts:
  while "TensorFlowModelGardeners" in pathlib.Path.cwd().parts:
    os.chdir('..')
elif not pathlib.Path('TensorFlowModelGardeners').exists():
  !git clone --depth 1 --branch v3-new-api https://github.com/PurdueCAM2Project/TensorFlowModelGardeners
os.chdir('TensorFlowModelGardeners')
!git pull

from yolo import Yolov3
from yolo.utils.testing_utils import int_scale_boxes

After cloning the repo, build the model and load the Darknet (paper implementation) pretrained weights. This may take time since Colab needs to download the weights.

A prediction function for the model is also created. The [`predict`](https://www.tensorflow.org/api_docs/python/tf/keras/Model?hl=en#predict) function is faster for batched inputs (many images being processed at the same time) than it is for single images, like used in this tutorial. It is used for the sake of example here.

In [None]:
model = Yolov3(type="regular", classes=80)
model.load_weights_from_dn(dn2tf_backbone = True, dn2tf_head = True)
model.set_prediction_filter()
model.make_predict_function()

Below is a function to infer the bounding boxes from an online image using the YOLO model. The frontend will pass the image from the webcam to the backend function in Colab by using a URI. When the backend function receives the image, it will decode the image into a Numpy array of integers (0 to 255) with 3 dimensions (width, height, RGB channels). Next, it will use the TensorFlow to normalize the pixels and resize the image. After that, it will use the model to predict the bounding boxes for any objects that may appear in the image. Finally, the bounding box format is converted to a JSON object so it can be returned and shown on the frontend.

In [8]:
# A OpenCV flag to treat the input image as an RGB image. This flag has a
# different name in OpenCV 2 and OpenCV 3, so a condition is needed to work with
# both versions.
COLOR_IMAGE = cv2.IMREAD_COLOR if int(cv2.__version__.split('.', 1)[0]) >= 3 \
  else cv2.CV_LOAD_IMAGE_COLOR

def yolo_infer(uri: str) -> JSON:
  # Decode the image
  with urllib.request.urlopen(uri) as response:
    data = response.read()
  img_buf = np.frombuffer(data, np.uint8)
  img = cv2.imdecode(img_buf, COLOR_IMAGE)
  
  # Preprocess the image
  mat = tf.cast(img, tf.float16)
  mat /= 255
  mat = tf.expand_dims(mat, axis = 0)
  mat = tf.image.resize(mat, model.input_image_size)

  # Run the inference
  pred = model.predict(mat)

  # Rescale bounding boxes
  out, classes = int_scale_boxes(pred[0], pred[1], 500, 375)

  # Convert the format of the bounding boxes to match the format in the
  # HTML document shown below: [x1, x2, y1, y2, c, p]
  out = out[0].numpy()
  classes = classes[0].numpy()
  boxes = []
  for i in range(out.shape[0]):
    if out[i][3] == 0:
      break
    boxes.append(list(out[i]) + [classes[i], pred[2][0][i]])

  # Return control to the client side (JavaScript in the HTML document)
  return JSON(boxes)

output.register_callback('yolo_infer', yolo_infer)

Below is a frontend interface to access the webcam, and stream the images to the backend, and display the bounding boxes to the user. The stream uses JPEG compression to speed up the transfer of images to Colab. The `yolo_infer` function that was made earlier is then called on the compressed JPEG image and the resulting bounding boxes are then drawn on the webcam images when they are recieved back from the backend.

In [None]:
%%html
<!--
  Sources:
    https://www.kirupa.com/html5/accessing_your_webcam_in_html5.htm
    https://developer.mozilla.org/en-US/docs/Web/Guide/Audio_and_video_manipulation -->

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style>
#container {
	margin: 0px auto;
	width: 500px;
	height: 375px;
	border: 10px #333 solid;
}
#videoElement {
	width: 500px;
	height: 375px;
	background-color: #666;
}
#my-canvas {
	background-color: #666;
}
</style>
</head>
 
<body>
<div id="container">
  <canvas id="my-canvas" width="500" height="375"></canvas>
	<video autoplay="true" id="videoElement" style="visibility:hidden"></video>
</div>

<button id="toggleWebcam">Start Webcam</button>

<script src="https://code.jquery.com/jquery-3.5.1.min.js" integrity="sha256-9/aliU8dGd2tb6OSsuzixeV4y/faTqgFtohetphbbj0=" crossorigin="anonymous"></script>
<script>
  // Load COCO class names from the repo and give the classes seemingly random, but distinct, colors
  var classes;
  var colors = [];
  $.ajax({
    url: 'https://raw.githubusercontent.com/PurdueCAM2Project/TensorFlowModelGardeners/master/yolo/dataloaders/dataset_specs/coco.names',
    success: function(data) {
      classes = data.split('\n');
      for (var i = 0; i < classes.length; i++) {
        colors.push("#" + Math.round(0x1000000 * (i / classes.length)).toString(16));
      }
      for (var i = colors.length - 1; i >= 0; i--) {
          j = Math.floor(Math.random() * (i + 1));
          x = colors[i];
          colors[i] = colors[j];
          colors[j] = x;
      }
    }
  });

  var video = document.querySelector("#videoElement");
  var toggleWebcamButton = document.querySelector("#toggleWebcam");
  var camOn = false;

  function startWebcam(e) {
    camOn = true;
    if (navigator.mediaDevices.getUserMedia) {
      navigator.mediaDevices.getUserMedia({ video: true })
        .then(function (stream) {
          video.srcObject = stream;
        })
        .catch(function (err0r) {
          console.log("Something went wrong!");
        });
    }
  }

  function stopWebcam(e) {
    var stream = video.srcObject;
    var tracks = stream.getTracks();

    for (var i = 0; i < tracks.length; i++) {
      var track = tracks[i];
      track.stop();
    }

    video.srcObject = null;
    camOn = false;
  }

  function toggleWebcam(e) {
    if (camOn) {
      stopWebcam(e);
      toggleWebcamButton.innerText = "Start Webcam";
    } else {
      startWebcam(e);
      toggleWebcamButton.innerText = "Stop Webcam";
      processor.doLoad();
    }
  }

  toggleWebcamButton.addEventListener("click", toggleWebcam);

  var processor = {  
    timerCallback: async function() {  
      if (!camOn) {  
        return;  
      }  
      await this.computeFrame();  
      var self = this;  
      setTimeout(function () {  
        self.timerCallback();  
      }, 0);
    },

    doLoad: function() {
      this.video = document.getElementById("videoElement");
      this.c1 = document.getElementById("my-canvas");
      this.ctx1 = this.c1.getContext("2d");
      this.lastFrameBoundingBoxes = [];

      this.width = 500;  
      this.height = 375;  
      this.timerCallback();
    },  

    computeFrame: async function() {
      this.ctx1.drawImage(this.video, 0, 0, this.width, this.height);
      var frame = this.ctx1.getImageData(0, 0, this.width, this.height);

      this.ctx1.putImageData(frame, 0, 0);
      var url = this.c1.toDataURL('image/jpeg', 0.8);
      this.drawBoundingBoxes(this.lastFrameBoundingBoxes);
      var result = await google.colab.kernel.invokeFunction('yolo_infer', [url], {});
      this.lastFrameBoundingBoxes = result.data['application/json'];
      return;
    },

    drawBoundingBoxes: function(boxes) {
      for (var i = 0; i < boxes.length; i++) {
        const [x1, x2, y1, y2, c, p] = boxes[i];
        console.log([classes[c] + ", " + p, x1, y1]);
        debugger;
        this.ctx1.beginPath();
        this.ctx1.lineWidth = "2";
        this.ctx1.strokeStyle = colors[c];
        this.ctx1.rect(x1, y1, x2-x1, y2-y1);
        this.ctx1.stroke();
        this.ctx1.font = "18px Monospace";
        this.ctx1.fillStyle = colors[c];
        this.ctx1.fillText(classes[c] + ", " + p.toFixed(2), x1, y1 - 3);
      }
    }
  };
</script>
</body>
</html>