# Recognize and Count Objects in a Video

An object detection classifier can be used to identify and locate objects in a static image. When using video, you can use the same approach to static detection individual frames.  In this Jupyter Notebook, we'll use the [IBM Maximo Visual Inspection](https://www.ibm.com/products/ibm-maximo-visual-inspection) for object detection and [OpenCV Python API](https://opencv.org/) to process the video.

Before running this notebook, you will need to train and deploy an object detection model. PowerAI Vision has auto-labeling to enhance your dataset for accuracy when using video input. After you train and deploy your model, set the `MAXIMO_VISUAL_INSPECTION_API_URL` constant below to use your model for inference.

Extracting frames and locating objects is easy with OpenCV and IBM Maximo Visual Inspection formerly known as PowerAI Vision. The challenge is how to keep track of objects if you want to count them. As an object moves, you will need to be able to determine whether or not you have already counted the object. In this notebook, we'll use the OpenCV Tracking API to follow cars down the road while we run PowerAI Vision object detection on a sample of the frames. With tracking, we'll be able to avoid double counting without requiring a lot of code.


| First Detected... | Followed Down the Road |
| :---: | :---: |
| ![detected](https://raw.githubusercontent.com/IBM/powerai-counting-cars/master/doc/source/images/output-frame_00011.jpg) | ![tracked](https://raw.githubusercontent.com/IBM/powerai-counting-cars/master/doc/source/images/output-frame_00128.jpg) |


## First setup some parameters

### Required setup!

Your Maximo Visual Inspection API endpoint for the model that you trained and deployed will need to be set here.

In [None]:
# Set this URL to the API endpoint of your deployed model.
MAXIMO_VISUAL_INSPECTION_API_URL = "https://ny1.ptopenlab.com/powerai-vision-ny/api/dlapis/your-guid-here"


### Optional configuration

Here you can customize some settings to tune your results.

> NOTE: The notebook uses sampling and cached results to speed things up for iterative development. If you change the video, you will need to run with `CLEAN = True` to delete and regenerate your cached frames and inference results!

In [None]:
CLEAN = False  # USE WITH CARE! Wipe out saved files when this is true (else reuse for speed)
input_video_url = "https://raw.githubusercontent.com/IBM/powerai-counting-cars/master/data/test_video.mp4"  # The input video
START_LINE = 0  # If start line is > 0, cars won't be added until below the line (try 200)
FRAMES_DIR = "frames"  # Output dir to hold/cache the original frames
OUTPUT_DIR = "output"  # Output dir to hold the annotated frames
SAMPLING = 10  # Classify every n frames (use tracking in between)
CONFIDENCE = 0.80  # Confidence threshold to filter iffy objects

# OpenCV colors are (B, G, R) tuples -- RGB in reverse
WHITE = (255, 255, 255)
YELLOW = (66, 244, 238)
GREEN = (80, 220, 60)
LIGHT_CYAN = (255, 255, 224)
DARK_BLUE = (139, 0, 0)
GRAY = (128, 128, 128)

## Install Python Requirements

Install Python packages from pypi.org.
We're pinning versions here to what was last tested.
You might want to comment these out after you have the packages, or update the versions if you
want to try the latest.

In [None]:
!pip install opencv-python==3.4.4.19
!pip install opencv-contrib-python==3.4.4.19
!pip install requests==2.21.0
!pip install pandas==0.23.4
!pip install urllib3==1.24.1

In [None]:
import json
import glob
import math
import os
import shutil

import cv2
from IPython.display import clear_output, Image, display
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
print("Warning: Certificates not verified!")

%matplotlib notebook


## Download the video
This will download a small example video.


In [None]:
!wget {input_video_url}
input_video = input_video_url.split('/')[-1]

## Create or clean the directories
Caching the frames and output directories allows the processing to continue where it left off. This is particularly useful when using a shared system with deployment time limits. This also allows you to quickly `Run all` when tweaking Python code that does not affect the inference.

If you change the input video or just want a fresh start, you should `CLEAN` or change the directory names.

In [None]:
if CLEAN:
    if os.path.isdir(FRAMES_DIR):
        shutil.rmtree(FRAMES_DIR)
    if os.path.isdir(OUTPUT_DIR):
        shutil.rmtree(OUTPUT_DIR)

if not os.path.isdir(FRAMES_DIR):
    os.mkdir(FRAMES_DIR)
if not os.path.isdir(OUTPUT_DIR):
    os.mkdir(OUTPUT_DIR)

## Parse and explode the video file into JPEGs
Each frame is saved as an individual JPEG file for later use.

In [None]:
if os.path.isfile(input_video):
    video_capture = cv2.VideoCapture(input_video)
else:
    raise Exception("File %s doesn't exist!" % input_video)

total_frames = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
print("Frame count estimate is %d" % total_frames)

num = 0
while video_capture.get(cv2.CAP_PROP_POS_FRAMES) < video_capture.get(cv2.CAP_PROP_FRAME_COUNT):
    success, image = video_capture.read()
    if success:
        num = int(video_capture.get(cv2.CAP_PROP_POS_FRAMES))
        print("Writing frame {num} of {total_frames}".format(
            num=num, total_frames=total_frames), end="\r")
        cv2.imwrite('{frames_dir}/frame_{num:05d}.jpg'.format(
            frames_dir=FRAMES_DIR, num=num), image)
    else:
        # TODO: If this happens, we need to add retry code
        raise Exception('Error writing frame_{num:05d}.jpg'.format(
            num=int(video_capture.get(cv2.CAP_PROP_POS_FRAMES))))

print("\nWrote {num} frames".format(num=num))

FRAME_FPS = int(video_capture.get(cv2.CAP_PROP_FPS))
FRAME_WIDTH = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
FRAME_HEIGHT = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
ROI_YMAX = int(round(FRAME_HEIGHT * 0.75))  # Bottom quarter = finish line

print("Frame Dimensions: %sx%s" % (FRAME_WIDTH, FRAME_HEIGHT))

## Maximo Visual Insights inference wrapper
Define a helper/wrapper to call Maximo Visual Insights and return the inference result.

In [None]:
s = requests.Session()


def detect_objects(filename):

    with open(filename, 'rb') as f:
        # WARNING! verify=False is here to allow an untrusted cert!
        r = s.post(MAXIMO_VISUAL_INSPECTION_API_URL,
                   files={'files': (filename, f)},
                   verify=False)

    return r.status_code, json.loads(r.text)


## Test the API on a single frame
Let's look at the result of a single inference operation from the Maximo Visual Insights Object Detection API. We see a standard HTTP return code, and a JSON response which includes the image URL, and tuples that indicate the confidence and bounding-box coordinates of the objects that we classified.

In [None]:
rc, jsonresp = detect_objects('frames/frame_00100.jpg')

print("rc = %d" % rc)
print("jsonresp: %s" % jsonresp)
if 'classified' in jsonresp:
    print("Got back %d objects" % len(jsonresp['classified']))
print(json.dumps(jsonresp, indent=2))


## Get object detection results for sampled frames
Since we've stored all video frames on disk (for easy reference), we can iterate over those files
and make queries as appropriate to Maximo Visual Inspection's API. We'll store the results in a
`tracking_results` dictionary, organized by file name. Since we are tracking objects from frame
to frame, we can use sampling to decide how often to check for new objects.

We're also caching the results so that you can change later code and run the notebook over
without running the same inference over again.

In [None]:
# Serialize requests, storing them in a "tracking_results" dict

try:
    with open('frames/frame-data-newmodel.json') as existing_results:
        tracking_results = json.load(existing_results)
except Exception:
    # Any fail to read existing results means we start over
    tracking_results = {}

print("Sampling every %sth frame" % SAMPLING)
i = 0
cache_used = 0
sampled = 0
for filename in sorted(glob.glob('frames/frame_*.jpg')):
    i += 1

    if not i % SAMPLING == 0:  # Sample every Nth
        continue

    existing_result = tracking_results.get(filename)
    if existing_result and existing_result['result'] == 'success':
        cache_used += 1
    else:
        rc, results = detect_objects(filename)
        if rc != 200 or results['result'] != 'success':
            print("ERROR rc=%d for %s" % (rc, filename))
            print("ERROR result=%s" % results)
        else:
            sampled += 1
            # Save frequently to cache partial results
            tracking_results[filename] = results
            with open('frames/frame-data-newmodel.json', 'w') as fp:
                json.dump(tracking_results, fp)

    print("Processed file {num} of {total_frames} (used cache {cache_used} times)".format(
        num=i, total_frames=total_frames, cache_used=cache_used), end="\r")

# Finally, write all our results
with open('frames/frame-data-newmodel.json', 'w') as fp:
    json.dump(tracking_results, fp)

print("\nDone")


## Define helper functions for tracking and drawing labels
Refer to the [OpenCV docs.](https://docs.opencv.org/3.4.1/)

In [None]:
def label_object(color, textcolor, fontface, image, car, textsize, thickness, xmax, xmid, xmin, ymax, ymid, ymin):
    cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, thickness)
    pos = (xmid - textsize[0]//2, ymid + textsize[1]//2)
    cv2.putText(image, car, pos, fontface, 1, textcolor, thickness, cv2.LINE_AA)


def update_trackers(image, counters):
    left_lane = counters['left_lane']
    right_lane = counters['right_lane']
    boxes = []
    color = (80, 220, 60)
    fontface = cv2.FONT_HERSHEY_SIMPLEX
    fontscale = 1
    thickness = 1

    for n, pair in enumerate(trackers):
        tracker, car = pair
        textsize, _baseline = cv2.getTextSize(
            car, fontface, fontscale, thickness)
        success, bbox = tracker.update(image)

        if not success:
            counters['lost_trackers'] += 1
            del trackers[n]
            continue

        boxes.append(bbox)  # Return updated box list

        xmin = int(bbox[0])
        ymin = int(bbox[1])
        xmax = int(bbox[0] + bbox[2])
        ymax = int(bbox[1] + bbox[3])
        xmid = int(round((xmin+xmax)/2))
        ymid = int(round((ymin+ymax)/2))

        if ymid >= ROI_YMAX:
            label_object(WHITE, WHITE, fontface, image, car, textsize, 1, xmax, xmid, xmin, ymax, ymid, ymin)
            # Count left-lane, right-lane as cars ymid crosses finish line
            if xmid < 630:
                left_lane += 1
            else:
                right_lane += 1
            # Stop tracking cars when they hit finish line
            del trackers[n]
        else:
            # Rectangle and number on the cars we are tracking
            label_object(color, YELLOW, fontface, image, car, textsize, 4, xmax, xmid, xmin, ymax, ymid, ymin)

    # Add finish line overlay/line
    overlay = image.copy()

    # Shade region of interest (ROI). We're really just using the top line.
    cv2.rectangle(overlay,
                  (0, ROI_YMAX),
                  (FRAME_WIDTH, FRAME_HEIGHT), DARK_BLUE, cv2.FILLED)
    cv2.addWeighted(overlay, 0.6, image, 0.4, 0, image)

    # Draw start line, if > 0
    if START_LINE > 0:
        cv2.line(image, (0, START_LINE), (FRAME_WIDTH, START_LINE), GRAY, 4, cv2.LINE_AA)
    # Draw finish line with lane hash marks
    cv2.line(image, (0, ROI_YMAX), (FRAME_WIDTH, ROI_YMAX), LIGHT_CYAN, 4, cv2.LINE_AA)
    cv2.line(image, (350, ROI_YMAX - 20), (350, ROI_YMAX + 20), LIGHT_CYAN, 4, cv2.LINE_AA)
    cv2.line(image, (630, ROI_YMAX - 20), (630, ROI_YMAX + 20), LIGHT_CYAN, 4, cv2.LINE_AA)
    cv2.line(image, (950, ROI_YMAX - 20), (950, ROI_YMAX + 20), LIGHT_CYAN, 4, cv2.LINE_AA)

    # Add lane counter
    cv2.putText(image, "Lane counter:", (30, ROI_YMAX + 80), fontface, 1.5, LIGHT_CYAN, 4, cv2.LINE_AA)
    cv2.putText(image, str(left_lane), (480, ROI_YMAX + 80), fontface, 1.5, LIGHT_CYAN, 4, cv2.LINE_AA)
    cv2.putText(image, str(right_lane), (800, ROI_YMAX + 80), fontface, 1.5, LIGHT_CYAN, 4, cv2.LINE_AA)
    seconds = counters['frames'] / FRAME_FPS
    cv2.putText(image, "Cars/second:", (35, ROI_YMAX + 110), fontface, 0.5, LIGHT_CYAN, 1, cv2.LINE_AA)
    cv2.putText(image, '{0:.2f}'.format(left_lane / seconds), (480, ROI_YMAX + 110), fontface, 0.5, LIGHT_CYAN, 1, cv2.LINE_AA)
    cv2.putText(image, '{0:.2f}'.format(right_lane / seconds), (800, ROI_YMAX + 110), fontface, 0.5, LIGHT_CYAN, 1, cv2.LINE_AA)

    counters['left_lane'] = left_lane
    counters['right_lane'] = right_lane
    return boxes, counters


In [None]:
def not_tracked(objects, boxes):
    if not objects:
        return []  # No new classified objects to search for
    if not boxes:
        return objects  # No existing boxes, return all objects

    new_objects = []
    for obj in objects:
        ymin = obj.get("ymin", "")
        ymax = obj.get("ymax", "")
        ymid = int(round((ymin+ymax)/2))
        xmin = obj.get("xmin", "")
        xmax = obj.get("xmax", "")
        xmid = int(round((xmin+xmax)/2))
        box_range = ((xmax - xmin) + (ymax - ymin)) / 2
        for bbox in boxes:
            bxmin = int(bbox[0])
            bymin = int(bbox[1])
            bxmax = int(bbox[0] + bbox[2])
            bymax = int(bbox[1] + bbox[3])
            bxmid = int((bxmin + bxmax) / 2)
            bymid = int((bymin + bymax) / 2)
            if math.sqrt((xmid - bxmid)**2 + (ymid - bymid)**2) < box_range:
                # found existing, so break (do not add to new_objects)
                break
        else:
            new_objects.append(obj)

    return new_objects


In [None]:
def in_range(obj):
    ymin = obj['ymin']
    ymax = obj['ymax']
    if ymin < START_LINE or ymax > ROI_YMAX:
        # Don't add new trackers before start or after finish.
        # Start line can help avoid overlaps and tracker loss.
        # Finish line protection avoids counting the car twice.
        return False
    return True
    
def add_new_object(obj, image, cars):
    car = str(cars)
    xmin = obj['xmin']
    xmax = obj['xmax']
    ymin = obj['ymin']
    ymax = obj['ymax']
    xmid = int(round((xmin+xmax)/2))
    ymid = int(round((ymin+ymax)/2))
    fontface = cv2.FONT_HERSHEY_SIMPLEX
    fontscale = 1
    thickness = 1
    textsize, _baseline = cv2.getTextSize(
        car, fontface, fontscale, thickness)

    # init tracker
    tracker = cv2.TrackerKCF_create()  # Note: Try comparing KCF with MIL
    success = tracker.init(image, (xmin, ymin, xmax-xmin, ymax-ymin))
    if success:
        trackers.append((tracker, car))

    label_object(GREEN, YELLOW, fontface, image, car, textsize, 4, xmax, xmid, xmin, ymax, ymid, ymin)


## Inference, tracking, and annotation
Loop through the saved frames and:
1. Update the trackers to follow already detected objects from frame to frame.
1. Look for new objects if we ran inference on this frame.
    * Check for overlap with tracked objects.
    * If no overlap, assign a sequence number and start tracking.
1. Write an annotated image with tracked objects highlighted and numbered.

In [None]:
cars = 0
trackers = []
counters = {
    'left_lane':  0,
    'right_lane':  0,
    'lost_trackers': 0,
    'frames': 0,
}

with open('frames/frame-data-newmodel.json') as existing_results:
    tracking_results = json.load(existing_results)

for filename in sorted(glob.glob('frames/frame_*.jpg')):
    counters['frames'] += 1
    img = cv2.imread(filename)
    boxes, counters = update_trackers(img, counters)

    if filename in tracking_results and 'classified' in tracking_results[filename]:
        jsonresp = tracking_results[filename]
        for obj in not_tracked(jsonresp['classified'], boxes):
            if in_range(obj):
                cars += 1
                add_new_object(obj, img, cars)  # Label and start tracking

    # Draw the running total of cars in the image in the upper-left corner
    cv2.putText(img, 'Cars detected: ' + str(cars), (30, 60),
                cv2.FONT_HERSHEY_SIMPLEX, 1.5, DARK_BLUE, 4, cv2.LINE_AA)
    # Add note with count of trackers lost
    cv2.putText(img, 'Cars lost: ' + str(counters['lost_trackers']), (35, 85),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, DARK_BLUE, 1, cv2.LINE_AA)

    cv2.imwrite("output/output-" + filename.split('/')[1], img)
    print("Processed file {num} of {total_frames}".format(
        num=counters['frames'], total_frames=total_frames), end="\r")

print("\nDone")


## Play the annotated frames in the notebook

This code will play the annotated frames in a loop to demonstrate the new video.
Running this in the notebook is usually slow. Shrinking the size helps some.
Refer to the following section to build a real, full speed video.

In [None]:
for filename in sorted(glob.glob(os.path.join(os.path.abspath(OUTPUT_DIR),
                                              'output-frame_*.jpg'))):
    frame = cv2.imread(filename)
    clear_output(wait=True)
    rows, columns, _channels = frame.shape
    frame = cv2.resize(frame, (int(columns/2), int(rows/2)))  # shrink it
    _ret, jpg = cv2.imencode('.jpg', frame)
    display(Image(data=jpg))

print("\nDone")


## Create a video from the annotated frames

This command requires `ffmpeg`. It will combine the annotated
frames to build an MP4 video which you can play at full speed
(the notebook playback above was most likely slow).

Uncomment the command to try running it from this notebook, or
copy the output files to a system with `ffmpeg` and run the
command there.

> NOTE: The command below requires libx264 for encoding video
stream into the H.264/MPEG-4 AVC compression format. Please 
check that `ffmpeg` was configured and built with `--enable-libx264`
(`ffmpeg 2>&1 | grep libx264`).
If not, just remove the `-vcodec libx264` option from the
following command.

In [1]:
# !ffmpeg -y -r 60 -f image2 -i output/output-frame_%05d.jpg -vcodec libx264 -crf 25  -pix_fmt yuvj420p annotated_video.mp4


<p><font size=-1 color=gray>
&copy; Copyright 2019 IBM Corp. All Rights Reserved.
<p>
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file
except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the
License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the specific language governing permissions and
limitations under the License.
</font></p>