# Demonstrating PowerAI Vision for Sports Advertising

### Overview
This notebook demonstrates how to leverage a model built with PowerAI Vision to track logos at a sporting event and create a video showing both the logos detected within the video as well as keep track of the total time a given logo has been on the screen. PowerAI Vision can do object detection with rectangular boxes (bounding boxes for the objects detected) as well as polygons to detect the edges around the object. In this lab we will use boxes around the advertising logos and we will use a polygon to trace the outside of the goal keeper and the referee. 

This example is a soccer game where we will track the following logos:

- fly erirates
- toyota
- kirin
- continental

as well as the ball.

You can modify the example to use any video you like that has an object detection model deployed in PowerAI Vision. Comments throughout the notebook will help you modify it to your needs.

Special thanks to Michael Hollinger for the initial code and his heavy lifting to create this demo.

In [1]:
# Cell 1

# Import the modules we are going to use throughtout the code example.
import cv2
import glob
import re
import os
import math

from pathlib import Path
from collections import defaultdict
import signal
import requests
import json
import concurrent.futures
import numpy as np

In [2]:
#Cell 2

# Check what version of cv2 we have installed - it should be 3.x 
cv2.__version__

'4.0.0'

#### This next section contains all of the variables that you will need to change to customize this example for the model you trained and deployed in PowerAI Vision.

The first section is just a set of boolean variables to determine if you want to perform all the functions in this notebook or just a subset of them (break the input video into separate frame files, render the images with times and labels, print out additional debug messages and clean up afterwards).

The second section asks for the input video that you want to perform the object detection on and the name of an output video file you want to have rendered.

The next section is the labels that you are using for object detection. These are the labels that you created in your PowerAI Vision training run. We also assign a specific color to each label for what will appear on the screen.

Finally, we need the PowerAI Vision REST API endpoint that you have deployed the model to and the API ID that was assigned by PowerAI Vision when you deployed the model.

In [3]:
# Cell 3

EXPLODE_FRAMES=True            # If True then we will extract the video into jpeg frames
DO_RENDERING=True              # If True then we will redraw the images with the labels and times in the image
DEBUG=True                     # Print out some additional debug code
CLEANUP_ORIGINAL_FRAMES=True   # After the video is recreated, do you want the original frames deleted
CLEANUP_INFERRED_FRAMES=True   # After the video is recreated, do you want the labeled frames deleted


videoclip = "Japan_vs_Australia_Test.mp4"     # This is the video file you want to use for input 
outputclip = "OUTPUT_Japan_vs_Australia_Test.mp4"    # This is the name of the output file with objects detected


# Number of advertisers you are labeling
num_ads=5

# Set the advertisers you are labelling and their associated colors
colortable={
            #advertisers
            'fly emirates': (33,26,215),
            'toyota': (254,0,0),
            'kirin':  (0,209,254),
            'continental':(6,122,10),
            'ball':(63,0,102)
           }

# define the logos you want to fill with polygons
polyfill = {'ball'}

# Values for PowerAI REST API
pai_api = 'api/dlapis/4807904b-cf00-442c-a716-9f24907a6bee'
powerai_baseurl='https://10.31.204.151/powerai-vision/'


#### In this next section we take the input video and break it down into individual frames and store each frame as a jpeg file. We will then use those files as input to the PowerAI Vision rest API to perform object detection on each frame.

We are using cv2 to get individual frames and then we write those frames out to a jpeg file into format frameN.jpg

In [4]:
# Cell 4

def signal_handler(signal, frame):
    # KeyboardInterrupt detected, exiting
    global is_interrupted
    is_interrupted = True
    

# First let's turn the video into a set of JPEG files
if EXPLODE_FRAMES:
    videofile = Path(videoclip)
    if videofile.is_file():
        cap = cv2.VideoCapture(videoclip)
        while not cap.isOpened():
            cap = cv2.VideoCapture(videoclip)
            cv2.waitKey(1000)
            print ("Wait for the header")
        print ("File loaded!")
    else:
        print ("File doesn't exist!")

    
    signal.signal(signal.SIGINT, signal_handler)
    is_interrupted = False
    # get the video size for later rendering and screen placement of the advertising times
    vid_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))-60
    vid_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    FPS = int(cap.get(cv2.CAP_PROP_FPS))
    if DEBUG:
        print("Video is %d x %d" % (vid_width,vid_height))
        print("Video frame rate is %d" % FPS)
        print("Frame count estimate is %d" % cap.get(cv2.CAP_PROP_FRAME_COUNT))  # also print the number of frames in the video
    
    pos_frame = cap.get(cv2.CAP_PROP_POS_FRAMES)
    fail_count = 0;
    while True:
        flag, frame = cap.read()
        frame = frame[0:vid_height,:]
                    
        if flag:
            pos_frame = cap.get(cv2.CAP_PROP_POS_FRAMES)
            name = "frame%d.jpg"%pos_frame
            cv2.imwrite(name, frame)

            #let's only print out every 20 frames to keep the output down for long videos
            if DEBUG:
                if pos_frame % 20 == 0:
                    print (str(pos_frame)+" frames processed")
        else:
            # The next frame is not ready, so we try to read it again
            print ("frame is not ready, retrying at frame %d" % pos_frame)
            cap.set(cv2.CAP_PROP_POS_FRAMES, pos_frame-1)
            fail_count += 1
            if fail_count == 5:
                print("Too many failures, exiting at frame %d" % pos_frame)
                break
 
        if cap.get(cv2.CAP_PROP_POS_FRAMES) == cap.get(cv2.CAP_PROP_FRAME_COUNT):
            # If the number of captured frames is equal to the total number of frames, we stop
            print ("Last frame complete at %d" % pos_frame)
            break
        
# otherwise we already have the JPEG files so skip the exploding
else:
    print("Skipping frame explode...")

File loaded!
Video is 854 x 420
Video frame rate is 30
Frame count estimate is 976
20.0 frames processed
40.0 frames processed
60.0 frames processed
80.0 frames processed
100.0 frames processed
120.0 frames processed
140.0 frames processed
160.0 frames processed
180.0 frames processed
200.0 frames processed
220.0 frames processed
240.0 frames processed
260.0 frames processed
280.0 frames processed
300.0 frames processed
320.0 frames processed
340.0 frames processed
360.0 frames processed
380.0 frames processed
400.0 frames processed
420.0 frames processed
440.0 frames processed
460.0 frames processed
480.0 frames processed
500.0 frames processed
520.0 frames processed
540.0 frames processed
560.0 frames processed
580.0 frames processed
600.0 frames processed
620.0 frames processed
640.0 frames processed
660.0 frames processed
680.0 frames processed
700.0 frames processed
720.0 frames processed
740.0 frames processed
760.0 frames processed
780.0 frames processed
800.0 frames processed
8

#### This next section is actually calling the deployed model to perform the object detection and return the caller the JSON output of the object detection.

In [5]:
# Cell 5

#define the code to do an inference with IBM PowerAI Vision
def inferImage(img, api_id):
    endpoint=powerai_baseurl+api_id
    f = open(img, 'rb')
    myfiles={'files': (img, f)}
    rc = 0
    retry_count = 0
    while (rc != 200) and (retry_count < 5):
        if retry_count != 0:
            print ("retrying upload for %s, attempt %d" % (img, retry_count))
        r = requests.post(endpoint, files=myfiles, verify=False)
        rc = r.status_code
        retry_count = retry_count + 1
    f.close()
    resp_value=json.loads(r.text)
    # print( resp_value )
    return rc, resp_value['classified']

In [6]:
# Cell 6

# infer from just one frame to make sure we get what we expect - note that if the first frame of the test video
# does not have any objects in it, you will see "Got back 0 objects". Otherwise you will see the objects detected
requests.packages.urllib3.disable_warnings()
rc, objresp = inferImage('frame1.jpg',pai_api)
if rc == 200:
    print (json.dumps(objresp, indent=2))
    print ( "Got back %d objects" % len(objresp))

[]
Got back 0 objects


In [7]:
# Cell 7

def drawText(image, text, x, y, size, color, font):
    cv2.putText(image, text, (x, y), font, size, color,2,cv2.LINE_AA)

#### The next section takes the items that were detected and performs two things 

1) update_metrics will increment the counters for the various labels so that we can know how long each logo was in the video. It will also draw a grey box over top of the video image and print out the total time each logo has been on the screen up to this point.

2) draw_label will put the coloured dot in the midpoint of the box where the object was detected in the inference stage and also write the label of the logo next to that dot.

3) if we find one of the objects that you want to fill in with segmentation then we will fill in that using cv2 to fill in the polygon.

In [8]:
# Cell 8
# Define the code that will put the labels onto of the images and add the total count times
ads_drawn = defaultdict(int)

# This is the location on the screen where the ad times will go - if you want to move it to the right increase the AD_START_X
AD_START_X = 25
AD_START_Y = vid_height - (num_ads * 25 + 50) # the height will depend on the number of ads we are tracking

AD_BOX_COLOR=(180,160,160)  # Make the ad timing box grey
COLOR_WHITE=(255,255,255)   # Make the text of the labels and the title white

# This is flushed after each frame is drawn, but we use it to keep track of what's on-screen at a given moment
things_drawn = defaultdict(int)

# This is the code that will count up the ad times on the screen
def update_metrics(image):
    
    # Count that this frame had an object of interest, regardless of the number of those items...
    for thing, count in things_drawn.items():
        # if DEBUG:
            # print("Found %d counts of %s" % (count, thing))  # even in debug mode this can print out too much so commented out
        ads_drawn[thing]+=1
    
    #flush the things-drawn
    things_drawn.clear()
    
    
    # Make an overlay with the image shaded the way we want it...
    overlay = image.copy()

    # Shade Ad Box
    cv2.rectangle(overlay,
                  (AD_START_X-15, AD_START_Y-25),
                  (AD_START_X+355, AD_START_Y+(num_ads*25)+30), AD_BOX_COLOR, cv2.FILLED)
    cv2.addWeighted(overlay, 0.7, image, 0.3, 0, image)
    
    #draw all labels in the ad table
    cursor_x = AD_START_X
    cursor_y = AD_START_Y
    #draw heading
    drawText(image=image,
             text="Advertiser Screen Time (sec)",
             x=cursor_x,
             y=cursor_y,
             size=0.7,
             color=COLOR_WHITE,
             font=cv2.FONT_HERSHEY_SIMPLEX)
    #draw ad labels
    for ad_label in sorted(ads_drawn, key=ads_drawn.get, reverse=True):
        if not ad_label in polyfill:
            screen_time = ads_drawn[ad_label]
            cursor_y += 25
            FONT_SCALE=0.7
            col = colortable[ad_label] if ad_label in colortable else (255,255,255)
            screen_time = screen_time/(FPS*1.0)
        
        
            drawText(image=image,
                     text=ad_label,
                     x=cursor_x,
                     y=cursor_y,
                     size=0.6,
                     color=col,
                     font = cv2.FONT_HERSHEY_SIMPLEX)
            drawText(image=image,
                     text=":",
                     x=cursor_x + 130,
                     y=cursor_y,
                     size=0.6,
                     color=COLOR_WHITE,
                     font = cv2.FONT_HERSHEY_SIMPLEX)
            drawText(image=image,
                     text="{0:.02f}".format(screen_time), #render fractions of a second
                     x=cursor_x + 140,
                     y=cursor_y,
                     size=0.6,
                     color=COLOR_WHITE,
                     font = cv2.FONT_HERSHEY_SIMPLEX)
    
    return

# in order to avoid the jitter of the ojbects detected points in the final video we will round to the nearest 15 pixels
def myround(x, base=15):
    return int(base * round(float(x)/base))

# this will draw a colored dot and the name of the label on top of the object we are detecting
def draw_label(resp, image):
    name = resp.get("label", "")
    col = colortable[name]
    font = cv2.FONT_HERSHEY_SIMPLEX
    fontface = cv2.FONT_HERSHEY_SIMPLEX
    fontscale = 1
    thickness = 1
    fontcolor = (255,255,255)
    textSize, baseline = cv2.getTextSize(name, fontface,
                                fontscale, thickness);
    
    # things_drawn keeps track of how many times we have seen this ad (indexed by ad name)
    things_drawn[name]+=1

    xmin = resp.get("xmin", "")
    xmax = resp.get("xmax", "")
    ymin = resp.get("ymin", "")
    ymax = resp.get("ymax", "")
    xmid = int(myround((xmin+xmax)/2))
    ymid = int(myround((ymin+ymax)/2))
    
    # this next line will draw the polygon around the object
    #   cv2.polylines(image, [np.int32(obj['polygons'][0])], True, color=col, thickness=2, lineType=8, shift=0)
    # but for our purposes if it's not the referee or the keeper, let's just put a dot on the advertiser and the name
    cv2.circle(image, (xmid, ymid), 8, color=col, thickness=-1, lineType=8, shift=0)
    cv2.putText(image, name, (xmid+5, ymid+(textSize[1])), font, 0.9,(255,255,255),2,cv2.LINE_AA)
    
    if name in polyfill:
        overlay = image.copy()
        cv2.fillPoly(overlay, [np.int32(obj['polygons'][0])], col)
        cv2.addWeighted(overlay, .5, image, .5, 0, image)
    
    cv2.putText(image, name, (xmid+5, ymid+(textSize[1])), font, 0.9,(255,255,255),2,cv2.LINE_AA)
    

### Call the PowerAI Inferencing engine for each frame

Now for each frame we call the PowerAI Vision REST API with the jpg image of that frame for it to give us back the objects found within the frame. Note that the first part of the code just steps through each frame one by one to call the inference API. But for a long video this can be slow, so the next section of the code creates N threads, each calling the inference engine for 1/Nth of the frames. You can adjust the number of threads based on what client you are running on.

In [9]:
# Cell 9 

#Now for each frame call PowerAI Vision to perform the inferencing
tracking_results = {}

# This section does the inference serialized. The inference REST endpoing in PowerAI Vision is called
# and the results are stored in a "tracking_results" dict for later processing
# It works fine but it's slow for processing large videos so the next section uses a set of threads to make more calls simultaneously
#
# i = 0;
# for name in glob.glob('frame*.jpg'):
#    rc, tracking_results[name] = inferImage(name,pai_api)
#    
#    i +=1
#    print(".", end = " ")
#    # let's only print every 20 frames to keep the output down for long videos
#    if i % 20 == 0:
#        print ("%d frames processed" % i)
#        
# print("complete %d frames" % i)

# Serial is too slow so let's run the inferencing across a set of threads (each thread doing a 1/N of the images)
import threading
import math

num_threads = 6
# The worker thread will check to see what thread they are assigned to and pick 1/num_threads of the images based on that
# Just for debugging we will print out the frame number that just finished an inference so we can make sure they are all moving
def worker(num):
    start_frame = int(math.ceil(pos_frame/num_threads))*num + 1;
    end_frame = int(math.ceil(pos_frame/num_threads))*(num+1) + 1;
    for i in range(start_frame, end_frame):
            name = "frame" + str(i) + ".jpg"
            # check if the file exists and if so call the inference engine
            if os. path. isfile(name):
                rc, tracking_results[name] = inferImage(name,pai_api)
                if DEBUG:
                    print("%d, " % i, end = " ")        
        
    


#Now spin up a bunch threads
threads = []
print("Calling PowerAI Inference in %d threads...(wait for all threads to complete before executing next cell)" % num_threads)
for i in range(num_threads):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

# wait for all the threads to complete before going on to the next cell
for thread in threads:
    thread.join()
    
print("All threads complete")
    


Calling PowerAI Inference in 6 threads...(wait for all threads to complete before executing next cell)
327,  1,  490,  653,  164,  816,  328,  2,  491,  654,  165,  817,  329,  3,  492,  655,  166,  330,  818,  4,  493,  656,  331,  167,  819,  5,  494,  657,  332,  168,  820,  6,  495,  658,  333,  169,  821,  7,  496,  334,  659,  170,  822,  8,  497,  335,  660,  823,  9,  336,  498,  661,  824,  171,  499,  825,  662,  10,  337,  172,  500,  826,  663,  11,  338,  173,  501,  827,  12,  339,  664,  174,  502,  828,  13,  340,  665,  175,  503,  829,  341,  14,  666,  176,  504,  830,  342,  15,  667,  505,  177,  831,  343,  16,  668,  506,  178,  832,  344,  17,  669,  507,  179,  833,  345,  18,  508,  670,  180,  834,  346,  19,  509,  671,  181,  835,  347,  510,  20,  672,  182,  836,  511,  348,  21,  673,  183,  837,  512,  349,  22,  674,  184,  838,  513,  350,  23,  675,  185,  839,  514,  351,  24,  186,  840,  515,  25,  187,  352,  676,  841,  516,  26,  188,  353,  67

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "<ipython-input-9-373191929aa1>", line 36, in worker
    rc, tracking_results[name] = inferImage(name,pai_api)
  File "<ipython-input-5-afc160849446>", line 19, in inferImage
    return rc, resp_value['classified']
KeyError: 'classified'



413,  578,  88,  901,  738,  414,  579,  89,  902,  739,  415,  580,  903,  90,  740,  416,  581,  904,  91,  741,  417,  905,  582,  92,  742,  418,  906,  583,  93,  743,  419,  907,  584,  94,  744,  908,  420,  585,  95,  745,  421,  909,  586,  96,  746,  422,  910,  587,  97,  747,  911,  423,  588,  98,  424,  912,  589,  99,  425,  913,  590,  748,  100,  914,  426,  591,  749,  101,  915,  427,  592,  750,  102,  916,  428,  593,  751,  103,  917,  429,  594,  752,  918,  104,  430,  595,  753,  919,  105,  431,  596,  920,  754,  106,  432,  921,  597,  755,  433,  922,  598,  756,  107,  434,  923,  599,  757,  108,  435,  924,  600,  758,  109,  925,  436,  601,  759,  110,  926,  437,  602,  760,  111,  927,  438,  603,  761,  928,  112,  439,  604,  929,  762,  113,  440,  605,  930,  763,  114,  441,  606,  931,  764,  115,  442,  932,  607,  765,  116,  443,  933,  608,  766,  117,  934,  444,  609,  767,  118,  935,  445,  768,  610,  936,  119,  446,  769,  611,  937,

### Redraw the frames with the objects labeled and the time for each log displayed

Now that all the code above is set up, go through each frame (in frame order) and call the function to update the metrics and draw the labels that were detected in the above inferencing stage.

In [11]:
# Cell 10

from shutil import copyfile
import traceback

# now we want to take the original frames and draw the inferences on top of them and then write out the new frame
for i in range(1, len(glob.glob('frame*.jpg'))): 
    keyframe = "frame" + str(i) + ".jpg"
    name = "frame" + str(i) + ".jpg"
   
    print(".", end = " ")
    # let's only print every 20 frames to keep the output down for long videos
    if i % 20 == 0:
        print ("%d frames redrawn" % i)
        
    if DO_RENDERING:
        img = cv2.imread(name)
    try:
        jsonresp = tracking_results[keyframe]
        if DO_RENDERING:
            # first update the metrics for the image to add any new time for the ads
            # and then draw the metric overlay onto the image
            update_metrics(img)
            
            output = img.copy()
            # Now also label all the objects we found in the frame
            for obj in jsonresp:
                draw_label(obj, output)
           
            # Write the new image to the output file which is the original image with 
            # the detected objects labeled and the sum of the ad time on the screen
            cv2.imwrite("output-"+name, output)
            
    except AssertionError as e:
        print ("Missing %s data" % name)
        traceback.print_exc()
        copyfile(name, "output-"+name)
        
print("complete %d frames redrawn" % i)

. . . . . . . . . . . . . . . . . . . . 20 frames redrawn
. . . . . . . . . . . . . . . . . . . . 40 frames redrawn
. . . . . . . . . . . . . . . . . . . . 60 frames redrawn
. . . . . . . . . . . . . . . . . . . . 80 frames redrawn
. . . . . . . . . . . . . . . . . . . . 100 frames redrawn
. . . . . . . . . . . . . . . . . . . . 120 frames redrawn
. . . . . . . . . . . . . . . . . . . . 140 frames redrawn
. . . . . . . . . . . . . . . . . . . . 160 frames redrawn
. . . . . . . . . . . . . . . . . . . . 180 frames redrawn
. . . . . . . . . . . . . . . . . . . . 200 frames redrawn
. . . . . . . . . . . . . . . . . . . . 220 frames redrawn
. . . . . . . . . . . . . . . . . . . . 240 frames redrawn
. . . . . 

KeyError: 'frame245.jpg'

#### Put the individual frames that have been redrawn back into a video

Now that we have all the redrawn frames in individual jpeg images, we string them back together into a video. We are using FFMPEG to do this and we pass in the frame rate and the image files and the image size (that we got from the original video). You will need to update the path below to point to where you installed ffmpeg.

In [12]:
# Cell 11

# Now string all the jpeg files back into a video using ffmpeg
from subprocess import call

cmd_string = "/usr/bin/ffmpeg -y -r " + str(FPS) + " -f image2 -s " + str(vid_width) + "x" + str(vid_height) + " -i ./output-frame%d.jpg  -pix_fmt yuv420p " + str(outputclip)
print("Processing ... ", end = " ")
call(cmd_string,shell=True)

print("done")

Processing ...  done


#### Cleanup

The last cell will delete all the individual frames if you have told it to. If you are debugging your video you may want to keep the input frames around and test your model with individual frames to see what confidence comes back and what objects are detected.

In [13]:
# Cell 12

# Clean up all the intermediate files if the variables are set to do so
if CLEANUP_ORIGINAL_FRAMES:
    print("removing frameX.jpg")
    for f1 in glob.glob("frame*.jpg"):
        os.remove(f1)
    
if CLEANUP_INFERRED_FRAMES:
    print("removing output-frameX.jpg")
    for f2 in glob.glob("output-frame*.jpg"):
        os.remove(f2)

removing frameX.jpg
removing output-frameX.jpg


#### Test new video file

Run the new video file you just created

In [14]:
# Cell 13

import io
import base64
from IPython.display import HTML

HTML("""
<video width="640" height="480" controls>
  <source src="OUTPUT_Japan_vs_Australia_Test.mp4" type="video/mp4">
</video>
""")