## Audit and Improve Video Annotation Quality Using Amazon SageMaker Ground Truth

This notebook walks through how to evaluate the quality of video annotations received from SageMaker Ground Truth annotators using several metrics.

Note: The standard functionality of this notebook will work with the standard Conda Python3/Data Science kernel, however there is an optional section that uses a PyTorch model to generate image embeddings. To run that section, please use a Conda PyTorch Python3 kernel.

Let's start by importing required libraries and initializing session and other variables used in this notebook. By default, the notebook uses the default Amazon S3 bucket in the same AWS Region you use to run this notebook. If you want to use a different S3 bucket, make sure it is in the same AWS Region you use to complete this tutorial, and specify the bucket name for `bucket`. 

In [None]:
%pylab inline
import json 
import os
import sys
import boto3
import sagemaker as sm
import subprocess
from glob import glob
from tqdm import tqdm
from PIL import Image
import datetime
import numpy as np
from matplotlib import patches
from plotting_funcs import *
from scipy.spatial import distance

## Prerequisites

You will create some of the resources you need to launch a Ground Truth audit labeling job in this notebook. You must create the following resources before executing this notebook:

* A work team. A work team is a group of workers that complete labeling tasks. If you want to preview the worker UI and execute the labeling task you will need to create a private work team, add yourself as a worker to this team, and provide the work team ARN below. This [GIF](images/create-workteam-loop.gif) demonstrates how to quickly create a private work team on the Amazon SageMaker console. If you do not want to use a private or vendor work team ARN, set `private_work_team` to `False` to use the Amazon Mechanical Turk workforce. To learn more about private, vendor, and Amazon Mechanical Turk workforces, see [Create and Manage Workforces
](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html).

In [None]:
private_work_team = True # Set it to false if using Amazon Mechanical Turk Workforce

if(private_work_team):
    WORKTEAM_ARN = '<<ADD WORK TEAM ARN HERE>>'
else :
    region = boto3.session.Session().region_name
    WORKTEAM_ARN = f'arn:aws:sagemaker:{region}:394669845002:workteam/public-crowd/default'
print(f'This notebook will use the work team ARN: {WORKTEAM_ARN}')

In [None]:
# Make sure workteam arn is populated if private work team is chosen
assert (WORKTEAM_ARN != '<<ADD WORK TEAM ARN HERE>>')

* The IAM execution role you used to create this notebook instance must have the following permissions:
    * If you do not require granular permissions for your use case, you can attach [AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) to your IAM user or role. If you are running this example in a SageMaker notebook instance, this is the IAM execution role used to create your notebook instance.If you need granular permissions, please see [Assign IAM Permissions to Use Ground Truth](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-security-permission.html#sms-security-permissions-get-started) for granular policy to use Ground Truth.
    * AWS managed policy [AmazonSageMakerGroundTruthExecution](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonSageMakerGroundTruthExecution). Run the following code-block to see your IAM execution role name. This [GIF](images/add-policy-loop.gif) demonstrates how to attach this policy to an IAM role in the IAM console. You can also find instructions in the IAM User Guide: [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).
    * When you create your role, you specify Amazon S3 permissions. Make sure that your IAM role has access to the S3 bucket that you plan to use in this example. If you do not specify an S3 bucket in this notebook, the default bucket in the AWS region you are running this notebook instance will be used. If you do not require granular permissions, you can attach [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) to your role.

In [None]:
role = sm.get_execution_role()
role_name =  role.split('/')[-1]
print('IMPORTANT: Make sure this execution role has the AWS Managed policy AmazonGroundTruthExecution attached.')
print('********************************************************************************')
print('The IAM execution role name:', role_name)
print('The IAM execution role ARN:', role)
print('********************************************************************************')

In [None]:
sagemaker_cl = boto3.client('sagemaker')
# Make sure the bucket is in the same region as this notebook.
bucket = '<< YOUR S3 BUCKET NAME >>'

sm_session = sm.Session()
s3 = boto3.client('s3')

if(bucket=='<< YOUR S3 BUCKET NAME >>'):
    bucket=sm_session.default_bucket()
region = boto3.session.Session().region_name
bucket_region = s3.head_bucket(Bucket=bucket)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']
assert bucket_region == region, f'Your S3 bucket {bucket} and this notebook need to be in the same region.'
print(f'IMPORTANT: make sure the role {role_name} has the access to read and write to this bucket.')
print('********************************************************************************************************')
print(f'This notebook will use the following S3 bucket: {bucket}')
print('********************************************************************************************************')

## Download Data

We are going to use a dataset from the Multi Object Tracking Challenge, a commonly used benchmark for multi object tracking. We are going to download the data. Depending on your connection speed, this can take between 5 and 10 minutes. Then, we will unzip it and upload it to `bucket` in  Amazon S3.

Disclosure regarding the Multiple Object Tracking Benchmark:

Multiple Object Tracking Benchmark is created by Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixe. We have not modified the images or the accompanying annotations. You can obtain the images and the annotations [here](https://motchallenge.net/data/MOT20/). The images and annotations are licensed by the authors under [Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License](https://creativecommons.org/licenses/by-nc-sa/3.0/). The following paper describes Multiple Object Tracking Benchmark in depth: from the data collection and annotation to detailed statistics about the data and evaluation of models trained on it.

MOT20: A benchmark for multi object tracking in crowded scenes.
Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixe [arXiv:2003.09003](https://arxiv.org/abs/2003.09003)


In [None]:
# Grab our data this will take ~5 minutes
!wget https://motchallenge.net/data/MOT20.zip -O /tmp/MOT20.zip


In [None]:
# unzip our data 
!unzip -q /tmp/MOT20.zip -d MOT20
!rm /tmp/MOT20.zip

In [None]:
# send our data to s3 this will take a couple minutes
!aws s3 cp --recursive MOT20/MOT20/train s3://{bucket}/MOT20/train --quiet

## View Images and Labels
The scene we are looking at is of pedestrians walking through a train station. 
Let's grab image paths and plot the first image.

In [None]:
img_paths = glob('MOT20/MOT20/train/MOT20-01/img1/*.jpg')
img_paths.sort()

imgs = []
for imgp in img_paths:
    img = Image.open(imgp)
    imgs.append(img)
    
img

## Load Labels
The MOT20 dataset has labels for each scene, but they are in a single text file. Let's load the labels and organize the them into a frame level dictionary so we can easily plot them.

In [None]:
# grab our labels
labels = []
with open('MOT20/MOT20/train/MOT20-01/gt/gt.txt', 'r') as f:
    for line in f:
        labels.append(line.replace('\n','').split(','))

lab_dict = {}

for i in range(1,len(img_paths)+1):
    lab_dict[i] = []
    
for lab in labels:
    lab_dict[int(lab[0])].append(lab)

## View MOT20 Annotations

Now let's look at what the existing MOT20 annotations look like.

The labels include both bounding box coordinates as well as unique IDs for each object, or in this case pedestrian, being tracked. By plotting two frames below we can see how the pedestrians persist across frames. Since our video has a high number of frames per second, we can look at frame 1 and then frame 31 to see the same scene with approximately one second between frames. To view different labeled frames in the scene, you can adjust the start index, end index, and step values.


In [None]:
start_index = 1
end_index = 32
step = 30

for j in range(start_index, end_index, step): 

    # Create figure and axes
    fig,ax = plt.subplots(1, figsize=(24,12))
    ax.set_title(f'Frame {j}', fontdict={'fontsize':20})

    # Display the image
    ax.imshow(imgs[j])

    for i,annot in enumerate(lab_dict[j]): 
        annot = np.array(annot, dtype=np.float32)
        
        # if class is pedestrian display box
        if annot[6] == 1:
            rect = patches.Rectangle((annot[2], annot[3]), annot[4], annot[5], linewidth=1, edgecolor='r', facecolor='none') 
            ax.add_patch(rect)
            plt.text(annot[2], annot[3]-10, f"pedestrian {int(annot[1])}", bbox=dict(facecolor='white', alpha=0.5)) 


## Evaluate Our Labels

For demonstration purposes we've labeled three pedestrians in one of the videos and inserted a few labeling anomalies into the annotations. While human labelers tend to be very accurate, they can make occasional mistake. Identifying these mistakes and then sending directed recommendations for frames and objects to fix them makes the label auditing process more efficient. If a labeler only has to focus on a few frames instead of a deep review of the entire scene, it can drastically improve speed and reduce cost. 

We have provided a JSON file containing intentionally flawed labels. For a typical Ground Truth video frame labeling job, you would find this file in Amazon S3 in the output location you specified when creating your labeling job. This label file is organized as a sequential list of labels, with each entry in the list consisting of the labels for one frame. Let's look at the labels for the first frame, where we can see the annotator has identified two pedestrians.

For more information about Ground Truth's output data format, see the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html).

In [None]:
# load  labels
lab_path = 'SeqLabel.json'
with open(lab_path, 'r') as f:
    flawed_labels = json.load(f) 
    
img_paths = glob('MOT20/MOT20/train/MOT20-01/img1/*.jpg')
img_paths.sort()

# Because the scene is 432 frames we chose to label every other frame
# Let's grab every other image to match our labels
imgs = []
for imgp in img_paths[::2]:
    img = Image.open(imgp)
    imgs.append(img)

flawed_labels['tracking-annotations'][0]

## View Our Annotations

We annotated 3 pedestrians, one of which enters the scene at frame 16. Let's view the scene starting at frame 16 to see all of our labeled pedestrians.

In [None]:
# let's view our tracking labels 
start_index = 16
end_index = 20
step = 3

for j in range(start_index, end_index, step): 

    # Create figure and axes
    fig,ax = plt.subplots(1, figsize=(24,12))
    ax.set_title(f'Frame {j}')

    # Display the image
    ax.imshow(np.array(imgs[j]))

    for i,annot in enumerate(flawed_labels['tracking-annotations'][j]['annotations']):
        rect = patches.Rectangle((annot['left'], annot['top']), annot['width'], annot['height'], linewidth=1, edgecolor='r', facecolor='none') 
        ax.add_patch(rect)
        plt.text(annot['left']-5, annot['top']-10, f"{annot['object-name']}", bbox=dict(facecolor='white', alpha=0.5)) 

## Analyze Our Tracking Data

Let's put our tracking data into a form that's easier to analyze.

The below function turns our tracking output into a dataframe. We can use this dataframe to plot values and compute metrics that will help us understand how the the object labels move through our frames.

In [None]:
# generate dataframes
label_frame = create_annot_frame(flawed_labels['tracking-annotations'])
label_frame.head()

## View Label Progression Plots

Let's start with some simple plots. The below plots illustrate how the coordinates of a given object progress through the frames of your video. Each bounding box has a left and top coordinate, representing the top-left point of the bounding box. We additionally have height and width values that let us determine the other 3 points of the box.

In the below plots, the blue lines represent the progression of our 4 values (top coordinate, left coordinate, width, and height) through the video frames and the orange lines represent a rolling average of these values. Since a video is a sequence of frames, if we have a video that has 5 frames per second or more, the objects within the video, and therefore the bounding boxes drawn around them, should have some amount of overlap between frames. In our video we have pedestrians walking at a normal pace so our plots should show a relatively smooth progression.

We can also plot the deviation between the rolling average and the actual values of bounding box coordinates. We will likely want to look at frames where the actual value deviates substantially from the rolling average.

In [None]:
# plot out progression of different metrics 

plot_timeseries(label_frame, obj='Pedestrian:1', roll_len=5)
plot_deviations(label_frame, obj='Pedestrian:1', roll_len=5)

## Plot Box Sizes

Let's combine the width and height values to look at how the size of the bounding box for a given object progresses through the scene. For Pedestrian 2 we intentionally reduced the size of the box on frame 25 and restored it on frame 26. We can see this reflected in our size progression plots.


In [None]:
def plot_size_prog(annot_frame, obj='P:0', roll_len = 5):
    fig, ax = plt.subplots(nrows=1,ncols=1, figsize=(17,10))
    ann_subframe = annot_frame[annot_frame.obj==obj]
    ann_subframe.index = list(np.arange(len(ann_subframe)))
    size_vec = ann_subframe['height']*ann_subframe['width']
    ax.plot(size_vec)
    ax.plot(size_vec.rolling(roll_len).mean())
    ax.title.set_text(f'{obj} Size progression')
    ax.set_xlabel('Frame Number')
    ax.set_ylabel('Box size')
    
plot_size_prog(label_frame, obj='Pedestrian:1')
plot_size_prog(label_frame, obj='Pedestrian:2')


## View Box Size Differential

Let's now look at how the size of the box changes from frame to frame by plotting the actual size differential. This allows us to get a better idea of the magnitude of these changes. We can also normalize the magnitude of the size changes by dividing the size differentials by the sizes of the boxes. This expresses the differential as a percentage change from the original size of the box. This makes it easier to set thresholds beyond which we can classify this frame as potentially problematic for this object bounding box. The below plots visualize both the absolute size differential and the size differential as a percentage. We can also add lines representing where the bounding box changed by more than 20% in size from one frame to the next.


In [None]:
# look at rolling size differential, try changing the object

def plot_size_diff(lab_frame, obj='Pedestrian:1', hline=.5):
    ann_subframe = lab_frame[lab_frame.obj==obj]
    size_vec = ann_subframe['height']*ann_subframe['width']
    size_diff = np.array(size_vec[:-1])- np.array(size_vec[1:])
    norm_size_diff = size_diff/np.array(size_vec[:-1])
    fig, ax = plt.subplots(ncols=1, nrows=2, figsize=(24,16))
    ax[0].plot(size_diff)
    ax[0].set_title('Absolute size differential')
    ax[1].plot(norm_size_diff)
    ax[1].set_title('Normalized size differential')
    ax[1].hlines(-hline,0,len(size_diff))
    ax[1].hlines(hline,0,len(size_diff))

plot_size_diff(label_frame, obj='Pedestrian:2', hline=.2)

If we normalize our size differential, we can use a threshold to identify which frames we want to flag for review. The above plot shows if we set a threshold of 20% change from the previous box size; it looks like we have a few frames that exceed that threshold.


In [None]:

def find_prob_frames(lab_frame, obj='Pedestrian:2', thresh = .25):
    ann_subframe = lab_frame[lab_frame.obj==obj]
    size_vec = ann_subframe['height']*ann_subframe['width']
    size_diff = np.array(size_vec[:-1])- np.array(size_vec[1:])
    norm_size_diff = size_diff/np.array(size_vec[:-1])
    problem_frames = np.where(np.abs(norm_size_diff)>thresh)[0]
    worst_frame = np.argmax(np.abs(norm_size_diff))
    return problem_frames, worst_frame

obj = 'Pedestrian:2'
problem_frames, worst_frame = find_prob_frames(label_frame, obj=obj,  thresh = .2)
print(f'Worst frame for {obj} is: {worst_frame}')
print('Problem frames for', obj, ':',problem_frames.tolist())

## View the frames with the largest size differential

Now that we have the indices for the frames with the largest size differential, we can view them in sequence. If we look at the frames below, we can see for Pedestrian 2 we were able to identify frames where our labeler made a mistake. Frame 23 and 25 were flagged because there was a large difference between frame 23 and the subsequent frame, frame 24. Frame 25 was flagged for a similar reason, since the mistake was corrected on frame 26.

In [None]:
start_index = worst_frame-2

# let's view our tracking labels 
for j in range(start_index, start_index+3): 
    
    # Create figure and axes
    fig,ax = plt.subplots(1, figsize=(24,12))
    ax.set_title(f'Frame {j}')

    # Display the image
    ax.imshow(imgs[j])

    for i,annot in enumerate(flawed_labels['tracking-annotations'][j]['annotations']):
        rect = patches.Rectangle((annot['left'], annot['top']), annot['width'], annot['height'] ,linewidth=1,edgecolor='r',facecolor='none') # 50,100),40,30
        ax.add_patch(rect)
        plt.text(annot['left']-5, annot['top']-10, f"{annot['object-name']}", bbox=dict(facecolor='white', alpha=0.5)) # 
    
    plt.show()

## Rolling IoU

IoU or Intersection over Union is a commonly used evaluation metric for object detection. It's calculated by dividing the area of overlap between two bounding boxes by the area of union for two bounding boxes. While it's typically used to evaluate the accuracy of a predicted box against a ground truth box, we can use it to evaulate how much overlap a given bounding box has from one frame of a video to the next. 
 
Since there are differences from one frame to the next, we would not expect a given bounding box for a single object to have 100% overlap with the corresponding bounding box from the next frame. However, depending on the frames per second for the video, there often is only a small amount of change in one frame to the next since the time elapsed between frames is only a fraction of a second. For higher fps video, we would expect a substantial amount of overlap between frames. The MOT20 videos are all shot at 25 fps, so these videos qualify. Operating with this assumption, we can use IoU to identify outlier frames where we see substantial differences between a bounding box in one frame to the next.


In [None]:
# calculate rolling intersection over union

def calc_frame_int_over_union(annot_frame, obj, i):
    annot_frame = annot_frame[annot_frame.obj==obj]
    annot_frame.index = list(np.arange(len(annot_frame)))
    boxA = [annot_frame.left[i], annot_frame.top[i], annot_frame.left[i] + annot_frame.width[i], annot_frame.top[i] + annot_frame.height[i]]
    boxB = [annot_frame.left[i+1], annot_frame.top[i+1], annot_frame.left[i+1] + annot_frame.width[i+1], annot_frame.top[i+1] + annot_frame.height[i+1]]
    return bb_int_over_union(boxA, boxB)

# create list of objects
objs = list(np.unique(label_frame.obj))

# iterate through our objects to get rolling IoU values for each
iou_dict = {}
for obj in objs:
    ious = []
    for i in range(len(label_frame[label_frame.obj==obj])-1):
        iou = calc_frame_int_over_union(label_frame, obj, i)
        ious.append(iou)
    iou_dict[obj] = ious
    
fig, ax = plt.subplots(nrows=1,ncols=3, figsize=(24,8), sharey=True)
ax[0].set_title(f'Rolling IoU {objs[0]}')
ax[0].set_xlabel('frames')
ax[0].set_ylabel('IoU')
ax[0].plot(iou_dict[objs[0]])
ax[1].set_title(f'Rolling IoU {objs[1]}')
ax[1].set_xlabel('frames')
ax[1].set_ylabel('IoU')
ax[1].plot(iou_dict[objs[1]])
ax[2].set_title(f'Rolling IoU {objs[2]}')
ax[2].set_xlabel('frames')
ax[2].set_ylabel('IoU')
ax[2].plot(iou_dict[objs[2]])

## Identify low overlap frames

Now that we have calculated our intersection over union for our objects, we can identify objects below an IoU threshold we set. Let's say we want to identify frames where the bounding box for a given object has less than 50% overlap. 

In [None]:
## ID problem indices
iou_thresh = 0.5

# use np.where to identify frames below our threshold.
inds = np.where(np.array(iou_dict[objs[0]]) < iou_thresh)[0]
worst_ind = np.argmin(np.array(iou_dict[objs[0]]))

print(objs[0],'worst frame:', worst_ind)

## Visualize low overlap frames

Now that we have identified our low overlap frames, let's view them. We can see for Pedestrian:1, there is an issue on frame 27. The annotator made a mistake and the bounding box for Pedestrian:1 is shifted to the left and clearly not in the correct position. Thankfully our IoU metric was able to identify this!

In [None]:
start_index = worst_ind-1

# let's view our tracking labels 
for j in range(start_index, start_index+3): 

    # Create figure and axes
    fig,ax = plt.subplots(1, figsize=(24,12))
    ax.set_title(f'Frame {j}')

    # Display the image
    ax.imshow(imgs[j])

    for i,annot in enumerate(flawed_labels['tracking-annotations'][j]['annotations']):
        rect = patches.Rectangle((annot['left'], annot['top']), annot['width'], annot['height'] ,linewidth=1,edgecolor='r',facecolor='none') # 50,100),40,30
        ax.add_patch(rect)
        plt.text(annot['left']-5, annot['top']-10, f"{annot['object-name']}", bbox=dict(facecolor='white', alpha=0.5)) # 
    plt.show()

## Embedding Comparison (Optional)

The above two methods work because they are simple and are based on the reasonable assumption that objects in high FPS video won't move too much from frame to frame. They can be considered more classical methods of comparison. Can we improve upon them? Let's try something more experimental. A deep learning method we can utilize to identify outliers is to generate embeddings for our bounding box crops with an image classification model like ResNet and compare these across frames. 

Convolutional neural network image classification models have a final fully connected layer using a softmax or some kind of scaling activation function that outputs probabilities. If we remove the final layer of our network our "predictions" will instead be the image embedding that is essentially the neural network's representation of the image. If we isolate our objects by cropping our images, we can compare the representations of these objects across frames to see if we can identify any outliers.

Let's start by importing a model from Torchhub. We can use a ResNet18 model that was trained on ImageNet. Because ImageNet is a very large and generic dataset, over time the network has learned how to classify images into different categories. While a neural network more finely tuned on pedestrians would likely perform better, a network trained on a large dataset like ImageNet should have learned enough information to give us some indication if images are similar.

Note: As mentioned at the beginning of the notebook, if you wish to run this section you'll need to use a PyTorch kernel.

In [None]:
import torch
import torch.nn as nn
import torchvision.models as models
import cv2
from torch.autograd import Variable
from scipy.spatial import distance

# download our model from torchhub
model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True)
model.eval()

# in order to get embeddings instead of a classification from a model we import, we need to remove the top layer of the network
modules=list(model.children())[:-1] 
model=nn.Sequential(*modules)

# Generate Embeddings

Let's use our headless model to generate image embeddings for our object crops. The below code iterates through our images, generates crops of our labeled objects, resizes them to 224x224x3 to work with our headless model, and then predicts the image crop embedding.

In [None]:
img_crops = {}
img_embeds = {}

for j,img in tqdm(enumerate(imgs[:32])):
    img_arr = np.array(img)
    img_embeds[j] = {}
    img_crops[j] = {}
    for i,annot in enumerate(flawed_labels['tracking-annotations'][j]['annotations']):

        # crop our image using our annotation coordinates
        crop = img_arr[annot['top']:(annot['top'] + annot['height']), annot['left']:(annot['left'] + annot['width']), :]
        
        # resize image crops to work with our model which takes in 224x224x3 sized inputs
        new_crop = np.array(Image.fromarray(crop).resize((224,224))) 
        img_crops[j][annot['object-name']] = new_crop
        
        # reshape array so that it follows (batch dimension, color channels, image dimension, image dimension)
        new_crop = np.reshape(new_crop, (1,224,224,3)) 
        new_crop = np.reshape(new_crop, (1,3,224,224))

        torch_arr = torch.tensor(new_crop, dtype=torch.float)
        
        # return image crop embedding from headless model
        with torch.no_grad():
            embed = model(torch_arr) 
        
        img_embeds[j][annot['object-name']] = embed.squeeze()


## View Our Image Crops

To generate our image crops, we are using the dimensions of our bounding box labels and then resizing the cropped images. Let's take a look at a few of them in sequence.

In [None]:
def plot_crops(obj = 'Pedestrian:1', start=0):
    fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(20,12))
    for i,a in enumerate(ax):
        a.imshow(img_crops[i+start][obj])
        a.set_title(f'Frame {i+start}')

plot_crops(start=1)

## Compute Distance

Now that we have our image embeddings, we need to compare them! Let's compute the distance between sequential embeddings for a given object.

In [None]:
def compute_dist(img_embeds, dist_func=distance.euclidean, obj='Pedestrian:1'):
    dists = []
    for i in img_embeds:
        if (i>0)&(obj in list(img_embeds[i].keys())):
            if (obj in list(img_embeds[i-1].keys())):
                dist = dist_func(img_embeds[i-1][obj],img_embeds[i][obj]) # distance  between frame at t0 and t1
                dists.append(dist)
    return dists

dists = compute_dist(img_embeds, obj='Pedestrian:1')
    
# look for distances that are 1 standard deviation greater than the mean distance
prob_frames = np.where(dists>(np.mean(dists)+np.std(dists)))[0]
print(prob_frames)
print('The frame with the greatest distance is frame:', np.argmax(dists))

## View Outlier Frames

Let's look at the crops for our outlier frames. We can see we were able to catch the issue on frame 27 where the bounding box was off-center. 

While this method is fun to play with, it's substantially more computationally expensive than the more generic methods and is not guaranteed to improve our accuracy. Using such a generic model will inevitably give us false positives. Feel free to try a model finetuned on pedestrians, which would likely yield better results!

In [None]:
def plot_crops(obj = 'Pedestrian:1', start=0):
    fig, ax = plt.subplots(nrows=1, ncols=5, figsize=(20,12))
    for i,a in enumerate(ax):
        a.imshow(img_crops[i+start][obj])
        a.set_title(f'Frame {i+start}')

plot_crops(start=np.argmax(dists))

## Combining the Metrics

Now that we have explored several methods for identifying anomalous and potentially problematic frames, let's combine them and identify all of those outlier frames. While we might have a few false positives, these tend to be areas with a lot of action that we might want our annotators to review regardless.

In [None]:
def get_problem_frames(lab_frame, size_thresh=.25, iou_thresh=.4, embed=False, imgs=None, verbose=False):
    """
    Function for identifying outlier frames using sequential size differential, rolling IoU, and optionally image crop embedding comparison
    """
    if embed:
        model = torch.hub.load('pytorch/vision:v0.6.0', 'resnet18', pretrained=True) 
        model.eval()
        modules=list(model.children())[:-1]
        model=nn.Sequential(*modules)
        
    frame_res = {}
    for obj in list(np.unique(lab_frame.obj)): 
        frame_res[obj] = {} 
        # get index for 
        lframe_len = max(lab_frame['frameid'])
        ann_subframe = lab_frame[lab_frame.obj==obj]
        fframe = min(ann_subframe['frameid'])
        lframe = max(ann_subframe['frameid'])
        size_vec = np.zeros(lframe_len+1)
        size_vec[fframe:lframe+1] = ann_subframe['height']*ann_subframe['width']
        size_diff = np.array(size_vec[:-1])- np.array(size_vec[1:])
        norm_size_diff = size_diff/np.array(size_vec[:-1])
        norm_size_diff[np.where(np.isnan(norm_size_diff))[0]] = 0
        norm_size_diff[np.where(np.isinf(norm_size_diff))[0]] = 0
        frame_res[obj]['size_diff'] = [int(x) for x in size_diff]
        frame_res[obj]['norm_size_diff'] = [int(x) for x in norm_size_diff]
        try:
            problem_frames = [int(x) for x in np.where(np.abs(norm_size_diff)>size_thresh)[0]]
            if verbose:
                worst_frame = np.argmax(np.abs(norm_size_diff)) 
                print('Worst frame for',obj,'in',frame, 'is: ',worst_frame)
        except:
            problem_frames = []
        frame_res[obj]['size_problem_frames'] = problem_frames
        
        ious = []
        for i in range(len(lab_frame[lab_frame.obj==obj])-1):
            iou = calc_frame_int_over_union(lab_frame, obj, i)
            ious.append(iou)
        frame_res[obj]['iou'] = ious
        inds = [int(x) for x in np.where(np.array(ious)<iou_thresh)[0]]
        frame_res[obj]['iou_problem_frames'] = inds
        
        if embed:
                
            img_crops = {}
            img_embeds = {}

            for j,img in tqdm(enumerate(imgs)):
                img_arr = np.array(img)
                img_embeds[j] = {}
                img_crops[j] = {}
                # need to change this to use dataframe 
                for i,annot in enumerate(tlabels['tracking-annotations'][j]['annotations']):
                    try:
                        crop = img_arr[annot['top']:(annot['top']+annot['height']),annot['left']:(annot['left']+annot['width']),:]                    
                        new_crop = np.array(Image.fromarray(crop).resize((224,224)))
                        img_crops[j][annot['object-name']] = new_crop
                        new_crop = np.reshape(new_crop, (1,224,224,3))
                        new_crop = np.reshape(new_crop, (1,3,224,224))
                        torch_arr = torch.tensor(new_crop, dtype=torch.float)
                        with torch.no_grad():
                            emb = model(torch_arr)
                        img_embeds[j][annot['object-name']] = emb.squeeze()
                    except:
                        pass
                    
            dists = compute_dist(img_embeds, obj=obj)

            # look for distances that are 2+ standard deviations greater than the mean distance
            prob_frames = np.where(dists>(np.mean(dists)+np.std(dists)*2))[0]
            frame_res[obj]['embed_prob_frames'] = prob_frames
        
    return frame_res
    
# if you want to add in embedding comparison, set embed=True
num_images_to_validate = 32
frame_res = get_problem_frames(label_frame, size_thresh=.25, iou_thresh=.5, embed=False, imgs=imgs[:num_images_to_validate])
        
prob_frame_dict = {}

all_prob_frames = []
for obj in frame_res:
    prob_frames = list(frame_res[obj]['size_problem_frames'])
    prob_frames.extend(list(frame_res[obj]['iou_problem_frames']))
    all_prob_frames.extend(prob_frames)
prob_frame_dict = [int(x) for x in np.unique(all_prob_frames)]

prob_frame_dict

# Command Line Interface

For use outside of a notebook, we can use the below command line interface

In [None]:
# Usage for the CLI is like this

# !{sys.executable} quality_metrics_cli.py run-quality-check --bucket mybucket \
# --lab_path job_results/bag-track-mot20-test-tracking/annotations/consolidated-annotation/output/0/SeqLabel.json \
# --save_path example_quality_output/bag-track-mot20-test-tracking.json

#To get the help text
!{sys.executable} quality_metrics_cli.py run-quality-check --help

## Launch a Directed Audit Job

Let's take a look at how we would create a Ground Truth [video frame tracking adjustment job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-object-tracking.html). Ground Truth provides a worker UI and infastructure to streamline the process of creating this type of labeling job. All we have to do is specify the worker instructions, labels, and our input data. 


Now that we've identified our problematic annotations, we can launch a new audit labeling job. We can do this in SageMaker using the console, however, when we want to launch jobs in a more automated fashion, using the boto3 API is very helpful.

When creating a new labeling job, we first need to create our label categories so Ground Truth knows what labels to display for our workers. In this file we also specify the labeling instructions. We can use the outlier frames identified above to give directed instructions to our workers. This way they can spend less time reviewing the entire scene and focus more on potential problems.

In [None]:
# create label categories 

os.makedirs('tracking_manifests', exist_ok=True)

labelcats = {
    "document-version": "2020-08-15",
    "auditLabelAttributeName": "Person",
    "labels": [
        {
            "label": "Parcel",
            "attributes": [
                {
                    "name": "color",
                    "type": "string",
                    "enum": [
                        "Bag",
                        "Jacket",
                        "Backpack"
                    ]
                }
            ]
        },
        {
            "label": "Pedestrian",
        },
        {
            "label": "Other",
        },


    ],
    "instructions": {
        "shortInstruction": f"Please draw boxes around pedestrians, with a specific focus on the following frames {prob_frame_dict}",
        "fullInstruction": f"Please draw boxes around pedestrians, with a specific focus on the following frames {prob_frame_dict}"
    }
}

filename = 'tracking_manifests/label_categories.json'
with open(filename,'w') as f:
    json.dump(labelcats,f)

s3.upload_file(Filename=filename, Bucket=bucket, Key='tracking_manifests/label_categories.json')

LABEL_CATEGORIES_S3_URI = f's3://{bucket}/tracking_manifests/label_categories.json'

## Generate manifests

SageMaker Ground Truth operates using [manifests](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html). When using a modality like image classification, a single image corresponds to a single entry in a manifest and a given manifest will directly contain paths for all of the images to be labeled. For videos, because we have multiple frames per video and we can have [multiple videos in a single manifest](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-manual-data-setup.html), it is organized instead by using a JSON sequence file for each video that contains the paths to our frames in S3. This allows a single manifest to contain multiple videos for a single job.

In this case our image files are all split out, so we can just grab filepaths. If your data is in the form of video files, you can use the Ground Truth console to split videos into video frames. To learn more, see [Automated Video Frame Input Data Setup](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-video-automated-data-setup.html). Other tools like [ffmpeg](https://ffmpeg.org/) can also be used for splitting video files into individual image frames. The below block is simply storing our filepaths in a dictionary.


In [None]:
# get our target MP4 files, 
vids = glob('MOT20/MOT20/train/*') 
vids.sort()

# we assume we have folders with the same name as the mp4 file in the same root folder
vid_dict = {}
for vid in vids:
    files = glob(f"{vid}/img1/*jpg")
    files.sort()
    files = files[:300:2] # skipping every other frame 
    fileset = []
    for fil in files:
        fileset.append('/'.join(fil.split('/')[5:]))
    vid_dict[vid] = fileset

Now that we have our image paths, we want to iterate through our frames and create a list of entries for each in our sequence file.

In [None]:
# generate sequences 
all_vids = {}
for vid in vid_dict:
    frames = []
    for i,v in enumerate(vid_dict[vid]):
        frame =         {
          "frame-no": i+1,
          "frame": f"{v.split('/')[-1]}",
          "unix-timestamp": int(time.time())
        }
        frames.append(frame)
    all_vids[vid] = {
      "version": "2020-07-01",
      "seq-no": np.random.randint(1,1000),
      "prefix": f"s3://{bucket}/{'/'.join(vid.split('/')[1:])}/img1/", 
      "number-of-frames": len(vid_dict[vid]),
      "frames": frames
    }
    
# save sequences
for vid in all_vids:
    with open(f"tracking_manifests/{vid.split('/')[-1]}_seq.json", 'w') as f:
        json.dump(all_vids[vid],f)
        
!cp SeqLabel.json tracking_manifests/SeqLabel.json              

Once we have our sequence file, we can create our manifest file. If we were creating a new job with no existing labels, we could simply pass in a path to our sequence file. Since we already have labels and instead want to launch an adjustment job, we need to point to the location of those labels in S3 and provide metadata for those labels in our manifest.

In [None]:
# create manifest 
manifest_dict = {} 
for vid in all_vids:
    source_ref = f"s3://{bucket}/tracking_manifests/{vid.split('/')[-1]}_seq.json"
    annot_labels = f"s3://{bucket}/tracking_manifests/SeqLabel.json"

    manifest = {
        "source-ref": source_ref,
        'Person': annot_labels, 
        "Person-metadata":{"class-map": {"1": "Pedestrian"}, 
                         "human-annotated": "yes", 
                         "creation-date": "2020-05-25T12:53:54+0000", 
                         "type": "groundtruth/video-object-tracking"}
    }
    manifest_dict[vid] = manifest
    
# save videos as individual jobs
for vid in all_vids:
    with open(f"tracking_manifests/{vid.split('/')[-1]}.manifest", 'w') as f:
        json.dump(manifest_dict[vid],f)
        
print('Example manifest: ', manifest)

In [None]:
# send data to s3
!aws s3 cp --recursive tracking_manifests s3://{bucket}/tracking_manifests/

## Launch Jobs (Optional)

Now that we have created our manifests we are ready to launch our adjustment labeling job. We can use this template for launching labeling jobs via [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html). In order to access the labeling job, make sure you followed the above steps to create a private work team.


In [None]:
# generate jobs 

job_names = []
outputs = []

arn_region_map = {'us-west-2': '081040173940',
                  'us-east-1': '432418664414',
                  'us-east-2': '266458841044',
                  'eu-west-1': '568282634449',
                  'eu-west-2': '487402164563',
                  'ap-northeast-1': '477331159723',
                  'ap-northeast-2': '845288260483',
                  'ca-central-1': '918755190332',
                  'eu-central-1': '203001061592',
                  'ap-south-1': '565803892007',
                  'ap-southeast-1': '377565633583',
                  'ap-southeast-2': '454466003867'
                 }

region_account = arn_region_map[region]

LABELING_JOB_NAME = f"mot20-test-tracking-adjust-{int(time.time())}"
task = 'AdjustmentVideoObjectTracking'
job_names.append(LABELING_JOB_NAME)
INPUT_MANIFEST_S3_URI = f's3://{bucket}/tracking_manifests/MOT20-01.manifest'

human_task_config = {
    "PreHumanTaskLambdaArn": f"arn:aws:lambda:{region}:{region_account}:function:PRE-{task}",
    "MaxConcurrentTaskCount": 200, # Maximum of 200 objects will be available to the workteam at any time
    "NumberOfHumanWorkersPerDataObject": 1, # We will obtain and consolidate 1 human annotationsfor each frame.
    "TaskAvailabilityLifetimeInSeconds": 864000, # Your workteam has 24 hours to complete all pending tasks.
    "TaskDescription": f"Please draw boxes around pedestrians, with a specific focus on the following frames {prob_frame_dict}",
    # If using public workforce, specify "PublicWorkforceTaskPrice"
    "WorkteamArn": WORKTEAM_ARN,
    "AnnotationConsolidationConfig": {
      "AnnotationConsolidationLambdaArn": f"arn:aws:lambda:{region}:{region_account}:function:ACS-{task}"
    },
    "TaskKeywords": [
      "Image Classification",
      "Labeling"
    ],
    "TaskTimeLimitInSeconds": 7200,
    "TaskTitle": LABELING_JOB_NAME,
    "UiConfig": {
      "HumanTaskUiArn": f'arn:aws:sagemaker:{region}:394669845002:human-task-ui/VideoObjectTracking'
    }
}

#if you are using the Amazon Mechanical Turk workforce, specify the amount you want to pay a
#worker to label a data object. See https://aws.amazon.com/sagemaker/groundtruth/pricing/ for recommendations.
if (not private_work_team):
    human_task_config["PublicWorkforceTaskPrice"] = {
        "AmountInUsd": {
           "Dollars": 0,
           "Cents": 3,
           "TenthFractionsOfACent": 6,
        }
    }
    human_task_config["WorkteamArn"] = WORKTEAM_ARN
else:
    human_task_config["WorkteamArn"] = WORKTEAM_ARN

createLabelingJob_request = {
  "LabelingJobName": LABELING_JOB_NAME,
  "HumanTaskConfig": human_task_config,
  "InputConfig": {
    "DataAttributes": {
      "ContentClassifiers": [
        "FreeOfPersonallyIdentifiableInformation",
        "FreeOfAdultContent"
      ]
    },
    "DataSource": {
      "S3DataSource": {
        "ManifestS3Uri": INPUT_MANIFEST_S3_URI
      }
    }
  },
  "LabelAttributeName": "Person-ref",
  "LabelCategoryConfigS3Uri": LABEL_CATEGORIES_S3_URI,
  "OutputConfig": {
    "S3OutputPath": f"s3://{bucket}/gt_job_results"
  },
  "RoleArn": role,
  "StoppingConditions": {
    "MaxPercentageOfInputDatasetLabeled": 100
  }
}
print(createLabelingJob_request)
out = sagemaker_cl.create_labeling_job(**createLabelingJob_request)
outputs.append(out)
print(out)

## Conclusion

In this notebook, we introduced how to measure the quality of annotations using statistical analysis and various quality metrics like IoU, Rolling IoU and Embedding Comparisons. In addition, we walked through how to flag frames which may not be labeled properly using these quality metrics and send those frames for verification/audit jobs using SageMaker Ground truth. 

Using this approach, quality checks can be performed on the annotations in automated manner at scale which reduces the number of frames humans need to verify or audit. Please try the notebook with your own data and add your own quality metrics for different task types supported by SageMaker Ground Truth. With this process in place, you can generate high quality datasets for a wide range of business use cases in a cost-effective manner without compromising the quality of annotations.

## Cleanup

We can use the below command to stop our labeling job

In [None]:
# cleanup
sagemaker_cl.stop_labeling_job(LABELING_JOB_NAME)