<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Labelled-Data" data-toc-modified-id="Labelled-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Labelled Data</a></span></li><li><span><a href="#Building-Up-the-Query" data-toc-modified-id="Building-Up-the-Query-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Building Up the Query</a></span><ul class="toc-item"><li><span><a href="#Shots-with-faces" data-toc-modified-id="Shots-with-faces-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Shots with faces</a></span></li><li><span><a href="#Shots-with-face-on-alternate-side-of-the-screen" data-toc-modified-id="Shots-with-face-on-alternate-side-of-the-screen-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Shots with face on alternate side of the screen</a></span></li><li><span><a href="#Face-Probability-Threshold" data-toc-modified-id="Face-Probability-Threshold-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Face Probability Threshold</a></span></li><li><span><a href="#3-Shot-Sequence" data-toc-modified-id="3-Shot-Sequence-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>3-Shot Sequence</a></span></li><li><span><a href="#Identity-Labels" data-toc-modified-id="Identity-Labels-3.5"><span class="toc-item-num">3.5&nbsp;&nbsp;</span>Identity Labels</a></span></li><li><span><a href="#Identity-Labels-with-Spatial-Constaint" data-toc-modified-id="Identity-Labels-with-Spatial-Constaint-3.6"><span class="toc-item-num">3.6&nbsp;&nbsp;</span>Identity Labels with Spatial Constaint</a></span></li><li><span><a href="#Without-Identity-Labels" data-toc-modified-id="Without-Identity-Labels-3.7"><span class="toc-item-num">3.7&nbsp;&nbsp;</span>Without Identity Labels</a></span></li></ul></li><li><span><a href="#Self-contained-Functions-for-Queries-in-this-Document" data-toc-modified-id="Self-contained-Functions-for-Queries-in-this-Document-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Self-contained Functions for Queries in this Document</a></span><ul class="toc-item"><li><span><a href="#Shots-with-faces" data-toc-modified-id="Shots-with-faces-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Shots with faces</a></span></li><li><span><a href="#Shot-sequences-with-faces-in-alternating-regions" data-toc-modified-id="Shot-sequences-with-faces-in-alternating-regions-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Shot sequences with faces in alternating regions</a></span></li><li><span><a href="#Shot/Reverse-shot-with-faces-above-certain-probability" data-toc-modified-id="Shot/Reverse-shot-with-faces-above-certain-probability-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Shot/Reverse shot with faces above certain probability</a></span></li><li><span><a href="#Shot/Reverse-shot-sequence-with-at-least-3-shots" data-toc-modified-id="Shot/Reverse-shot-sequence-with-at-least-3-shots-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Shot/Reverse shot sequence with at least 3 shots</a></span></li><li><span><a href="#Shot/Reverse-shot-sequence-with-consistent-identities" data-toc-modified-id="Shot/Reverse-shot-sequence-with-consistent-identities-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Shot/Reverse shot sequence with consistent identities</a></span></li><li><span><a href="#Shot/Reverse-shot-sequence-with-consistent-identities-in-alternating-regions" data-toc-modified-id="Shot/Reverse-shot-sequence-with-consistent-identities-in-alternating-regions-4.6"><span class="toc-item-num">4.6&nbsp;&nbsp;</span>Shot/Reverse shot sequence with consistent identities in alternating regions</a></span></li><li><span><a href="#Shot/Reverse-shot-sequence-with-consistent-face-bounding-boxes" data-toc-modified-id="Shot/Reverse-shot-sequence-with-consistent-face-bounding-boxes-4.7"><span class="toc-item-num">4.7&nbsp;&nbsp;</span>Shot/Reverse shot sequence with consistent face bounding boxes</a></span></li></ul></li><li><span><a href="#Scratchpad" data-toc-modified-id="Scratchpad-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Scratchpad</a></span></li></ul></div>

# Introduction

This notebook describes how we gradually build up a increasing complex rekall query for shot/reverse shot sequences. 

Section 4 provides self-contained functions for various queries in this notebook. Section 1-3 needs to be run in order to work.

The hidden cell below sets up the imports and helper functions.

In [9]:
from esper.prelude import *
from query.models import *
from rekall.video_interval_collection import VideoIntervalCollection
from rekall.interval_list import Interval, IntervalList
from rekall.parsers import in_array, bbox_payload_parser, merge_dict_parsers, dict_payload_parser
from rekall.payload_predicates import payload_satisfies
from rekall.list_predicates import length_at_most
from rekall.logical_predicates import and_pred, or_pred
from rekall.spatial_predicates import scene_graph, make_region, _region_contains_bbox
from rekall.temporal_predicates import before, after, overlaps, equal
from rekall.merge_ops import payload_second, payload_plus, payload_first
from rekall.bbox_predicates import height_at_least
from esper.rekall import intrvllists_to_result, add_intrvllists_to_result, intrvllists_to_result_with_objects, bbox_to_result_object,intrvllists_to_result_bbox

CINEMATIC_SHOTS_LABELLER = 64
MAX_FRAME = 72000
VIDEO_ID = 216

# Wrap the payload in a list.
def wrap_list(intvl):
    intvl.payload = [intvl.payload]
    return intvl

# Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
# faces of a frame in the shot.
def get_shots_with_face(shots, face_frames):
    return shots.merge(
    face_frames, predicate=overlaps(), payload_merge_op=payload_second
    ).map(wrap_list).coalesce(payload_merge_op=payload_plus)

# Returns precision, recall, precision_per_item, recall_per_item
def compute_statistics(query_intrvllists, ground_truth_intrvllists):
    total_query_time = 0
    total_query_segments = 0
    total_ground_truth_time = 0
    total_ground_truth_segments = 0
    
    for video in query_intrvllists:
        total_query_time += query_intrvllists[video].coalesce().get_total_time()
        total_query_segments += query_intrvllists[video].size()
    for video in ground_truth_intrvllists:
        total_ground_truth_time += ground_truth_intrvllists[video].coalesce().get_total_time()
        total_ground_truth_segments += ground_truth_intrvllists[video].size()
        
    total_overlap_time = 0
    overlapping_query_segments = 0
    overlapping_ground_truth_segments = 0
    
    for video in query_intrvllists:
        if video in ground_truth_intrvllists:
            query_list = query_intrvllists[video]
            gt_list = ground_truth_intrvllists[video]
            
            total_overlap_time += query_list.overlaps(gt_list).coalesce().get_total_time()
            overlapping_query_segments += query_list.filter_against(gt_list, predicate=overlaps()).size()
            overlapping_ground_truth_segments += gt_list.filter_against(query_list, predicate=overlaps()).size()
    
    if total_query_time == 0:
        precision = 1.0
        precision_per_item = 1.0
    else:
        precision = total_overlap_time / total_query_time
        precision_per_item = overlapping_query_segments / total_query_segments
    
    if total_ground_truth_time == 0:
        recall = 1.0
        recall_per_item = 1.0
    else:
        recall = total_overlap_time / total_ground_truth_time
        recall_per_item = overlapping_ground_truth_segments / total_ground_truth_segments
    
    return precision, recall, precision_per_item, recall_per_item

def print_statistics(query_intrvllists, ground_truth_intrvllists):
    precision, recall, precision_per_item, recall_per_item = compute_statistics(
        query_intrvllists, ground_truth_intrvllists)

    print("Precision: ", precision)
    print("Recall: ", recall)
    print("Precision Per Item: ", precision_per_item)
    print("Recall Per Item: ", recall_per_item)

# Labelled Data

I have manually labelled all shot/reverse shot sequences in the first 50min (~72000 frames) of Godfather Part III and will be using this as groundtruth for validation. The hidden cell below reads in my labels and visualizes them.

In [10]:
data = [
    (8757,9049),
    (12750,13463),
    (13683,14227),
    (21357,22236),
    (22294,22758),
    (23147,25854),
    (26007,26942),
    (27620,28172),
    (28382,28623),
    (28785,29036),
    (29904,31014),
    (33936,35339),
    (35421,36248),
    (39388,40062),
    (41675,42689),
    (51246,52118),
    (53117,54776), # One side is a long shot and face is too small to be detected.
    (54895,55762),
    (56819,59963),
    (60253,61875),
    (66533,67846),
    (68729,69040),
    (69421,70153),
    (70285,71102)]
intrvllist = IntervalList([Interval(start, end, payload=None) for (start,end) in data])
shot_reverse_shot_labelled = {VIDEO_ID: intrvllist}

def display_labelled_interval(indices):
    return esper_widget(intrvllists_to_result_with_objects({VIDEO_ID: IntervalList([Interval(data[i][0], data[i][1], None) for i in indices])}, payload_to_objs=lambda p,v:[]))
    
esper_widget(intrvllists_to_result(shot_reverse_shot_labelled), crop_bboxes=False, show_middle_frame=False)

VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

# Building Up the Query

We take the approach of starting with a simple query with high rekall, and gradually refine it to weed out the false positives by adding more complexity.

## Shots with faces

We start by trying to find all shots where at least one sampled frame has a face. The payload for each shot is a list, each element is a list of faces in a sampled frame.

In [None]:
shots = VideoIntervalCollection.from_django_qs(
    Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER, max_frame__lte=MAX_FRAME),
    with_payload=lambda obj:[]
)
# For each frame, payload is a list of faces
face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID, frame__number__lte=MAX_FRAME),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)

shots_with_faces = get_shots_with_face(shots, face_frames)

We can now visualize the result by rendering both the groundtruth shots (red) and our shots with faces (black).

In [None]:
result = intrvllists_to_result(shots_with_faces.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

We can also look at our current recall and precision. We can see that it has high recall and low precision.

In [None]:
print_statistics(shots_with_faces.get_allintervals(), shot_reverse_shot_labelled)

The recall is not 100% because some shots in groundtruth are long/extreme long shots and no faces were detected. Run the cell below to see an example.

In [None]:
display_labelled_interval([16])

## Shots with face on alternate side of the screen

In shot reverse shot sequences, adjacent shots usually have faces on different side of the screen. Many of the shots_with_faces are crowd shots with many faces that are small (run the cell below for an example). We want to focus on shots with small number of bigger faces that appear in alternating regions of the screen. We can hence build two sets of shots: one with faces on the left, and one with faces on the right. We use a few parameters to define the left and right regions, minimum face height and maximum number of faces.

In [None]:
esper_widget(intrvllists_to_result_with_objects(shots_with_faces.filter(lambda intvl: intvl.start==11388).get_allintervals(), payload_to_objs=lambda p,v:[]))

In [None]:
RIGHT_HALF_MIN_X=0.45
LEFT_HALF_MAX_X=0.55
MAX_FACES_ON_SCREEN=2
MIN_FACE_HEIGHT=0.4

In [None]:
right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)

We now find sequence of two shots where the face positions alternate between left and right regions.

In [None]:
shot_reverse_shot = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=or_pred(before(max_dist=1), after(max_dist=1), arity=2)
    ).coalesce()

We can visualizes the result and see the evaluation numbers.

In [None]:
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

The recall is bad because some shot reverse shot has faces smaller than 40% of the screen, sometimes there are more than 2 faces and face can appear in the mid-section of the screen (Run cell below for false negative examples).

In [None]:
display_labelled_interval([6,14,15])

We can relax the parameters to improve recall.

In [None]:
RIGHT_HALF_MIN_X=0.33
LEFT_HALF_MAX_X=0.66
MAX_FACES_ON_SCREEN=4
MIN_FACE_HEIGHT=0.2

In [None]:
right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)
shot_reverse_shot = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=or_pred(before(max_dist=1), after(max_dist=1), arity=2)
    ).coalesce()
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

## Face Probability Threshold

Sometimes a non-face gets marked as a face (run cell below for an example), so we introduce a threshold for the probability

In [None]:
import esper.stdlib
esper_widget(esper.stdlib.qs_to_result(Face.objects.filter(frame__video__id=VIDEO_ID, frame__number=68664)))

In [None]:
MIN_FACE_PROBABILITY=0.99

In [None]:
face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(
            video_id=VIDEO_ID, frame__number__lte=MAX_FRAME, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)
right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)
shot_reverse_shot = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=or_pred(before(max_dist=1), after(max_dist=1), arity=2)
    ).coalesce()
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

## 3-Shot Sequence

So far the intervals we get only need to contain two shots, so any cut separating two shots with faces will be included (run cell below for a false positive).

In [None]:
esper_widget(intrvllists_to_result_with_objects(shot_reverse_shot.filter(lambda intvl: intvl.start==11388).get_allintervals(), payload_to_objs=lambda p,v:[]))

We can instead look for sequences that are at least 3 shots long, either with left-right-left or right-left-right pattern, and take the union of the two sets of sequences.

In [None]:
shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    )

shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    )

shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

## Identity Labels

Some of the inaccuracies come from stitching together extra shots on different characters who happen to be at the right position on screen (run cell below for an example). Now that we have most faces labelled for Godfather III, we can add in identity constraints on our 3-shot sequences.

In [None]:
esper_widget(intrvllists_to_result_with_objects(shot_reverse_shot.filter(lambda intvl: intvl.start==12135).get_allintervals(), payload_to_objs=lambda p,v:[]))

In [None]:
face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(
            video_id=VIDEO_ID, frame__number__lte=MAX_FRAME, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            merge_dict_parsers([
                bbox_payload_parser(VideoIntervalCollection.django_accessor),
                dict_payload_parser(VideoIntervalCollection.django_accessor, {"face_id": "id"})
            ]))
    ).coalesce(payload_merge_op=payload_plus)
right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)

def share_face(int1, int2):
    def get_identities_for_face_ids(ids):
        return {face_identity.identity_id for face_identity in FaceIdentity.objects.filter(face_id__in=ids)}
    def has_common_face(ids1, ids2):
        identities1 = get_identities_for_face_ids(ids1)
        identities2 = get_identities_for_face_ids(ids2)
        return len(identities1.intersection(identities2)) > 0
    faces1 = {face['face_id'] for faces in int1.payload for face in faces}
    faces2 = {face['face_id'] for faces in int2.payload for face in faces}
    return has_common_face(faces1, faces2)

shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_right,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_left,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

## Identity Labels with Spatial Constaint

We can further restrict the identity matchings to be among faces in the desired left or right region.

In [None]:
def filter_faces_to_region(region):
    def fn(intvl):
        intvl.payload = [face for face in intvl.payload if _region_contains_bbox(region, face)]
        return intvl
    return fn

faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    ).map(filter_faces_to_region(right_half))

faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    ).map(filter_faces_to_region(left_half))

shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)

shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_right,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_left,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

## Without Identity Labels

If we do not have good identity labels, we can instead use the position of the largest face in each sampled frame as a proxy. If the position is stable throughout the shot, it is likely that it is the same person. Sometimes the person can move during the shot (run cell below for an example), so we can set a threshold on the maximum movement of the center point of the face between consecutive sampled frames, in order to allow some movement.

In [None]:
display_labelled_interval([5])

In [None]:
MAX_FACE_MOVEMENT=0.15

In [None]:
def find_highest_box(boxes):
    def get_height(box):
        return box['y2'] - box['y1']
    if len(boxes) == 0:
        return None
    result = boxes[0]
    best = get_height(result)
    for i in range(1, len(boxes)):
        h = get_height(boxes[i])
        if h > best:
            best = h
            result= boxes[i]
    return result

def take_highest_in_frame(intvl):
    result = []
    for faces_in_frame in intvl.payload:
        largest = find_highest_box(faces_in_frame)
        if largest is not None:
            result.append(largest)
    intvl.payload = result
    return intvl

def movement_less_than(dist):
    def get_center(box):
        return ((box['x1'] + box['x2']) / 2, (box['y1']+box['y2']) / 2)
    def get_distance(pt1, pt2):
        return np.sqrt((pt1[0]-pt2[0])**2+(pt1[1]-pt2[1])**2)
    def check(boxes):
        for b1, b2 in zip(boxes, boxes[1:]):
            if get_distance(get_center(b1), get_center(b2)) > dist:
                return False
        return True
    return check

shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right).map(take_highest_in_frame).filter(
    payload_satisfies(movement_less_than(MAX_FACE_MOVEMENT)))
shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left).map(take_highest_in_frame).filter(
    payload_satisfies(movement_less_than(MAX_FACE_MOVEMENT)))

shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    )

shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    )

shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
result = intrvllists_to_result(shot_reverse_shot.get_allintervals(), color='black')
add_intrvllists_to_result(result, shot_reverse_shot_labelled, color='red')
print_statistics(shot_reverse_shot.get_allintervals(), shot_reverse_shot_labelled)
esper_widget(result, crop_bboxes=False, show_middle_frame=False)

# Self-contained Functions for Queries in this Document

Parameters for each query are listed at the top of the definition in ALL_CAPS variables.

## Shots with faces

In [12]:
def shots_with_faces():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.parsers import in_array, bbox_payload_parser
    from rekall.merge_ops import payload_plus, payload_second
    from rekall.temporal_predicates import overlaps
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
        
    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)

    shots_with_faces = get_shots_with_face(shots, face_frames)
    return intrvllists_to_result_with_objects(shots_with_faces.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shots_with_faces())

Precision:  0.3602783388033325
Recall:  0.9536398947500313
Precision Per Item:  0.39036144578313253
Recall Per Item:  1.0


VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

## Shot sequences with faces in alternating regions

In [13]:
def shots_with_faces_in_alternating_regions():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    RIGHT_HALF_MIN_X=0.33
    LEFT_HALF_MAX_X=0.66
    MAX_FACES_ON_SCREEN=4
    MIN_FACE_HEIGHT=0.2
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.bbox_predicates import height_at_least
    from rekall.parsers import in_array, bbox_payload_parser
    from rekall.merge_ops import payload_plus, payload_second
    from rekall.temporal_predicates import overlaps, before, after
    from rekall.spatial_predicates import make_region, scene_graph
    from rekall.list_predicates import length_at_most
    from rekall.logical_predicates import and_pred, or_pred
    from rekall.payload_predicates import payload_satisfies
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
    
    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)
    
    right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
    left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
    graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
    faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

    faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
    shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
    shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)
    shot_reverse_shot = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=or_pred(before(max_dist=1), after(max_dist=1), arity=2)
    ).coalesce()
    return intrvllists_to_result_with_objects(shot_reverse_shot.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shots_with_faces_in_alternating_regions())

Precision:  0.3723450218619843
Recall:  0.8998454663158334
Precision Per Item:  0.2830188679245283
Recall Per Item:  1.0


VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

## Shot/Reverse shot with faces above certain probability

In [14]:
def shot_reverse_shot_with_probable_faces():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    RIGHT_HALF_MIN_X=0.33
    LEFT_HALF_MAX_X=0.66
    MAX_FACES_ON_SCREEN=4
    MIN_FACE_HEIGHT=0.2
    MIN_FACE_PROBABILITY=0.99
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.bbox_predicates import height_at_least
    from rekall.parsers import in_array, bbox_payload_parser
    from rekall.merge_ops import payload_plus, payload_second
    from rekall.temporal_predicates import overlaps, before, after
    from rekall.spatial_predicates import make_region, scene_graph
    from rekall.list_predicates import length_at_most
    from rekall.logical_predicates import and_pred, or_pred
    from rekall.payload_predicates import payload_satisfies
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
        
    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)
    
    right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
    left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
    graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
    faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

    faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
    shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
    shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)
    shot_reverse_shot = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=or_pred(before(max_dist=1), after(max_dist=1), arity=2)
    ).coalesce()
    return intrvllists_to_result_with_objects(shot_reverse_shot.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shot_reverse_shot_with_probable_faces())

Precision:  0.4629832591924011
Recall:  0.8998037004552479
Precision Per Item:  0.3333333333333333
Recall Per Item:  1.0


VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

## Shot/Reverse shot sequence with at least 3 shots

In [16]:
def shot_reverse_shot_three_shots():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    RIGHT_HALF_MIN_X=0.33
    LEFT_HALF_MAX_X=0.66
    MAX_FACES_ON_SCREEN=4
    MIN_FACE_HEIGHT=0.2
    MIN_FACE_PROBABILITY=0.99
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.bbox_predicates import height_at_least
    from rekall.parsers import in_array, bbox_payload_parser
    from rekall.merge_ops import payload_plus, payload_second
    from rekall.temporal_predicates import overlaps, before, after
    from rekall.spatial_predicates import make_region, scene_graph
    from rekall.list_predicates import length_at_most
    from rekall.logical_predicates import and_pred, or_pred
    from rekall.payload_predicates import payload_satisfies
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
        
    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)
    
    right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
    left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
    graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
    faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

    faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
    shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
    shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)
    shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    )

    shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    )

    shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
    return intrvllists_to_result_with_objects(shot_reverse_shot.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shot_reverse_shot_three_shots())

VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

## Shot/Reverse shot sequence with consistent identities

In [18]:
def shot_reverse_shot_consistent_identities():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    RIGHT_HALF_MIN_X=0.33
    LEFT_HALF_MAX_X=0.66
    MAX_FACES_ON_SCREEN=4
    MIN_FACE_HEIGHT=0.2
    MIN_FACE_PROBABILITY=0.99
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.bbox_predicates import height_at_least
    from rekall.parsers import in_array, bbox_payload_parser, merge_dict_parsers, dict_payload_parser
    from rekall.merge_ops import payload_plus, payload_second, payload_first
    from rekall.temporal_predicates import overlaps, before, after
    from rekall.spatial_predicates import make_region, scene_graph
    from rekall.list_predicates import length_at_most
    from rekall.logical_predicates import and_pred, or_pred
    from rekall.payload_predicates import payload_satisfies
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
    # Check if the face id labels between the payloads have any in common
    def share_face(int1, int2):
        def get_identities_for_face_ids(ids):
            return {face_identity.identity_id for face_identity in FaceIdentity.objects.filter(face_id__in=ids)}
        def has_common_face(ids1, ids2):
            identities1 = get_identities_for_face_ids(ids1)
            identities2 = get_identities_for_face_ids(ids2)
            return len(identities1.intersection(identities2)) > 0
        faces1 = {face['face_id'] for faces in int1.payload for face in faces}
        faces2 = {face['face_id'] for faces in int2.payload for face in faces}
        return has_common_face(faces1, faces2)

    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            merge_dict_parsers([
                bbox_payload_parser(VideoIntervalCollection.django_accessor),
                dict_payload_parser(VideoIntervalCollection.django_accessor, {"face_id": "id"})
            ]))
    ).coalesce(payload_merge_op=payload_plus)
    
    right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
    left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
    graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
    faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    )

    faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    )
    shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
    shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)

    shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_right,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

    shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_left,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

    shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()    
    return intrvllists_to_result_with_objects(shot_reverse_shot.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shot_reverse_shot_consistent_identities())

VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

## Shot/Reverse shot sequence with consistent identities in alternating regions

In [20]:
def shot_reverse_shot_consistent_identities_in_alternating_regions():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    RIGHT_HALF_MIN_X=0.33
    LEFT_HALF_MAX_X=0.66
    MAX_FACES_ON_SCREEN=4
    MIN_FACE_HEIGHT=0.2
    MIN_FACE_PROBABILITY=0.99
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.bbox_predicates import height_at_least
    from rekall.parsers import in_array, bbox_payload_parser, merge_dict_parsers, dict_payload_parser
    from rekall.merge_ops import payload_plus, payload_second, payload_first
    from rekall.temporal_predicates import overlaps, before, after
    from rekall.spatial_predicates import make_region, scene_graph, _region_contains_bbox
    from rekall.list_predicates import length_at_most
    from rekall.logical_predicates import and_pred, or_pred
    from rekall.payload_predicates import payload_satisfies
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
    # Check if the face id labels between the payloads have any in common
    def share_face(int1, int2):
        def get_identities_for_face_ids(ids):
            return {face_identity.identity_id for face_identity in FaceIdentity.objects.filter(face_id__in=ids)}
        def has_common_face(ids1, ids2):
            identities1 = get_identities_for_face_ids(ids1)
            identities2 = get_identities_for_face_ids(ids2)
            return len(identities1.intersection(identities2)) > 0
        faces1 = {face['face_id'] for faces in int1.payload for face in faces}
        faces2 = {face['face_id'] for faces in int2.payload for face in faces}
        return has_common_face(faces1, faces2)
    # Returns a function that transforms an interval by filtering out all faces outside of `region` in its payload.
    def filter_faces_to_region(region):
        def fn(intvl):
            intvl.payload = [face for face in intvl.payload if _region_contains_bbox(region, face)]
            return intvl
        return fn

    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            merge_dict_parsers([
                bbox_payload_parser(VideoIntervalCollection.django_accessor),
                dict_payload_parser(VideoIntervalCollection.django_accessor, {"face_id": "id"})
            ]))
    ).coalesce(payload_merge_op=payload_plus)
    
    right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
    left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
    graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
    faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    ).map(filter_faces_to_region(right_half))

    faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    ).map(filter_faces_to_region(left_half))
    shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right)
    shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left)

    shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_right,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

    shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1),
        payload_merge_op=payload_first
    ).merge(
        shots_with_faces_on_left,
        predicate=and_pred(before(max_dist=1),
                           share_face,
                           arity=2)
    )

    shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
    return intrvllists_to_result_with_objects(shot_reverse_shot.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shot_reverse_shot_consistent_identities_in_alternating_regions())

VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

## Shot/Reverse shot sequence with consistent face bounding boxes

In [22]:
def shot_reverse_shot_consistent_face_bbox():
    VIDEO_ID=216
    CINEMATIC_SHOTS_LABELLER=64
    RIGHT_HALF_MIN_X=0.33
    LEFT_HALF_MAX_X=0.66
    MAX_FACES_ON_SCREEN=4
    MIN_FACE_HEIGHT=0.2
    MIN_FACE_PROBABILITY=0.99
    MAX_FACE_MOVEMENT=0.15
    
    from rekall.video_interval_collection import VideoIntervalCollection
    from rekall.bbox_predicates import height_at_least
    from rekall.parsers import in_array, bbox_payload_parser
    from rekall.merge_ops import payload_plus, payload_second
    from rekall.temporal_predicates import overlaps, before, after
    from rekall.spatial_predicates import make_region, scene_graph, _region_contains_bbox
    from rekall.list_predicates import length_at_most
    from rekall.logical_predicates import and_pred, or_pred
    from rekall.payload_predicates import payload_satisfies
    from esper.rekall import intrvllists_to_result_with_objects
    # Keep shots that overlaps with face_frames and each shot's payload is a list, each element is a list of
    # faces of a frame in the shot.
    def get_shots_with_face(shots, face_frames):
        # Wrap the payload in a list.
        def wrap_list(intvl):
            intvl.payload = [intvl.payload]
            return intvl
        return shots.merge(
            face_frames, predicate=overlaps(), payload_merge_op=payload_second
        ).map(wrap_list).coalesce(payload_merge_op=payload_plus)
    # Returns a function that transforms an interval by filtering out all faces outside of `region` in its payload.
    def filter_faces_to_region(region):
        def fn(intvl):
            intvl.payload = [face for face in intvl.payload if _region_contains_bbox(region, face)]
            return intvl
        return fn
    # Returns the highest bbox in `boxes`
    def find_highest_box(boxes):
        def get_height(box):
            return box['y2'] - box['y1']
        if len(boxes) == 0:
            return None
        result = boxes[0]
        best = get_height(result)
        for i in range(1, len(boxes)):
            h = get_height(boxes[i])
            if h > best:
                best = h
                result= boxes[i]
        return result
    # Transforms the interval's payload (list of bboxes) by picking the highest bbox
    def take_highest_in_frame(intvl):
        result = []
        for faces_in_frame in intvl.payload:
            largest = find_highest_box(faces_in_frame)
            if largest is not None:
                result.append(largest)
        intvl.payload = result
        return intvl
    # Returns a function that checks if the distance between centers of consecutive boxes
    # is within `dist`.
    def movement_less_than(dist):
        def get_center(box):
            return ((box['x1'] + box['x2']) / 2, (box['y1']+box['y2']) / 2)
        def get_distance(pt1, pt2):
            return np.sqrt((pt1[0]-pt2[0])**2+(pt1[1]-pt2[1])**2)
        def check(boxes):
            for b1, b2 in zip(boxes, boxes[1:]):
                if get_distance(get_center(b1), get_center(b2)) > dist:
                    return False
            return True
        return check
    
    shots = VideoIntervalCollection.from_django_qs(
        Shot.objects.filter(video_id=VIDEO_ID, labeler_id=CINEMATIC_SHOTS_LABELLER),
        with_payload=lambda obj:[]
    )
    # For each frame, payload is a list of faces
    face_frames = VideoIntervalCollection.from_django_qs(
        Face.objects.annotate(
        min_frame=F('frame__number'),
        max_frame=F('frame__number'),
        video_id=F('frame__video_id')).filter(video_id=VIDEO_ID, probability__gte=MIN_FACE_PROBABILITY),
        with_payload=in_array(
            bbox_payload_parser(VideoIntervalCollection.django_accessor))
    ).coalesce(payload_merge_op=payload_plus)
    
    right_half = make_region(RIGHT_HALF_MIN_X, 0.0, 1.0, 1.0)
    left_half = make_region(0.0, 0.0, LEFT_HALF_MAX_X, 1.0)
    graph = {
        'nodes': [ { 'name': 'face', 'predicates': [ height_at_least(MIN_FACE_HEIGHT) ] } ],
        'edges': []
    }
    faces_on_right = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=right_half))
        )
    ).map(filter_faces_to_region(right_half))

    faces_on_left = face_frames.filter(
        and_pred(
            payload_satisfies(length_at_most(MAX_FACES_ON_SCREEN)),
            payload_satisfies(scene_graph(graph, region=left_half))
        )
    ).map(filter_faces_to_region(left_half))
    
    shots_with_faces_on_right = get_shots_with_face(shots, faces_on_right).map(take_highest_in_frame).filter(
        payload_satisfies(movement_less_than(MAX_FACE_MOVEMENT)))
    shots_with_faces_on_left = get_shots_with_face(shots, faces_on_left).map(take_highest_in_frame).filter(
        payload_satisfies(movement_less_than(MAX_FACE_MOVEMENT)))
    
    shot_reverse_shot_1 = shots_with_faces_on_right.merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    )

    shot_reverse_shot_2 = shots_with_faces_on_left.merge(
        shots_with_faces_on_right,
        predicate=before(max_dist=1)
    ).merge(
        shots_with_faces_on_left,
        predicate=before(max_dist=1)
    )

    shot_reverse_shot = shot_reverse_shot_1.set_union(shot_reverse_shot_2).coalesce()
    return intrvllists_to_result_with_objects(shot_reverse_shot.get_allintervals(), payload_to_objs=lambda p,v:[])

esper_widget(shot_reverse_shot_consistent_face_bbox())

VGridWidget(jsglobals={'schema': [['Identity', ['id', 'name']], ['Genre', ['id', 'name']], ['Video', ['id', 'p…

# Scratchpad

In [None]:
def get_height(box):
    return box['y2'] - box['y1']

def find_highest_box(boxes):
    if len(boxes) == 0:
        return None
    result = boxes[0]
    best = get_height(result)
    for i in range(1, len(boxes)):
        h = get_height(boxes[i])
        if h > best:
            best = h
            result= boxes[i]
    return result

def take_highest_in_frame(intvl):
    result = []
    for faces_in_frame in intvl.payload:
        largest = find_highest_box(faces_in_frame)
        if largest is not None:
            result.append(largest)
    intvl.payload = result
    return intvl

def take_highest_in_region(x1,x2):
    def fn(intvl):
        filtered = []
        for faces_in_frame in intvl.payload:
            faces = [f for f in faces_in_frame if f['x1']>=x1 and f['x2']<=x2]
            if len(faces) > 0:
                filtered.append(faces)
        intvl.payload = filtered
        return take_highest_in_frame(intvl)
    return fn

shot_reverse_shot_with_faces = shots_with_faces_on_right.map(take_highest_in_region(RIGHT_HALF_MIN_X, 1.0)).merge(
        shots_with_faces_on_left.map(take_highest_in_region(0, LEFT_HALF_MAX_X)),
        predicate=or_pred(before(max_dist=1), after(max_dist=1), arity=2),
        payload_merge_op=lambda p1, p2: (p1,p2)
    ).coalesce()=

def double_list_to_objects(p, v):
    right_faces, left_faces = p
    def to_obj(box, i):
        obj = bbox_to_result_object(box, v)
        obj['gender_id'] = i
        return obj
    return [to_obj(box, 1) for faces in left_faces for box in faces] + [to_obj(box, 2) for faces in right_faces for box in faces]

def list_to_objects(p,v):
    right_faces, left_faces = p
    def to_obj(box, i):
        obj = bbox_to_result_object(box, v)
        obj['gender_id'] = i
        return obj
    return [to_obj(box, 1) for box in left_faces] + [to_obj(box, 2) for box in right_faces]

esper_widget(intrvllists_to_result_with_objects(shot_reverse_shot_with_faces.filter(lambda intvl: True).get_allintervals(), payload_to_objs=list_to_objects))

In [None]:
from esper.prelude import *