## A notebook for exploring video posedata

**Intended use:** the user selects a video that is accompanied by already extracted posedata in a .json file. The notebook provides visualizations that summarize the quality and content of the poses extracted across all frames of the video, as well as armature plots of the detected poses in a selected frame. These can be viewed separately from the source video, compared numerically, grouped and searched by similarity, and even animated.

Note that at present, this only works with .json output files generated via the Open PifPaf command-line tools.


In [None]:
from datetime import datetime
import json
import os
from pathlib import Path
from time import sleep
import warnings

from bokeh.io import output_notebook
from bokeh.layouts import column, row
from bokeh.models import (
    Button,
    CrosshairTool,
    DatetimeTickFormatter,
    Div,
    LegendItem,
    Line,
    LinearAxis,
    Range1d,
    Slider,
    Span,
    TapTool,
    Toggle,
)
from bokeh.models.sources import ColumnDataSource
from bokeh.models.widgets.inputs import Select
from bokeh.plotting import figure, show
from bokeh.themes import Theme
import faiss
from ipyfilechooser import FileChooser
from IPython.display import HTML, display, clear_output
from ipywidgets import Dropdown, Layout, IntProgress
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import numpy as np

from pose_functions import *
from posedata_preprocessing import *


### Build and display the video/posedata selector widget

Clicking the "Select" button that appears after running this cell will display a filesystem navigator/selector widget that can be used to select a video for analysis. Note that for now, this video **must** be in the same folder as its posedata output, and the names of the matched video and posedata files should be identical, other than that the posedata file will have `.openpifpaf.json` appended to the name of the video file.

The default folder the selector widget shows first is either the value of the `$DEV_FOLDER` environment variable (see README.md for information about how to set this via a `.env` file) or else the folder from which the notebook is being run.


In [None]:
source_data_folder = Path(os.getenv("DATA_FOLDER", Path.cwd()))


def get_available_videos(data_folder):
    """
    Available videos will be limited to those with a .json and matching video (.mp4, .avi, etc)
    file in a predefined directory (defaulting to the notebook's running directory)
    """
    available_json_files = list(data_folder.glob("*.json"))
    available_video_files = (
        p.resolve()
        for p in Path(data_folder).glob("*")
        if p.suffix in {".avi", ".mp4", ".mov", ".mkv", ".webm"}
    )
    available_json = [
        json_file.stem.split(".")[0] for json_file in available_json_files
    ]

    available_videos = []

    for video_name in available_video_files:
        if video_name.stem.split(".")[0] in available_json:
            available_videos.append(video_name.name)

    return available_videos


fc = FileChooser(source_data_folder)
fc.title = '<b>Use "Select" to choose a video file.</b><br>It must have an accompanying .openpifpaf.json file in the same folder.'
fc.filter_pattern = ["*.mp4", "*.mkv", "*.avi", "*.webm", "*.mov"]

display(fc)


### Collect video and per-frame pose metadata for the selected video

Run this cell after selecting a video above.


In [None]:
pose_file = f"{fc.selected}.openpifpaf.json"
video_file = fc.selected

print("Video file:", video_file)
print("Posedata file:", pose_file)

cap = cv2.VideoCapture(video_file)
video_fps = cap.get(cv2.CAP_PROP_FPS)
video_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
video_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
cap.release()

print("Video FPS:", video_fps)

print("Loading video and JSON files, please wait...")

pose_data, pose_series = preprocess_pose_json(pose_file, video_fps)

print("Total frames:", len(pose_series["frame"]))

print("Duration:", pose_series["timestamp"][len(pose_series["timestamp"]) - 1].time())


In [None]:
tracked_pose_file = f"{fc.selected}.tracked.openpifpaf.json"

if os.path.isfile(tracked_pose_file):
    tracked_pose_data = json.load(open(tracked_pose_file, "r"))
else:
    tracked_pose_data = track_poses(
        pose_data, video_fps, video_width, video_height, show_progress=True
    )
    print("Writing pose data with tracking info to", tracked_pose_file)
    json.dump(tracked_pose_data, open(tracked_pose_file, "w"))

pose_data = tracked_pose_data
pose_series["tracked_poses"] = count_tracked_poses(
    tracked_pose_data,
)


## Pose normalization and clustering

The following two cells need to be run to enable the pose search features of the posedata explorer app.

The normalization process in the first cell can take quite a while if it has never been run on a particular set of video/posedata files (~10 minutes for a full-length play). But it then caches the results in pickle (\*.p) files in the same folder as the video and posedata files, meaning the cell will take a very short amount of time on every subsequent invocation for that video.

When the explorer's infrastructure switches over to using a local database to store the normalized pose coordinates and other data, these normalization and indexing steps should be entirely replaced by the database ingest process for a new video/posedata corpus.


In [None]:
import pickle


normalized_pose_file = pose_file.replace(".openpifpaf.json", ".normalized.p")
metadata_file = pose_file.replace(".openpifpaf.json", ".metadata.p")

if (os.path.isfile(normalized_pose_file)) and (os.path.isfile(metadata_file)):
    normalized_pose_data = pickle.load(open(normalized_pose_file, "rb"))
    [normalized_pose_metadata, framepose_to_seqno] = pickle.load(
        open(metadata_file, "rb")
    )

else:
    print("Computing normalized poses for comparison and clustering")
    print("This may take a while...")
    bar = IntProgress(min=0, max=len(pose_data))
    display(bar)

    # For cluster analysis, each pose must be a 1D array, and all poses must be in a 1D list
    # that includes only the pose keypoint coordinates (not the confidence scores).
    # So we also create a parallel data structure to keep track of the frame number and numbering
    # within the frame of each of the poses.
    normalized_pose_data = []
    normalized_pose_metadata = []

    framepose_to_seqno = {}
    pose_seqno = 0

    for i, frame in enumerate(pose_data):

        if i % 100 == 0:
            bar.value = i

        for j, pose in enumerate(frame["predictions"]):
            normalized_coords = extract_trustworthy_coords(
                shift_normalize_rescale_pose_coords(pose)
            )
            normalized_pose_data.append(normalized_coords)
            normalized_pose_metadata.append({"frameno": i, "poseno": j})

            if i in framepose_to_seqno:
                framepose_to_seqno[i][j] = pose_seqno
            else:
                framepose_to_seqno[i] = {j: pose_seqno}

            pose_seqno += 1

    bar.bar_style = "success"

    pickle.dump(normalized_pose_data, open(normalized_pose_file, "wb"))
    pickle.dump(
        [normalized_pose_metadata, framepose_to_seqno], open(metadata_file, "wb")
    )


In [None]:
# Build FAISS indexes of the poses, for fast nearest neighbor similarity search

print("Indexing video posedata set for similarity search")

# FAISS can't handle NaNs in the iput vectors so use -1s instead
faiss_pose_data = [
    tuple(np.nan_to_num(raw_pose, nan=-1).tolist()) for raw_pose in normalized_pose_data
]

# This builds an exact (flat) index based on Euclidean distance
faiss_L2_index = faiss.IndexFlatL2(34)
faiss_L2_input = np.array(faiss_pose_data).astype("float32")
faiss_L2_index.add(faiss_L2_input)

# The below builds an exact (flat) index based on inner-product distance,
# which is equivalent to cosine similarity when the inputs are normalized.
# So far, its results have not been noticeably preferable to the L2
# (Euclidean) distance-based index, but it may be useful in the future.
# faiss_IP_index = faiss.IndexFlatIP(34)
# faiss_IP_input = np.array(faiss_pose_data).astype('float32')
# faiss.normalize_L2(faiss_IP_input) # Must normalize the inputs!
# faiss_IP_index.add(faiss_IP_input)


#### Optional: cluster analysis of normalized poses

The cell below computes a K-means clustering of the poses based on the L2 (Euclidean) similarities of their normalized coordinate vectors, then calculates and visualizes the relative sizes of the clusters and the averaged armature positions of their poses.

The averaged poses are visualized on blended averages of their background (source) image regions if `AVERAGE_BACKGROUNDS` is set to `True` -- although generating these plots takes quite a bit longer than plotting averaged poses with no backgrounds, due to the overhead of averaging the background images' pixel values.


In [None]:
# OPTIONAL cluster analysis of the normalized poses:
# Compute a K-means clustering of the poses based on the L2 (Euclidean) similarities
# of their normalized coordinate vectors, then compute and visualize the sizes of
# the clusters and the averages of their poses.
NUMBER_OF_CLUSTERS = 100
IMAGE_SAMPLE = 100  # Only contribute 1 in 100 images to the background average
AVERAGE_BACKGROUNDS = True  # Set to True to average the pose backgrounds

print(f"Clustering video poses into {NUMBER_OF_CLUSTERS} clusters")
kmeans_faiss = faiss.Kmeans(d=faiss_L2_input.shape[1], k=NUMBER_OF_CLUSTERS, niter=100)
kmeans_faiss.train(faiss_L2_input)
_, cluster_labels = kmeans_faiss.index.search(faiss_L2_input, 1)
cluster_labels = np.array(cluster_labels).flatten()

bin_counts = {}
cluster_to_pose = {}

for i in range(len(cluster_labels)):
    ct = cluster_labels[i]
    if ct not in bin_counts:
        bin_counts[ct] = 1
    else:
        bin_counts[ct] += 1
    if ct not in cluster_to_pose:
        cluster_to_pose[ct] = [i]
    else:
        cluster_to_pose[ct].append(i)

sorted_bin_counts = dict(
    sorted(bin_counts.items(), key=lambda item: item[1], reverse=True)
)
sorted_bin_counts_list = list(sorted_bin_counts.values())

fig = plt.figure(figsize=(10, 5))
plt.bar(range(len(sorted_bin_counts_list)), sorted_bin_counts_list)
plt.xlabel("cluster")
plt.ylabel("# poses")
plt.show()

print("Drawing averages of cluster poses and backgrounds")

for k in list(sorted_bin_counts.keys())[:10]:
    cluster_poses = []
    images_array = []

    # Use some matplotlib weirdness to draw the stick figures in higher resolution
    # but with the same axis labels (0-100 "pixels")
    fig, ax = plt.subplots()
    fig.set_size_inches(UPSCALE * 100 / fig.dpi, UPSCALE * 100 / fig.dpi)
    fig.canvas.draw()

    print("Cluster:", k, "| total poses:", len(cluster_to_pose[k]))

    for i, pose_id in enumerate(cluster_to_pose[k]):

        cluster_poses.append(normalized_pose_data[pose_id])

        # Don't average the background of every pose in the cluster,
        # because that usually takes way too long
        if AVERAGE_BACKGROUNDS and (i % IMAGE_SAMPLE) == 0:
            # Get the original posedata for the pose in order to extract the background image
            pose_frameno = normalized_pose_metadata[pose_id]["frameno"]
            poseno = normalized_pose_metadata[pose_id]["poseno"]
            pose_pred = pose_data[pose_frameno]["predictions"][poseno]

            pose_base_image = extract_pose_background(
                pose_pred, video_file, pose_frameno
            )

            # Resize/normalize the cutout background dimensions, just as is done
            # for the pose itself
            resized_image = cv2.resize(
                pose_base_image,
                dsize=(POSE_MAX_DIM * UPSCALE, POSE_MAX_DIM * UPSCALE),
                interpolation=cv2.INTER_LANCZOS4,
            )
            images_array.append(resized_image)

    if AVERAGE_BACKGROUNDS:
        images_array = np.array(images_array, dtype=float)

        # Average the RGB values of all of the pose background images
        avg_background_img = np.mean(images_array, axis=0).astype(np.uint8)
        plt.imshow(avg_background_img)

    with warnings.catch_warnings():
        warnings.filterwarnings(action="ignore", message="Mean of empty slice")
        cluster_average = np.nanmean(np.array(cluster_poses), axis=0).tolist()

    armature_prevalences = get_armature_prevalences(cluster_poses)
    cluster_average = np.array_split(cluster_average, len(cluster_average) / 2)
    cluster_average_img = draw_normalized_and_unflattened_pose(
        cluster_average, armature_prevalences=armature_prevalences
    )

    plt.imshow(cluster_average_img)
    axis_labels = [0] + list(range(0, 100, 20))
    axis_label_locs = [lab * UPSCALE for lab in axis_labels]

    ax.xaxis.set_major_locator(mticker.FixedLocator(axis_label_locs))
    ax.set_xticklabels(axis_labels)
    ax.yaxis.set_major_locator(mticker.FixedLocator(axis_label_locs))
    ax.set_yticklabels(axis_labels)
    plt.show()

    # If we want to inspect some of the poses in the cluster
    # for i in range (10):
    #     this_pose = np.array_split(cluster_poses[i], len(cluster_poses[i]) / 2)
    #     pose_img =  draw_normalized_and_unflattened_pose(this_pose)

    #     plt.imshow(pose_img)
    #     plt.show()


### Build and launch the explorer app

This displays an interactive chart visualization of the attributes of the posedata in the .json output file across the runtime of the video.

Clicking anywhere in the chart, moving the slider, or clicking the prev/next buttons will select a frame and draw the poses detected in that frame, with the option of displaying the actual image from the source video as the "background." When a frame is selected, it is also possible to click a specific pose in the frame window to select that pose for comparison with a second pose (which is also selected by clicking on it). And the first selected pose can be used as the "query" to search for the most similar poses across the entire video, which can then be viewed and paged through.

Please see the cell below the next if you are running this notebook in VS Code. Note also that the Jupyter server must be running on port 8888 (or 8889) for the explorer app to work in Jupyter/JupterLab.


In [None]:
def pil_to_bokeh_image(pil_img, target_width, target_height):
    """The Bokeh interactive notebook tools will only display image data if it's formatted in a particular way"""
    img_array = np.array(pil_img.transpose(Image.Transpose.FLIP_TOP_BOTTOM))

    img = np.empty(img_array.shape[:2], dtype=np.uint32)
    view = img.view(dtype=np.uint8).reshape(img_array.shape)

    for i in range(target_height):
        for j in range(target_width):
            view[i, j, 0] = img_array[i, j, 0]
            view[i, j, 1] = img_array[i, j, 1]
            view[i, j, 2] = img_array[i, j, 2]
            view[i, j, 3] = img_array[i, j, 3]

    return img


def bkapp(doc):
    """Define and run the Bokeh interactive notebook (Python + Javascript) application"""

    # Some session data is best stored in a global dictionary
    data = {}

    max_y = max(pose_series["avg_coords_per_pose"] + pose_series["num_poses"])

    # This is the main interactive timeline chart
    tl = figure(
        width=FIGURE_WIDTH,
        height=FIGURE_HEIGHT,
        title=video_file,
        min_border=10,
        y_range=(0, max_y + 1),
        tools="save,box_zoom,pan,reset",
    )
    # Format the X axis as hour-minute-second timecodes
    tl.x_range = Range1d(min(pose_series["timestamp"]), max(pose_series["timestamp"]))
    tl.xaxis.axis_label = "Time"
    time_formatter = DatetimeTickFormatter(
        hourmin="%H:%M:%S",
        minutes="%H:%M:%S",
        minsec="%H:%M:%S",
        seconds="%Ss",
        milliseconds="%3Nms",
    )
    # The 3 main pose-related time series to be visualized on the timeline
    tl.line(
        pose_series["timestamp"],
        pose_series["num_poses"],
        legend_label="Poses per frame",
        color="blue",
        alpha=0.6,
        line_width=2,
    )
    tl.line(
        pose_series["timestamp"],
        pose_series["avg_coords_per_pose"],
        legend_label="Coords per pose",
        color="red",
        alpha=0.6,
        line_width=2,
    )
    # Only display this if figure tracking has been run
    if "tracked_poses" in pose_series:
        tl.line(
            pose_series["timestamp"],
            pose_series["tracked_poses"],
            legend_label="Tracked poses",
            color="purple",
            alpha=0.6,
            line_width=2,
        )
    tl.line(
        pose_series["timestamp"],
        [0] * len(pose_series["frame"]),
        color="orange",
        alpha=0,
        line_width=2,
        name="similar_poses",
    )
    # Only display this if scene/activity detection has been run
    if "activity" in pose_series:
        tl.line(
            pose_series["timestamp"],
            pose_series["activity"],
            y_range_name="avg_score",
            legend_label="Activity",
            color="black",
            alpha=.6,
            line_width=1,
            name="activity",
        )

    # The left Y axis corresponds to counts of poses and coordinates
    tl.yaxis.axis_label = "Poses or Coords"
    tl.extra_y_ranges = {"avg_score": Range1d(0, 1)}
    tl.line(
        pose_series["timestamp"],
        pose_series["avg_score"],
        y_range_name="avg_score",
        legend_label="Avg pose score",
        color="green",
        alpha=0.4,
        line_width=2,
    )
    # The right Y axis corresponds to the average pose score (from 0 to 1), and to the
    # per-frame similarity score (0 to 1) when a pose search query has been run.
    tl.add_layout(
        LinearAxis(
            y_range_name="avg_score", axis_label="Avg Pose Score or Cosine Similarity"
        ),
        "right",
    )
    tl.xaxis.formatter = time_formatter
    tl.xaxis.ticker.desired_num_ticks = 10
    tl.legend.click_policy = "hide"
    frame_line = Span(
        location=pose_series["timestamp"][0],
        dimension="height",
        line_color="red",
        line_width=3,
    )
    tl.add_layout(frame_line)

    def tl_tap(event):
        """When the chart is clicked, move the slider to the appropriate frame"""
        # Do not respond to clicks on the top 25% of the plot. This is to try inadvertently
        # selecting a new frame when the user just wants to click on the legend to hide or
        # show one of the time-series glyphs.
        if event.y > 0.75 * max_y:
            return
        # event.x is a timestamp, so it needs to be converted to a frameno
        start_dt = datetime(1900, 1, 1)
        dt = datetime.utcfromtimestamp(event.x / 1000)
        t_delta = dt - start_dt
        clicked_frame = (
            round(t_delta.total_seconds() * video_fps) + 1
        )  # Slider framenos are 1-indexed
        slider_callback(None, slider.value, clicked_frame)

    tl_tap_tool = TapTool()
    tl_crosshair_tool = CrosshairTool()

    def get_frame_info(fn):
        frame_dt = pose_series["timestamp"][fn]
        frame_tc = frame_dt.strftime("%H:%M:%S.%f")[:-4]
        return f"{frame_tc}: {pose_series['num_poses'][fn]} detected poses, {pose_series['tracked_poses'][fn]} tracked, {pose_series['avg_coords_per_pose'][fn]:.3f} avg coords/pose, {pose_series['avg_score'][fn]:.3f} avg pose score"

    info_div = Div(text=get_frame_info(0))

    tl.add_tools(tl_tap_tool, tl_crosshair_tool)
    tl.on_event("tap", tl_tap)

    # This is the second figure, where the poses in the selected frame are drawn
    fr = figure(
        x_range=(0, video_width),
        y_range=(0, video_height),
        width=FIGURE_WIDTH,
        height=int(FIGURE_WIDTH / video_width * video_height),
        title="Poses in selected frame",
        tools="save",
    )
    # Add an invisible glyph to suppress the "figure has no renderers" warning
    fr.circle(0, 0, size=0, alpha=0.0)

    pose_info_div = Div(text="Click to poses to compare")

    # This is the drawing of the first pose selected from a frame
    pose_p1 = figure(
        x_range=(0, POSE_MAX_DIM),
        y_range=(0, POSE_MAX_DIM),
        width=POSE_MAX_DIM * 2,
        height=POSE_MAX_DIM * 2,
        title="",
        tools="",
    )
    # Add an invisible glyph to suppress the "figure has no renderers" warning
    pose_p1.circle(0, 0, size=0, alpha=0.0)

    # This is the second pose selected from a frame
    pose_p2 = figure(
        x_range=(0, POSE_MAX_DIM),
        y_range=(0, POSE_MAX_DIM),
        width=POSE_MAX_DIM * 2,
        height=POSE_MAX_DIM * 2,
        title="",
        tools="",
    )
    # Add an invisible glyph to suppress the "figure has no renderers" warning
    pose_p2.circle(0, 0, size=0, alpha=0.0)

    def background_toggle_handler(event):
        """When the image underlay is toggled on or off, prompt the slider to redraw the frame"""
        slider_callback(None, slider.value, slider.value)

    background_switch = Toggle(label="show background", active=False)
    background_switch.on_click(background_toggle_handler)

    def slider_callback(attr, old, new):
        """
        When the slider moves, draw the poses in the new frame and show the background if desired.
        Also erase the selected pose drawings and the search results (not sure this is desirable).
        """
        slider.value = new
        fr.renderers = []
        if background_switch.active:
            rgba_bg = image_from_video_frame(video_file, new - 1)
            pil_bg = Image.fromarray(rgba_bg)
            frame_img = draw_frame(
                pose_data[new - 1], video_width, video_height, pil_bg
            )
        else:
            frame_img = draw_frame(pose_data[new - 1], video_width, video_height)
        img = pil_to_bokeh_image(frame_img, video_width, video_height)
        fr.image_rgba(image=[img], x=0, y=0, dw=img.shape[1], dh=img.shape[0])
        if old != new:
            info_div.text = get_frame_info(new - 1)
            frame_line.location = pose_series["timestamp"][new - 1]
            pose_p1.title.text = ""
            pose_p1.renderers = []
            pose_p2.title.text = ""
            pose_p2.renderers = []
            pose_info_div.text = "Click two poses to compare"
            for pose_box in similar_poses:
                pose_box.renderers = []

    slider = Slider(
        start=1, end=len(pose_data), value=1, step=1, title="Selected frame"
    )
    slider.on_change("value_throttled", slider_callback)

    def get_pose_extent_maps(frameno):
        """
        Determine the regions around each pose drawn on the frame that can be clicked
        to select them.
        """
        pose_extent_maps = []
        for i, pose_prediction in enumerate(pose_data[frameno]["predictions"]):

            if "bbox" in pose_prediction:
                bbox = pose_prediction["bbox"]
            else:
                extent = get_pose_extent(pose_prediction)
                bbox = [
                    extent[0],
                    extent[1],
                    extent[2] - extent[0],
                    extent[3] - extent[1],
                ]

            extent_map = {
                "poseno": i,
                "min_x": bbox[0],
                "min_y": video_height - bbox[3] - bbox[1],
                "max_x": bbox[0] + bbox[2],
                "max_y": video_height - bbox[1],
            }

            pose_extent_maps.append(extent_map)

        return pose_extent_maps

    def match_pose_pixel_maps(x, y, pose_extent_maps):
        """
        When an x,y coordinate on the frame is clicked, check the regions calculated in
        get_pose_extent_maps() to see if the user wants to select one (or more) of the
        poses in the frame.
        """
        matched_poses = []
        for extent_map in pose_extent_maps:
            if (
                x >= extent_map["min_x"]
                and x <= extent_map["max_x"]
                and y >= extent_map["min_y"]
                and y <= extent_map["max_y"]
            ):
                matched_poses.append(extent_map["poseno"])
        return matched_poses

    def fr_tap(event):
        """
        When the frame is clicked, determine if one of the poses in the frame has
        been selected, then draw it in one of the two boxes below (if available)
        and, if there are now two poses drawn in the boxes below, calculate and
        display their cosine similarity score.
        """
        pixel_key = f"{int(event.x)}, {int(event.y)}"
        pose_extent_maps = get_pose_extent_maps(slider.value - 1)
        clicked_poses = match_pose_pixel_maps(event.x, event.y, pose_extent_maps)
        if len(clicked_poses):
            pose_img = normalize_and_draw_pose(
                pose_data[slider.value]["predictions"][clicked_poses[0]],
                video_file
            )
            pose_img = pil_to_bokeh_image(pose_img, POSE_MAX_DIM, POSE_MAX_DIM)

            if pose_p1.title.text == "":
                pose_p1.image_rgba(
                    image=[pose_img],
                    x=0,
                    y=0,
                    dw=pose_img.shape[1],
                    dh=pose_img.shape[0],
                )
                pose_p1.title = f"{clicked_poses[0]+1}"
                pose_info_div.text = "Please click another pose for comparison"
            elif pose_p1.title.text != "" and pose_p2.title.text == "":
                pose_p2.image_rgba(
                    image=[pose_img],
                    x=0,
                    y=0,
                    dw=pose_img.shape[1],
                    dh=pose_img.shape[0],
                )
                pose_p2.title = f"{clicked_poses[0]+1}"

                normalized_p1 = shift_normalize_rescale_pose_coords(
                    pose_data[slider.value - 1]["predictions"][
                        int(pose_p1.title.text) - 1
                    ]
                )
                normalized_p2 = shift_normalize_rescale_pose_coords(
                    pose_data[slider.value - 1]["predictions"][
                        int(pose_p2.title.text) - 1
                    ]
                )

                cosine_similarity = compare_poses_cosine(
                    normalized_p1,
                    normalized_p2,
                )
                p1_angles = compute_joint_angles(normalized_p1)
                p2_angles = compute_joint_angles(normalized_p2)
                pose_p2.title = f"{clicked_poses[0]+1}"
                angle_similarity = compare_poses_angles(p1_angles, p2_angles)
                pose_info_div.text = f"Cosine similarity between pose keypoints: {(cosine_similarity*100):3.3f}% | Similarity between pose joint angles: {(angle_similarity*100):3.3f}%"

    fr_tap_tool = TapTool()

    fr.add_tools(fr_tap_tool)
    fr.on_event("tap", fr_tap)

    # Buttons to advance or back up the frame selector slider by one frame
    def prev_handler(event):
        slider_callback(None, slider.value, max(1, slider.value - 1))

    def next_handler(event):
        slider_callback(None, slider.value, min(slider.value + 1, len(pose_data)))

    prev_button = Button(label="prev")
    prev_button.on_click(prev_handler)
    next_button = Button(label="next")
    next_button.on_click(next_handler)

    search_info_div = Div(text="L2 (Euclidean distance) similar pose search")

    SIMILAR_POSES_TO_SHOW = 4
    SIMILAR_MATCHES_TO_FIND = 1000
    POSE_SIMILARITY_THRESHOLD = 0.8
    similar_poses = []

    for s in range(SIMILAR_POSES_TO_SHOW):
        similar_poses.append(
            figure(
                x_range=(0, POSE_MAX_DIM),
                y_range=(0, POSE_MAX_DIM),
                width=POSE_MAX_DIM * 2,
                height=POSE_MAX_DIM * 2,
                title="",
                tools="",
            )
        )
    for pose_box in similar_poses:
        pose_box.circle(0, 0, size=0, alpha=0.0)

    # Need to kep track of match data for paging through all of the results
    data["match_indices"] = None
    data["valid_search_results"] = 0
    data["search_results_index"] = 0
    similar_frame_scores = [0] * len(pose_series["frame"])
    match_cosine_similarities = {}
    target_frameno = None
    target_poseno = None

    def draw_similar_poses(start_rank):
        """
        Draw up to SIMILAR_POSES_TO_SHOW in boxes below the search/query info div and
        search results paging back/forward buttons.
        """
        # Clear any previously drawn poses
        for pose_box in similar_poses:
            pose_box.renderers = []

        matches_to_show = 0
        data["search_results_index"] = start_rank

        match_framenos = []
        match_scores = []
        match_timecodes = []
        matches_advanced = 0

        while matches_to_show < SIMILAR_POSES_TO_SHOW:

            current_match_rank = matches_advanced + start_rank
            matches_advanced += 1

            match_index = data["match_indices"][current_match_rank]
            match_frameno = normalized_pose_metadata[match_index]["frameno"]
            match_poseno = normalized_pose_metadata[match_index]["poseno"]

            # Skip the query pose if it's returned as a (100%) match. This is *usually*
            # the highest-ranked match, but may not always be so, depending on the indexing
            # process.
            if target_frameno == match_frameno and target_poseno == match_poseno:
                continue

            matches_to_show += 1

            match_framenos.append(
                str(match_frameno + 1)
            )  # All framenos in the UI are 1-indexed
            match_scores.append(f"{match_cosine_similarities[match_index]*100:3.3f}%")
            match_timecodes.append(
                pose_series["timestamp"][match_frameno].strftime("%H:%M:%S.%f")[:-4]
            )

            if background_switch.active:
                match_img = normalize_and_draw_pose(
                    pose_data[match_frameno]["predictions"][match_poseno],
                    video_file,
                    match_frameno
                )
            else:
                match_img = normalize_and_draw_pose(
                    pose_data[match_frameno]["predictions"][match_poseno],
                    video_file
                )

            match_img = pil_to_bokeh_image(match_img, POSE_MAX_DIM, POSE_MAX_DIM)

            similar_poses[matches_to_show - 1].image_rgba(
                image=[match_img],
                x=0,
                y=0,
                dw=match_img.shape[1],
                dh=match_img.shape[0],
            )

        search_info_div.text = f"matches in frames {', '.join(match_framenos)} | {', '.join(match_timecodes)} | scores {', '.join(match_scores)}"

    def find_similar_poses():
        """
        Run the query against the FAISS pose vector index to find the SIMILAR_MATCHES_TO_FIND
        most similar poses to the query pose (the first pose in the comparison boxes) whose
        calculated cosine similarities are above the POSE_SIMILARITY_THRESHOLD. Note that the
        cosine similarity is (for now) calculated on the fly and used for the thresholding,
        rather than the similarity scores returned by the FAISS index (these determine the
        order in which the search results are ranked, but are relatively more difficult to
        interpret and to present to the user in an intuitive manner).
        """
        if pose_p1.title.text == "":
            return

        for pose_box in similar_poses:
            pose_box.renderers = []

        target_frameno = slider.value - 1
        target_poseno = int(pose_p1.title.text) - 1

        target_pose_w_confs = shift_normalize_rescale_pose_coords(
            pose_data[target_frameno]["predictions"][target_poseno]
        )
        target_pose = extract_trustworthy_coords(target_pose_w_confs)

        target_pose_query = np.array([np.nan_to_num(target_pose, nan=-1)]).astype(
            "float32"
        )

        D, I = faiss_L2_index.search(target_pose_query, SIMILAR_MATCHES_TO_FIND)

        data["match_indices"] = I[0]
        data["valid_search_results"] = 0
        data["search_results_index"] = 0
        similar_frame_scores = [0] * len(pose_series["frame"])

        for m in range(SIMILAR_MATCHES_TO_FIND):
            match_index = data["match_indices"][m]
            if match_index != -1:
                match_frameno = normalized_pose_metadata[match_index]["frameno"]
                match_poseno = normalized_pose_metadata[match_index]["poseno"]

                cosine_similarity = compare_poses_cosine(
                    target_pose_w_confs,
                    shift_normalize_rescale_pose_coords(
                        pose_data[match_frameno]["predictions"][match_poseno]
                    ),
                )

                match_cosine_similarities[match_index] = cosine_similarity

                if cosine_similarity >= POSE_SIMILARITY_THRESHOLD:
                    if similar_frame_scores[match_frameno] > 0:
                        similar_frame_scores[match_frameno] = max(
                            similar_frame_scores[match_frameno], cosine_similarity
                        )
                    else:
                        similar_frame_scores[match_frameno] = cosine_similarity

                    data["valid_search_results"] += 1

        similar_poses_renderers = tl.select(name="similar_poses")
        if len(similar_poses_renderers) > 0:
            for sim_pose_renderer in similar_poses_renderers:
                try:
                    tl.renderers.remove(sim_pose_renderer)
                except Exception as e:
                    pass

        # Mark the frames on the video posedata timeline that have high match scores
        tl.line(
            pose_series["timestamp"],
            similar_frame_scores,
            y_range_name="avg_score",
            legend_label="Similar poses",
            color="orange",
            alpha=0.8,
            line_width=2,
            name="similar_poses",
        )

    def find_and_draw_similar_poses():
        find_similar_poses()
        draw_similar_poses(0)

    def get_similar_poses_handler(event):
        # Need to add a tick callback to display the "please wait" message
        search_info_div.text = (
            "<strong>Searching for similar poses, please wait...</strong>"
        )
        doc.add_next_tick_callback(find_and_draw_similar_poses)

    def reset_subposes_handler(event):
        """This clears both the two similarity/query pose boxes as well as the match boxes."""
        pose_p1.title.text = ""
        pose_p1.renderers = []
        pose_p2.title.text = ""
        pose_p2.renderers = []
        pose_info_div.text = ""
        for pose_box in similar_poses:
            pose_box.renderers = []
        similar_poses_renderers = tl.select(name="similar_poses")
        if len(similar_poses_renderers) > 0:
            for sim_pose_renderer in similar_poses_renderers:
                try:
                    tl.renderers.remove(sim_pose_renderer)
                except Exception as e:
                    pass
        for li in tl.legend.items:
            if li.label["value"] == "Similar poses":
                li.visible = False
        search_info_div.text = "L2 (Euclidean distance) similar pose search"

    reset_subposes_button = Button(label="clear")
    reset_subposes_button.on_click(reset_subposes_handler)

    get_similar_poses_button = Button(label="look up 1st pose")
    get_similar_poses_button.on_click(get_similar_poses_handler)

    frame_control_row = row(children=[prev_button, next_button, background_switch])

    pose_buttons_column = column(reset_subposes_button, get_similar_poses_button)

    subposes_row = row(children=[pose_p1, pose_p2, pose_buttons_column])

    # Buttons to page through the search matches in groups of SIMILAR_POSES_TO_SHOW
    def prev_similar_poses_handler(event):
        draw_similar_poses(max(0, data["search_results_index"] - SIMILAR_POSES_TO_SHOW))

    def next_similar_poses_handler(event):
        draw_similar_poses(
            min(
                data["search_results_index"] + SIMILAR_POSES_TO_SHOW,
                data["valid_search_results"] - SIMILAR_POSES_TO_SHOW,
            )
        )

    prev_similar_button = Button(label="previous group of poses")
    prev_similar_button.on_click(prev_similar_poses_handler)

    next_similar_button = Button(label="next group of poses")
    next_similar_button.on_click(next_similar_poses_handler)

    similar_poses_controls = row(children=[prev_similar_button, next_similar_button])

    similar_poses_row = row(children=similar_poses)

    layout_column = column(
        tl,
        slider,
        info_div,
        frame_control_row,
        fr,
        pose_info_div,
        subposes_row,
        search_info_div,
        similar_poses_controls,
        similar_poses_row,
    )

    doc.add_root(layout_column)


## Running the notebook in VS Code or JupyterLab Desktop

As of early 2023, if you are running this notebook in VS Code or JupyterLab Desktop instead of Jupyter or JupyterLab, the cell below will not work (BokehJS will load, but no figures will appear) without using one of these workarounds:

### VS Code

Take note of the error message that appears when you try to run the cell below, particularly the long alphanumeric string suggested as a value for `BOKEH_ALLOW_WS_ORIGIN`. Copy this string, then uncomment the lines indicated in the cell below, paste the alphanumeric string in place of the `INSERT_BOKEH_ALLOW_WS_ORIGIN_VALUE_HERE` text, then try running the cell below again to launch the explorer app.

### JupyterLab Desktop

Take note of the error message that appears when you try to run this cell, particularly the number that follows `localhost:` after each of its appearances in the message. Copy that number, replace the value following `bokeh_port =` with the number, and then uncomment the lines indicated in the cell below before running the cell again to launch the explorer app.

In [None]:
import os

# If you are following the steps above to run the explorer app in VS Code,
# uncomment the following 3 lines (delete the '# 's) before running this cell:

# os.environ[
#    "BOKEH_ALLOW_WS_ORIGIN"
# ] = "INSERT_BOKEH_ALLOW_WS_ORIGIN_VALUE_HERE"

# If you are following the steps above to run the explorer app in JupyterLab,
# Desktop, change the "bokeh_port = ..." line below to the number displayed in
# the error message, and uncomment the 3 lines below it (delete the '# 's) before
# running this cell:

bokeh_port = 8888 # <- May need to be replaced to run in JupyterLab Desktop

# os.environ[
#    "BOKEH_ALLOW_WS_ORIGIN"
# ] = f"localhost:{bokeh_port}"

output_notebook()

show(bkapp, notebook_url=f"localhost:{bokeh_port}")


### Demo of frame-by-frame pose drawing

The cell below uses a different viz library to draw the poses in each successive frame on an HTML canvas, at the same frame rate as the source video.

Note that this drawing library (`ipycanvas`) doesn't play well with the Bokeh interactive application above, which is why the somewhat clunkier PIL ImageDraw library is used to draw the poses there instead.


In [None]:
from ipycanvas import Canvas, hold_canvas

canvas = Canvas(width=video_width, height=video_height, sync_image_data=True)

display(canvas)


def draw_frame_on_canvas(frame, canvas):

    for pose_prediction in frame["predictions"]:
        pose_coords = np.array_split(
            pose_prediction["keypoints"], len(pose_prediction["keypoints"]) / 3
        )

        for i, seg in enumerate(OPP_COCO_SKELETON):

            if pose_coords[seg[0] - 1][2] == 0 or pose_coords[seg[1] - 1][2] == 0:
                continue

            canvas.stroke_style = OPP_COCO_COLORS[i]
            canvas.line_width = 2

            canvas.stroke_line(
                pose_coords[seg[0] - 1][0],
                pose_coords[seg[0] - 1][1],
                pose_coords[seg[1] - 1][0],
                pose_coords[seg[1] - 1][1],
            )


# This will "animate" all of the detected poses starting from the beginning of the video
for frame in pose_data:

    with hold_canvas():

        canvas.clear()

        draw_frame_on_canvas(frame, canvas)

        sleep(1 / video_fps)
