# Helmet Keypoint annotation tool

Simple and lightweight tool to extract a frame from a video file and annotate the helmet center coordinates.
This notebook is part of a more comprehensive processing pipeline that is still being developed. I will be updating this notebook quite often.

If you find this helpful, consider upvoting it! Thanks =)

## Importing the dependencies

In [None]:
import pandas as pd
from PIL import Image, ImageDraw
from pathlib import Path
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

## Extracting a frame from a video

This function uses ffmeg to exctract a single frame from a video. It saves on the working dir, loads it as a PIL image object and then deletes the image from disk

In [None]:
def get_frame_from_video(frame, video):
    frame = frame - 1
    !ffmpeg \
        -hide_banner \
        -loglevel fatal \
        -nostats \
        -i $video -vf "select=eq(n\,$frame)" -vframes 1 frame.png
    img = Image.open('frame.png')
    os.remove('frame.png')
    return img

Here, an example

In [None]:
video = '../input/nfl-health-and-safety-helmet-assignment/train/57583_000082_Endzone.mp4'
frame = 1
img = get_frame_from_video(frame, video)
img

## Keypoint overlaying

To annotate the image with the key-point information, we first need to calculate the x-y coordinate of the helmet's center.
For now, I will use the baseline bounding boxes.

In [None]:
bboxes_df = pd.read_csv('../input/nfl-health-and-safety-helmet-assignment/train_baseline_helmets.csv')
video_frame = Path(video).stem + '_' + str(frame)
df = bboxes_df[bboxes_df['video_frame'] == video_frame].copy()
xc = (df['left'] + df['width']/2).astype(int).values
yc = (df['top'] + df['height']/2).astype(int).values
xc, yc

Now, we can annotate the image passing the helmet centers and the radius of the circle. Originaly I only used a point, but the size of the pixel is too small to see.

In [None]:
def annotate_frame(img, xc, yc, r, col = (57, 255, 20)):
    draw = ImageDraw.Draw(img)
    for x, y in zip(xc, yc):
#         draw.point((x, y), fill=col)
        draw.ellipse((x-r, y-r, x+r, y+r), fill=col, outline = 'black')
    return img

In [None]:
annotate_frame(img, xc, yc, 5)

In [None]:
# code from: https://www.kaggle.com/robikscube/nfl-helmet-assignment-getting-started-guide
def add_track_features(tracks, fps=59.94, snap_frame=10):
    """
    Add column features helpful for syncing with video data.
    """
    tracks = tracks.copy()
    tracks["game_play"] = (
        tracks["gameKey"].astype("str")
        + "_"
        + tracks["playID"].astype("str").str.zfill(6)
    )
    tracks["time"] = pd.to_datetime(tracks["time"])
    snap_dict = (
        tracks.query('event == "ball_snap"')
        .groupby("game_play")["time"]
        .first()
        .to_dict()
    )
    tracks["snap"] = tracks["game_play"].map(snap_dict)
    tracks["isSnap"] = tracks["snap"] == tracks["time"]
    tracks["team"] = tracks["player"].str[0].replace("H", "Home").replace("V", "Away")
    tracks["snap_offset"] = (tracks["time"] - tracks["snap"]).astype(
        "timedelta64[ms]"
    ) / 1_000
    # Estimated video frame
    tracks["est_frame"] = (
        ((tracks["snap_offset"] * fps) + snap_frame).round().astype("int")
    )
    return tracks

def add_video_features(videos):
    videos['game_play'] = videos['video_frame'].apply(lambda x: '_'.join(x.split('_')[:2]))
    videos['camera'] = videos['video_frame'].apply(lambda x: x.split('_')[2])
    videos['frame'] = videos['video_frame'].apply(lambda x: x.split('_')[-1])
    videos['xc'] = (videos['left'] + videos['width']/2).astype(int).values
    videos['yc'] = (videos['top'] + videos['height']/2).astype(int).values
    return videos

# Visualizing the tracking data

It would be helpful to have visualize the tracking data as well, so here a code that can plot dots on a nice 2d drawing of the football field

In [None]:
def annotate_field(xc, yc, player, r = 10, width = 3, col = [(27, 3, 163), (255, 7, 58)], crop = None, box = True):
    field = Image.open('../input/nflhelmet-helper-dataset/field.png')
    w, h = field.size
    zero = (68,68)
    fs = (2424,1100)
    draw = ImageDraw.Draw(field)
    xc, yc = xc*fs[0]/120 + zero[0], (1 - yc/53.3)*fs[1] + zero[1]
    for x, y, p in zip(xc, yc, player):
        c = col[0] if p[0] == 'H' else col[1]
        draw.ellipse((x-r, y-r, x+r, y+r), fill=c, width=width, outline = 'black')
    if isinstance(crop, float):
        if box:
            cp = [xc.min() - crop*w, yc.min() - crop*h, xc.max() + crop*w, yc.max() + crop*h]
        else:
            cp = [xc.min() - crop*w, 0, xc.max() + crop*2*w, h]
        field = field.crop(cp)
        
    return field

The function inputs `x`, `y` coordinates as well as the tag of each player (to extract H or V for coloring). Here is an example:

In [None]:
tracking_df = pd.read_csv('../input/nfl-health-and-safety-helmet-assignment/train_player_tracking.csv')
tracking_df = add_track_features(tracking_df)
x, y, player = tracking_df.query(f"game_play == '57583_000082' and est_frame == 10")[['x', 'y', 'player']].values.transpose()
annotate_field(x, y, player, r = 20)

# Combining all together (tracking + camera)

For this, I decided to use a class that initializes by creating the expanded dataframes.

In [None]:
class show_play_with_tracking():
    
    def __init__(self, video_df = None, track_df = None):
        if video_df is None:
            video_df = pd.read_csv('../input/nfl-health-and-safety-helmet-assignment/train_baseline_helmets.csv')
            self.video_df = add_video_features(video_df)
        if track_df is None:
            tracking_df = pd.read_csv('../input/nfl-health-and-safety-helmet-assignment/train_player_tracking.csv')
            tracking_df = add_track_features(tracking_df)
            self.tracking_df = tracking_df.query("est_frame > 0")
       
    def __call__(self, game_play, frame, img_size = 800, video_folder = '../input/nfl-health-and-safety-helmet-assignment/train/'):
        
        camera = 'Sideline'
        frame_side = get_frame_from_video(frame, video_folder + game_play + '_' + camera + '.mp4')
        df = self.video_df.query(f"game_play == '{game_play}' and frame == '{frame}' and camera == '{camera}'")
        frame_side = annotate_frame(frame_side, df.xc, df.yc, 10)

        camera = 'Endzone'
        frame_end = get_frame_from_video(frame, video_folder + game_play + '_' + camera + '.mp4')
        df = self.video_df.query(f"game_play == '{game_play}' and frame == '{frame}' and camera == '{camera}'")
        frame_end = annotate_frame(frame_end, df.xc, df.yc, 10)

        frames = self.tracking_df['est_frame'].values
        if frame not in frames:
            index = np.absolute(frames-frame).argmin()
            frame = frames[index]
        df = self.tracking_df.query(f"game_play == '{game_play}' and est_frame == {frame}")
        field = annotate_field(df.x, df.y, df.player, 10, crop = 0.01)

        wf, hf = field.size
        wc, hc = frame_side.size
        field = field.resize((int(wf*2*hc/hf), 2*hc))
        wf, hf = field.size

        img = Image.new('RGB', (wf+wc+20, 2*hc+20))
        img.paste(im=field, box=(5, 10))
        img.paste(im=frame_side, box=(wf+15, 5))
        img.paste(im=frame_end, box=(wf+15, hc+15))
        img.thumbnail((img_size,img_size))
        return img

Initializing the class with the default dataframes (`train_baseline_helmets.csv` and `train_player_tracking.csv`)

In [None]:
spwt = show_play_with_tracking()

And here is an example for the gameplay `57682_002630` at frame 1:

In [None]:
spwt('57682_002630', 1)

With this function we could easily do that for all 60 videos!

In [None]:
all_plays = pd.read_csv('../input/nfl-health-and-safety-helmet-assignment/train_baseline_helmets.csv')['video_frame'].\
                apply(lambda x: '_'.join(x.split('_')[:2])).unique()
len(all_plays)

Grab a coffee coz this will take a while ☕

In [None]:
imgs = []
for play in tqdm(all_plays):
    imgs.append(spwt(play, 1, 400)) # gameplay, frame, img_size
imgs = [imgs[0:20], imgs[20:40], imgs[40:60]]

In [None]:
W, H = 400, 250
img = Image.new('RGB', (W*3, H*20), (255, 255, 255))
for x in range(3):
    for y in range(20):
        img.paste(im=imgs[x][y], box=(W*x, H*y))
img

## A few learned lessons

### Lesson 1 

If you are planing to use tracking data mapping. You can't have perfect score just by predicting each frame independently. On this example, we can't see the red guy in the endzone on either cameras.

In [None]:
spwt('57682_002630', 300)

The only way you could predict the red dot in the Endzone is by having temporal coherence. If you rewind this 100 frames you can see him.

In [None]:
spwt('57682_002630', 200)

### Lesson 2

Camera placemente is not consistent. Some gameplays, have the camera on the home endzone some on the visitor endzone

In [None]:
# Camera is on the TOP and LEFT in relation to the tracking data
spwt(all_plays[30], 1)

In [None]:
# Camera is on the BOTTOM and LEFT in relation to the tracking data
spwt(all_plays[1], 1)

In [None]:
# Camera is on the BOTTOM and RIGHT in relation to the tracking data
spwt(all_plays[2], 1)

# That is it for now! more coming soon! =)