## This notebook

This notebook is based on [NFL Baseline - Simple Helmet Mapping](https://www.kaggle.com/its7171/nfl-baseline-simple-helmet-mapping) by [tito](https://www.kaggle.com/its7171).  

The major change is to select trackings according to the number of helmets.  
In tito's notebook, itertools.combinations was used to randomly select what not to use. There are some combinations that cannot be verified with this method. Another problem is that there is a trade-off between execution speed and accuracy.

In this implementation, the distance is measured between each player and the player with the close distance is selected. This probably improved the execution speed (The speed is almost 28 times faster!).

The reason why I decided on this idea is because I thought that what would fit in the picture would be a close distance between the players, and players who are far apart would not be captured in the first place. The implementation itself uses sklearn.neighbors.KDTree, but since this is the first time I have used it, there may be other better ways to handle it.

There are also some minor changes, such as using scipy.optimize.minimize_scalar to search for DIG, and the normalize_arr function is different.  

But I haven't tested which process is better, so if anyone has, I'd appreciate it if you'd share!
If you have any questions or suggestions for improvement, please comment.

## Reference

* [NFL Baseline - Simple Helmet Mapping](https://www.kaggle.com/its7171/nfl-baseline-simple-helmet-mapping) by [tito](https://www.kaggle.com/its7171).  
* [Helper Code + Helmet Mapping + Deepsort](https://www.kaggle.com/robikscube/helper-code-helmet-mapping-deepsort) by [
Rob Mulla](https://www.kaggle.com/robikscube)

## IMPORT MODULE & CONFIG

In [1]:
import os
import sys
import random
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import cv2
import itertools
from pathlib import Path
from glob import glob
from tqdm.notebook import tqdm
from multiprocessing import Pool

from scipy.optimize import minimize, minimize_scalar
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KDTree

In [3]:
from external_lib.NFLlib.score import NFLAssignmentScorer, check_submission
from external_lib.NFLlib.features import add_track_features


In [4]:
# Config
SEED = 42
CONF_THRE = 0.30
DIG_MAX = 30

## MAIN

In [6]:
def add_cols(df):
    df['game_play'] = df['video_frame'].str.split('_').str[:2].str.join('_')
    if not 'video' in df.columns:
        df['video'] = df['video_frame'].str.split('_').str[:3].str.join('_') + '.mp4'
    return df


# path 
BASE_DIR = Path('/work/data/input/nfl-health-and-safety-helmet-assignment')
TRAIN_DIR = BASE_DIR / 'train'
TEST_DIR = BASE_DIR / 'test'
IMAGES_DIR = BASE_DIR / 'images'
    
# Reading data
DEBUG = (len(glob(str(TEST_DIR / '*'))) == 6)

if DEBUG:
    tracking = pd.read_csv(BASE_DIR / 'train_player_tracking.csv')
    helmets = pd.read_csv(BASE_DIR / 'train_baseline_helmets.csv')
else:
    tracking = pd.read_csv(BASE_DIR / 'test_player_tracking.csv') 
    helmets = pd.read_csv(BASE_DIR / 'test_baseline_helmets.csv')
    
labels = pd.read_csv(BASE_DIR / 'train_labels.csv')
sub = pd.read_csv(BASE_DIR / 'sample_submission.csv')

# processing data
tracking = add_track_features(tracking)
helmets = add_cols(helmets)
labels = add_cols(labels)

# sampling Data
if DEBUG:
    sample_videos = labels['video'].drop_duplicates().sample(1, random_state=42).tolist()
    sample_gameplays = ['_'.join(x.split('_')[:2]) for x in sample_videos]
    tracking = tracking[tracking['game_play'].isin(sample_gameplays)]
    helmets = helmets[helmets['video'].isin(sample_videos)]
    labels = labels[labels['video'].isin(sample_videos)]

print(tracking.shape, helmets.shape, labels.shape)

(6424, 18) (8746, 8) (7888, 15)


In [None]:
def find_nearest(arr: np.array, value: int):
    arr, value = arr.astype(int), int(value)
    idx = np.abs(arr - value).argmin()
    return arr[idx]

def normalize_arr(arr: np.array):
    _mean = np.mean(arr, axis=0)
    out_arr = arr - _mean
    _norm = np.linalg.norm(out_arr)
    out_arr = out_arr / _norm
    return out_arr

def rotate_arr(u, t):
    t = np.deg2rad(t)
    R = np.array([
        [np.cos(t), -np.sin(t)],
        [np.sin(t),  np.cos(t)]
    ])
    return  np.dot(R, u.T).T

def mapping(h_arr: np.array, t_arr: np.array):
    out_norm = float('INF')
    out_idx = None
    out_x = None
    out_dig = None
    
    tree = KDTree(t_arr)
    for i in range(len(t_arr)):
        dist, idx = tree.query([t_arr[i]], k=len(h_arr))
        idx = np.sort(idx[0])
        norm_t_arr = normalize_arr(t_arr[idx])
                
        # minimization norm
        def opt_rot(dig):
            rot_t_arr = rotate_arr(norm_t_arr, dig)
            return np.linalg.norm(np.sort(rot_t_arr[:, 0])-h_arr[:, 0])
                
        for bounds in [(-DIG_MAX , DIG_MAX), (180-DIG_MAX, 180+DIG_MAX)]:
            result = minimize_scalar(opt_rot, bounds=bounds, method='bounded') 
            if out_norm > result.fun:
                out_norm = result.fun
                out_idx = idx
                out_x = rotate_arr(norm_t_arr, result.x)[:, 0]
                out_dig = result.x
                
    return out_idx, out_x

def main(args: pd.DataFrame):
    video_frame, subhelmets = args
    gameKey, playID, view, frame = video_frame.split('_')
    gameKey, playID, frame = int(gameKey), int(playID), int(frame)

    # get nearest-frame
    _index = (tracking['gameKey']==gameKey) & (tracking['playID']==playID)
    subtracking = tracking[_index].copy()
    est_frame = find_nearest(subtracking["est_frame"].values, frame)
    subtracking = subtracking[subtracking['est_frame']==est_frame]
    subtracking = subtracking.reset_index(drop=True)

    if view == 'Endzone':
        subtracking[['x', 'y']] = subtracking[['y', 'x']].values

    # normalizing 
    subhelmets = subhelmets[subhelmets['conf']>CONF_THRE].copy()

    if len(subhelmets) > len(subtracking):
        subhelmets = subhelmets.tail(len(subtracking))

    subhelmets['x'] = subhelmets['left'] + subhelmets['width'] // 2
    subhelmets['y'] = subhelmets['top'] + subhelmets['height'] // 2
    subhelmets[['norm_x', 'norm_y']] = normalize_arr(subhelmets[['x', 'y']].values)
    subhelmets = subhelmets.sort_values('norm_x').reset_index(drop=True)

    # mapping tracking2helmets
    h_arr = subhelmets[['norm_x', 'norm_y']].values
    t_arr = subtracking[['x', 'y']].values
    out_idx, out_x = mapping(h_arr, t_arr)

    # helmets labeling 
    players = subtracking['player'].tolist()
    players = [p for i, p in enumerate(players) if i in out_idx]
    _pred = pd.DataFrame({'label': players, 'x': out_x}).sort_values('x')['label']
    subhelmets['label'] = _pred.values
    
    return subhelmets[['video_frame', 'left', 'width', 'top', 'height', 'label']]

In [None]:
# multi processing
df_list = helmets.groupby('video_frame')
submission_df = []

p = Pool(processes=4)
with tqdm(total=len(df_list)) as pbar:
    for subdf in p.imap(main, df_list):
        submission_df.append(subdf)
        pbar.update(1)
p.close()

# submission
submission_df = pd.concat(submission_df).reset_index(drop=True)
submission_df.to_csv('submission.csv', index=False)

if DEBUG:
    scorer = NFLAssignmentScorer(labels[labels['video_frame'].isin(submission_df['video_frame'].unique())])
    score = scorer.score(submission_df)
    print(f'score: {round(score, 5)}')