# Introduction to SportsLabKit

This quick tutorial introduces the key concepts and basic features of `SportsLabKit` to help you get started with your projects.

## Multiple Object Tracking in `SportsLabKit`

In a broad definition, Multiple Object Tracking (MOT) is the problem of automatically identifying multiple objects in a video and representing them as a set of trajectories. 

The typical approach to MOT algorithms follows the tracking-by-detection paradigm, which attempts to solve the problem by first detecting objects in each frame and then associating them with the objects in the previous frame.

One large challenge in the tracking-by-detection paradigm is scalability. The detection model is typically a deep learning model, which is computationally expensive. The association model is also computationally expensive, as it requires reID features to be extracted for each bounding box. Recent approaches, such as [TrackFormer](https://arxiv.org/abs/2101.02702)/[TransTrack](https://arxiv.org/abs/2012.15460), have attempted to address this challenge by using a single deep learning model to perform both detection and association. However, there is no clear consensus on the best approach to MOT as tracking-by-detection models are still competitive ([ByteTrack](https://arxiv.org/abs/2110.06864)/[BoT-SORT](https://arxiv.org/abs/2206.14651)/[Strong-SORT](https://arxiv.org/abs/2202.13514)).


> IMHO, approaches that adhere to “The Bitter Lesson” are the most promising. [Unicorn: Towards Grand Unification of Object Tracking](https://arxiv.org/abs/2207.07078) demonstrates that a single network can solve four tracking problems (SOT, MOT, VOS, MOTS) simultaneously. I think this is a direction many will follow.

SportsLabKit implements the tracking-by-detection paradigm. 

In brief, the algorithm works as follows:

1. The detection model (YOLOX, DETR, RCNN etc.) detects items of interest via bounding boxes in each frame, then 
2. Several feature extractors are used to obtain descripters of each detection (e.g. ReID features, optical flow features, etc.), then
3. The association model (Minimum cost bipartite matching) associates detections in the current frame with detections in the previous frame / existing tracklets.

For a more detailed explanation of the tracking-by-detection paradigm, please refer to the original [DeepSORT paper](https://arxiv.org/abs/1703.07402), [this blog](https://medium.com/augmented-startups/deepsort-deep-learning-applied-to-object-tracking-924f59f99104) explaining DeepSORT or our [SoccerTrack paper](https://openaccess.thecvf.com/content/CVPR2022W/CVSports/papers/Scott_SoccerTrack_A_Dataset_and_Tracking_Algorithm_for_Soccer_With_Fish-Eye_CVPRW_2022_paper.pdf).

![](assets/tracking-by-detection.png)

We chose to start with this approach for two reasons: 1) it is a simple and modular approach and 2) we can explicitly control the use of appearance and motion features. 

## DataFrames in `SportsLabKit`

`SportsLabKit` extends the popular data science library [pandas](https://pandas.pydata.org/) by adding an interface to handle tracking data. If you are not familiar with [pandas](https://pandas.pydata.org/), we recommend taking a quick look at its [Getting started](https://pandas.pydata.org/docs/getting_started/index.html#getting-started) documentation before proceeding.

There are two core data structures in `SportsLabKit`, the BoundingBoxDataFrame and the CoordinateDataFrame. Both are,

   1. Subclasses of pandas.DataFrame and inherit all of its functionality
   2. Inherited from the `SLKMixin` and are designed to work with the `SportsLabKit` API

![](./assets/dataframe_inheritance.png)

The main difference is that the CoordinateDataFrame has comes built-in with functionality that handles coordinates, while the BoundingBoxDataFrame is made to be compatibile with bounding box data. 

In a nutshell, both data structures are a multi-indexed `pandas.DataFrame` with a few extra methods and attributes/metadata. This means that we can use various dataframe method directly:

* `df.head()` returns the first 5 rows of the dataframe
* `df.columns` returns the column names
* `df.iloc[0]` returns the first row of the dataframe

While also extending functionality to additional convenience functions like:

* `df.iter_frames()` iterates over the data frame by frame
* `df.iter_teams()` iterates over the data frame by team
* `df.iter_players()` iterates over the data frame by player
* `df.iter_attributes()` iterates over the data frame by attribute

See more about how to use dataframes in the [DataFrame Manipulation](../02_user_guide/dataframe_manipulation.ipynb) tutorial.

## Reading and writing files

First, we need to read some data.

### Reading files

Assuming you have a file containing either bounding box or coordinates data, you can read it using `slk.read_data()`, which automatically detects the filetype and creates a `BBoxDataFrame` or a `CoordinateDataFrame`. This tutorial uses a sample from the SoccerTrack dataset, which is part of the SoccerTrack installation. Therefore, we use `slk.datasets.get_path()` to retrieve the path to the dataset.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sportslabkit as slk
from sportslabkit.logger import show_df # This just makes the df viewable in the notebook.


dataset_path = slk.datasets.get_path("wide_view")
path_to_csv = sorted(dataset_path.glob("annotations/*.csv"))[0]

bbdf = slk.load_df(path_to_csv)

bbdf.head()

TeamID,0,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,3,3,3,3,3
PlayerID,1,1,1,1,1,10,10,10,10,10,...,9,9,9,9,9,0,0,0,0,0
Attributes,bb_left,bb_top,bb_width,bb_height,conf,bb_left,bb_top,bb_width,bb_height,conf,...,bb_left,bb_top,bb_width,bb_height,conf,bb_left,bb_top,bb_width,bb_height,conf
frame,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
0,3543.0,607.0,30.0,52.5,1.0,3536.42,555.93,13.57,42.39,1.0,...,2919.31,538.44,23.59,47.18,1.0,3542.77,549.47,6.4,7.0,1.0
1,3542.0,609.0,32.0,51.0,1.0,3536.13,555.96,13.66,42.27,1.0,...,2919.44,538.55,23.59,47.18,1.0,3548.55,549.43,6.4,7.0,1.0
2,3542.0,611.0,32.0,50.0,1.0,3535.85,555.99,13.73,42.16,1.0,...,2919.57,538.66,23.59,47.18,1.0,3554.32,549.4,6.4,7.0,1.0
3,3542.0,613.0,32.0,49.0,1.0,3535.57,556.02,13.8,42.04,1.0,...,2919.7,538.77,23.59,47.18,1.0,3560.1,549.36,6.4,7.0,1.0
4,3539.0,615.0,36.0,46.0,1.0,3535.28,556.04,13.88,41.94,1.0,...,2919.84,538.88,23.59,47.18,1.0,3565.87,549.33,6.4,7.0,1.0


To use the full soccertrack dataset, see ["Dataset Preparation"](../02_user_guide/dataset_preparation.ipynb).

### Writing files

To write back to file use `BBoxDataFrame.to_csv()`. 

In [3]:
bbdf.to_csv("assets/soccertrack_sample.csv") 

## Visualization

Now that we have a bounding box dataframe, we can visualize the results.

In [4]:
path_to_mp4 = sorted(dataset_path.glob("videos/*.mp4"))[0]

It is also possible to download full soccertrack dataset using `soccertrack.datasets.Downloader`. See ["Dataset Preparation"](../02_user_guide/dataset_preparation.ipynb). for more details.

The `BBoxDataFrame` has a built-in `visualize_frame()` method that can be used to visualize the bounding boxes in a single frame.

In [5]:
from sportslabkit.utils import cv2pil

frame_idx = 50
cam = slk.Camera(path_to_mp4)
frame = cam.get_frame(frame_idx)
resized_frame = cv2pil(bbdf.visualize_frame(frame_idx=frame_idx, frame=frame), False).resize((frame.shape[1]//8, frame.shape[0]//8))

The `BBoxDataFrame` also has a `visualize_bbox()` method built in. It returns  generators containing a sequence of drawn bounding boxes, which can be passed to the `make_video` method of `soccertrack.utils` to output a video. 

The following notebook will output `bbox_FISH.mp4` in the current directory.

In [6]:
save_path = 'assets/visualize_frames.mp4'
bbdf_short = bbdf.iloc[:100].copy()
bbdf_short.visualize_frames(path_to_mp4, save_path)

Writing video: 100it [00:08, 11.88it/s]


## Tracking

Below we provide a snippiet of code that detects and tracks the players and ball in the downloaded video. First we define the components of our tracking pipeline.

- `Camera` - A class that handles camera calibration and coordinate transformations
- `detection_model`: A detection model, such as YOLOX, DETR, etc. 
- `motion_model`:　A motion model that predicts the next position of the players and ball, such as Kalman Filter, Constant Velocity, etc.
- `SORTTracker`: A tracker based on the [SORT](https://arxiv.org/abs/1602.00763) algorithm.


> Deciding how to architect the structure of the tracking part was very tricky and probably suboptimal. We are still working on improving this part of the library so there may be some breaking changes in the future.

In [7]:
from sportslabkit.utils import get_git_root
from sportslabkit.mot import SORTTracker

root = get_git_root()
cam = slk.Camera(path_to_mp4)

det_model = slk.detection_model.load('YOLOv8x', imgsz=640)
motion_model = slk.motion_model.load('KalmanFilter', dt=1/30, process_noise=10000, measurement_noise=10)

tracker = SORTTracker(detection_model=det_model, motion_model=motion_model)
tracker.track(cam[:100])
res = tracker.to_bbdf()

Tracking Progress:   0%|          | 0/100 [00:00<?, ?it/s][W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware.
Tracking Progress: 100%|██████████| 100/100 [03:47<00:00,  2.27s/it, Active: 1, Dead: 2]


In [8]:
save_path = "assets/tracking_results.mp4"
res.visualize_frames(cam.video_path, save_path)

Writing video: 97it [00:21,  4.47it/s]


## What next?

> This tutorial is a work in progress. If you have any questions or suggestions, please feel free to [reach out](https://twitter.com/AtomJamesScott).