Real-time video analytics

The goal of this repo is to build a real-time video analytics system that is capable of solving multiple various tasks.

Features:

Detect and count all people within a frame
Count unique people using tracking+reid
Identify number of unique customers in the shop
Track each customer
Identify how many people entered and exited shop during the video
Identify gender (male/female) and age (child/adult as classification) for each customer
Identify number of unique customers that made a purchase
Identify if cashier is right or left-handed

Usage

Installation

Install poetry and use system env (in case using conda or other):

poetry env use system

Install torch and torchvision with conda (or other):

conda install pytorch torchvision -c pytorch

Install dependencies (poetry is recommended but requirements.txt is also available):

poetry install

Install ffmpeg as per system.

Tools and packages

Fiftyone is used as the main tool for managing data and labels.
Fire is the main cli entrypoint to run all scripts.
User inputs are validated with pydantic.

The FiftyoneDataset class is the main entrypoint to run all commands from cli. Check out its usage below:

NAME
    main.py FiftyoneDataset - Usage docs: https://docs.pydantic.dev/2.7/concepts/models/

SYNOPSIS
    main.py FiftyoneDataset GROUP | COMMAND | <flags>

FLAGS
    -n, --name=NAME (required)
        Type: str
    -o, --overwrite=OVERWRITE
        Type: bool
        Default: False

Run the commands

Create a dataset and track objects

Create a fiftyone dataset to visualize data and labels:

python main.py FiftyoneDataset --name <name> create <path/to/video/folder>

# keep fiftyone running to observe changes
fiftyone app launch <name>

Run detection and tracking:

python main.py FiftyoneDataset --name <name> track --model yolov8s --label-field yolov8s

This command will download a respective yolov8 model to the current directory and create a label field "yolov8s" that will contain detections with COCO classes and track ids.

Ultralytics yolov8 family of models is used for object detection.
BoxSort is used to connect frame-level detections into tracklets.

Count unique people with reid

In order to tracking algorithms with a reid model, install boxmot and run it on top of existing detections.

pip install boxmot

python main.py FiftyoneDataset --name <name>
    track_reid
    --label-field yolob8s
    --tracking_method deepocsort
    --reid-model osnet_x0_25_market1501.pt
    --new_field deepocsort_market

A new label field will be created, it will contain an index parameter and will be available in the fiftyone app for inspection.

Other reid models and tracking methods are also available: check (the source repo)[https://github.com/mikel-brostrom/boxmot?tab=readme-ov-file\] for more details.

Count customers

To be able to identify a customer we need to annotate a cashier's zone in a yaml config file. Once a fiftyone dataset is created, find out video sample id from web UI and add an entry to the config file following the existing example.

Add these annotations to fiftyone:

python main.py FiftyoneDataset --name <name> annotate_zones configs/annotations.yaml --label_field zone

Verify the correctness of zone annotations in the fiftyone app.

After adding zones, run the following command to label each person as a cashier or a customer:

❯ python main.py FiftyoneDataset --name <name> identify_customer --tracking_field yolov8s --zone_field zone --iou-threshold 0.5

Output:
6665610ec7102da65e0f95ef cashier count: 2
6665610ec7102da65e0f95ef customer count: 41

A new label field is created that maps a "person" class to either "cashier" or "customer" based on the intersection with the "cashier" zone.

Number of customers exiting the shop

This step assumes that the exit zone has been annotated at the previous step, as well as customer identification.

❯ python main.py FiftyoneDataset --name <name> identify_exit --tracking_field visitor_type --zone_field zone

Output:
6665610ec7102da65e0f95ef: Number of customers exiting is 4

An "exit" is defined as a last frame for each tracklet to intersect with the annotated "exit" zone.

Age and Gender prediction

Age and gender classification is handled by the clip model with 4 classes:

adult woman
adult man
girl
boy

The script below will create a new dataset with customer patches, download the clip model using fiftyone zoo and classify each patch into the defined categories. Then, the predictions will be grouped for each customer and simple voting will decides the final class probabilities.

❯ python main.py FiftyoneDataset --name <name> classify_customers --patch-field visitor_type --export-dir data/interim/patches

Output:
Customer 'customer-1' class probabilities: {'adult man': 0.64, 'boy': 0.18, 'adult w
oman': 0.18}
Customer 'customer-10' class probabilities: {'adult woman': 0.87, 'girl': 0.13}
Customer 'customer-11' class probabilities: {'adult man': 0.62, 'adult woman': 0.38}
Customer 'customer-14' class probabilities: {'adult man': 0.85, 'adult woman': 0.03,
 'boy': 0.13}
Customer 'customer-15' class probabilities: {'adult man': 0.33, 'adult woman': 0.51,
 'girl': 0.16}
Customer 'customer-17' class probabilities: {'adult man': 0.87, 'adult woman': 0.1,

It's also possible to review the clip predictions in the fiftyone app by opening the <name>-patches dataset. Filter customers by the customer_id field and review the label counts in the clip field.

Example age and gender prediction for a single customer:

Possible improvements:

Select top-k patches per customer to optimize speed
Run a face detection algorithm and apply a model trained on the adience dataset

Customer purchase detection (TODO)

There are few steps to tackle this problem:

Detect customers spending at least n number of seconds around a register.
Estimate customer pose, ensure hands intersect with the cashier's desk.
Add these identifications for human verification.
Train a video action recognition model to identify a target event.

Identify if cashier is right or left-handed

To identify a cashier's handedness we apply pose estimation model on a patch-level for cashiers and compute a magnitude (i.e. xy-variance) of each hand movement across frames.

❯ python main.py FiftyoneDataset --name <name> cashier_hand

Output:
Cashier 'cashier-12' is 'right'-handed: left_var=0.07, right_var=0.12
Cashier 'cashier-16' is 'left'-handed: left_var=0.03, right_var=0.03

Keypoint detection can be review in the fiftyone app as well. Filter on a specific cashier with the cashier_id field and then select wrist labels in the keypoints field. It might be useful to set a confidence threshold around 0.4-0.6.

Example hand detection for a cashier:

Possible improvements:

Select top patches for each cashier
Track hand detection across frames and compute movement magnitude.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
docs/images		docs/images
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time video analytics

Features:

Usage

Installation

Tools and packages

Run the commands

Create a dataset and track objects

Count unique people with reid

Count customers

Number of customers exiting the shop

Age and Gender prediction

Customer purchase detection (TODO)

Identify if cashier is right or left-handed

About

Releases

Packages

Languages

License

Rusteam/video-analytics

Folders and files

Latest commit

History

Repository files navigation

Real-time video analytics

Features:

Usage

Installation

Tools and packages

Run the commands

Create a dataset and track objects

Count unique people with reid

Count customers

Number of customers exiting the shop

Age and Gender prediction

Customer purchase detection (TODO)

Identify if cashier is right or left-handed

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages