Skip to content

Analyze video files with computer vision and deep learning

License

Notifications You must be signed in to change notification settings

Rusteam/video-analytics

Repository files navigation

Real-time video analytics

The goal of this repo is to build a real-time video analytics system that is capable of solving multiple various tasks.

Features:

  • Detect and count all people within a frame
  • Count unique people using tracking+reid
  • Identify number of unique customers in the shop
  • Track each customer
  • Identify how many people entered and exited shop during the video
  • Identify gender (male/female) and age (child/adult as classification) for each customer
  • Identify number of unique customers that made a purchase
  • Identify if cashier is right or left-handed

example output

Usage

Installation

  1. Install poetry and use system env (in case using conda or other):
poetry env use system
  1. Install torch and torchvision with conda (or other):
conda install pytorch torchvision -c pytorch
  1. Install dependencies (poetry is recommended but requirements.txt is also available):
poetry install
  1. Install ffmpeg as per system.

Tools and packages

  • Fiftyone is used as the main tool for managing data and labels.
  • Fire is the main cli entrypoint to run all scripts.
  • User inputs are validated with pydantic.

The FiftyoneDataset class is the main entrypoint to run all commands from cli. Check out its usage below:

NAME
    main.py FiftyoneDataset - Usage docs: https://docs.pydantic.dev/2.7/concepts/models/

SYNOPSIS
    main.py FiftyoneDataset GROUP | COMMAND | <flags>

FLAGS
    -n, --name=NAME (required)
        Type: str
    -o, --overwrite=OVERWRITE
        Type: bool
        Default: False

Run the commands

Create a dataset and track objects

  1. Create a fiftyone dataset to visualize data and labels:
python main.py FiftyoneDataset --name <name> create <path/to/video/folder>

# keep fiftyone running to observe changes
fiftyone app launch <name>
  1. Run detection and tracking:
python main.py FiftyoneDataset --name <name> track --model yolov8s --label-field yolov8s

This command will download a respective yolov8 model to the current directory and create a label field "yolov8s" that will contain detections with COCO classes and track ids.

  • Ultralytics yolov8 family of models is used for object detection.
  • BoxSort is used to connect frame-level detections into tracklets.

Count unique people with reid

In order to tracking algorithms with a reid model, install boxmot and run it on top of existing detections.

pip install boxmot

python main.py FiftyoneDataset --name <name>
    track_reid
    --label-field yolob8s
    --tracking_method deepocsort
    --reid-model osnet_x0_25_market1501.pt
    --new_field deepocsort_market

A new label field will be created, it will contain an index parameter and will be available in the fiftyone app for inspection.

Other reid models and tracking methods are also available: check (the source repo)[https://github.com/mikel-brostrom/boxmot?tab=readme-ov-file\] for more details.

Count customers

To be able to identify a customer we need to annotate a cashier's zone in a yaml config file. Once a fiftyone dataset is created, find out video sample id from web UI and add an entry to the config file following the existing example.

  1. Add these annotations to fiftyone:
python main.py FiftyoneDataset --name <name> annotate_zones configs/annotations.yaml --label_field zone

Verify the correctness of zone annotations in the fiftyone app.

  1. After adding zones, run the following command to label each person as a cashier or a customer:
❯ python main.py FiftyoneDataset --name <name> identify_customer --tracking_field yolov8s --zone_field zone --iou-threshold 0.5

Output:
6665610ec7102da65e0f95ef cashier count: 2
6665610ec7102da65e0f95ef customer count: 41

A new label field is created that maps a "person" class to either "cashier" or "customer" based on the intersection with the "cashier" zone.

Number of customers exiting the shop

This step assumes that the exit zone has been annotated at the previous step, as well as customer identification.

❯ python main.py FiftyoneDataset --name <name> identify_exit --tracking_field visitor_type --zone_field zone

Output:
6665610ec7102da65e0f95ef: Number of customers exiting is 4

An "exit" is defined as a last frame for each tracklet to intersect with the annotated "exit" zone.

Age and Gender prediction

Age and gender classification is handled by the clip model with 4 classes:

  • adult woman
  • adult man
  • girl
  • boy

The script below will create a new dataset with customer patches, download the clip model using fiftyone zoo and classify each patch into the defined categories. Then, the predictions will be grouped for each customer and simple voting will decides the final class probabilities.

❯ python main.py FiftyoneDataset --name <name> classify_customers --patch-field visitor_type --export-dir data/interim/patches

Output:
Customer 'customer-1' class probabilities: {'adult man': 0.64, 'boy': 0.18, 'adult w
oman': 0.18}
Customer 'customer-10' class probabilities: {'adult woman': 0.87, 'girl': 0.13}
Customer 'customer-11' class probabilities: {'adult man': 0.62, 'adult woman': 0.38}
Customer 'customer-14' class probabilities: {'adult man': 0.85, 'adult woman': 0.03,
 'boy': 0.13}
Customer 'customer-15' class probabilities: {'adult man': 0.33, 'adult woman': 0.51,
 'girl': 0.16}
Customer 'customer-17' class probabilities: {'adult man': 0.87, 'adult woman': 0.1,

It's also possible to review the clip predictions in the fiftyone app by opening the <name>-patches dataset. Filter customers by the customer_id field and review the label counts in the clip field.

Example age and gender prediction for a single customer: age and gender

Possible improvements:

  1. Select top-k patches per customer to optimize speed
  2. Run a face detection algorithm and apply a model trained on the adience dataset

Customer purchase detection (TODO)

There are few steps to tackle this problem:

  1. Detect customers spending at least n number of seconds around a register.
  2. Estimate customer pose, ensure hands intersect with the cashier's desk.
  3. Add these identifications for human verification.
  4. Train a video action recognition model to identify a target event.

Identify if cashier is right or left-handed

To identify a cashier's handedness we apply pose estimation model on a patch-level for cashiers and compute a magnitude (i.e. xy-variance) of each hand movement across frames.

❯ python main.py FiftyoneDataset --name <name> cashier_hand

Output:
Cashier 'cashier-12' is 'right'-handed: left_var=0.07, right_var=0.12
Cashier 'cashier-16' is 'left'-handed: left_var=0.03, right_var=0.03

Keypoint detection can be review in the fiftyone app as well. Filter on a specific cashier with the cashier_id field and then select wrist labels in the keypoints field. It might be useful to set a confidence threshold around 0.4-0.6.

Example hand detection for a cashier: keypoint detection result

Possible improvements:

  1. Select top patches for each cashier
  2. Track hand detection across frames and compute movement magnitude.

About

Analyze video files with computer vision and deep learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages