This step will process JSON-MIN outputs from Label Studio Video Object Tracking Projects

The input directory expects a directory of videos and json files

Videos must be in a format compatible with OpenCV's VideoCapture (ex: mp4)

Ex:
```
label-studio-inputs
|
|-video1.mp4
|-video1.json
|-video2.mp4
|-video2.json
```

## IMPORTANT
Make sure that all source videos and Label Studio annotations are in the same frame rate.
Label Studio ran at 24 FPS for me with a 29.97 FPS video. Convert the video FPS to the same used in Label Studio.

In [41]:
# Install requirements
%pip install opencv-python~=4.11
%pip install moviepy
%pip install tqdm~=4.67.1
%pip install loguru~=0.7.3
%pip install ipywidgets


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/home/kevin/AI/FRC/Tools/.venv/bin/python -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Collecting moviepy
  Downloading moviepy-2.2.1-py3-none-any.whl.metadata (6.9 kB)
Collecting imageio<3.0,>=2.5 (from moviepy)
  Downloading imageio-2.37.0-py3-none-any.whl.metadata (5.2 kB)
Collecting imageio_ffmpeg>=0.2.0 (from moviepy)
  Downloading imageio_ffmpeg-0.6.0-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting proglog<=1.0.0 (from moviepy)
  Downloading proglog-0.1.12-py3-none-any.whl.metadata (794 bytes)
Collecting python-dotenv>=0.10 (from moviepy)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Downloading moviepy-2.2.1-py3-none-any.whl (129 kB)
Downloading imageio-2.37.0-py3-none-a

**NOTE: A Jupyter kernel restart is required for fancy progress output**

In [82]:
# Configuration

INPUTS_DIRECTORY="label-studio-inputs"
OUTPUTS_DIRECTORY="yolo-training-data"
LOG_LEVEL = "TRACE" # TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL
SAVING_FRAMES_PER_SECOND = 25


In [83]:
# Import everything needed
import json
import csv
import copy
import cv2
import os
from decimal import Decimal
from pathlib import Path
from tqdm.notebook import tqdm
from moviepy import VideoFileClip

In [84]:
# Configure logging

from loguru import logger
import sys
logger.remove()
logger.add(sys.stderr, level=LOG_LEVEL) 

2

In [91]:
# Make output directories

Path(OUTPUTS_DIRECTORY).mkdir()
[(Path(OUTPUTS_DIRECTORY) / p).mkdir(parents=True, exist_ok=True) for p in ('images/', 'labels/')]

[None, None]

In [86]:
# Util: Linear Interpolation for bounding box processing. Label Studio will not export interpolated boxes unless a keyframe is present.

def linear_interpolation(prev_seq, seq, label):
    # Define the start and end frame numbers
    a0 = prev_seq['frame']
    a1 = seq['frame']
    frames_info = dict()
    # Loop over all intermediate frames
    for frame in range(a0+1, a1):
        t = Decimal(frame-a0)/Decimal(a1-a0)
        info = [label]
        # Interpolate bounding box dimensions for the current frame
        for b0, b1 in ((prev_seq[k], seq[k]) for k in ('x', 'y', 'width', 'height')):
            info.append(str(b0 + t*(b1-b0)))
        # Add interpolated information for the current frame to 'frames_info'
        frames_info[frame] = info
    return frames_info

In [118]:
# Main processing function

import numpy as np


def process(input_directory: str, output_directory: Path) -> None:
    files = os.listdir(input_directory)
    json_files = [f for f in files if f.endswith('.json')]
    inputs = []
    for json_file in json_files:
        base_name = os.path.splitext(json_file)[0]
        videos = [f for f in files if os.path.splitext(f)[0] == base_name and not f.endswith('.json')]
        if videos:
            inputs.append({"video": videos[0], "json": json_file})
        else:
            logger.warning(f"No matching video file found for {json_file}, skipping...")
    logger.debug(f"File inputs: {inputs}")

    for work_index, inp in enumerate(inputs):
        logger.info(f"Processing video+annotations for {inp["json"]}, {work_index + 1}/{len(inputs)}")
        with open(os.path.join(input_directory, inp["json"])) as f:
            video_labels = json.load(f, parse_float=Decimal)

            labels = set()
            for subject in video_labels[0]['box']:
                labels.add(*subject['labels'])
            labels_dict = {k: i for i, k in enumerate(sorted(labels))}

            logger.debug(f"Found labels: {labels_dict}")

            files_dict = dict()
            frame_times = dict()

            logger.debug(f"Found {len(video_labels[0]['box'])} subjects in {inp['json']}")

            for subject in copy.deepcopy(video_labels[0]['box']):
                subject_labels = subject['labels']

                # Map label to int
                if len(subject_labels) == 1:
                    label = labels_dict[subject_labels[0]]
                else:
                    raise ValueError("Each subject must have exactly one label")
                
                prev_seq = None

                # Process each sequence in the subject's timeline
                for seq in subject['sequence']:
                    frame = seq['frame']

                    # Adjust the x and y coordinates to be the center of the bounding box
                    seq['x'] += seq['width'] / Decimal('2')
                    seq['y'] += seq['height'] / Decimal('2')

                    # Adjust the scale of bounding box dimensions
                    for k in ('x', 'y', 'width', 'height'):
                        seq[k] /= Decimal('100')

                    # If the current sequence is not adjacent to the previous sequence, perform linear interpolation
                    if (prev_seq is not None) and prev_seq['enabled'] and (frame - prev_seq['frame'] > 1):
                        lines = linear_interpolation(prev_seq, seq, label)
                    else:
                        lines = dict()

                    # Create the bounding box information line for the current frame
                    lines[frame] = [label] + [str(seq[k]) for k in ('x', 'y', 'width', 'height')]

                    # Add the bounding box information line to the corresponding frame in 'files_dict'
                    for frame, info in lines.items():
                        if frame in files_dict:
                            files_dict[frame].append(info)
                        else:
                            files_dict[frame] = [info]

                    # Store the timestamp for the current frame
                    frame_times.update({frame: float(seq['time'])})

                    prev_seq = seq
        
            files_dict = dict(sorted(files_dict.items()))
            frame_times = dict(sorted(frame_times.items()))
            logger.trace(f"Frame times: {frame_times.keys()}")

        # Create classes.txt
        classes_file = output_directory / 'classes.txt'
        existing_lines = set()
        if classes_file.exists():
            with open(classes_file, 'r') as f:
                existing_lines = set(line.strip() for line in f)
                logger.trace(f"Found existing lines in classes.txt, {existing_lines}")
        with open(classes_file, 'a') as f:
            for line in labels_dict:
                if line not in existing_lines:
                    f.write(f'{line}\n')
                    logger.trace(f"Line not in in classes.txt, {line}, appending it")
                else:
                    logger.trace(f"Line already in in classes.txt, {line}, ignoring it")

        
        max_frame = max(files_dict.keys())
        padding = len(str(max_frame))

        # Write labels (annotations)
        for frame, lines in tqdm(files_dict.items(), "Writing label files"):
            with open(output_directory / 'labels' / f'{inp["json"].rsplit(".", 1)[0]}_frame_{frame:0{padding}d}.txt', 'w') as csvfile:
                csvwriter = csv.writer(csvfile, delimiter=' ')
                csvwriter.writerows(lines)

        # load the video clip
        video_clip = VideoFileClip(os.path.join(input_directory, inp["video"]))
        # if the SAVING_FRAMES_PER_SECOND is above video FPS, then set it to FPS (as maximum)
        saving_frames_per_second = min(video_clip.fps, SAVING_FRAMES_PER_SECOND)
        # if SAVING_FRAMES_PER_SECOND is set to 0, step is 1/fps, else 1/SAVING_FRAMES_PER_SECOND
        step = 1 / video_clip.fps if saving_frames_per_second == 0 else 1 / saving_frames_per_second
        padding = len(str(int(video_clip.fps * video_clip.duration)))
        images_dir = output_directory / 'images'
        images_dir.mkdir(parents=True, exist_ok=True)
        for idx, current_duration in tqdm(enumerate(np.arange(0, video_clip.duration, step)), "Processing frames", unit="frame", total=round(SAVING_FRAMES_PER_SECOND * video_clip.duration)):
            frame_filename = images_dir / f'{inp["json"].rsplit(".", 1)[0]}_frame_{idx+1:0{padding}d}.jpg'
            video_clip.save_frame(str(frame_filename), current_duration)

        # look through frames, delete ones that aren't labelled
        label_files = [x for x in os.listdir(output_directory / 'labels')]
        for frame_file in tqdm(os.listdir(images_dir), "Filtering frames"):
            if frame_file.rsplit(".", 1)[0] + ".txt" not in label_files:
                os.remove(os.path.join(output_directory, "images", frame_file))

        logger.success(f"Completed video+annotations for {inp["json"]}, {work_index + 1}/{len(inputs)}")


In [None]:
# GO
process(INPUTS_DIRECTORY, Path(OUTPUTS_DIRECTORY))

[32m2025-06-15 14:58:48.615[0m | [34m[1mDEBUG   [0m | [36m__main__[0m:[36mprocess[0m:[36m17[0m - [34m[1mFile inputs: [{'video': 'video.mp4', 'json': 'video.json'}][0m
[32m2025-06-15 14:58:48.616[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess[0m:[36m20[0m - [1mProcessing video+annotations for video.json, 1/1[0m
[32m2025-06-15 14:58:48.629[0m | [34m[1mDEBUG   [0m | [36m__main__[0m:[36mprocess[0m:[36m29[0m - [34m[1mFound labels: {'blueRobot': 0, 'redRobot': 1}[0m
[32m2025-06-15 14:58:48.630[0m | [34m[1mDEBUG   [0m | [36m__main__[0m:[36mprocess[0m:[36m34[0m - [34m[1mFound 17 subjects in video.json[0m
[32m2025-06-15 14:58:48.762[0m | [36m[1mTRACE   [0m | [36m__main__[0m:[36mprocess[0m:[36m82[0m - [36m[1mFrame times: dict_keys([77, 129, 151, 191, 206, 211, 218, 229, 232, 234, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266,

Writing label files:   0%|          | 0/4114 [00:00<?, ?it/s]

ffmpeg output:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'label-studio-inputs/video.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    title           : Einstein Final 1 - 2023 FIRST Championship
    artist          : FIRSTRoboticsCompetition
    date            : 20230422
    encoder         : Lavf61.1.100
    comment         : https://www.youtube.com/watch?v=yovTwDUIJI4
    description     : Einstein Final 1 - 2023 FIRST Championship - FIRST Robotics Competition
                    : Red (Teams 1323, 4096, 4414) - 213
                    : Blue (Teams 5460, 125, 870) - 141
                    : https://frc-events.firstinspires.org/2023/CMPTX/playoffs/14
                    : 
                    : Uploaded by MatchLIVE from JK Productions
                    : (c) 2023 FIRST Robotics Competition
    synopsis        : Einstein Final 1 - 2023 FIRST Championship - FIRST Robotics Competition
                    : Red (Teams 1

Processing frames:   0%|          | 0/4755 [00:00<?, ?frame/s]

Filtering frames:   0%|          | 0/4755 [00:00<?, ?it/s]

[32m2025-06-15 14:59:36.912[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36mprocess[0m:[36m128[0m - [32m[1mCompleted video+annotations for video.json, 1/1[0m


: 