commaVQ challenge

Leaderboard · comma.ai/jobs · Discord · X

Source Video	Compressed Video	Future Prediction
source_video.mp4	compressed_video.mp4	generated.mp4

A world model is a model that can predict the next state of the world given the observed previous states and actions.

World models are essential to training all kinds of intelligent agents, especially self-driving models.

commaVQ contains:

encoder/decoder models used to heavily compress driving scenes
a world model trained on 3,000,000 minutes of driving videos
a dataset of 100,000 minutes of compressed driving videos

Task

Lossless compression challenge: make me smaller! $500 challenge

Losslessly compress 5,000 minutes of driving video "tokens". Go to ./compression/ to start

Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE

Submit a single zip file containing the compressed data and a python script to decompress it into its original form using this form. Top solutions are listed on comma's official leaderboard.

Implementation	Compression rate
szabolcs-cs (self-compressing neural network)	3.4
pkourouklidis (arithmetic coding with GPT)	2.6
anonymous (zpaq)	2.3
rostislav (zpaq)	2.3
anonymous (zpaq)	2.2
anonymous (zpaq)	2.2
0x41head (zpaq)	2.2
tillinf (zpaq)	2.2
baseline (lzma)	1.6

Overview

A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.

A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].

Examples

./notebooks/encode.ipynb and ./notebooks/decode.ipynb for an example of how to visualize the dataset using a segment of driving video from comma's drive to Taco Bell

./notebooks/gpt.ipynb for an example of how to use the world model to imagine future frames.

./compression/compress.py for an example of how to compress the tokens using lzma

Download the dataset

Using huggingface datasets

import numpy as np
from datasets import load_dataset
num_proc = 40 # CPUs go brrrr
ds = load_dataset('commaai/commavq', num_proc=num_proc)
tokens = np.load(ds['0'][0]['path']) # first segment from the first data shard

Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq

References

[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[3] https://worldmodels.github.io/

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.

Name	Name	Last commit message	Last commit date
Latest commit adeebshihadeh submissio0nm Apr 1, 2025 3992a16 · Apr 1, 2025 History 35 Commits
compression	compression	new best score	Mar 11, 2025
examples	examples	use torch.compile - gpt at 0.3 sec/frame (#23 )	May 29, 2024
gpt2m @ 12f0a5e	gpt2m @ 12f0a5e	use torch.compile - gpt at 0.3 sec/frame (#23 )	May 29, 2024
nanogpt	nanogpt	move all models to HF (#15 )	Aug 30, 2023
notebooks	notebooks	use torch.compile - gpt at 0.3 sec/frame (#23 )	May 29, 2024
utils	utils	gpt: make easier to use outside of self.generate (#25 )	May 30, 2024
.gitattributes	.gitattributes	unLFS hevc file (#16 )	Aug 30, 2023
.gitignore	.gitignore	even less assuming	Jun 25, 2024
.gitmodules	.gitmodules	move all models to HF (#15 )	Aug 30, 2023
CITATION.cff	CITATION.cff	create CITATION.cff	Oct 11, 2023
LICENSE	LICENSE	dataset release	Jun 28, 2023
README.md	README.md	submissio0nm	Apr 1, 2025
requirements.txt	requirements.txt	use torch.compile - gpt at 0.3 sec/frame (#23 )	May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

commaVQ challenge

Leaderboard · comma.ai/jobs · Discord · X

Task

Lossless compression challenge: make me smaller! $500 challenge

Overview

Examples

Download the dataset

References

About

Releases

Packages

Contributors 7

Languages

License

commaai/commavq

Folders and files

Latest commit

History

Repository files navigation

commaVQ challenge

Leaderboard · comma.ai/jobs · Discord · X

Task

Lossless compression challenge: make me smaller! $500 challenge

Overview

Examples

Download the dataset

References

About

Resources

License

Citation

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages