Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,841 503 Updated Mar 28, 2025

NVIDIA / Cosmos-Tokenizer

A suite of image and video neural tokenizers

Jupyter Notebook 1,589 74 Updated Feb 11, 2025

lxa9867 / ImageFolder

High-performance Image Tokenizers for VAR and AR

Python 228 5 Updated Mar 25, 2025

zh460045050 / VQGAN-LC

Python 121 8 Updated Jun 28, 2024

TencentARC / SEED-Voken

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 856 31 Updated Feb 19, 2025

zhaoyue-zephyrus / bsq-vit

[ICLR 2025][arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization

Python 139 Updated Jun 12, 2024

youngsheen / SimVQ

SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 230 7 Updated Dec 29, 2024

TencentARC / DiTCtrl

[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"

Python 245 5 Updated Mar 17, 2025

hywang66 / LARP

Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).

Python 59 Updated Feb 11, 2025

ant-research / lumos

[CVPR'25] Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text

Python 34 Updated Mar 16, 2025

ByteFlow-AI / TokenFlow

[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".

Python 296 1 Updated Mar 5, 2025

AV-Odyssey / AV-Odyssey

This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"

Python 23 1 Updated Dec 23, 2024

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,086 82 Updated Mar 2, 2025

showlab / Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

442 18 Updated Mar 14, 2025

showlab / Show-o

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,294 56 Updated Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jiaming Han csuhan

Achievements

Achievements

Highlights

Block or report csuhan

Stars

tulerfeng / Video-R1

thaoshibe / awesome-personalized-lmms

NVIDIA / cuda-python

lucasdegeorge / T2I-ImageNet

mlfoundations / dclm

rongyaofang / GoT

xdit-project / xDiT

Qinyu-Allen-Zhao / Arinar

TencentQQGYLab / ELLA

djghosh13 / geneval

FoundationVision / UniTok

yunhaif / reflect-vlm

mit-han-lab / vila-u

Tencent / Hunyuan3D-2

philippe-eecs / vitok

NVIDIA / Cosmos