Lists (21)
Sort Name ascending (A-Z)
3d
agent
audio
comfyui
cv
cv_work
datasets
diffusion language model
digital-human
flow mathcing
inference optimization
infrastructure
Languge Diffusion Models
LLM reason model
LLMs
multi-modal
Personal skill improvement
RAG
RL
tool
video
Stars
Official Implementation of Video-T1: Test-Time Scaling for Video Generation
Code of LHM: Large Animatable Human Reconstruction Model for Single Image to 3D in Seconds
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Unleashing Vecset Diffusion Model for Fast Shape Generation within 1 Second.
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Autoregressive Image Generation with Randomized Parallel Decoding
Distilling Diversity and Control in Diffusion Models
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
The official implementation of our paper "CoRe^2: Collect, Reflect and Refine to Generate Better and Faster".
Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
An Efficient Text-to-Image Generation Pretrain Pipeline
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
Model Context Protocol Servers
zero-shot voice conversion & singing voice conversion, with real-time support
This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incentivize reasoning ca…
A series of technical report on Slow Thinking with LLM
Explore the Multimodal “Aha Moment” on 2B Model