
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
A pipeline parallel training script for diffusion models.
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Wan: Open and Advanced Large-Scale Video Generative Models
collection of diffusion model papers categorized by their subareas
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A generative world for general-purpose robotics & embodied AI learning.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
"Deep Generative Modeling": Introductory Examples
Official inference repo for FLUX.1 models
Enjoy the magic of Diffusion models!
Lumina-T2X is a unified framework for Text to Any Modality Generation
llama3 implementation one matrix multiplication at a time
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Open-Sora: Democratizing Efficient Video Production for All
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
The official PyTorch implementation of Google's Gemma models
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
ENFUGUE is an open-source web app for making studio-grade images and video using generative AI.
Instant voice cloning by MIT and MyShell. Audio foundation model.