Stars
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Official implementation of the paper: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models"
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
[CVPR2025] MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
Self-supervised Spatiotemporal Learning via Video Clip Order Prediction
EVA Series: Visual Representation Fantasies from BAAI
This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.
ConvMAE: Masked Convolution Meets Masked Autoencoders
PyTorch implementation of Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Example models using DeepSpeed
Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video ta…
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
Source code for "Visually aligned sound generation via sound-producing motion parsing" (Published at Neurocomputing)
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned Sound (VAS) dataset.
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)
Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"
Official Implementation of the paper "A U-Net Based Discriminator for Generative Adversarial Networks" (CVPR 2020)
A mix of GAN implementations including progressive growing
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…