Highlights
- Pro
Stars
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first work to systematically explore R1 for video]
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
Medical Diffusion: This repository contains the code to our paper Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Synthesis
Diffusion Models in Medical Imaging (Published in Medical Image Analysis Journal)
Align Anything: Training All-modality Model with Feedback
DepictQA: Depicted Image Quality Assessment with Vision Language Models
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including PSNR, SSIM, LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
Reproduction of DDPO paper (RLHF for diffusion)
Solve Visual Understanding with Reinforced VLMs
A comprehensive collection of IQA papers
Witness the aha moment of VLM with less than $3.
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
②[CVPR 2024] Low-level visual instruction tuning, with a 200K dataset and a model zoo for fine-tuned checkpoints.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
[ICLR 2023 Oral] Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
Memory-optimized training library for diffusion models
HunyuanVideo: A Systematic Framework For Large Video Generation Model