Skip to content

codeaudit/VPD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VPD

PWC
PWC

Created by Wenliang Zhao*, Yongming Rao*, Zuyan Liu*, Benlin Liu, Jie Zhou, Jiwen Lu

This repository contains PyTorch implementation for paper "Unleashing Text-to-Image Diffusion Models for Visual Perception".

VPD (Visual Perception with Pre-trained Diffusion Models) is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.

intro

[Project Page] [arXiv]

Installation

Clone this repo, and run

git submodule init
git submodule update

Download the checkpoint of stable-diffusion (we use v1-5 by default) and put it in the checkpoints folder

Semantic Segmentation with VPD

Equipped with a lightweight Semantic FPN and trained for 80K iterations on $512\times512$ crops, our VPD can achieve 54.6 mIoU on ADE20K.

Please check segmentation.md for detailed instructions.

Referring Image Segmentation with VPD

VPD achieves 73.25, 63.51, and 62.80 oIoU on the validation sets of RefCOCO, RefCOCO+, and G-Ref, repectively.

Please check refer.md for detailed instructions.

Depth Estimation with VPD

VPD obtains 0.254 RMSE on NYUv2 depth estimation benchmark, establishing the new state-of-the-art.

RMSE d1 d2 d3 REL log_10
VPD 0.254 0.964 0.995 0.999 0.069 0.030

Please check depth.md for detailed instructions.

License

MIT License

Acknowledgements

This code is based on mmsegmentation, LAVT, and MIM-Depth-Estimation.

Citation

If you find our work useful in your research, please consider citing:

@article{zhao2023unleashing,
  title={Unleashing Text-to-Image Diffusion Models for Visual Perception},
  author={Zhao, Wenliang and Rao, Yongming and Liu, Zuyan and Liu, Benlin and Zhou, Jie and Lu, Jiwen},
  journal={arXiv preprint arXiv:2303.02153},
  year={2023}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Shell 1.0%