Skip to content

apple/ml-jointnet

Repository files navigation

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling

Jingyang Zhang1, Shiwei Li1, Yuanxun Lu3, Tian Fang1, David McKinnon1, Yanghai Tsin1, Long Quan2, Yao Yao3*
1Apple, 2HKUST, 3Nanjing University

This is the official implementation of JointNet, a novel neural network architecture for modeling the joint distribution of images and an additional dense modality (e.g., depth maps). It is extended from a pre-trained text-to-image diffusion model, so as to enable efficient learning of the new modality while maintaining the strong generalization. JointNet can perform a variety of applications, including joint RGBD generation, dense depth prediction, depth-conditioned image generation, and tile-based 3D panorama generation.

Usage

Setup

Install dependencies:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

Note that the version requirement of diffusers is strict. Specifically the class jointnet.py:JointNetModel inherits UNet2DConditionModel. If diffusers have to be updated, the interface of JointNetModel should be updated as well.

About Pretrained Models

We provide two models. rgbd_finetune_sd21b link is the default joint generation model, and rgbd_inpaint_sd21b link is for inpainting tasks, which is finetuned to condition on masked input images.

Joint Generation

jointnet_pipeline.py provides the pipeline for joint generation. Directly launch it to have a quick test:

python jointnet_pipeline.py \
  --model path/to/model \
  --prompt "A cat sitting on a wooden bench outside" \
  --modalities depth \
  --out_prefix examples/test_gen

Please refer to the script for more available arguments.

Inpainting

Use jointnet_inpaint_pipeline.py for inpainting tasks. Different masks can be applied to RGB and the joint modality. A example of general usage:

python jointnet_inpaint_pipeline.py \
  --model path/to/model \
  --prompt "A dog sitting on a wooden bench outside" \
  --modalities depth \
  --image examples/inpaint_image.png \
  --mask examples/inpaint_image_mask.png \
  --joint_input examples/inpaint_depth.png \
  --joint_mask examples/inpaint_depth_mask.png \
  --denoising_strength 0.9 \
  --out_prefix examples/test_inpaint

As mentioned in the paper, we can do bidirectional conversion between RGB and the joint modality by channel-wise inpainting. We provide presets for these tasks:

# Image to depth
python jointnet_inpaint_pipeline.py \
  --model path/to/model \
  --prompt "A cat sitting on a wooden bench outside" \
  --modalities depth \
  --image examples/inpaint_image.png \
  --preset rgb2j \
  --denoising_strength 1.0 \
  --out_prefix examples/test_rgb2d
# Depth to image
python jointnet_inpaint_pipeline.py \
  --model path/to/model \
  --prompt "A cat sitting on a wooden bench outside" \
  --modalities depth \
  --joint_input examples/inpaint_depth.png \
  --preset j2rgb \
  --denoising_strength 1.0 \
  --out_prefix examples/test_d2rgb

Tile-based Diffusion

jointnetpano_pipeline.py and jointnetpano_inpaint_pipeline.py provides tile-based diffusion pipelines for panoramas. The interface is similar to the non-tile-based ones.

jointnet_upsample_pipeline.py is a tile-based pipeline for general purpose upsampling. The interface is similar to the inpaint pipeline. You can also choose to only upsample either image or the joint modality. In this case the upsampling is conditioned on the other side in the target resolution. This can be set by --preset both/image/joint

License

This sample code is released under the LICENSE terms.

Citation

@article{zhang2024jointnet,
  title={JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling},
  author={Zhang, Jingyang and Li, Shiwei and Lu, Yuanxun and Fang, Tian and McKinnon, David and Tsin, Yanghai and Quan, Long and Yao, Yao},
  journal={International Conference on Learning Representations (ICLR)},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages