pix2pix-zero

[website]

This is author's reimplementation of "Zero-shot Image-to-Image Translation" using the diffusers library.
The results in the paper are based on the CompVis library, which will be released later.

[New!] Code for editing real and synthetic images released!

We propose pix2pix-zero, a diffusion-based image-to-image approach that allows users to specify the edit direction on-the-fly (e.g., cat to dog). Our method can directly use pre-trained Stable Diffusion, for editing real and synthetic images while preserving the input image's structure. Our method is training-free and prompt-free, as it requires neither manual text prompting for each input image nor costly fine-tuning for each task.

TL;DR: no finetuning required, no text input needed, input structure preserved.

Results

All our results are based on stable-diffusion-v1-4 model. Please the website for more results.

The top row for each of the results below show editing of real images, and the bottom row shows synthetic image editing.

Real Image Editing

Synthetic Image Editing

Method Details

Given an input image, we first generate text captions using BLIP and apply regularized DDIM inversion to obtain our inverted noise map. Then, we obtain reference cross-attention maps that correspoind to the structure of the input image by denoising, guided with the CLIP embeddings of our generated text (c). Next, we denoise with edited text embeddings, while enforcing a loss to match current cross-attention maps with the reference cross-attention maps.

Getting Started

Environment Setup

We provide a conda env file that contains all the required dependencies
```
conda env create -f environment.yml
```
Following this, you can activate the conda environment with the command below.
```
conda activate pix2pix-zero
```

Real Image Translation

First, run the inversion command below to obtain the input noise that reconstructs the image. The command below will save the inversion in the results folder as output/test_cat/inversion/cat_1.pt and the BLIP-generated prompt as output/test_cat/prompt/cat_1.txt
```
python src/inversion.py  \
        --input_image "assets/test_images/cats/cat_1.png" \
        --results_folder "output/test_cat"
```

Next, we can perform image editing with the editing direction as shown below. The command below will save the edited image as output/test_cat/edit/cat_1.png

python src/edit_real.py \
    --inversion "output/test_cat/inversion/cat_1.pt" \
    --prompt "output/test_cat/prompt/cat_1.txt" \
    --task_name "cat2dog" \
    --results_folder "output/test_cat/"

Editing Synthetic Images

Similarly, we can edit the synthetic images generated by Stable Diffusion with the following command.

python src/edit_synthetic.py \
    --results_folder "output/synth_editing" \
    --prompt_str "a high resolution painting of a cat in the style of van gough" \
    --task "cat2dog"

Tips and Debugging

Controlling the Image Structure:
The --xa_guidance flag controls the amount of cross-attention guidance to be applied when performing the edit. If the output edited image does not retain the structure from the input, increasing the value will typically address the issue. We recommend changing the value in increments of 0.05.
Improving Image Quality:
If the output image quality is low or has some artifacts, using more steps for both the inversion and editing would be helpful. This can be controlled with the --num_ddim_steps flag.
Reducing the VRAM Requirements:
We can reduce the VRAM requirements using lower precision and setting the flag --use_float_16.

Finding Custom Edit Directions

We provide some pre-computed directions in the assets folder. To generate new edit directions, users can first generate two files containing a large number of sentences (~1000) and then run the command as shown below.
```
  python src/make_edit_direction.py \
    --file_source_sentences sentences/apple.txt \
    --file_target_sentences sentences/orange.txt \
    --output_folder assets/embeddings_sd_1.4
```
After running the above command, you can set the flag --task apple2orange for the new edit.

Comparison

Comparisons with different baselines, including, SDEdit + word swap, DDIM + word swap, and prompt-to-propmt. Our method successfully applies the edit, while preserving the structure of the input image.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
script		script
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
environment.yml		environment.yml
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pix2pix-zero

[website]

Results

Real Image Editing

Synthetic Image Editing

Method Details

Getting Started

Tips and Debugging

Comparison

About

Releases

Packages

Languages

License

chenxwh/pix2pix-zero

Folders and files

Latest commit

History

Repository files navigation

pix2pix-zero

[website]

Results

Real Image Editing

Synthetic Image Editing

Method Details

Getting Started

Tips and Debugging

Comparison

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages