Skip to content
View Attention-Refocusing's full-sized avatar

Block or report Attention-Refocusing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Attention Refocusing

[Website][Demo]

This is the official implementation of the paper "Grounded Text-to-Image Synthesis with Attention Refocusing"

intro_small.mp4

Setup

conda create --name ldm_layout python==3.8.0
conda activate ldm_layout
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
pip install git+https://github.com/CompVis/taming-transformers.git
pip install git+https://github.com/openai/CLIP.git

Inference

Teaser figure

Download the model GLIGEN and put them in gligen_checkpoints

Run with the prompts in HRS/Drawbench prompts :

python guide_gligen.py --ckpt [model_checkpoint]  --file_save [save_path] \
                       --type [category] --box_pickle [saved_boxes] --use_gpt4

Where

  • --ckpt: Path to the GLIGEN checkpoint
  • --file_save: Path to save the generated images
  • --type: The category to test (options include counting, spatial, color, size)
  • --box_pickle: Path to save the generated layout from GPT-4
  • --use_gpt4: Whether to use GPT-4 to generate the layout. If you're using GPT-4, set your GPT-4 API key as follows:
export OPENAI_API_KEY='your-api-key'

For instance, to generate images according to the layouts and prompts of the counting category:

python guide_gligen.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin --file_save counting_500 \
                       --type counting --box_pickle ../data_evaluate_LLM/gpt_generated_box/counting.p

To run with user input text prompts:

export OPENAI_API_KEY='your-api-key'
python inference.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin

We provide generated layout from GPT4 for HRS benchmark in the HRS boxes, DrawBench boxes
We also provide generated images from GLIGEN, and other baselines including Stable Diffusion, Attend-and-excite, MultiDiffusion, Layout-guidance, GLIGEN and ours here

Evaluation

Set up the environment, download detector models, and run evaluation for each category, see the evaluation.

Attention-refocusing with other baselines

ControlNet + attention-refocusing

Acknowledgments

This project is built on the following resources:

  • GLIGEN: Our code is built upon the foundational work provided by GLIGEN.

  • HRS: The evaluation component of our project has been adopted from HRS.

Popular repositories Loading

  1. attention-refocusing attention-refocusing Public

    Python 106 9

  2. attention-refocusing.github.io attention-refocusing.github.io Public

    HTML 1

  3. LLM-groundedVideoDiffusion LLM-groundedVideoDiffusion Public

    Forked from TonyLianLong/LLM-groundedVideoDiffusion

    LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper

  4. controlnet_attention controlnet_attention Public

    Python