Latent Space Editing in Transformer-based Flow Matching

Vincent Tao HU, David W Zhang, Pascal Mettes, Meng Tang, Deli Zhao, Cees G.M. Snoek

AAAI 2024

@inproceedings{hulfm,
        title = {Latent Space Editing in Transformer-based Flow Matching},
        author = {Hu, Tao and Zhang, David W and Mettes, Pascal and Tang, Meng and Zhao, Deli and Snoek, Cees G.M.},
        year = {2024},
        booktitle = {AAAI},
      }

Local-Prompt

Adding(Sequentially), Replacing, Rescaling

python dissect_lfm_t2i.py

Semantic Direction Manipulation

step1: semantic direction collection

python dissect_lfm.py

step2:generate the semantic direction

python tools/utils_attr.py

step3:semantic steering

python dissect_lfm.py

Pretraining on datasets

CelebAMask256

CUDA_VISIBLE_DEVICES=4,5,6,7  accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train_lfm.py --config=configs/lfm_cm256_uvit_large.py --config.train.batch_size=512

MM-Celeba_HQ

CUDA_VISIBLE_DEVICES=4,5,6,7  accelerate launch --multi_gpu --num_processes 4 --main_process_port 8839  --mixed_precision fp16 train_lfm_t2i.py --config=configs/lfm_mmcelebahq256_uvit_large.py --config.train.batch_size=512

COCO

python scripts/extract_mscoco_feature.py #train
python scripts/extract_mscoco_feature.py #val
python scripts/extract_empty_feature.py
python scripts/extract_test_prompt_feature.py

CUDA_VISIBLE_DEVICES=4,5,6,7  accelerate launch --multi_gpu --num_processes 4 --mixed_precision fp16 train_lfm_t2i.py --config=configs/lfm_mscoco_uvit_from_in256.py --config.train.batch_size=256

Prepare ./assets

following U-ViT

fid_stats, a dummy file
pretrained_weights, for initialization and fine-tunig
stable-diffusion, need the Encoder-Decoder weight

Environment Preparation

conda create -n uspace  python=3.10
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install pytorch-lightning torchdiffeq  matplotlib h5py timm diffusers accelerate loguru blobfile ml_collections
pip install hydra-core wandb einops scikit-learn --upgrade
pip install einops sklearn
pip install transformers==4.23.1 pycocotools # for text-to-image task

Acknowledgement

This codebase is developed based on U-ViT, if you find this repo useful, please consider citing the following paper:

@inproceedings{bao2022all,
  title={All are Worth Words: A ViT Backbone for Diffusion Models},
  author={Bao, Fan and Nie, Shen and Xue, Kaiwen and Cao, Yue and Li, Chongxuan and Su, Hang and Zhu, Jun},
  booktitle = {CVPR},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
dissections_vis		dissections_vis
lfm_dataset		lfm_dataset
libs		libs
pretrained_weights		pretrained_weights
scripts		scripts
tools		tools
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
datasets.py		datasets.py
dissect_lfm.py		dissect_lfm.py
dissect_lfm_t2i.py		dissect_lfm_t2i.py
flow_matching.py		flow_matching.py
flow_matching_t2i.py		flow_matching_t2i.py
train_ffmm.py		train_ffmm.py
train_lfm.py		train_lfm.py
train_lfm_t2i.py		train_lfm_t2i.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Space Editing in Transformer-based Flow Matching

Local-Prompt

Semantic Direction Manipulation

Pretraining on datasets

CelebAMask256

MM-Celeba_HQ

COCO

Prepare ./assets

Environment Preparation

Acknowledgement

About

Releases

Packages

Languages

License

dongzhuoyao/uspace

Folders and files

Latest commit

History

Repository files navigation

Latent Space Editing in Transformer-based Flow Matching

Local-Prompt

Semantic Direction Manipulation

Pretraining on datasets

CelebAMask256

MM-Celeba_HQ

COCO

Prepare ./assets

Environment Preparation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages