StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

🔥🔥🔥 StyleCrafter on SDXL for stylized image generation is available! Enabling higher resolution(1024×1024) and more visually pleasing!

GongyeLiu, Menghan Xia*, Yong Zhang, Haoxin Chen, Jinbo Xing,
Xintao Wang, Yujiu Yang*, Ying Shan

(* corresponding authors)

From Tsinghua University and Tencent AI Lab.

🔆 Introduction

TL;DR: We propose StyleCrafter, a generic method that enhances pre-trained T2V models with style control, supporting Style-Guided Text-to-Image Generation and Style-Guided Text-to-Video Generation.

1. ⭐⭐ Style-Guided Text-to-Video Generation.

Style-guided text-to-video results. Resolution: 320 x 512; Frames: 16. (Compressed)

2. Style-Guided Text-to-Image Generation.

Style-guided text-to-image results. Resolution: 512 x 512. (Compressed)

📝 Changelog

[2024.06.25]: 🔥🔥 Support StyleCrafter on SDXL!
[2023.12.08]: 🔥🔥 Release the Huggingface online demo.
[2023.12.05]: 🔥🔥 Release the code and checkpoint.
[2023.11.30]: 🔥🔥 Release the project page.

🧰 Models

Base Model	Gen Type	Resolution	Checkpoint	How to run
VideoCrafter	Image/Video	320x512	Hugging Face	StyleCrafter on VideoCrafter
SDXL	Image	1024x1024	Hugging Face	StyleCrafter on SDXL

It takes approximately 5 seconds to generate a 512×512 image and 85 seconds to generate a 320×512 video with 16 frames using a single NVIDIA A100 (40G) GPU. A GPU with at least 16G GPU memory is required to perform the inference process.

⚙️ Setup

conda create -n stylecrafter python=3.8.5
conda activate stylecrafter
pip install -r requirements.txt

💫 Inference

Download all checkpoints according to the instructions
Run the commands in terminal.

# style-guided text-to-image generation
sh scripts/run_infer_image.sh

# style-guided text-to-video generation
sh scripts/run_infer_video.sh

(Optional) Infernce on your own data according to the instructions

👨‍👩‍👧‍👦 Crafter Family

VideoCrafter1: Framework for high-quality text-to-video generation.

ScaleCrafter: Tuning-free method for high-resolution image/video generation.

TaleCrafter: An interactive story visualization tool that supports multiple characters.

LongerCrafter: Tuning-free method for longer high-quality video generation.

DynamiCrafter Animate open-domain still images to high-quality videos.

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.

🙏 Acknowledgements

We would like to thank AK(@_akhaliq) for the help of setting up online demo.

📭 Contact

If your have any comments or questions, feel free to contact lgy22@mails.tsinghua.edu.cn

BibTex

@article{liu2023stylecrafter,
  title={StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter},
  author={Liu, Gongye and Xia, Menghan and Zhang, Yong and Chen, Haoxin and Xing, Jinbo and Wang, Xintao and Yang, Yujiu and Shan, Ying},
  journal={arXiv preprint arXiv:2312.00330},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
checkpoints		checkpoints
configs		configs
docs		docs
eval_data		eval_data
lvdm		lvdm
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

🔥🔥🔥 StyleCrafter on SDXL for stylized image generation is available! Enabling higher resolution(1024×1024) and more visually pleasing!

🔆 Introduction

1. ⭐⭐ Style-Guided Text-to-Video Generation.

2. Style-Guided Text-to-Image Generation.

📝 Changelog

🧰 Models

⚙️ Setup

💫 Inference

👨‍👩‍👧‍👦 Crafter Family

📢 Disclaimer

🙏 Acknowledgements

📭 Contact

BibTex

About

Releases

Packages

Languages

License

GongyeLiu/StyleCrafter

Folders and files

Latest commit

History

Repository files navigation

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

🔥🔥🔥 StyleCrafter on SDXL for stylized image generation is available! Enabling higher resolution(1024×1024) and more visually pleasing!

🔆 Introduction

1. ⭐⭐ Style-Guided Text-to-Video Generation.

2. Style-Guided Text-to-Image Generation.

📝 Changelog

🧰 Models

⚙️ Setup

💫 Inference

👨‍👩‍👧‍👦 Crafter Family

📢 Disclaimer

🙏 Acknowledgements

📭 Contact

BibTex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages