Skip to content

Kandinskiy-2-2 with temporal blocks for short video generation.

License

Notifications You must be signed in to change notification settings

TheDenk/Kandimate

Repository files navigation

Kandimate

Kandinskiy-2-2 with temporal blocks for short gifs generation.

Approach based on AnimateDiff with Kandinsky-2 models and interpolation FILM models.

Some Examples (More examples see below):

Common Issues

WARNING! Current model version was trained 1 epoch on 3% ~ 350k video data from WebVid dataset.
So, it maybe difficult to get a good result.

GPU MEM requirements (RTX 3090 or 4090 at least):

  • 512x512 generation ~ 20.5 GB
  • 768x768 generation ~ 23.5 GB

PS.
Best image generation with 4 < guidance_scale < 8 and image_size = 768.

Setups for Inference

Prepare Environment

git clone https://github.com/TheDenk/Kandimate.git
cd Kandimate

Requirements with pip

pip install -r requiremetns.txt

Or with conda

conda env create -f environment.yaml
conda activate kandimate

Download Base Models And Motion Module Checkpoints

git lfs install

git clone https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder ./models/kandinsky-2-2-decoder

git clone https://huggingface.co/kandinsky-community/kandinsky-2-2-prior ./models/kandinsky-2-2-prior

bash download_bashscripts/download-motion-module.sh
bash download_bashscripts/download-interpolation-models.sh

You may also directly download the motion module and interpolation models checkpoints from Google Drive, then put them in models/motion-modules/ folder and models/interpolation-models/ respectively.

Interpolation models also can be found here.

Inference

Main inference

All generation parameters, such as prompt, negative_prompt, seed, etc. stored in config file configs/inference/inference.yaml.
After downloading the all models, run the following commands to generate animations.

The results will automatically be saved to samples/ folder.

python -m scripts.animate --config ./configs/inference/inference.yaml

It is recommend users to generate animation with 16 frames and 768 resolution. Notably, various resolution/frames may affect the quality more or less.

Interpolation (optional)

Original Interpolated

Also you can apply interpolation between frames to make gif more smoothness. Set path to gif and inerpolation parameters in ./configs/interpolate/interpolate.yaml.

python -m scripts.interpolate --config ./configs/interpolate/interpolate.yaml

PS.
It is not recommended to use interpolation when generating nature videos.

Steps for Training

Dataset

Before training, download the videos files and the .csv annotations of WebVid10M to the local mechine. Note that the training script requires all the videos to be saved in a single folder. You may change this by modifying kandimate/data/dataset.py.

Configuration

After dataset preparations, update the below data paths in the config .yaml files in configs/training/ folder:

train_data:
  csv_path:     [Replace with .csv Annotation File Path]
  video_folder: [Replace with Video Folder Path]
  sample_size:  256

Other training parameters (lr, epochs, validation settings, etc.) are also included in the config files.

Training

To train motion modules

torchrun --nnodes=1 --nproc_per_node=1 train.py --config configs/training/training.yaml

Todo

  • Add train and inference scripts (py and jupyter).
  • Add interpolation inference scripts (py and jupyter).
  • Add Gradio Demo (probably).
  • Add controlnet (probably).

Gallery

Here several best results.

Acknowledgements

Codebase AnimateDiff and Tune-a-Video.
Diffusion models Kandinsky-2.
Interpolation models FILM.

Contacts

Issues should be raised directly in the repository. For professional support and recommendations please welcomedenk@gmail.com.

About

Kandinskiy-2-2 with temporal blocks for short video generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published