Skip to content

Transferring Genshin PVs into a freehand style with Diffusion Model.

License

Notifications You must be signed in to change notification settings

Kebii/Freehand-Genshin-Diffusion

Repository files navigation

Freehand-Genshin-Diffusion

Transferring Genshin PVs into a freehand style with Diffusion Model.

Plans

  • Inference codes of image-model
  • Pretrained weights for 480x320 resolution
  • Inference codes of video-model incorporating temporal module.
  • Training scripts

Examples

  • Here are some results we generated using the pretrained image-model, with the resolution of 480x320.

  • Here are the results generated by our pretrained video-model.

  • The model can be generalized to real-world videos:

Limitations

We observe following shortcomings in current version:

  1. The primary issue is the temporal inconsistency in the generated frames, which causes flickering and jittering in the video.
  2. Training and inference for this model are inefficient, requiring substantial computational resources.

Installation

Build Environtment

We Recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:

conda create -n Genshin python=3.10
conda activate Genshin
# Install requirements with pip:
pip install -r requirements.txt

Download weights

You can download weights manually, which has some steps:

  1. Download our trained weights from BaiduDisk, which include two parts: denoising_unet.pth, reference_unet.pth.

  2. (Optional) Download our newly trained weights from BaiduDisk, which include three parts: denoising_unet-54400.pth, reference_unet-54400.pth, and motion_module-146.pth.

  3. Download pretrained weight of based models and other components:

  4. Download the pretrained motion module weights of AnimateDiff mm_sd_v15_v2

Finally, these weights should be orgnized as follows:

./pretrained_weights/
|-- denoising_unet.pth
|-- reference_unet.pth
|-- denoising_unet-54400.pth
|-- reference_unet-54400.pth
|-- motion_module-146.pth
|-- mm_sd_v15_v2.ckpt
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Inference

Here is the cli command for running inference scripts:

  • image-model inference:
python -m scripts.genshin_paint_image --config ./configs/prompts/genshin_paint_image.yaml -W 480 -H 320
  • video-model inference:
python -m scripts.genshin_paint_video --config ./configs/prompts/genshin_paint_video.yaml -W 480 -H 320
  • You can refer the format of genshin_paint_image(video).yaml and modify the input_video_path to transfer other Gensin PVs in MP4 format.

Training

The training process involves two steps:

  • Step 1, train the image-model:
accelerate launch genshin_train_stage_1.py --config ./configs/train/genshin_stage1.yaml
  • Step 2, train the temporal module of the video-model:
accelerate launch genshin_train_stage_2.py --config ./configs/train/genshin_stage2.yaml

I am sorry that I can't open source the training data.

Disclaimer

This project is intended for academic research, and we explicitly disclaim any responsibility for user-generated content. Users are solely liable for their actions while using the generative model. The project contributors have no legal affiliation with, nor accountability for, users' behaviors. It is imperative to use the generative model responsibly, adhering to both ethical and legal standards.

Acknowledgements

This repository is build based on Moore-AnimateAnyone. We thank them for their excellent work in releasing high-quality code.

About

Transferring Genshin PVs into a freehand style with Diffusion Model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages