ComfyUI_ModelScopeT2V

Allows native usage of ModelScope based Text To Video Models in ComfyUI

Getting Started

Clone The Repository

cd /your/path/to/ComfyUI/custom_nodes
git clone https://github.com/ExponentialML/ComfyUI_ModelScopeT2V.git

Preparation

Create a folder in your ComfyUI models folder named text2video.

Download Models

Models that were converted to A1111 format will work.

Modelscope

https://huggingface.co/kabachuha/modelscope-damo-text2video-pruned-weights/tree/main

Zeroscope

https://huggingface.co/cerspense/zeroscope_v2_1111models

Instructions

Place the models in text2video_pytorch_model.pth model in the text2video directory.

You must also use the accompanying open_clip_pytorch_model.bin, and place it in the clip folder under your model directory.

This is optional if you're not using the attention layers, and are using something like AnimateDiff (more on this in usage).

Usage

model_path: The path to your ModelScope model.
enable_attn: Enables the temporal attention of the ModelScope model. If this is disabled, you must apply a 1.5 based model. If this option is enabled and you apply a 1.5 based model, this parameter will be disabled by default. This is due to ModelScope's usage of the SD 2.0 based CLIP model instead of the 1.5 one.
enable_conv: Enables the temporal convolution modules of the ModelScope model. Enabling this option with a 1.5 based model as input will allow you to leverage temporal convoutions with other modules (such as AnimateDiff)
temporal_attn_strength: Controls the strength of the temporal attention, bringing it closer to the dataset input without temporal properties.
temporal_conv_strength: Controls the strength of the temporal convolution, bringing it closer to the model input without temporal properties.
sd_15_model: Optional. If left blank, pure ModelScope will be used.

Tips

Use the recently released ResAdapter LoRA for better quality at lower resolutions.
If you're using pure ModelScope, try higher CFG (around 15) for better coherence. You may also try any other rescale nodes.
When using pure ModelScope, ensure that you use a minimum of 24 frames.
If using with AnimateDiff, make sure to use 16 frames if you're not using context options.
You must use the same CLIP model as the 1.5 checkpoint if you have enable_attn disabled.

TODO

Uncoditional guidance (CFG 1) is currently not implemented.
Explore ensembling 1.5 models with the 2.0 CLIP encoder to use all modules.

Atributions

The temporal code was borrowed and leveraged from https://github.com/kabachuha/sd-webui-text2video. Thanks @kabachuha!

Thanks to the ModelScope team for open sourcing. Check out there existing works https://github.com/modelscope.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs		configs
modules		modules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

modules

modules

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

nodes.py

nodes.py

Repository files navigation

ComfyUI_ModelScopeT2V

Getting Started

Clone The Repository

Preparation

Download Models

Modelscope

Zeroscope

Instructions

Usage

Tips

TODO

Atributions

About

Releases

Packages

Languages

License

ExponentialML/ComfyUI_ModelScopeT2V

Folders and files

Latest commit

History

Repository files navigation

ComfyUI_ModelScopeT2V

Getting Started

Clone The Repository

Preparation

Download Models

Modelscope

Zeroscope

Instructions

Usage

Tips

TODO

Atributions

About

Resources

License

Stars

Watchers

Forks

Languages