Introducing the Stable Video Diffusion Temporal Controlnet! This tool uses a controlnet style encoder with the svd base. It's designed to enhance your video diffusion projects by providing precise temporal control.
- Controlnet Model: you can get the depth model by running the inference script, it will automatically download the depth model to the cache, the model files can be found here: temporal-controlnet-depth-svd-v1
- Installation: run
pip install -r requirements.txt
- Execution: Run "run_inference.py".
- Focus on Central Object: The system tends to extract motion features primarily from a central object and, occasionally, from the background. It's best to avoid overly complex motion or obscure objects.
- Simplicity in Motion: Stick to motions that svd can handle well without the controlnet. This ensures it will be able to apply the motion.
My example training config is configured like this:
accelerate launch train_svd.py \
--pretrained_model_name_or_path="stabilityai/stable-video-diffusion-img2vid" \
--output_dir="model_out" \
--csv_path="path-to-your-csv" \
--video_folder="path-to-your-videos" \
--depth_folder="path-to-your-depth" \
--motion_folder="path-to-your-motion" \
--validation_image_folder="./validation_demo/rgb" \
--validation_control_folder="./validation_demo/depth" \
--width=512 \
--height=512 \
--learning_rate=2e-5 \
--per_gpu_batch_size=8 \
--num_train_epochs=5 \
--mixed_precision="fp16" \
--gradient_accumulation_steps=2 \
--checkpointing_steps=2000 \
--validation_steps=400 \
--gradient_checkpointing
- lllyasviel: for the original controlnet implementation
- Stability: for stable video diffusion.
- Diffusers Team: For the svd implementation.
- Pixeli99: For providing a practical svd training script: SVD_Xtend