# **VideoCrafter：A Toolkit for Text-to-Video Generation and Editing**


VideoCrafter is an open-source video generation and editing toolbox for crafting video content.

More details can be founded in [![GitHub](https://img.shields.io/github/stars/VideoCrafter/VideoCrafter?style=social)](https://github.com/VideoCrafter/VideoCrafter)

In [None]:
### make sure that CUDA is available in Edit -> Nootbook settings -> GPU
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

Tesla T4, 15360 MiB, 15101 MiB


## Installnation

In [None]:
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.8 2  
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.9 1  
!python --version  
!apt-get update
!apt install software-properties-common
!sudo dpkg --remove --force-remove-reinstreq python3-pip python3-setuptools python3-wheel
!apt-get install python3-pip

print('Git clone project and install requirements...')
!git clone https://github.com/VideoCrafter/VideoCrafter &> /dev/null
%cd VideoCrafter 
!export PYTHONPATH=/content/VideoCrafter:$PYTHONPATH 

!python3.8 -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
!apt update
!apt install ffmpeg &> /dev/null  
!python3.8 -m pip install pytorch-lightning==1.8.3 omegaconf==2.1.1 einops==0.3.0 transformers==4.25.1
!python3.8 -m pip install opencv-python==4.1.2.30 imageio==2.9.0 imageio-ffmpeg==0.4.2
!python3.8 -m pip install av moviepy
!python3.8 -m pip install -e .

update-alternatives: renaming python3 link from /usr/bin/python3 to /usr/local/bin/python3
update-alternatives: using /usr/bin/python3.8 to provide /usr/local/bin/python3 (python3) in auto mode
Python 3.8.10
Get:1 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease [3,622 B]
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Hit:4 http://archive.ubuntu.com/ubuntu focal InRelease
Get:5 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:8 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Get:9 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2,590 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1,324 kB]
Get:11 http://ppa.launc

In [None]:
### download all model form hugging-face
! rm -rf models/
! git lfs install
! git clone https://huggingface.co/VideoCrafter/t2v-version-1-1/
! mv t2v-version-1-1/models .

Updated git hooks.
Git LFS initialized.
Cloning into 't2v-version-1-1'...
remote: Enumerating objects: 18, done.[K
remote: Counting objects: 100% (18/18), done.[K
remote: Compressing objects: 100% (16/16), done.[K
remote: Total 18 (delta 2), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (18/18), 3.15 KiB | 1.05 MiB/s, done.
Filtering content: 100% (5/5), 374.82 MiB | 2.91 MiB/s, done.
Encountered 1 file(s) that may not have been copied correctly on Windows:
	models/base_t2v/model.ckpt

See: `git lfs help smudge` for more details.


### Base T2V: Generic Text-to-video Generation

In [None]:
PROMPT="astronaut riding a horse outer space" 
OUTDIR="results/"

BASE_PATH="models/base_t2v/model.ckpt"
CONFIG_PATH="models/base_t2v/model_config.yaml"

! python scripts/sample_text2video.py \
    --ckpt_path $BASE_PATH \
    --config_path $CONFIG_PATH \
    --prompt "$PROMPT" \
    --save_dir $OUTDIR \
    --n_samples 1 \
    --batch_size 1 \
    --seed 1000 \
    --show_denoising_progress

Global seed set to 1000
config: 
 {'model': {'load_from_pretrained_img_model': True, 'ckpt_path': '/apdcephfs_cq2/share_1290939/yingqinghe/dependencies/stable_diffusion/compvis-sd-v1-4-original/sd-v1-4-full-ema.ckpt', 'config_path': 'configs/latent-diffusion/txt2img-1p4B-eval-Clipembedder.yaml', 'load_from_checkpoint': '/apdcephfs/share_1290939/yingqinghe/results/latent_diffusion/text2video/tv_054_NoFPSEmbd_NoMotionAdapter_FS32_basedon050_2_8nodes_e0_V/checkpoints/trainstep_checkpoints/epoch=000003-step=000020000.ckpt', 'base_learning_rate': 5e-07, 'scale_lr': False, 'target': 'lvdm.models.ddpm3d.LatentDiffusion', 'params': {'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'video', 'cond_stage_key': 'caption', 'image_size': [32, 32], 'video_length': 16, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'monitor': 'train/loss_simple_step', 'scale_by_std': False, 'scale_factor': 0

In [None]:
# visualize
from IPython.display import HTML
from base64 import b64encode
import os, sys, glob

# get the last from results

mp4_name = sorted(os.listdir(OUTDIR+'/videos'))[0]

mp4_name = os.path.join(OUTDIR+'/videos', mp4_name)

mp4 = open('{}'.format(mp4_name),'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

print('Display animation: {}'.format(mp4_name), file=sys.stderr)
display(HTML("""
  <video width=256 controls>
        <source src="%s" type="video/mp4">
  </video>
  """ % data_url))

Display animation: results//videos/astronaut_riding_a_horse_outer_space_seed01000_000.mp4


## VideoLoRA: Personalized Text-to-Video Generation with LoRA

In [None]:
PROMPT="astronaut riding a horse"
OUTDIR="results/videolora"

BASE_PATH="models/base_t2v/model.ckpt"
CONFIG_PATH="models/base_t2v/model_config.yaml"

LORA_PATH="models/videolora/lora_001_Loving_Vincent_style.ckpt"
TAG=", Loving Vincent style"

! python scripts/sample_text2video.py \
    --ckpt_path $BASE_PATH \
    --config_path $CONFIG_PATH \
    --prompt "$PROMPT" \
    --save_dir $OUTDIR \
    --n_samples 1 \
    --batch_size 1 \
    --seed 1000 \
    --show_denoising_progress \
    --inject_lora \
    --lora_path $LORA_PATH \
    --lora_trigger_word "$TAG" \
    --lora_scale 1.0

Global seed set to 1000
config: 
 {'model': {'load_from_pretrained_img_model': True, 'ckpt_path': '/apdcephfs_cq2/share_1290939/yingqinghe/dependencies/stable_diffusion/compvis-sd-v1-4-original/sd-v1-4-full-ema.ckpt', 'config_path': 'configs/latent-diffusion/txt2img-1p4B-eval-Clipembedder.yaml', 'load_from_checkpoint': '/apdcephfs/share_1290939/yingqinghe/results/latent_diffusion/text2video/tv_054_NoFPSEmbd_NoMotionAdapter_FS32_basedon050_2_8nodes_e0_V/checkpoints/trainstep_checkpoints/epoch=000003-step=000020000.ckpt', 'base_learning_rate': 5e-07, 'scale_lr': False, 'target': 'lvdm.models.ddpm3d.LatentDiffusion', 'params': {'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'video', 'cond_stage_key': 'caption', 'image_size': [32, 32], 'video_length': 16, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'monitor': 'train/loss_simple_step', 'scale_by_std': False, 'scale_factor': 0

In [None]:
# visualize
from IPython.display import HTML
from base64 import b64encode
import os, sys, glob

# get the last from results

mp4_name = sorted(os.listdir(OUTDIR+'/videos'))[0]

mp4_name = os.path.join(OUTDIR+'/videos', mp4_name)

mp4 = open('{}'.format(mp4_name),'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

print('Display animation: {}'.format(mp4_name), file=sys.stderr)
display(HTML("""
  <video width=256 controls>
        <source src="%s" type="video/mp4">
  </video>
  """ % data_url))

Display animation: results/videolora/videos/astronaut_riding_a_horse,_Loving_Vincent_style_seed01000_000.mp4
