<a href="https://colab.research.google.com/github/alanland/colab-notebooks/blob/main/ChuanhuChatGPT/ChuanhuChatGPT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **VideoCrafter：A Toolkit for Text-to-Video Generation and Editing**


VideoCrafter is an open-source video generation and editing toolbox for crafting video content.

More details can be founded in [![GitHub](https://img.shields.io/github/stars/VideoCrafter/VideoCrafter?style=social)](https://github.com/VideoCrafter/VideoCrafter)

In [None]:
### make sure that CUDA is available in Edit -> Nootbook settings -> GPU
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

Tesla T4, 15360 MiB, 15101 MiB


## Installnation

In [None]:
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.8 2  
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.9 1  
!python --version  
!apt-get update
!apt install software-properties-common
!sudo dpkg --remove --force-remove-reinstreq python3-pip python3-setuptools python3-wheel
!apt-get install python3-pip

print('Git clone project and install requirements...')
!git clone https://github.com/VideoCrafter/VideoCrafter &> /dev/null
%cd VideoCrafter 
!export PYTHONPATH=/content/VideoCrafter:$PYTHONPATH 

!python3.8 -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
!apt update
!apt install ffmpeg &> /dev/null  
!python3.8 -m pip install pytorch-lightning==1.8.3 omegaconf==2.1.1 einops==0.3.0 transformers==4.25.1
!python3.8 -m pip install opencv-python==4.1.2.30 imageio==2.9.0 imageio-ffmpeg==0.4.2
!python3.8 -m pip install av moviepy
!python3.8 -m pip install -e .

In [None]:
### download all model form hugging-face
! rm -rf models/
! git lfs install
! git clone https://huggingface.co/VideoCrafter/t2v-version-1-1/
! mv t2v-version-1-1/models .

Updated git hooks.
Git LFS initialized.
Cloning into 't2v-version-1-1'...
remote: Enumerating objects: 55, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (10/10), done.[K
remote: Total 55 (delta 1), reused 0 (delta 0), pack-reused 45[K
Unpacking objects: 100% (55/55), 7.17 KiB | 667.00 KiB/s, done.
Filtering content: 100% (10/10), 1.74 GiB | 8.94 MiB/s, done.
Encountered 2 file(s) that may not have been copied correctly on Windows:
	models/base_t2v/model_rm_wtm.ckpt
	models/base_t2v/model.ckpt

See: `git lfs help smudge` for more details.


### Base T2V: Generic Text-to-video Generation

In [None]:
!python3.8 -m pip install omegaconf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
PROMPT="astronaut riding a horse outer space" #@param {type:"string"}
OUTDIR="results/"

BASE_PATH="models/base_t2v/model.ckpt"
CONFIG_PATH="models/base_t2v/model_config.yaml"

! python3.8 scripts/sample_text2video.py \
    --ckpt_path $BASE_PATH \
    --config_path $CONFIG_PATH \
    --prompt "$PROMPT" \
    --save_dir $OUTDIR \
    --n_samples 1 \
    --batch_size 1 \
    --seed 1000 \
    --show_denoising_progress

In [None]:
# visualize
from IPython.display import HTML
from base64 import b64encode
import os, sys, glob

# get the last from results

mp4_name = sorted(os.listdir(OUTDIR+'/videos'))[0]

mp4_name = os.path.join(OUTDIR+'/videos', mp4_name)

mp4 = open('{}'.format(mp4_name),'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

print('Display animation: {}'.format(mp4_name), file=sys.stderr)
display(HTML("""
  <video width=256 controls>
        <source src="%s" type="video/mp4">
  </video>
  """ % data_url))

## VideoLoRA: Personalized Text-to-Video Generation with LoRA

In [None]:
PROMPT="astronaut riding a horse" #@param {type:"string"}
OUTDIR="results/videolora"

BASE_PATH="models/base_t2v/model.ckpt"
CONFIG_PATH="models/base_t2v/model_config.yaml"

LORA_PATH="models/videolora/lora_003_MakotoShinkaiYourName_style.ckpt" #@param ["models/videolora/lora_001_Loving_Vincent_style.ckpt", "models/videolora/lora_002_frozenmovie_style.ckpt", "models/videolora/lora_003_MakotoShinkaiYourName_style.ckpt", "models/videolora/lora_004_coco_style.ckpt"]


### get tags from lora model
lora_dict = {
    "models/videolora/lora_001_Loving_Vincent_style.ckpt": ", Loving Vincent style", 
    "models/videolora/lora_002_frozenmovie_style.ckpt": ", frozenmovie style",
    "models/videolora/lora_003_MakotoShinkaiYourName_style.ckpt": ", MakotoShinkaiYourName style", 
    "models/videolora/lora_004_coco_style.ckpt": ", coco style"
}

TAG=lora_dict[LORA_PATH]

! python3.8 scripts/sample_text2video.py \
    --ckpt_path $BASE_PATH \
    --config_path $CONFIG_PATH \
    --prompt "$PROMPT" \
    --save_dir $OUTDIR \
    --n_samples 1 \
    --batch_size 1 \
    --seed 1000 \
    --show_denoising_progress \
    --inject_lora \
    --lora_path $LORA_PATH \
    --lora_trigger_word "$TAG" \
    --lora_scale 1.0

Global seed set to 1000
config: 
 {'model': {'target': 'lvdm.models.ddpm3d.LatentDiffusion', 'params': {'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'video', 'cond_stage_key': 'caption', 'image_size': [32, 32], 'video_length': 16, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'scale_by_std': False, 'scale_factor': 0.18215, 'unet_config': {'target': 'lvdm.models.modules.openaimodel3d.UNetModel', 'params': {'image_size': 32, 'in_channels': 4, 'out_channels': 4, 'model_channels': 320, 'attention_resolutions': [4, 2, 1], 'num_res_blocks': 2, 'channel_mult': [1, 2, 4, 4], 'num_heads': 8, 'transformer_depth': 1, 'context_dim': 768, 'use_checkpoint': True, 'legacy': False, 'kernel_size_t': 1, 'padding_t': 0, 'temporal_length': 16, 'use_relative_position': True}}, 'first_stage_config': {'target': 'lvdm.models.autoencoder.AutoencoderKL', 'params': {'embed_dim': 4, 'monitor': 'val

In [None]:
# visualize
from IPython.display import HTML
from base64 import b64encode
import os, sys, glob

# get the last from results

mp4_name = sorted(os.listdir(OUTDIR+'/videos'))[0]

mp4_name = os.path.join(OUTDIR+'/videos', mp4_name)

mp4 = open('{}'.format(mp4_name),'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

print('Display animation: {}'.format(mp4_name), file=sys.stderr)
display(HTML("""
  <video width=256 controls>
        <source src="%s" type="video/mp4">
  </video>
  """ % data_url))

Display animation: results/videolora/videos/astronaut_riding_a_horse,_MakotoShinkaiYourName_style_seed01000_000.mp4
