# Video generation inference with SageMaker

In this notebook, we will walk through the deployment of the pretrained or fine-tuned models.

In [1]:
!pip install huggingface_hub

[33mDEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m

## Download pretrained models

Download the pretrained animate anyone models from huggingface model hub or donwload the fine-tuned models trained with SageMaker HyperPod from S3 bucket.

In [2]:
from huggingface_hub import hf_hub_download

repo_id = 'patrolli/AnimateAnyone'
local_path = 'pretrained_weights/animateanyone'

files = ['denoising_unet.pth', 'reference_unet.pth', 'pose_guider.pth', 'motion_module.pth']
for filename in files:
    hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_path)

reference_unet.pth:  69%|######8   | 2.36G/3.44G [00:00<?, ?B/s]

pose_guider.pth:   0%|          | 0.00/4.35M [00:00<?, ?B/s]

motion_module.pth:   0%|          | 0.00/1.82G [00:00<?, ?B/s]

Download pretrained model of DWPose

In [3]:
repo_id = 'yzd-v/DWPose'
local_path = 'pretrained_weights/DWPose'

files = ['dw-ll_ucoco_384.onnx', 'yolox_l.onnx']
for filename in files:
    hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_path)

dw-ll_ucoco_384.onnx:   0%|          | 0.00/134M [00:00<?, ?B/s]

yolox_l.onnx:   0%|          | 0.00/217M [00:00<?, ?B/s]

Download VAE pretrained model

In [4]:
repo_id = 'stabilityai/sd-vae-ft-mse'
local_path = 'pretrained_weights/sd-vae-ft-mse'

files = ['diffusion_pytorch_model.bin', 'config.json']
for filename in files:
    hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_path)

diffusion_pytorch_model.bin:   0%|          | 0.00/335M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

Download SD1.5 pretrained model

In [5]:
repo_id = 'runwayml/stable-diffusion-v1-5'
local_path = 'pretrained_weights/stable-diffusion-v1-5'

files = ['diffusion_pytorch_model.bin', 'config.json']
for filename in files:
    hf_hub_download(repo_id=repo_id, subfolder='unet', filename=filename, local_dir=local_path)

diffusion_pytorch_model.bin:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Download pretrained CLIP embedding model.

In [6]:
repo_id = 'lambdalabs/sd-image-variations-diffusers'
local_path = 'pretrained_weights'

files = ['pytorch_model.bin', 'config.json']
for filename in files:
    hf_hub_download(repo_id=repo_id, filename=filename, subfolder='image_encoder', local_dir=local_path)

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

image_encoder/config.json:   0%|          | 0.00/703 [00:00<?, ?B/s]

In [9]:
%%bash
git clone https://github.com/MooreThreads/Moore-AnimateAnyone.git
cp animateanyone_infer/pose2vid.py Moore-AnimateAnyone/scripts/
cp animateanyone_infer/vid2pose.py Moore-AnimateAnyone/tools/

Cloning into 'Moore-AnimateAnyone'...


In [10]:
!pip install -r ./Moore-AnimateAnyone/requirements.txt

Collecting clip@ https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip#sha256=b5842c25da441d6c581b53a5c60e0c2127ebafe0f746f8e15561a006c6c3be6a (from -r ./Moore-AnimateAnyone/requirements.txt (line 3))
  Using cached https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip (4.3 MB)
  Preparing metadata (setup.py) ... [?25ldone
[33mDEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m

## Inferance on local environment

### (Optional) Generate a pose sequence file

In [None]:
sample_data_path = 'sample_data/ref.mp4' # replace it with your own reference video file

In [None]:
%%time
!python ./Moore-AnimateAnyone/tools/vid2pose.py --video_path $sample_data_path

### Generate a video with reference image and pose sequence

Prepare a inference config file. 

In [11]:
%%writefile infer_config.yaml
pretrained_base_model_path: "./pretrained_weights/stable-diffusion-v1-5/"
pretrained_vae_path: "./pretrained_weights/sd-vae-ft-mse"
image_encoder_path: "./pretrained_weights/image_encoder"
denoising_unet_path: "./pretrained_weights/animateanyone/denoising_unet.pth"
reference_unet_path: "./pretrained_weights/animateanyone/reference_unet.pth"
pose_guider_path: "./pretrained_weights/animateanyone/pose_guider.pth"
motion_module_path: "./pretrained_weights/animateanyone/motion_module.pth"

test_cases:
    "Moore-AnimateAnyone/configs/inference/ref_images/anyone-1.png": # replace with your own inference image
    - "Moore-AnimateAnyone/configs/inference/pose_videos/anyone-video-1_kps.mp4" # replace with your own pose sequence

weight_dtype: 'fp16'
inference_config:
    unet_additional_kwargs:
      use_inflated_groupnorm: true
      unet_use_cross_frame_attention: false 
      unet_use_temporal_attention: false
      use_motion_module: true
      motion_module_resolutions:
      - 1
      - 2
      - 4
      - 8
      motion_module_mid_block: true 
      motion_module_decoder_only: false
      motion_module_type: Vanilla
      motion_module_kwargs:
        num_attention_heads: 8
        num_transformer_block: 1
        attention_block_types:
        - Temporal_Self
        - Temporal_Self
        temporal_position_encoding: true
        temporal_position_encoding_max_len: 32
        temporal_attention_dim_div: 1

    noise_scheduler_kwargs:
      beta_start: 0.00085
      beta_end: 0.012
      beta_schedule: "linear"
      clip_sample: false
      steps_offset: 1
      ### Zero-SNR params
      prediction_type: "v_prediction"
      rescale_betas_zero_snr: True
      timestep_spacing: "trailing"

    sampler: DDIM



Writing infer_config.yaml


In [12]:
save_dir = "output"

In [13]:
!python Moore-AnimateAnyone/scripts/pose2vid.py --config infer_config.yaml -W 512 -H 784 -L 64 --save_dir $save_dir

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
  return self.fget.__get__(instance, owner)()
pose video has 200 frames, with 30 fps
  num_channels_latents = self.denoising_unet.in_channels
100%|███████████████████████████████████████████| 30/30 [06:39<00:00, 13.30s/it]
100%|███████████████████████████████████████████| 64/64 [00:07<00:00,  8.81it/s]
save_path: output/20240718/0807--seed_42-512x784/anyone-1_anyone-video-1_784x512_3_0807.mp4


In [14]:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager

[33m(Deprecated) Installing extensions with the jupyter labextension install command is now deprecated and will be removed in a future major version of JupyterLab.

Users should manage prebuilt extensions with package managers like pip and conda, and extension authors are encouraged to distribute their extensions as prebuilt packages [0m


In [15]:
from IPython.display import HTML

video_path = "output/20240718/0807--seed_42-512x784/anyone-1_anyone-video-1_784x512_3_0807.mp4"

video_html = f"""
<video width="640" height="480" controls>
  <source src="{video_path}" type="video/mp4">
  Your browser does not support the video tag.
</video>
"""
HTML(video_html)

## The above video is generated from the pretrained weights, which is orginally trained on small sample of video data. In order to improve the quality, the fine tuning on large and better video data source is sugguested. 