## Deploy Video Generation Endpoint on SageMaker

In this notebook, we will deploy an asynchronous [Animate Anyone](https://github.com/MooreThreads/Moore-AnimateAnyone) endpoint with SageMaker.

### Setup

In [1]:
!pip install sagemaker boto3 huggingface_hub transformers torch --upgrade --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
awscli 1.33.24 requires botocore==1.34.142, but you have botocore 1.34.146 which is incompatible.[0m[31m
[0m

In [2]:
import sagemaker
import jinja2
from sagemaker import image_uris
import boto3
import os
import time
import json
from pathlib import Path
import json
import base64

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [3]:
role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs

In [None]:
model_bucket = sess.default_bucket()  # bucket to house artifacts
s3_code_prefix = "animateanyone-serving"  # folder within bucket where code artifact will go
s3_model_prefix = "model_animateanyone_inference"  # folder within bucket where code artifact will go
region = sess._region_name
account_id = sess.account_id()

s3_client = boto3.client("s3")
sm_client = boto3.client("sagemaker")
smr_client = boto3.client("sagemaker-runtime")

jinja_env = jinja2.Environment()

# define a variable to contain the s3url of the location that has the model
pretrained_model_location = f"s3://{model_bucket}/{s3_model_prefix}/"
print(f"Pretrained model will be uploaded to ---- > {pretrained_model_location}")

## Prepare inference script and container image

In [5]:
inference_image_uri = image_uris.retrieve(
    framework="djl-deepspeed", region=sess.boto_session.region_name, version="0.27.0"
)
inference_image_uri

'763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.27.0-deepspeed0.12.6-cu121'

In [6]:
!pip install pyyaml



We use the default inference configuration from [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) as the inference configuration in the endpoint, you can modify it based on your own requirements.

In [7]:
import yaml

def read_yaml(file_path):
    with open(file_path, 'r') as file:
        try:
            yaml_data = yaml.safe_load(file)
            return yaml_data
        except yaml.YAMLError as e:
            print(f"Error reading YAML file: {e}")
            return None

# We use the default configuration for inference
infer_config_dict = read_yaml('./Moore-AnimateAnyone/configs/inference/inference_v2.yaml')
infer_config_dict

{'unet_additional_kwargs': {'use_inflated_groupnorm': True,
  'unet_use_cross_frame_attention': False,
  'unet_use_temporal_attention': False,
  'use_motion_module': True,
  'motion_module_resolutions': [1, 2, 4, 8],
  'motion_module_mid_block': True,
  'motion_module_decoder_only': False,
  'motion_module_type': 'Vanilla',
  'motion_module_kwargs': {'num_attention_heads': 8,
   'num_transformer_block': 1,
   'attention_block_types': ['Temporal_Self', 'Temporal_Self'],
   'temporal_position_encoding': True,
   'temporal_position_encoding_max_len': 32,
   'temporal_attention_dim_div': 1}},
 'noise_scheduler_kwargs': {'beta_start': 0.00085,
  'beta_end': 0.012,
  'beta_schedule': 'linear',
  'clip_sample': False,
  'steps_offset': 1,
  'prediction_type': 'v_prediction',
  'rescale_betas_zero_snr': True,
  'timestep_spacing': 'trailing'},
 'sampler': 'DDIM'}

In [8]:
deployment_config = {
    "model_name": "animateanyone",
    "pretrained_base_model_path": "stable-diffusion-v1-5",
    "pretrained_vae_path": "sd-vae-ft-mse",
    "image_encoder_path": "image_encoder",
    "denoising_unet_path": "animateanyone/denoising_unet.pth",
    "reference_unet_path": "animateanyone/reference_unet.pth",
    "pose_guider_path": "animateanyone/pose_guider.pth",
    "motion_module_path": "animateanyone/motion_module.pth",
    "weight_dtype": "fp16",
    "infer_config": infer_config_dict
}

with open(f"{s3_code_prefix}/deployment_config.json",'w') as file:
    json.dump(deployment_config, file)

### Upload the model artifacts on Amazon S3

we use the pretrained model to create an endpoint, alternatively you can use the fine tuned model for an endpoint.

In [None]:
# upload the model artifacts to s3
model_download_path = "pretrained_weights"

model_artifact = sess.upload_data(path=model_download_path, key_prefix=s3_model_prefix)
print(f"Model uploaded to --- > {model_artifact}")
print(f"We will set option.s3url={model_artifact}")

In [10]:
%%writefile animateanyone-serving/serving.properties
engine = Python
option.tensor_parallel_degree = 1
option.model_id = {{s3url}}
option.prediction_timeout=1200

Overwriting animateanyone-serving/serving.properties


In [11]:
# we plug in the appropriate model location into our `serving.properties` file based on the region in which this notebook is running
template = jinja_env.from_string(Path("animateanyone-serving/serving.properties").open().read())
Path("animateanyone-serving/serving.properties").open("w").write(
    template.render(s3url=pretrained_model_location)
)
!pygmentize animateanyone-serving/serving.properties | cat -n

     1	[36mengine[39;49;00m[37m [39;49;00m=[37m [39;49;00m[33mPython[39;49;00m[37m[39;49;00m
     2	[36moption.tensor_parallel_degree[39;49;00m[37m [39;49;00m=[37m [39;49;00m[33m1[39;49;00m[37m[39;49;00m
     3	[36moption.model_id[39;49;00m[37m [39;49;00m=[37m [39;49;00m[33ms3://sagemaker-us-east-1-822507008821/model_animateanyone_inference/[39;49;00m[37m[39;49;00m
     4	[36moption.prediction_timeout[39;49;00m=[33m1200[39;49;00m[37m[39;49;00m


## Prepare the model tarball file and upload to S3

Note that if you need to clone [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone) if it is not in the current directory.

In [12]:
%%sh
rm -r animateanyone-serving/.ipynb_checkpoints
find ./Moore-AnimateAnyone -type f -name "*.pyc" -delete
cp -r Moore-AnimateAnyone/src animateanyone-serving/src
tar czvf model.tar.gz animateanyone-serving/

animateanyone-serving/
animateanyone-serving/requirements.txt
animateanyone-serving/deployment_config.json
animateanyone-serving/serving.properties
animateanyone-serving/src/
animateanyone-serving/src/__pycache__/
animateanyone-serving/src/__pycache__/__init__.cpython-310.pyc
animateanyone-serving/src/utils/
animateanyone-serving/src/utils/__pycache__/
animateanyone-serving/src/utils/__pycache__/util.cpython-310.pyc
animateanyone-serving/src/utils/.ipynb_checkpoints/
animateanyone-serving/src/utils/.ipynb_checkpoints/util-checkpoint.py
animateanyone-serving/src/utils/util.py
animateanyone-serving/src/pipelines/
animateanyone-serving/src/pipelines/__pycache__/
animateanyone-serving/src/pipelines/__pycache__/pipeline_pose2vid_long.cpython-310.pyc
animateanyone-serving/src/pipelines/__pycache__/context.cpython-310.pyc
animateanyone-serving/src/pipelines/__pycache__/__init__.cpython-310.pyc
animateanyone-serving/src/pipelines/__pycache__/utils.cpython-310.pyc
animateanyone-serving/src/pipe

In [None]:
s3_code_artifact = sess.upload_data("model.tar.gz", model_bucket, s3_code_prefix)
print(f"S3 Code or Model tar ball uploaded to --- > {s3_code_artifact}")

## Deploy model as an asynchronous endpoint

In [15]:
from sagemaker.model import Model
from sagemaker.utils import name_from_base

#inference_image_uri = "763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.3.0-gpu-py311-cu121-ubuntu20.04-sagemaker"

model_name = deployment_config["model_name"]
model = Model(
    image_uri=inference_image_uri,
    model_data=s3_code_artifact,
    role=role,
    name=model_name,
)

In [None]:
from datetime import datetime
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")

async_endpoint_config_name = f"Async-animateanyone-{timestamp}"

s3_output_path = f"s3://{model_bucket}/{s3_code_prefix}/output/async"

sagemaker_session = sagemaker.Session()
boto_session = sagemaker_session.boto_session
sagemaker_client = boto_session.client('sagemaker')

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=async_endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": "ml.g5.2xlarge",
            "InitialInstanceCount": 1
        }
    ],
    AsyncInferenceConfig={
        "OutputConfig": {
            "S3OutputPath": s3_output_path,
            #  Optionally specify Amazon SNS topics
            #"NotificationConfig": {
            #  "SuccessTopic": success_topic,
            #  "ErrorTopic": error_topic,
            #}
        },
        "ClientConfig": {
            "MaxConcurrentInvocationsPerInstance": 10
        }
    }
)
print(f"Created EndpointConfig: {create_endpoint_config_response['EndpointConfigArn']}")

In [18]:
endpoint_name = f"endpoint-{timestamp}" + model_name

create_endpoint_response = sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=async_endpoint_config_name
)

In [None]:
waiter = boto3.client('sagemaker').get_waiter('endpoint_in_service')
print("Waiting for endpoint to create...")
waiter.wait(EndpointName=endpoint_name)

Waiting for endpoint to create...


In [21]:
resp = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
print(f"Endpoint Status: {resp['EndpointStatus']}")

Endpoint Status: InService


In [33]:
endpoint_name

'endpoint-20240723-072940animateanyone'

### Set up autoscaling

In [24]:
client = boto3.client('application-autoscaling') # Common class representing Application Auto Scaling for SageMaker amongst other services

resource_id='endpoint/' + endpoint_name + '/variant/' + 'variant1' # This is the format in which application autoscaling references the endpoint

response = client.register_scalable_target(
    ServiceNamespace='sagemaker', 
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=0,  
    MaxCapacity=5
)

response = client.put_scaling_policy(
    PolicyName='Invocations-ScalingPolicy',
    ServiceNamespace='sagemaker', # The namespace of the AWS service that provides the resource. 
    ResourceId=resource_id, # Endpoint name 
    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # SageMaker supports only Instance Count
    PolicyType='TargetTrackingScaling', # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 1, # The target value for the metric. 
        'CustomizedMetricSpecification': {
            'MetricName': 'SageMakerVariantInvocationsPerInstance',
            'Namespace': 'AWS/SageMaker',
            'Dimensions': [
                {'Name': 'EndpointName', 'Value': endpoint_name }
            ],
            'Statistic': 'Average',
        },
        'ScaleInCooldown': 120, # The cooldown period helps you prevent your Auto Scaling group from launching or terminating 
                                # additional instances before the effects of previous activities are visible. 
                                # You can configure the length of time based on your instance startup time or other application needs.
                                # ScaleInCooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start. 
        'ScaleOutCooldown': 120 # ScaleOutCooldown - The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
    }
)

## Async inference

we use the sample data from `Moore-AnimateAnyone` for testing asynchronous endpoint.

In [None]:
pose_seq_path = "./Moore-AnimateAnyone/configs/inference/pose_videos/anyone-video-1_kps.mp4"
ref_img_path = "./Moore-AnimateAnyone/configs/inference/ref_images/anyone-1.png"
pose_s3_path = f"s3://{model_bucket}/{s3_code_prefix}/inputs/anyone-video-1_kps.mp4"
ref_s3_path = f"s3://{model_bucket}/{s3_code_prefix}/inputs/anyone-1.png"
input_async_s3_path = f"s3://{model_bucket}/{s3_code_prefix}/inputs/input_data_async.json"
output_s3uri = f"s3://{model_bucket}/{s3_code_prefix}/outputs/anyone-video-1.mp4"

input_data_async = {
    "pose_seq_s3uri": pose_s3_path,
    "ref_s3_path": ref_s3_path,
    'height': 512,
    'width': 512,
    'steps': 30,
    'cfg': 3.5,
    'fps': -1,
    'seed': 42,
    'length': 30,
    'output_s3uri': output_s3uri
}

input_async_path = "./input_data_async.json"
with open(input_async_path, 'w') as file:
    json.dump(input_data_async, file)

!aws s3 cp $pose_seq_path $pose_s3_path
!aws s3 cp $ref_img_path $ref_s3_path
!aws s3 cp $input_async_path $input_async_s3_path


In [None]:
sm_runtime = boto3.Session().client("sagemaker-runtime")

response = sm_runtime.invoke_endpoint_async(
    EndpointName=endpoint_name, 
    InputLocation=input_async_s3_path,
    InvocationTimeoutSeconds=3600
)
output_location = response['OutputLocation']
print(f"OutputLocation: {output_location}")

In [28]:
from botocore.exceptions import ClientError
import urllib
import sys

timeout = 1200

def get_output(output_location, timeout=1200):
    output_url = urllib.parse.urlparse(output_location)
    bucket = output_url.netloc
    key = output_url.path[1:]
    start_time = time.time()
    while True:
        try:
            return sess.read_s3_file(bucket=output_url.netloc, key_prefix=output_url.path[1:])
        except ClientError as e:
            if e.response['Error']['Code'] == 'NoSuchKey':
                print("waiting for output...")
                time.sleep(2)
                if time.time() - start_time > timeout:
                    raise Exception("timeout!!!")
                    break
                continue
            raise

In [29]:
output = get_output(output_location)
print(f"Output size in bytes: {((sys.getsizeof(output)))}")

waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting for output...
waiting fo

In [None]:
!aws s3 cp $output_s3uri generated_res.mp4

In [31]:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager

[33m(Deprecated) Installing extensions with the jupyter labextension install command is now deprecated and will be removed in a future major version of JupyterLab.

Users should manage prebuilt extensions with package managers like pip and conda, and extension authors are encouraged to distribute their extensions as prebuilt packages [0m


In [32]:
from IPython.display import HTML

video_path = "generated_res.mp4"

video_html = f"""
<video width="640" height="480" controls>
  <source src="{video_path}" type="video/mp4">
  Your browser does not support the video tag.
</video>
"""
HTML(video_html)

### Image captioning

## Clean up
Uncomment the below cell to delete the endpoint and model when you finish the experiment

In [41]:
sm_client.delete_model(ModelName=model_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=async_endpoint_config_name)

{'ResponseMetadata': {'RequestId': '251a1cfb-9ec0-44f6-9666-525908903f39',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '251a1cfb-9ec0-44f6-9666-525908903f39',
   'content-type': 'application/x-amz-json-1.1',
   'date': 'Tue, 23 Jul 2024 08:52:52 GMT',
   'content-length': '0'},
  'RetryAttempts': 0}}