## 1: Install Required Packages


In [None]:
%%sh

# optional: update OS packages in Amazon SageMaker Studio Ubuntu environment
sudo bash -c 'export DEBIAN_FRONTEND=noninteractive && apt-get update -qq -y && apt-get upgrade -qq -y'

# install dependencies
sudo bash -c 'export DEBIAN_FRONTEND=noninteractive && apt-get install -y git git-lfs libgl1 ffmpeg wget'

In [None]:
%pip install -Uq sagemaker boto3 botocore ffmpeg-python ipython diffusers pywget

In [None]:
# restart kernel after installing new packages

import os
os._exit(0)

## 2: Prepare the SVD-XT Model for Inference

Steps to prepare the model for inference: 1/ Download the model artifacts from Hugging Face, 2/ add the custom inference script, 3/ create an archive file from the model artifacts, and 4/ upload the archive file to Amazon S3 for deployment.

Alternately, for steps 2.2-2.4, below, if the model archive is already available from Amazon S3, see '2.2-2.4: Alternate Method if Model Already Exists in S3', below.


### 2.1: Import Packages and Set SageMaker Variables

In [None]:
import os
import json
import shutil

import boto3
from botocore.exceptions import ClientError

import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3 import s3_path_join

In [None]:
sm_session_bucket = None

sm_session = sagemaker.Session()

if sm_session_bucket is None and sm_session is not None:
    # set to default bucket if a bucket name is not given
    sm_session_bucket = sm_session.default_bucket()
try:
    sm_role = sagemaker.get_execution_role()
except ValueError:
    iam_client = boto3.client("iam")
    sm_role = iam_client.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

In [None]:
# name of packaged model archive file
MODEL_ARCHIVE = "model_v2.tar.gz"

In [None]:
print(f"sagemaker role arn: {sm_role}")
print(f"sagemaker bucket: {sm_session.default_bucket()}")
print(f"sagemaker session region: {sm_session.boto_region_name}")

### 2.2: Download the Model Artifacts from Hugging Face

It will take 6-7 minutes to download model artifacts from Hugging Face. You will need a Hugging Face account to get your personal access token. Requires approximately 34 GB of space.

Check the `/dev/nvme1n1` volume, mounted to `/home/sagemaker-user` to ensure it has enough space.


In [None]:
%%sh

df -h $PWD

In [None]:
%%sh

git lfs install

Downloading the model weights from the HuggingFace repository requires a username and personalized access token.

You can create a simple READ-only access token in your [HuggingFace profile settings](https://huggingface.co/settings/tokens).

### 403 Access Denied errors

If you encounter errors during cloning, you need to make sure your username and access token are correct, and that you have accepted the Terms & Conditions of the Stable Video Diffusion model. Visit the [model card](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1) and accept the terms to get access.

In [None]:
%%time
%%sh

user_name="<YOUR_HUGGINGFACE_USERNAME>"
access_token="<YOUR_HUGGING_FACE_ACCESS_TOKEN>"

git clone "https://${user_name}:${access_token}@huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1.git"

### 2.4: TAR GZIP Model Artifacts

Important: Final model archive file will be 14-15 GB and **takes 20-30 minutes** to package and compress.

Continuously poll the size of the model archive file file every 15 seconds from your terminal:

```sh
while sleep 15; do ls -la model_v2.tar.gz; done
```


In [None]:
%%time

import shutil

model_repo = "stable-video-diffusion-img2vid-xt-1-1/"

# copy custom inference script and requirements.txt to model repo
shutil.copy("inference/inference.py", model_repo)
shutil.copy("inference/requirements.txt", model_repo)

# use CLI tools to create model archive (faster than Python-based tar'ing)
! cd {model_repo} && tar --verbose --exclude='.[^/]*' -c --gzip --file ../{MODEL_ARCHIVE} .

### 2.2-2.4: Alternate Method if Model Already Exists in S3

If the model archive file already exists in S3, skip steps 1-3 above. Create an Amazon S3 presigned URL and use the URL to download the model package. This replaces the two steps above: downloading the model artifacts and TAR GZIP. This step takes 4-7 minutes in the same AWS Region.


In [None]:
%%time

import os
from pywget import wget

presigned_s3_url = "<YOUR_PRESIGNED_URL_GOES_HERE>"

wget.download(presigned_s3_url, MODEL_ARCHIVE)

### 2.5: Copy Model Artifacts to S3

This step takes 2-3 minutes in the same AWS Region to copy model archive file to Amazon S3, which is approximately 14 GB.


In [None]:
%%time

import boto3

s3_client = boto3.client("s3")

response = s3_client.upload_file(
    MODEL_ARCHIVE,
    sm_session_bucket,
    f"async_inference/model/{MODEL_ARCHIVE}",
)

## 3: Deploy Model to Amazon SageMaker Endpoint


### 3.1: Deploy Model to Amazon SageMaker Endpoint

Deploying the Amazon SageMaker Asynchronous Inference Endpoint takes 5-7 minutes.

In [None]:
env = {
    "SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600",
    "TS_MAX_RESPONSE_SIZE": "1000000000",
    "TS_MAX_REQUEST_SIZE": "1000000000",
    "MMS_MAX_RESPONSE_SIZE": "1000000000",
    "MMS_MAX_REQUEST_SIZE": "1000000000",
}

huggingface_model = HuggingFaceModel(
    model_data=s3_path_join(
        "s3://", sm_session_bucket, f"async_inference/model/{MODEL_ARCHIVE}"
    ),
    transformers_version="4.37.0",
    pytorch_version="2.1.0",
    py_version="py310",
    env=env,
    role=sm_role,
)

# where the response payload or error will be stored
async_config = AsyncInferenceConfig(
    output_path=s3_path_join("s3://", sm_session_bucket, "async_inference/output"),
    failure_path=s3_path_join(
        "s3://", sm_session_bucket, "async_inference/output_errors"
    ),
)

In [None]:
%%time

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    async_inference_config=async_config,
)

In [None]:
endpoint_name = predictor.endpoint_name
print(endpoint_name)

### 3.2: Optional: Set Endpoint Name Manually

If the model was previously deployed to an endpoint, then uncomment and set the `endpoint_name` variable manually.


In [None]:
# endpoint_name = "<YOUR_MODEL_ENDPOINT_NAME>"