# Notebook Requirements

This notebook requires:
* at least `ml.m5.2xlarge` instance
* at least 80 GiB of storage

Otherwise, the download and packaging of SVD model assets might fail. If the below mentioned storage sizes do not match your system, please verify the requirements are matched.

## 1: Install Required Packages


In [None]:
%%sh

# optional: update system packages in Amazon SageMaker Studio Ubuntu environment
sudo bash -c 'export DEBIAN_FRONTEND=noninteractive && apt-get -qq update -y && apt-get -qq upgrade -y'

# install system packages
sudo bash -c 'export DEBIAN_FRONTEND=noninteractive && apt-get -qq install -y git git-lfs libgl1 ffmpeg wget pigz pv'

In [None]:
%%sh

# install new Python packages
pip install -Uq sagemaker boto3 botocore ffmpeg-python ipython diffusers pywget opencv-python

In [None]:
# restart Python kernel after installing packages

import os
os._exit(0)

## 2: Prepare the SVD-XT Model for Inference

Steps to prepare the model for inference: 1/ Download the model artifacts from Hugging Face, 2/ add the custom inference script, 3/ create an archive file from the model artifacts, and 4/ upload the archive file to Amazon S3 for deployment.

Alternately, for steps 2.2-2.4, below, if the model archive is already available from Amazon S3, see **2.2-2.4: Alternate Method if Model Already Exists in S3'**, below.


### 2.1: Import Packages and Set SageMaker Variables

In [None]:
import os
import json
import shutil

import boto3
from botocore.exceptions import ClientError

import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3 import s3_path_join


MODEL_REPO_PATH = "stable-video-diffusion-img2vid-xt-1-1/"
MODEL_ARCHIVE = "model_v2.tar.gz"

In [None]:
sm_session_bucket = None

sm_session = sagemaker.Session()

if sm_session_bucket is None and sm_session is not None:
    # set to default bucket if a bucket name is not given
    sm_session_bucket = sm_session.default_bucket()
try:
    sm_role = sagemaker.get_execution_role()
except ValueError:
    iam_client = boto3.client("iam")
    sm_role = iam_client.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

In [None]:
print(f"sagemaker role arn: {sm_role}")
print(f"sagemaker bucket: {sm_session_bucket}")
print(f"sagemaker session region: {sm_session.boto_region_name}")

### 2.2: Download the Model Artifacts from Hugging Face

It will take 6-7 minutes to download model artifacts from Hugging Face. You will need a Hugging Face account to get your personal access token. Requires approximately 34 GB of space.

Check the `/dev/nvme1n1` volume, mounted to `/home/sagemaker-user` to ensure it has enough space.


In [None]:
%%sh

df -h $PWD

In [None]:
%%sh

git lfs install

Downloading the model weights from the HuggingFace repository requires a username and personalized access token.

You can create a simple READ-only access token in your [HuggingFace profile settings](https://huggingface.co/settings/tokens).

### 403 Access Denied errors

If you encounter errors during cloning, you need to make sure your username and access token are correct, and that you have accepted the Terms & Conditions of the Stable Video Diffusion model. Visit the [model card](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1) and accept the terms to get access.

In [None]:
%%time

user_name = "<YOUR_HUGGINGFACE_USERNAME>"
access_token = "<YOUR_HUGGING_FACE_ACCESS_TOKEN>"

# use CLI tool to clone the repo with working credentials
! git clone "https://{user_name}:{access_token}@huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1.git" {MODEL_REPO_PATH}

### 2.4: Package the Model Artifacts

The final model archive file will be **approx. 14 GiB** and takes **about 10 minutes on an ml.m5.2xlarge instance** to package and compress.

In [None]:
%%time

import shutil

# copy custom inference script and requirements.txt to model repo
shutil.copy("inference/inference.py", MODEL_REPO_PATH)
shutil.copy("inference/requirements.txt", MODEL_REPO_PATH)

# use CLI tools to create model archive (faster than Python-based tar'ing)
! cd {MODEL_REPO_PATH} && tar --verbose --use-compress-program="pigz --best --recursive" --exclude='.[^/]*' -c . | pv --timer --bytes > ../{MODEL_ARCHIVE}

### 2.2-2.4: Alternate Method if Model Already Exists in S3

If the model archive file already exists in S3, skip steps 1-3 above. Create an Amazon S3 presigned URL and use the URL to download the model package. This replaces the two steps above: downloading the model artifacts and TAR GZIP. This step takes 4-7 minutes in the same AWS Region.


In [None]:
%%time

import os
from pywget import wget

# presigned_s3_url = "<YOUR_PRESIGNED_URL_GOES_HERE>"
# wget.download(presigned_s3_url, MODEL_ARCHIVE)

### 2.5: Upload Model Archive to S3

This step takes 2-3 minutes in the same AWS Region to copy model archive file to Amazon S3.


In [None]:
%%time

import boto3
from ipywidgets import IntProgress
from IPython.display import display

f = IntProgress(min=0, max=os.path.getsize(MODEL_ARCHIVE), description="Uploading:")
display(f)
def progress_update(bytes_amount):
    f.value += bytes_amount

print(f"Uploading model archive {MODEL_ARCHIVE} to S3 bucket {sm_session_bucket}...")

s3_client = boto3.client("s3")
response = s3_client.upload_file(
    Filename=MODEL_ARCHIVE,
    Bucket=sm_session_bucket,
    Key=f"async_inference/model/{MODEL_ARCHIVE}",
    Callback=progress_update,
)
print(response)
print("Upload completed.")

## 3: Deploy Model to Amazon SageMaker Endpoint

Deploying the Amazon SageMaker Asynchronous Inference Endpoint takes 5-7 minutes.

In [None]:
%%time

env = {
    "SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600",
    "TS_MAX_RESPONSE_SIZE": "1000000000",
    "TS_MAX_REQUEST_SIZE": "1000000000",
    "MMS_MAX_RESPONSE_SIZE": "1000000000",
    "MMS_MAX_REQUEST_SIZE": "1000000000",
}

huggingface_model = HuggingFaceModel(
    model_data=s3_path_join(
        "s3://", sm_session_bucket, f"async_inference/model/{MODEL_ARCHIVE}"
    ),
    transformers_version="4.37.0",
    pytorch_version="2.1.0",
    py_version="py310",
    env=env,
    role=sm_role,
)

# where the response payload or error will be stored
async_config = AsyncInferenceConfig(
    output_path=s3_path_join("s3://", sm_session_bucket, "async_inference/output"),
    failure_path=s3_path_join(
        "s3://", sm_session_bucket, "async_inference/output_errors"
    ),
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.4xlarge",
    async_inference_config=async_config,
)

with open("deployed_endpoint_name.txt", "w") as f:
    f.write(predictor.endpoint_name)

print("")
print(f"Deployed endpoint name: {predictor.endpoint_name}")