# Generative Fill example on Amazon SageMaker using DLC container.

In this notebook, we explore how to build generative fill application and host Stable Diffusion/ ControlNet / segment anything models on SageMaker asynchronous endpoint using BYOC (Bring-your-own-container).

In this notebook, under the hood we use stable-diffusion-webui and extensions to generate image. 

Note - Amazon Web Services has no control or authority over the third-party generative AI service referenced in this Workshop, and does not make any representations or warranties that the third-party generative AI service is secure, virus-free, operational, or compatible with your production environment and standards. You are responsible for making your own independent assessment of the content provided in this Workshop, and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party generative AI service referenced in this Workshop. The content of this Workshop: (a) is for informational purposes only, (b) represents current Amazon Web Services product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from Beijing Sinnet Technology Co., Ltd. (“Sinnet”), Ningxia Western Cloud Data Technology Co., Ltd. (“NWCD”), Amazon Connect Technology Services (Beijing) Co., Ltd. (“Amazon”), or their respective affiliates, suppliers or licensors.  Amazon Web Services’ content, products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied.  The responsibilities and liabilities of Sinnet, NWCD or Amazon to their respective customers are controlled by the applicable customer agreements. 

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

---

## Build Docker image and push to ECR.

Initialize the variables for SageMaker default bucket, role, and AWS account ID, and current AWS region.

In [1]:
import sagemaker
import boto3 

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
role="AmazonSageMaker-ExecutionRole-20220920T203057"

account_id = boto3.client("sts").get_caller_identity().get("Account")
region_name = boto3.session.Session().region_name
inference_image="sd3-compyui-notebook-7-12"

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


In [4]:
%%sh -s "$region_name" "$role" "$bucket" "$account_id" "$inference_image"
region=$1
account=$4
inference_image=$5
echo $region   $account
aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $account.dkr.ecr.$region.amazonaws.com.cn

inference_fullname=$account.dkr.ecr.$region.amazonaws.com.cn/$inference_image:latest
echo $inference_fullname

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${inference_image}" --region $region || aws ecr create-repository --repository-name "$inference_image" --region $region

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${inference_image}" --region ${region}
    echo "I am here created new ECR Repo $inference_image"
fi

AWSchinaAccount="727897471807"

# Get the login command from ECR and execute it directly
docker login -u AWS -p $(aws ecr get-login-password --region $region) $AWSchinaAccount.dkr.ecr.$region.amazonaws.com.cn

aws ecr set-repository-policy \
    --repository-name "${inference_image}" \
    --policy-text "file://ecr-policy.json" \
    --region ${region}

docker build -t ${inference_image} -f Dockerfile.inference . --build-arg REGION=$region


docker tag ${inference_image} ${inference_fullname}

docker push ${inference_fullname}

cn-north-1 415056049790


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded
415056049790.dkr.ecr.cn-north-1.amazonaws.com.cn/sd3-compyui-notebook-7-11:latest
{
    "repositories": [
        {
            "repositoryArn": "arn:aws-cn:ecr:cn-north-1:415056049790:repository/sd3-compyui-notebook-7-11",
            "registryId": "415056049790",
            "repositoryName": "sd3-compyui-notebook-7-11",
            "repositoryUri": "415056049790.dkr.ecr.cn-north-1.amazonaws.com.cn/sd3-compyui-notebook-7-11",
            "createdAt": 1720698326.499,
            "imageTagMutability": "MUTABLE",
            "imageScanningConfiguration": {
                "scanOnPush": false
            },
            "encryptionConfiguration": {
                "encryptionType": "AES256"
            }
        }
    ]
}


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



Login Succeeded
{
    "registryId": "415056049790",
    "repositoryName": "sd3-compyui-notebook-7-11",
    "policyText": "{\n  \"Version\" : \"2008-10-17\",\n  \"Statement\" : [ {\n    \"Sid\" : \"new statement\",\n    \"Effect\" : \"Allow\",\n    \"Principal\" : \"*\",\n    \"Action\" : [ \"ecr: CompleteLayerUpload\", \"ecr: InitiateLayerUpload\", \"ecr: ListImages\", \"ecr:BatchCheckLayerAvailability\", \"ecr:BatchGetImage\", \"ecr:DescribeImages\", \"ecr:DescribeRepositories\", \"ecr:GetDownloadUrlForLayer\" ]\n  } ]\n}"
}


#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.inference
#1 transferring dockerfile: 590B done
#1 DONE 0.0s

#2 [auth] sharing credentials for 727897471807.dkr.ecr.cn-north-1.amazonaws.com.cn
#2 DONE 0.0s

#3 [internal] load metadata for 727897471807.dkr.ecr.cn-north-1.amazonaws.com.cn/pytorch-inference:2.3.0-gpu-py311
#3 DONE 0.1s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [ 1/10] FROM 727897471807.dkr.ecr.cn-north-1.amazonaws.com.cn/pytorch-inference:2.3.0-gpu-py311@sha256:1a5b621b7e15af17a989af9628ceef820984b2a589dd442a654098564fd4b5f2
#5 DONE 0.0s

#6 [ 3/10] WORKDIR /opt/ml/code
#6 CACHED

#7 [ 2/10] RUN mkdir -p /opt/ml/code
#7 CACHED

#8 [ 4/10] RUN git clone https://github.com/comfyanonymous/ComfyUI.git /opt/ml/code
#8 CACHED

#9 [internal] load build context
#9 transferring context: 374B done
#9 DONE 0.0s

#10 [ 5/10] RUN pip install -r requirements.txt
#10 0.786 Collecting t

The push refers to repository [415056049790.dkr.ecr.cn-north-1.amazonaws.com.cn/sd3-compyui-notebook-7-11]
4f57158c2ca4: Preparing
35d882d91a5a: Preparing
7989612c6aed: Preparing
5f70bf18a086: Preparing
5f70bf18a086: Preparing
fef069097b59: Preparing
ae856c1a8554: Preparing
5f70bf18a086: Preparing
b353e8f30fed: Preparing
02428eed6535: Preparing
4a0a5c128293: Preparing
3e6a356c8e45: Preparing
484aaa05a493: Preparing
48786340ab74: Preparing
5b0bdb8eec88: Preparing
348f0dd546ad: Preparing
b89286a316e3: Preparing
87253ce52fc2: Preparing
dc1df578e095: Preparing
a976e35b0162: Preparing
9020defa345e: Preparing
2243997e691d: Preparing
51e59867d98a: Preparing
8b942a46ff73: Preparing
2316d1d13f80: Preparing
23aa9e3f79b2: Preparing
1de5b0cf4a95: Preparing
2b4ab9c66c33: Preparing
4c4b5ef162c6: Preparing
def14ac3e467: Preparing
ab8967843163: Preparing
46d81ba1b88f: Preparing
cca198a3aa21: Preparing
6e2c6cec7c79: Preparing
1165bb2e119f: Preparing
ef177fb935b3: Preparing
5cb9ea7abb90: Preparing
5f0f0

Upload the dummy file to S3 to meet the requirement of SageMaker Endpoint for model data.

In [5]:
model_data = "s3://{0}/SD3-ComfyUI-fake/data/model.tar.gz".format(bucket)
!touch dummy
!tar czvf model.tar.gz dummy
!rm dummy
!aws s3 cp model.tar.gz $model_data

dummy
upload: ./model.tar.gz to s3://sagemaker-cn-north-1-415056049790/SD3-ComfyUI-fake/data/model.tar.gz


## Deploy to SageMaker Asychronous Endpoint

Initialized the variables for URI of Docker Inference Endpoint.

In [6]:
model_name = None
image_uri = "{0}.dkr.ecr.{1}.amazonaws.com.cn/{2}:latest".format(
    account_id, region_name, inference_image
)
print(image_uri)

415056049790.dkr.ecr.cn-north-1.amazonaws.com.cn/sd3-compyui-notebook-7-11:latest


Define the models configuration in order to download those models from one of source - HTTP, S3 and HuggingFace. Note: Here as an example the Lora model - 2bNierAutomataLora_v2b.safetensors and ControlNet model - control_sd15_canny.pth are going to be downloaded from Civitai and Huggingface directly once the SageMaker endpoint is created.

In [7]:
# comments out
import json

huggingface_models = [
    {

    }
]

model_environment = {
}

In [9]:
import json

huggingface_models = [
    {
    }
]

model_environment = {
}

Define the model, instance type and instance initial count for SageMaker endpoint.

In [10]:
from sagemaker.model import Model
from sagemaker.predictor import Predictor

model = Model(
    name=model_name,
    model_data=model_data,
    role=role,
    image_uri=image_uri,
    env=model_environment,
    predictor_cls=Predictor,
)

instance_type = "ml.g5.xlarge"
instance_count = 1

Define the SageMaker Asychronous Inference config

In [11]:
from sagemaker.async_inference import AsyncInferenceConfig

async_config = AsyncInferenceConfig(
    output_path="s3://{0}/{1}/asyncinvoke/out/".format(bucket, "sd3-comfyui")
)

Here we use asynchronous inference since asynchronous inference is more suitable for workloads with large payload sizes and long inference processing times. 

In [None]:
predictor = model.deploy(
    instance_type=instance_type,
    initial_instance_count=instance_count,
    container_startup_health_check_timeout=1800,
    async_inference_config=async_config,
)

-----------------------

## Generate initial image using text prompt

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()
predictor.deserializer = JSONDeserializer()

inputs = {
    "task": "text-to-image",
    "txt2img_payload": {
        "prompt": "((Best quality)), ((masterpiece)), ((realistic)), (detailed), cute panda ((standing in a asian garden with cherry trees)) ((masterpiece)), absurdres, HDR",
        "negative_prompt": "(bad quality)",
        "seed": 2816240246,
        "sampler_name": "Euler a",
        "batch_size": 1,
        "n_iter": 1,
        "steps": 20,
        "cfg_scale": 7,
        "width": 512,
        "height": 768,
        "alwayson_scripts": {},
    },
}

prediction = predictor.predict_async(inputs)

Helper function for S3.

In [None]:
import json
import io

s3_resource = boto3.resource("s3")


def get_bucket_and_key(s3uri):
    pos = s3uri.find("/", 5)
    bucket = s3uri[5:pos]
    key = s3uri[pos + 1 :]
    return bucket, key

Wait until the asychronous inference is done in case we use asynchronous inference for image generation. 

In [None]:
from sagemaker.async_inference.waiter_config import WaiterConfig

print(f"Response object: {prediction}")
print(f"Response output path: {prediction.output_path}")
print("Start Polling to get response:")

import time

start = time.time()

config = WaiterConfig(
    max_attempts=100, delay=10  #  number of attempts  #  time in seconds to wait between attempts
)

prediction.get_result(config)

print(f"Time taken: {time.time() - start}s")

Process the generated images from asynchronous inference result.

In [None]:
import traceback
from PIL import Image
import uuid
from io import BytesIO
from datetime import datetime
import base64

try:
    output_bucket, output_key = get_bucket_and_key(prediction.output_path)
    output_obj = s3_resource.Object(output_bucket, output_key)
    body = output_obj.get()["Body"].read().decode("utf-8")
    image_object = json.loads(body)["images"][0]
    image = Image.open(BytesIO(base64.b64decode(image_object)))
    image.show()
    initial_image_filename = datetime.now().strftime(f"%Y%m%d%H%M%S-{uuid.uuid4()}.png")
    image.save(initial_image_filename)
except Exception as e:
    traceback.print_exc()
    print(e)

## Expand initial image using text prompt and ControlNet models

ControlNet is a neural network structure to control diffusion models by adding extra conditions.

In [None]:
from PIL import Image
import base64
import io


def encode_image_to_base64(image):
    with io.BytesIO() as output_bytes:
        if isinstance(image, dict):
            image = image["image"]
        format = "PNG" if image.mode == "RGBA" else "JPEG"
        image.save(output_bytes, format=format)
        bytes_data = output_bytes.getvalue()

    encoded_string = base64.b64encode(bytes_data)

    base64_str = str(encoded_string, "utf-8")
    mimetype = "image/jpeg" if format == "JPEG" else "image/png"
    image_encoded_in_base64 = (
        "data:" + (mimetype if mimetype is not None else "") + ";base64," + base64_str
    )
    return image_encoded_in_base64


def decode_base64_to_image(encoding):
    if encoding.startswith("data:image/"):
        encoding = encoding.split(";")[1].split(",")[1]
    try:
        image = Image.open(io.BytesIO(base64.b64decode(encoding)))
        return image
    except Exception as e:
        print(e)

Define the payload for SageMaker inference.

In [None]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()
predictor.deserializer = JSONDeserializer()

inputs = {
    "task": "image-to-image",
    "img2img_payload": {
        "prompt": "((Best quality)), ((masterpiece)), ((realistic)), (detailed), cute panda ((standing in a asian garden with cherry trees)) ((masterpiece)), absurdres, HDR",
        "negative_prompt": "(bad quality)",
        "init_images": [encode_image_to_base64(image)],
        "mask": None,
        "steps": 20,
        "sampler_name": "Euler a",
        "batch_size": 1,
        "n_iter": 1,
        "cfg_scale": 7,
        "denoising_strength": 0.8,
        "seed": 2866147124,
        "height": 768,
        "width": 1280,
        "resize_mode": 0,
        "include_init_images": False,
        "alwayson_scripts": {
            "controlnet": {
                "args": [
                    {
                        "enabled": True,
                        "module": "inpaint_only+lama",
                        "model": "control_v11p_sd15_inpaint [ebff9138]",
                        "image": encode_image_to_base64(image),
                        "resize_mode": "Resize and Fill",
                        "low_vram": False,
                        "weight": 1,
                        "guidance_start": 0,
                        "guidance_end": 1,
                        "pixel_perfect": False,
                        "control_mode": "ControlNet is more important",
                    }
                ]
            }
        },
    },
}

prediction = predictor.predict_async(inputs)

Wait until the asynchronous inference is done in case we use asynchronous inference for image generation. 

In [None]:
from sagemaker.async_inference.waiter_config import WaiterConfig

print(f"Response object: {prediction}")
print(f"Response output path: {prediction.output_path}")
print("Start Polling to get response:")

import time

start = time.time()

config = WaiterConfig(
    max_attempts=100, delay=10  #  number of attempts  #  time in seconds to wait between attempts
)

prediction.get_result(config)

print(f"Time taken: {time.time() - start}s")

Process the generated images from asynchronous inference result.

In [None]:
import traceback
from PIL import Image
import uuid
from io import BytesIO
from datetime import datetime
import base64

try:
    output_bucket, output_key = get_bucket_and_key(prediction.output_path)
    output_obj = s3_resource.Object(output_bucket, output_key)
    body = output_obj.get()["Body"].read().decode("utf-8")
    image_object = json.loads(body)["images"][0]
    image2 = Image.open(BytesIO(base64.b64decode(image_object)))
    image2.show()
    image2.save(datetime.now().strftime(f"%Y%m%d%H%M%S-{uuid.uuid4()}.png"))
except Exception as e:
    traceback.print_exc()
    print(e)

## Run generative fill application built with Gradio framework

In [None]:
endpoint_name = predictor.endpoint_name

In [None]:
!git clone https://github.com/xieyongliang/generative-fill-webui.git

In [None]:
!cd ./generative-fill-webui && export sagemaker_endpoint=$endpoint_name && pip install -r requirements.txt && python ui.py

## [Optional] Create auto-scaling group for SageMaker endpoint in case you want to scale it based on specific metrics automatically.

In [None]:
def create_autoscaling_group_for_sagemaker_endpoint(
    endpoint_name, min_capcity=1, max_capcity=2, target_value=5
):
    # application-autoscaling client
    asg_client = boto3.client("application-autoscaling")

    # This is the format in which application autoscaling references the endpoint
    resource_id = f"endpoint/{endpoint_name}/variant/AllTraffic"

    # Configure Autoscaling on asynchronous endpoint down to zero instances
    response = asg_client.register_scalable_target(
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:variant:DesiredInstanceCount",
        MinCapacity=min_capcity,
        MaxCapacity=max_capcity,
    )

    response = asg_client.put_scaling_policy(
        PolicyName=f"Request-ScalingPolicy-{endpoint_name}",
        ServiceNamespace="sagemaker",
        ResourceId=resource_id,
        ScalableDimension="sagemaker:variant:DesiredInstanceCount",
        PolicyType="TargetTrackingScaling",
        TargetTrackingScalingPolicyConfiguration={
            "TargetValue": target_value,
            "CustomizedMetricSpecification": {
                "MetricName": "ApproximateBacklogSizePerInstance",
                "Namespace": "AWS/SageMaker",
                "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
                "Statistic": "Average",
            },
            "ScaleInCooldown": 600,  # duration until scale in begins (down to zero)
            "ScaleOutCooldown": 300,  # duration between scale out attempts
        },
    )


create_autoscaling_group_for_sagemaker_endpoint(predictor.endpoint_name)

## Resource cleanup.

In [None]:
predictor.delete_endpoint()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.


![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/inference|generativeai|llm-workshop|lab12-hosting-controlnet-models-on-sagemaker|stable-diffusion-webui-async-inference-sagemaker-notebook.ipynb)
