# Building Personalized Avatar Using Generative AI using Amazon SageMaker

Generative AI has become a popular tool for enhancing and accelerating the creative process across various industries, including entertainment, advertising, and art. It enables more personalized experiences for audiences and improves the overall quality of the final products. 

In workshop, you will fine-tune a Stable Diffusion (SD) model to build a personalized avatar generator on Amazon SageMaker and save inference cost with Multi Model Endpoints (MME) at the same time. 

The entire example takes about 1 hour to complete. At the end, you will have a simple [Gradio](https://www.gradio.app/) application to experiment with different prompt and generate avatar images of yourself.


Recommend to use `Data Science 3.0` kernel in SageMaker Studio with a `ml.m5.large` instance.

---


1. [Set Up](#1.-Set-Up)
2. [Prepare Image Data](#2.-Prepare-Image-Data)
3. [Run LoRA Finetuning](#3.-Run-LoRA-Finetuning)
4. [Host Multi-Model Endpoints](#4.-Host-Multi-Model-Endpoints)
5. [Invoke Model](#5.-Invoke-the-utility-model)
6. [Run The Gradio App](#6.-Run-The-Gradio-App)
7. [Clean Up](#7.-Clean-Up)

### 1. Set Up

Installs the dependencies required to package the model and test the fine-tuned model.

In [None]:
!pip install -Uq diffusers
!pip install -Uq peft
!pip install -Uq conda-pack
!pip install -Uq gradio

#### Permissions and environment variables

***
To use Amazon SageMaker, you need to set up and authenticate the use of AWS services. Here, you use the execution role associated with the current notebook as the AWS account role with SageMaker access.
***

In [None]:
import sagemaker
from sagemaker.utils import name_from_base
from sagemaker.experiments.run import Run
from sagemaker.huggingface import HuggingFace
from sagemaker.model import Model
import boto3
from pathlib import Path
import time
from io import BytesIO
import os
import tarfile
import base64
from PIL import Image
import json

In [None]:
role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # bucket to house artifacts
s3_prefix = (
    "stable-diffusion-dreambooth-workshop"  # folder within bucket where code artifact will go
)

region = sess._region_name
account_id = sess.account_id()

### 2. Prepare Image Data

**<span style="color:red">IMPORTANT</span>: upload minimum 10 images of yourself to the `data` folder.** The images needs to capture the essence of how you look clearly from multiple perspectives. Include a front-facing photo, a profile shot from each side, and photos from angles in between. You should also include photos with different facial expressions like smiling, frowning, and a neutral expression. Having a mix of expressions will allow the model to better reproduce your unique facial features. 

---

Then you will upload these images to a S3 location for training.

In [None]:
local_path = "data"
s3_path = f"s3://{bucket}/{s3_prefix}/images"

print(f"your images are uploaded here: {s3_path}\n\n")
print("----------------------\n")
!aws s3 cp {local_path} {s3_path} --recursive

### 3. Run LoRA Finetuning

You are going to use SageMaker's Hugging Face Training Estimator to fine-tune a personalized avatar model. This uses a managed container with HuggingFace Transformers libraries, enabling easy training of transformer based models. (Training takes about 26 mins)

if you take a closer look at the training scripts in `src` folder. Here is an overview of what they are. 

```
|-- src                      Training code directory for the fine tuning job
|   |--launch.py             Entry script for the training job
|   |--requirements.txt      Python modules to extend the container
|   |--trainer.py            LoRA fine tuning script
|   |--train_dreambooth.py   Dreambooth script
|   |--utils.py              Utility functions
    └── sd_lora              A Triton Python backend model template directory for LoRA fine-tuned Stable Diffusion models
        |-- 1
        |   └── model.py
        └── config.pbtxt
```

In [None]:
hyperparameters = {
    "input_data": "/opt/ml/input/data/training",
    "resolution": 512,
    "num_steps": 1000,
    "concept_prompt": "photo of <<TOK>>", # Fine tuned instance
    "class_prompt": "a photo of person", # Fine tuned class
    "lr": 1e-4,
    "grad_accum": 1,
    "train_text_encoder": True,      # to change, you need to comment this out instead of change to False
    "prior_preservation": True,      # to change, you need to comment this out instead of change to False
    "prior_loss_weight": 1.0,
    "num_class_images": 50,
    "lora_r": 128,
    "lora_alpha": 1,
    "lora_dropout": 0.05,
    "lora_text_encoder_r": 64,
    "lora_text_encoder_alpha": 1,
    "lora_text_encoder_dropout": 0.05,
    "face_preprocessing":True,
}

# Define the configuration for the training job
estimator = HuggingFace(
    entry_point="launch.py",
    source_dir="src",
    role=role,
    transformers_version="4.26",
    pytorch_version="1.13",
    py_version="py39",
    instance_count=1,
    instance_type="ml.g5.xlarge",
    hyperparameters=hyperparameters,
    keep_alive_period_in_seconds=600,
)

In [None]:
%%time
experiment_name = "picture-book-statble-diffusion"
run_name = name_from_base("lora-with-dreambooth")
with Run(experiment_name=experiment_name, sagemaker_session=sess, run_name=run_name) as run:
    
    # Start the training job
    data = {'training': s3_path}

    estimator.fit(data)#, wait=False)

In [None]:
initial_model_path = estimator.model_data

%store initial_model_path

print(f"model.tar.gz is located here: {initial_model_path}")

### 4. Host Multi-Model Endpoints

You are going to use SageMaker MME to host your personalized avatar models. MME allows you to define a central S3 location to host your models and SageMaker take care of loading and caching your model dynamically base on your traffic patterns. This hosting approach optimizes resource utilization, save costs, and minimize operational burden of managing thousands of endpoints.


As of today, MME with GPU support NVIDIA Triton as the model server. For more detail on MME triton, please visit refer to this [blog](https://aws.amazon.com/blogs/machine-learning/run-multiple-deep-learning-models-on-gpu-with-amazon-sagemaker-multi-model-endpoints/)

At a high level, model serving using Triton require certain model package format. When using Python backend, a triton config file and a Python script are required. The python script has to be named `modle.py`, and final `model.tar.gz` file should have file structure similar to below:

```
|--sd_lora
   |--config.pbtxt        # triton server configuration
   |--1\
      |--model.py         # inference handler script
    ...
````

In [None]:
# define the central MME S3 bucket location
mme_prefix = f"{s3_prefix}/inference/models"

model_data_url = f"s3://{bucket}/{mme_prefix}/"

print(f"MME S3 bucket: {model_data_url}")

SD can be difficult to dynamically load due to its large model size. A full SD model is  about 6GB on disk, and presents a challenge during intial load. Cold start can take well over 60s to download and unpackage the model files. To optimize this, you have to de-couple the base SD model from it's fine-tuned LoRA weights. The diagram bellow illustrate a new design to centrally share the base SD model and it's conda environment from a central location.

<img src="statics/mme_diagram.png">

When a model needs to be loaded from S3 the first time, you `model.tar.gz` will only contain the LoRA weights (68 MB) comparing to the entire SD model (~6GB).

**Note: this only shares the storage of models. For GPU memory, this still need to load a full SD model for every model instance**

#### Prepare the utility model

To accomplish this without building a custom container, you will have build an utility model to pre-load shared resurces onto the instance. A template of this utility model is located at `models/model_setup` folder.

The utility model does 2 things when invoked:

1. upload a conda pack from S3. This contains all the requirement modules to run the model
2. upload the base SD model from S3

#### 1. Prepare the conda pack

When using the Triton Python backend (which our Stable Diffusion model will run on), you can include your own environment and dependencies. The recommended way to do this is to use [conda pack](https://conda.github.io/conda-pack/) to generate a conda environment archive in `tar.gz` format, and point to it in the `config.pbtxt` file of the models that should use it, adding the snippet: 

```
parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "path_to_your_env.tar.gz"}
}

```
You can use a different environment per model, or the same for all models (read more on this [here](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments)). Since the all of the models that we'll be deploying have the same set of environment requirements, you will create a single conda environment and will use a Python backend to copy that environment into a location where it can be accessed by all models.

> ⚠ **Warning**: The approach for a creating a shared conda environment highlighted here is limited to a single instance deployment only. In the event of auto-scaling, there is no guarantee that the new instance will have the conda environment configured. Since the conda environment for hosting Stable Diffusion models is quite large  the recommended approach for production deployments is to create shared environment by extending the Triton Inference Image.  

Let's start by creating the conda environment with the necessary dependencies; running these cells will output a `sd_env.tar.gz` file.

In [None]:
%%writefile environment.yml
name: mme_env
dependencies:
  - python=3.8
  - pip
  - pip:
      - numpy
      - torch --extra-index-url https://download.pytorch.org/whl/cu118
      - accelerate
      - transformers
      - diffusers
      - xformers
      - peft
      - conda-pack

In [None]:
!conda env create -f environment.yml

Now you can create the environment using the above environment yaml spec

It could take up to 5 min to create the conda environment. The packaged conda environment will be stored in `models/model_setup/` directory.

In [None]:
!conda pack -n mme_env -o models/model_setup/sd_env.tar.gz

#### Prepare the stable diffusion base model

In [None]:
import diffusers
import torch 
from peft import PeftModel
import os

device="cuda"


pipe = diffusers.StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1",
                                                             cache_dir='hf_cache',
                                                             torch_dtype=torch.float16,
                                                             revision="fp16")

# save the base model, you will need to use this for inference.
sd_dir = 'stable_diff'
pipe.save_pretrained(sd_dir)

Store this into `models/model_setup/` directory

In [None]:
sd_tar = f"models/model_setup/{sd_dir}.tar.gz"

s3_client = boto3.client("s3")

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))
        
    print(f"SD base model created here: {output_filename}")

make_tarfile(sd_tar, sd_dir)

Upload the utility model to S3 bucket (this may take a few minutes)

In [None]:
!rm -rf `find -type d -name .ipynb_checkpoints`

In [None]:
model_repo = "models"

model_name = "model_setup"
tar_name = f"{model_name}.tar.gz"
!tar -C $model_repo -czvf $tar_name $model_name
sess.upload_data(path=tar_name, bucket=bucket, key_prefix=mme_prefix)
!rm $tar_name

#### Deploy endpoint
Now, you get the correct URI for the SageMaker Triton container image. Check out all the available Deep Learning Container images that AWS maintains [here](https://github.com/aws/deep-learning-containers/blob/master/available_images.md). 

In [None]:
# account mapping for SageMaker Triton Image
account_id_map = {
    "us-east-1": "785573368785",
    "us-east-2": "007439368137",
    "us-west-1": "710691900526",
    "us-west-2": "301217895009",
    "eu-west-1": "802834080501",
    "eu-west-2": "205493899709",
    "eu-west-3": "254080097072",
    "eu-north-1": "601324751636",
    "eu-south-1": "966458181534",
    "eu-central-1": "746233611703",
    "ap-east-1": "110948597952",
    "ap-south-1": "763008648453",
    "ap-northeast-1": "941853720454",
    "ap-northeast-2": "151534178276",
    "ap-southeast-1": "324986816169",
    "ap-southeast-2": "355873309152",
    "cn-northwest-1": "474822919863",
    "cn-north-1": "472730292857",
    "sa-east-1": "756306329178",
    "ca-central-1": "464438896020",
    "me-south-1": "836785723513",
    "af-south-1": "774647643957",
}


region = boto3.Session().region_name
if region not in account_id_map.keys():
    raise ("UNSUPPORTED REGION")

base = "amazonaws.com.cn" if region.startswith("cn-") else "amazonaws.com"
mme_triton_image_uri = (
    "{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:22.12-py3".format(
        account_id=account_id_map[region], region=region, base=base
    )
)

you are now ready to configure and deploy the multi-model endpoint

In [None]:
sm_client = boto3.client(service_name="sagemaker")

container = {
    "Image": mme_triton_image_uri,
    "ModelDataUrl": model_data_url,     # S3 location of the models
    "Mode": "MultiModel",
}

In [None]:
sm_model_name = name_from_base(f"{mme_prefix.split('/')[0]}-models")

create_model_response = sm_client.create_model(
    ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

print("Model Arn: " + create_model_response["ModelArn"])

Create a SageMaker endpoint configuration.

In [None]:
endpoint_config_name = name_from_base(f"{mme_prefix.split('/')[0]}-epc")

instance_type = 'ml.g5.xlarge'

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": instance_type,
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": sm_model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

Create the endpoint, and wait for it to transition to InService state. (This takes about 5 mins)

In [None]:
endpoint_name = name_from_base(f"{mme_prefix.split('/')[0]}-ep")

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

### 5. Invoke the utility model
Prior to invoking any of the personalized avatar models, you first invoke the utility model to load the conda environment and stable diffusion base model Refer to the [model.py](./models/model_setup/1/model.py) file in the `models/model_setup/1` directory for more details on the implementation.

In [None]:
# invoke the setup_conda model to create the shared conda environment
sm_runtime = boto3.client("sagemaker-runtime")

inputs = dict(input_args = "hello")

payload = {
    "inputs":
        [{"name": name, "shape": [1,1], "datatype": "BYTES", "data": [data]} for name, data in inputs.items()]
}

response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/octet-stream",
    Body=json.dumps(payload),
    TargetModel="model_setup.tar.gz",
)

output = json.loads(response["Body"].read().decode("utf8"))["outputs"]
output

#### Invoke the LoRA fine tuned model

To demonstrate the capability of MME, you will make multiple copy of our fine-tuned model

In [None]:
models_loaded = []
model_count = 0

# create a dummy payload ====================
prompt = "<<TOK>>"
negative_prompt = ""

gen_args = json.dumps(dict(num_inference_steps=50, guidance_scale=7, seed=0))

inputs = dict(prompt = prompt,
              negative_prompt = negative_prompt,
              gen_args = gen_args)

payload = {
    "inputs":
        [{"name": name, "shape": [1,1], "datatype": "BYTES", "data": [data]} for name, data in inputs.items()]
}

# make replica and load the models to MME S3 location

for x in range(3):
    # make a copy of the model
    model_name = f"avatar-model-v{x}.tar.gz"
    !aws s3 cp {initial_model_path} {model_data_url}{model_name}
    
    # make a inference request to load model into memory
    response = sm_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType="application/octet-stream",
            Body=json.dumps(payload),
            TargetModel=model_name, 
        )
    
    models_loaded.append(model_name)
        
    model_count+=1

print(f"\n\nLoaded {model_count} personalized avatar models ")

In [None]:
# helper functions to encode and decode images
def decode_image(img):
    buff = BytesIO(base64.b64decode(img.encode("utf8")))
    image = Image.open(buff)
    return image

In [None]:
import random

prompt = "TOK style illustration of a boy in a yard, cute, smiling, trees in the background, high Res, 2 dimension"

# prompt = """<<TOK>> epic portrait, zoomed out, blurred background cityscape, bokeh, perfect symmetry, by artgem, artstation ,concept art,cinematic lighting, highly detailed, 
# octane, concept art, sharp focus, rockstar games,
# post processing, picture of the day, ambient lighting, epic composition"""
negative_prompt = """
beard, goatee, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, 
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile, 
unprofessional, failure, crayon, oil, label, thousand hands
"""

seed = random.randint(1, 1000000000)
gen_args = json.dumps(dict(num_inference_steps=50, guidance_scale=7, seed=seed))

inputs = dict(prompt = prompt,
              negative_prompt = negative_prompt,
              gen_args = gen_args)

payload = {
    "inputs":
        [{"name": name, "shape": [1,1], "datatype": "BYTES", "data": [data]} for name, data in inputs.items()]
}

In [None]:
%%time
response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/octet-stream",
    Body=json.dumps(payload),
    TargetModel=models_loaded[0],
)
output = json.loads(response["Body"].read().decode("utf8"))["outputs"]
original_image = decode_image(output[0]["data"][0])
original_image

## 6. Run The Gradio App

Gradio is an open-source Python library that allows developers to easily create and share custom web-based interfaces for their machine learning models, without requiring any web development skills. 

After you have installed Gradio, run the code below. The interative UI will render directly in the output cell. You can interact with your models and generate avatars. Have fun :)

---

**Example prompt:** front portrait, with glasses, zoomed out, young and handsome, perfectly centered, anime, cute-fine-face, illustration, realistic shaded perfect face, fine details, image premiere, 4k resolution, a masterpiece

In [None]:
import gradio as gr
import numpy

with gr.Blocks() as demo:
    gr.Markdown("# Personalized Avatar Generator")
    with gr.Row():
        with gr.Column(scale=1):

            models = gr.Dropdown(choices=models_loaded, type="value",
                                 info="Choose a model", show_label=False)

            prompt = gr.Textbox(show_label=False,
                                info="Prompt:",
                                placeholder="Enter a prompt for your avatar")
            nprompt = gr.Textbox(show_label=False,
                                 info="Negative prompt:",
                                 placeholder="""beard, goatee, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, 
watermark, grainy, signature, cut off, draft, amateur, multiple, gross, weird, uneven, furnishing, decorating, decoration, furniture, text, poor, low, basic, worst, juvenile, 
unprofessional, failure, crayon, oil, label, thousand hands
""")

            create = gr.Button(value="Create")
        with gr.Column(scale=1):
            output_img = gr.Image(label="Output Image", type="pil", height=400)


    def generate_avatar(model_name, p, np, inf_steps=50, scale=10):
        
        s = random.randint(1, 1000000000)
        
        gen_args = json.dumps(dict(num_inference_steps=inf_steps, guidance_scale=scale, seed=s))

        inputs = dict(prompt = f"<<TOK>>, {p}",
                      negative_prompt = np,
                      gen_args = gen_args)

        payload = {
            "inputs":
                [{"name": name, "shape": [1,1], "datatype": "BYTES", "data": [data]} for name, data in inputs.items()]
        }
        
        response = sm_runtime.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType="application/octet-stream",
            Body=json.dumps(payload),
            TargetModel=model_name,
        )
        output = json.loads(response["Body"].read().decode("utf8"))["outputs"]
        output_image = decode_image(output[0]["data"][0])
        
        return output_image

    create.click(generate_avatar, [models, prompt, nprompt], output_img)

demo.launch()

## 7. Clean Up

In [None]:
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_model(ModelName=sm_model_name)