# Visual Tourism : ML Application Design on AWS with Hugging Face

The application is called **Visual Tourism**, and aims to be an LLM travel agent for a user that seeks advices when visiting a new place. Taking a picture of a landmark, and uploading it on AWS to run an inference and get informations or advices about where to eat. 

We start by deploying the model to a real-time inference endpoint. Then we add autoscaling to the endpoint, and we finally test the endpoint by uploading a picture of the Eiffel tower in the dedicated S3 bucket.  


1. [Deploy the Model](#deploy-the-model)
   
2. [Add Autoscaling](#add-autoscling)
   
3. [Testing the endpoint](#test-using-the-boto3-sdk)
   
4. [Clean up](#clean-up)

In [2]:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.huggingface import get_huggingface_llm_image_uri
import sagemaker
import boto3
import json
import time

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [3]:
sess = sagemaker.Session()

from_sagemaker_notebook = True
if from_sagemaker_notebook:
    # The execution role is available only when running a notebook within SageMaker.
    try:
        role = sagemaker.get_execution_role()
    except ValueError:
        iam = boto3.client('iam')
        role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

else : 
    # However if we are not in sagemaker notebook instance, we can use the sagemaker execution role's ARN
    # It needs to be defined beforehand
    ACCOUNT_ID = "123456789"
    role = f"arn:aws:iam:{ACCOUNT_ID}::role/service-role/AmazonSageMaker-ExecutionRole-Example"
        

print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker role arn: arn:aws:iam::491809630699:role/service-role/AmazonSageMaker-ExecutionRole-20240303T091759
sagemaker session region: eu-west-3


## Deploy the model

We deploy the model following the instructions in the Idefics model card, however we select the ml.g4dn.12xlarge instance, which is sufficient for hosting the model for our demo.

In [4]:
# Hub Model configuration
hub = {
	'HF_MODEL_ID':'HuggingFaceM4/idefics-9b-instruct',
	'SM_NUM_GPUS': json.dumps(4)  # 4 GPUS are available in the ml.g4dn.12xlarge instance
}

endpoint_name = "Idefics"
image_uri = get_huggingface_llm_image_uri("huggingface",version="1.1.0")

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   name=endpoint_name,           # model name
   env=hub,                      # configuration for loading model from Hub
   role=role,                    # iam role with permissions to create an Endpoint
   image_uri=image_uri,          # Hugging Face image
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [5]:
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
	instance_type="ml.g4dn.12xlarge",
	container_startup_health_check_timeout=900,
)

Using already existing model: Idefics


-----------!

In [6]:
print(predictor.endpoint_name)

Idefics-2024-03-07-14-27-03-927


## Add autoscling

In order to scale out well with the traffic, we can add an autoscaling policy to the endpoint, so that more instances are created if the number of invocations increases or the GPU utilization is high for example. It scales out/in automatically based on the metric we choose, up to maximum instances defined by the maxCapacity parameter.

In [7]:
# Application Auto Scaling client 
autoscale = boto3.client('application-autoscaling') 

# The resource ID is the unique identifier for the endpoint
# There is only one variant for this endpoint that receives 'AllTraffic'
resource_id=f"endpoint/{predictor.endpoint_name}/variant/AllTraffic"


# The scalable target is the SageMaker endpoint
# We can scale out the instance count up to 5 instances
# But note that you may be limited by quotas.
response = autoscale.register_scalable_target(
    ServiceNamespace='sagemaker', 
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=1,
    MaxCapacity=5
)

# We can choose among a variety of scaling policy 
# For the example here we chose the InvocationsPerInstance metric
response = autoscale.put_scaling_policy(
    PolicyName=f'InvocationsPerInstance-Idefics',
    ServiceNamespace='sagemaker',
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 5, # threshold
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance',
        }
    }
)

## Test using the boto3 sdk

We can use the python SDK to upload a picture to the S3 bucket, a picture of the eiffel tower. From the output directory we get the answer from Idefics.

In [9]:
# Upload a test image to the S3 bucket
s3 = boto3.client('s3')
bucket = "YOUR_BUCKET_NAME"
key = "input/"
image_name = "tour_eiffel.jpeg"
image_origin = "images/" + image_name

s3.upload_file(image_origin, bucket, key + image_name)

# Wait a bit ..
time.sleep(3)

# Download the output from the S3 bucket
key = "output/"
tourism_indications = "result_" + image_name.split(".")[0] + ".txt"
s3.download_file(bucket, key + tourism_indications, tourism_indications)

# Read the output
with open(tourism_indications, 'r') as file:
    data = file.read().replace('.', '.\n')
    print(data)

The Eiffel Tower is a famous landmark in Paris, France.
 It is a symbol of the city and a popular tourist attraction.
 The best place to eat near this location is the Michelin-starred Le Jules Verne restaurant.



## Clean up

Don't forget to delete the endpoint once testing is over, to avoid high costs

In [10]:
predictor.delete_model()
predictor.delete_endpoint()