# Hybrid Search with Amazon OpenSearch Service

**Welcome to Hybrid search notebook. Use this notebook to build a Hybrid Search application powered by Amazon OpenSearch Service**

In this notebook, you will perform the following steps in sequence,

The lab includes the following steps:
1. [Step 1: Get the Cloudformation outputs](#Step-1:-Get-the-Cloudformation-outputs)
2. [Step 2: Create the OpenSearch-Sagemaker ML connector](#Step-2:-Create-the-OpenSearch-Sagemaker-ML-connector)
3. [Step 3: Register and deploy the embedding model in OpenSearch](#Step-3:-Register-and-deploy-the-embedding-model-in-OpenSearch)
4. [Step 4: Create the OpenSearch ingest pipeline with text-embedding processor](#TODO-Step-4:-Create-the-OpenSearch-ingest-pipeline-with-text-embedding-processor)
5. [Step 5: Create the k-NN index](#Step-5:-Create-the-k-NN-index)
6. [Step 6: Prepare the image dataset](#Step-6:-Prepare-the-image-dataset)
7. [Step 7: Ingest the prepared data into OpenSearch](#Step-7:-Ingest-the-prepared-data-into-OpenSearch)
8. [Step 8: Update the environment variables of lambda](#Step-8:-Update-the-environment-variables-of-lambda)
9. [Step 9: Create the Lambda URL](#Step-9:-Create-the-Lambda-URL)
10. [Step 10: Host the Hybrid Search application in EC2](#Step-7:-Host-the-Hybrid-Search-application-in-EC2)

In [157]:
#Install dependencies
#Implement header-based authentication and request authentication for AWS services that support AWS auth v4
%pip install requests_aws4auth
#OpenSearch Python SDK
%pip install opensearch_py
#Progress bar for for loop
%pip install alive-progress

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting alive-progress
  Downloading alive_progress-3.1.5-py3-none-any.whl.metadata (68 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m68.4/68.4 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting about-time==4.2.1 (from alive-progress)
  Downloading about_time-4.2.1-py3-none-any.whl (13 kB)
Collecting grapheme==0.6.0 (from alive-progress)
  Downloading grapheme-0.6.0.tar.gz (207 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m207.3/207.3 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hDownloading alive_progress-3.1.5-py3-none-any.whl (75 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: grapheme
  Building wheel for grapheme (set

## Step 1: Get the Cloudformation outputs

Here, we retrieve the services that are already deployed as a part of the cloudformation template to be used in building the application. The services include,
1. **Sagemaker Endpoint**
2. **OpenSearch Domain** Endpoint
3. **S3** Bucket name
4. **Lambda** Function name 

In [305]:
import sagemaker, boto3, json, time
from sagemaker.session import Session
import subprocess
from IPython.utils import io

cfn = boto3.client('cloudformation')

response = cfn.list_stacks(StackStatusFilter=['CREATE_COMPLETE','UPDATE_COMPLETE'])

for cfns in response['StackSummaries']:
    if('TemplateDescription' in cfns.keys()):
        if('hybrid search' in cfns['TemplateDescription']):
            stackname = cfns['StackName']
stackname

response = cfn.describe_stack_resources(
    StackName=stackname
)
# for resource in response['StackResources']:
#     if(resource['ResourceType'] == "AWS::SageMaker::Endpoint"):
#         SagemakerEmbeddingEndpoint = resource['PhysicalResourceId']

cfn_outputs = cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']

for output in cfn_outputs:
    if('OpenSearchDomainEndpoint' in output['OutputKey']):
        OpenSearchDomainEndpoint = output['OutputValue']
        
    if('EmbeddingEndpointName' in output['OutputKey']):
        SagemakerEmbeddingEndpoint = output['OutputValue']
        
    if('s3' in output['OutputKey'].lower()):
        s3_bucket = output['OutputValue']
        
    if('lambdafunction' in output['OutputKey'].lower()):
        lambdaFunction = output['OutputValue']

region = boto3.Session().region_name  
        

account_id = boto3.client('sts').get_caller_identity().get('Account')



print("stackname: "+stackname)
print("account_id: "+account_id)  
print("region: "+region)
print("SagemakerEmbeddingEndpoint: "+SagemakerEmbeddingEndpoint)
print("OpenSearchDomainEndpoint: "+OpenSearchDomainEndpoint)
print("S3 Bucket: "+s3_bucket)
print("lambda Function : "+lambdaFunction)

stackname: hybridsearch-opensearch-app
account_id: 445083327804
region: us-east-1
SagemakerEmbeddingEndpoint: opensearch-hybrid-search-embedding-gpt-j-6b-b182bc90
OpenSearchDomainEndpoint: search-opensearchservi-75ucark0bqob-bzk6r6h2t33dlnpgx2pdeg22gi.us-east-1.es.amazonaws.com
S3 Bucket: hybridsearch-opensearch-app-s3buckethosting-b7nfsjknlc83
lambda Function : OpenSearchHybridSearch


## Step 2: Create the OpenSearch-Sagemaker ML connector 

Amazon OpenSearch Service AI connectors allows you to create a connector from OpenSearch Service to SageMaker Runtime.
To create a connector, we use the Amazon OpenSearch Domain endpoint, SagemakerEndpoint that hosts the GPT-J-6B embedding model and an IAM role that grants OpenSearch Service access to invoke the sagemaker model (this role is already created as a part of the cloudformation template)

Here, Using the connector_id obtained from the previous step, we register and deploy the model in OpenSearch and get a model identifier (model_id)

In [306]:
import boto3
import requests 
from requests_aws4auth import AWS4Auth
import json

host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)


remote_ml = {
                "sagemaker":
                 {
                     "endpoint_url":"https://runtime.sagemaker."+region+".amazonaws.com/endpoints/"+SagemakerEmbeddingEndpoint+"/invocations",
                     "pre_process_fun": '\n    StringBuilder builder = new StringBuilder();\n    builder.append("\\"");\n    builder.append(params.text_docs[0]);\n    builder.append("\\"");\n    def parameters = "{" +"\\"inputs\\":" + builder + "}";\n    return "{" +"\\"parameters\\":" + parameters + "}";\n    ', 
                    "post_process_fun": '\n    def name = "sentence_embedding";\n    def dataType = "FLOAT32";\n    if (params.embedding == null || params.embedding.length == 0) {\n        return null;\n    }\n    def shape = [params.embedding[0].length];\n    def json = "{" +\n               "\\"name\\":\\"" + name + "\\"," +\n               "\\"data_type\\":\\"" + dataType + "\\"," +\n               "\\"shape\\":" + shape + "," +\n               "\\"data\\":" + params.embedding[0] +\n               "}";\n    return json;\n    ',
                    "request_body": "{ \"text_inputs\": \"${parameters.inputs}\"}"
             
                 },
                
                 "bedrock_text":
                {
                     "endpoint_url":"https://bedrock-runtime."+region+".amazonaws.com/model/amazon.titan-embed-text-v1/invoke",
                    "pre_process_fun": "\n    StringBuilder builder = new StringBuilder();\n    builder.append(\"\\\"\");\n    String first = params.text_docs[0];\n    builder.append(first);\n    builder.append(\"\\\"\");\n    def parameters = \"{\" +\"\\\"inputText\\\":\" + builder + \"}\";\n    return  \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
      
                    "post_process_fun":'\n    def name = "sentence_embedding";\n    def dataType = "FLOAT32";\n    if (params.embedding == null || params.embedding.length == 0) {\n        return null;\n    }\n    def shape = [params.embedding.length];\n    def json = "{" +\n               "\\"name\\":\\"" + name + "\\"," +\n               "\\"data_type\\":\\"" + dataType + "\\"," +\n               "\\"shape\\":" + shape + "," +\n               "\\"data\\":" + params.embedding +\n               "}";\n    return json;\n    ',
                    "request_body": "{ \"inputText\": \"${parameters.inputText}\"}"
                 },
                
                 "bedrock_multimodal":
                {
                     "endpoint_url": "https://bedrock-runtime."+region+".amazonaws.com/model/amazon.titan-embed-image-v1/invoke",
                     "request_body": "{ \"inputText\": \"${parameters.inputText:-null}\", \"inputImage\": \"${parameters.inputImage:-null}\" }",
                      "pre_process_fun": "\n    StringBuilder parametersBuilder = new StringBuilder(\"{\");\n    if (params.text_docs.length > 0 && params.text_docs[0] != null) {\n      parametersBuilder.append(\"\\\"inputText\\\":\");\n      parametersBuilder.append(\"\\\"\");\n      parametersBuilder.append(params.text_docs[0]);\n      parametersBuilder.append(\"\\\"\");\n      \n      if (params.text_docs.length > 1 && params.text_docs[1] != null) {\n        parametersBuilder.append(\",\");\n      }\n    }\n    \n    \n    if (params.text_docs.length > 1 && params.text_docs[1] != null) {\n      parametersBuilder.append(\"\\\"inputImage\\\":\");\n      parametersBuilder.append(\"\\\"\");\n      parametersBuilder.append(params.text_docs[1]);\n      parametersBuilder.append(\"\\\"\");\n    }\n    parametersBuilder.append(\"}\");\n    \n    return  \"{\" +\"\\\"parameters\\\":\" + parametersBuilder + \"}\";",
                     "post_process_fun":'\n    def name = "sentence_embedding";\n    def dataType = "FLOAT32";\n    if (params.embedding == null || params.embedding.length == 0) {\n        return null;\n    }\n    def shape = [params.embedding.length];\n    def json = "{" +\n               "\\"name\\":\\"" + name + "\\"," +\n               "\\"data_type\\":\\"" + dataType + "\\"," +\n               "\\"shape\\":" + shape + "," +\n               "\\"data\\":" + params.embedding +\n               "}";\n    return json;\n    '
                    }
            }



In [286]:
connector_path_url = host+'_plugins/_ml/connectors/_create'
register_model_path_url = host+'_plugins/_ml/models/_register'


headers = {"Content-Type": "application/json"}

for remote_ml_key in remote_ml.keys():
    
    #create connector
    payload_1 = {
       "name": remote_ml_key+": embedding",
       "description": "Test connector for"+remote_ml_key+" remote embedding model",
       "version": 1,
       "protocol": "aws_sigv4",
       "credential": {
          "roleArn": "arn:aws:iam::"+account_id+":role/opensearch-sagemaker-role"
       },
       "parameters": {
          "region": region,
          "service_name": remote_ml_key.split("_")[0]
       },
       "actions": [
          {
             "action_type": "predict",
             "method": "POST",
             "headers": {
                "content-type": "application/json"
             },
             "url": remote_ml[remote_ml_key]["endpoint_url"],
              "pre_process_function": remote_ml[remote_ml_key]["pre_process_fun"],
              "request_body": remote_ml[remote_ml_key]["request_body"],
             "post_process_function": remote_ml[remote_ml_key]["post_process_fun"]
          }
       ]
    }
    

    r_1 = requests.post(connector_path_url, auth=awsauth, json=payload_1, headers=headers)
    remote_ml[remote_ml_key]["connector_id"] = json.loads(r_1.text)["connector_id"]
    
    time.sleep(2)
    
    #register model
    
    payload_2 = { 
                "name": remote_ml_key,
                "function_name":"remote",
                "description": remote_ml_key+" embeddings model",
                "connector_id": remote_ml[remote_ml_key]["connector_id"]
                
                }

    r_2 = requests.post(register_model_path_url, auth=awsauth, json=payload_2, headers=headers)
    remote_ml[remote_ml_key]["model_id"] = json.loads(r_2.text)["model_id"]
    
    time.sleep(2)
    
    #deploy model
    
    deploy_model_path_url = host+'_plugins/_ml/models/'+remote_ml[remote_ml_key]["model_id"]+'/_deploy'

    r_3 = requests.post(deploy_model_path_url, auth=awsauth, headers=headers)
    deploy_status = json.loads(r_3.text)["status"]
    print("Deployment status of the "+remote_ml_key+" model, "+remote_ml[remote_ml_key]["model_id"]+" : "+deploy_status)
    
    
    #test model
    payload_4 = {
                  "parameters": {
                    "inputText": "hello",
                      "inputImage":'/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAEAAHIDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD36oZ51gUEgknoBU1U73+H6GmtxMZ9tZvRR9MmniaRlB3nn0GKz8fNirUZOwYNXZCTLhA/vMfxprqAOp/M1XzJ6/pSlpAOo/KlYdx4UlAdzZ/3jTS8yn5ZG/HmoxMwHb86XzSTkinYVyyssmcFQ3H0qVZAeuV+tUo5f3h5Harw5FS0NDqKjbKY29PSnKwcZH4j0qRjqKKKACiiigAqOWES4ySMelSUUAVxZxe5quVCsyqRgHAya0Kykclix5BYmqQmSru5G4DFNcMON2ajkxvOOlIDmquICMVNEylSGPIqE0gGTgUATBVJ49atCN1GUbHtVSIHcB2zWkAAtJsEVzMV/wBahwO4qIzqsyOjAgnB+lWv9ZnHT1qBtPic5OQfaloGpboooqSgooooAKKKKAGTP5cLt6CoIYFWNARzipJfndY+w+ZqkUY5p7IRTltXklYqAFz3qJrWSM4xn3FadFFwsZLoy8EU3GCD0qW4ZnuWBOQDgVG6cdKoRPFIijkjNT+d5zCNT16kelUkTg1LYbxOwJ+XbSA0QABgdKKKKkoKKKKACiiigApCcDPWlpCuWySfp2oAYinknqTk+9SUUUAFFFQ3Fx5IXABY9jQBQfLXDEepocEDmnAKTuwBk54px2nr/OqJCMfKfrS2hxckeoNAKjpSoFWQOoBYdM0AX6KiimEhKkYYdqlqSgooooAKKKKACiiigAooooAKztVQ+XHIvY4NaNVNRI+zBT1LDFNAzNhZuhOanpsaYXNOqiQpykg9KTn1o/GgCbdslSTt3+lX6zQQ0ZXNXbd98Kk9RwaljRLRRRSGFFFFABRRRQAUUUUAFZmpS7Z41b7uMitOs7VUDRRnuGx+lNAyISKy4FFRQrhalpkhx3o+i/pSjNHPc0wHoxDen41PattldD35FVRgdzUqsFljceuDSGaFFFFSMKKKKACiiigAooooAKz9Rbc8cY7fMa0KyL0/6efYCmhMcF2oKSnhcil20xDMUuBUm2l20ARfhTjlo+tP8uk2kAimBejbfGreozTqhtTmAD0JFTVBQUUUUAFFFFABRRSEhQSTgCgAZgqknoKxpG864eT1PFWLm6807E+7UKrVWETI3ygU6mKKfQIdRmkpQM0AG7FBcEUpXioyKALFoflcehzVms6GXyZQT908GtGkxoKKKKQwooooAKzLu681vLQ/KO/rUt/c7R5KH5j972qiopoTHotTKtNUU5nCD3piH8Ac1GZR2qJnLHmqF1qdva5UkySD+BOT+PYUAaXnH0qRJx3FYz6jLn93EhHUEtnI9aSLVSQA8WWz0Bx2/wD10xXOgVww4NNYVSt7uOYkRv8AMvJU9RVwNuFIZE4q7aSb4AD1Xg1SepLJ9s5Tsw/UUMaNCiiipGFQ3EwghZz17D1NTVkX83m3Hlj7qfzpoCAEuxZjknkmpVFRrUgOBTJHltowOtV5pVhQySNgD9aJ5hDGXILHoAO5rC8+7uW3zjKM3yxqOFGO5oBkGoa/KkM1x5bpbRIWZQMM3OOtc7bawLqSVFtZVRjkqki8g9Bk/SpPFst5B4S1mUfLstXZWK9COmK8q8PX/iTVbVZrTWmjeS5jgkElsNg3k9HPBIALEccd65K6rtfuWvmXS5faJ1Ph623PUZfEqwsI/slxI2cAtKu0Y4wMHitPTdaXUoZHRZI/JfaAxBOcA54+teNXFzrv2e6vjqU39nKvmQSw26XG4HcMsVwE5Ug56E/jXafC6/u9Q8NXFzdI08huWy+zsFX0Fb0OZUbVfjv02t/mVifZuqnQ+C3Xe/8AkekWk0pkQqVMgB2M2ePb6V0VtMZEBYAMPvAVyljIZJWJgIUDOQpOOfat2ycib7rbSo6qRVrYxNN6jV9ksbejDNPbpUEn3TVDNuio1fcoPqM0VBQlxKIIHk9Bx9aw155JyT1q9qsvEcQ7ncaorVITJVpc96aKazU0rkt2GyEnmos09ulR1qjJmL4ytbi+8Fa1a2sTzXE1o6RxoMsxI6AV4Zp3hTxVpVpIlho+oQXMu1XuRbyk4Vw4wuMA5Veefwya+gNSacNZpDcPD5twEcoqnIKse4PpVP8AthrWKUTKLjypWUyphdyqFyceoLYOOMjtV+wdS1iPbqDaZ4fZeG/F1tdNcnQbp7hEaCF/srxokbAhgqKAP4m/E55r1L4U6PfaH4Qks7+1mtpftsjqkqlSVIXBweex/Kuglup/tg2CR2juZYxEGADgRbgP5deau6fffbkeRYtsYwFbfnccAkdO2cfXNL2DiuYft1J8peQnNWV4qqvWrI6CpZSJgcioX6GnqaY/esnuap6GjDL+4j5/hH8qKzAxCjmilYoL9998/ooCio0qOR99xK/q5P61IvSgQ4tgUykLZb6UVpFGcndiN0rI1PUZrS6gt4jbxmVWYSXJIQsMYQY7nPc9q1JHCrk1j34uLiT9zLCYWjKPBcR70PP3uO/auihG712OetK0dNyCa9t2v/K1kW6R/Z45UilUN5TkkMAw6jjrV+d9H82OynFruUbUiZBhd3QdMDPp3rJOmzLBLAl1DsksltFaRTu4zknn3PH0qw2m3DrcQNcQC3uXWScbTuBAXO056HaOvSulxhp71jnjKfa5bV9Ikie4gaBGRnIlSP5lYD5mAI5IGM8Gn2t9ptvDBbxXAwwBUkHnceCxxgFjnrjNUL3RXnnuZ/tHlmR90f8A0zDD94P+BCnS6XHNqDzK8RimKM8bhjjbgfLhgOgHUHFTy02tZP8Ar/hyueonpFGta6jaXU5ignDyLnK7SCMHBzketaY6Vk6dby2qzCaWN/MmaUbVK43HJB5Nae9R1Nc04pOy2Omm243kSr1pslMWVd4HqadJWElZm0HdFfNFGKKksiQ5/GpXcRxs56AZqFODj0qLUXK2LgdyB+tVCPNJIipLlg5diJbgjkNVcatdu83l2JkiicoX89FyQATwcetVVlwODT9OIaO4d4TK3207QOinYvJ9q9KdKMIuTV/6+R5lGrKpLlvYkmvbzbltHvc/7JRv/ZqrG8u++kaiB/1zX/4qr12lv5snmQzFmJyVR8H8QaqtHbK5BS5B5BP7z/GsoVklsbzoXfxfl/kRNcuzbpNK1P7u3iJf8aeJw27do+qMX+/+6HzfX5qUfZW2rmc4+bjeMcd6en2LYTm6wOvMlJuMt4jUJR2l+X+Q0zlgyjRtUO4chlBzxjPLdaas12svmJo1/uxjJ2dP++qux3kFqDGqy4z/AM8nNTDUFOSI5T64gapUlHaCKcHLeT/ApJe6gXC/2Pe/nH/8VVlbjUHOBpEw/wB6aMf1q9GXIZsqOf7tWEVhg7h+VS8T/dX4/wCY1hv7z/D/ACKFv/aTTxmSyhjjDDcftG4gfQCtOU0KccFsk9uKZMcZrCdTnd7WN6dNQVkyVEJjU+worSt4f9Gi/wBwfyorK5qYDjZdTJ6OR+tQ3sZmgCZ/iHWrN8PL1SYepB/MVWvFZ7R9v3l+YfhWlJ2mmZVVeDRSXTGB5PB9KqxSNYSXUDi8jVrkuHjt2cOhVRwR7g1dt7ppCkecbjj3qrDcXt3LO0ciRW8E5twzMwMjg4IUDHAPGTknB7V018RKmrVNbnPhsPGp71PS3f8AIS41W3XexneMdhJYy8fjjmqn9s22TnUbXPo0Ei5q1JezmCd5HY/Z5fKkUnkH8PqMHuDVb7cv9/B+tb4eFOtBSSf4f5GFetOlPluvx/zFXW7QEj+0bXPGCY5B+lPGu2gOP7Sszx1KyDFV31QLkCXJ+tVm1N88c/jXSsFF9H96/wAjneNmu34/5mn/AG5ZAH/ia2e7PB2P0+lDa9ZsCF1G3Hp+4lP9Kz11MHG7OfrVgzCSPIcj8aPqcFun+H+Qvr0/L8f8y1b+ItLjV/OvFkyRjy7aQf0q6PE+k4G3z3/3bZv6iuee6iX5fvk+tOXY3cD2oeBpbu/3r/IFmFXpb8f8zqdN1eHUbho4be4QIu4vIgUfzzVuU549eKz9CthBaST95Tx9B/8AXzWlEu+8hT1cf415WIjCNRqGyPWw8pypqU92b4UKoHpxRTqK5joOd1xdl+j9nT+RqCNty1oa/CWtI5h1jbn6H/IrGikxiqQmYOpiXS75JACYy2UI/l9aat3BHHgSXwjJVkZLgt0z6j3571000MN5A0UyB0bqDXI3/hTVbZmfStQaRDz5cjBXH49D+ldNSmsZGMXU5JLra6fqcacsLJyjHmi+nVDdRv2jsJl86d5Jyvyy7eADnccAc9AM9qwDdueMn6028ttetj/pNnMPUmHIP4isp5Lkt8yMvtsNfRZfgVQoqHOpeemp4eMxLrVXJxcfI1Azvyhzzjr3p6TOm4lW+U4bp1/yax1ublAVVioYgkBe46Un2u6ww85sMQSMDkjGP5Cux0KnSxgp07a3udEpmGG8p8HGPlz1oa4kOVIZSPXisVNRvVjCCZygGAuTgU8Xl9M+Qhcn/YJNSqNS/vNClOnb3b3NHzZdw5P5Vr6XBPqF3Hbx9/vN/dHc1R0nQNd1GRWMRghPWSZdo/AdTXommaZBpNt5cZLyN9+UjBY/0HtXnY7FQpLli05eXQ7cFhJ1HeSsizsSCFIoxhEGAKm0xd9/u7IpP58VWdq0dFj/AHMsx/jbA+gr5yT6n0SRqUUUVBRHPCtxA8L/AHXXBrjpIpLS4aCUYZf1HqK7Wqt3Y299GFnTOOjDgj6GmmI5hJMd6nWX1qebw9OhJtrlWH92QYP5iqr2GpxdbUuPVGBpiJxMO2RThIp681Rb7VH9+0nX/gBpv2nb96Nx9VNO7CyNDMZ/hH5UYi/uL/3yKofbU9/ypftie/5VXNInlRfBj/uj8hTg4HTj6VQW53fdRz9FNTJ9of7lrO3/AAA0rsLItb6jZ6VLW/fpalfd2AqePSbp/wDWyxxj0UbjSuVYpHc7BEGXY4Aro7WAW9ukQ52jk+p71Ha2EFpygLOerscmrVS3caQUUUUhn//Z'
                      }
                        }
    
    if(remote_ml_key == 'sagemaker'):
        payload_4 = {
      "parameters": {
        "inputs": "hello"
          }
            }

    path_4 = host+'_plugins/_ml/models/'+remote_ml[remote_ml_key]["model_id"]+'/_predict'
    r_4 = requests.post(path_4, auth=awsauth, json=payload_4, headers=headers)
    embed = json.loads(r_4.text)['inference_results'][0]['output'][0]['data'][0:4]
    shape = json.loads(r_4.text)['inference_results'][0]['output'][0]['shape']
    remote_ml[remote_ml_key]['dimensions'] = shape[0]
    print(remote_ml_key+ " : "+str(embed))
    print(shape)
    print("\n")

Deployment status of the sagemaker model, dUuIS4wBuQkLO8mDIYXp : COMPLETED
sagemaker : [-0.0018504525069147348, -0.001708334544673562, 0.011649771593511105, -0.01878291927278042]
[4096]


Deployment status of the bedrock_text model, eUuIS4wBuQkLO8mDMoXw : COMPLETED
bedrock_text : [0.5390625, -0.46679688, -0.125, 0.45703125]
[1536]


Deployment status of the bedrock_multimodal model, fUuIS4wBuQkLO8mDQ4Xx : COMPLETED
bedrock_multimodal : [-0.028092243, 0.031673167, -0.003421167, -0.047097858]
[1024]




## Step 4: Create the OpenSearch ingest pipeline with text-embedding processor

In the ingestion pipeline, you choose "text_embedding" processor to generate vector embeddings from "caption" field and store vector data in "caption_embedding" field of type knn_vector.

In [287]:
for remote_ml_key in remote_ml.keys():
    if("multimodal" not in remote_ml_key):
        path = "_ingest/pipeline/"+remote_ml_key+"-ingest-pipeline"
        url = host + path
        payload = {
      "description": remote_ml_key+" ingest pipeline",
      "processors": [
                        {
                          "text_embedding": {
                            "model_id": remote_ml[remote_ml_key]["model_id"],
                            "field_map": {
                              "caption": "caption_embedding"
                            }
                          }
                        }
                      ]
                    }

        r = requests.put(url, auth=awsauth, json=payload, headers=headers)
        print(r.status_code)
        print(r.text)
        print(remote_ml[remote_ml_key]["model_id"])
        print(remote_ml_key)

200
{"acknowledged":true}
dUuIS4wBuQkLO8mDIYXp
sagemaker
200
{"acknowledged":true}
eUuIS4wBuQkLO8mDMoXw
bedrock_text


In [290]:
path = "_ingest/pipeline/bedrock-multimodal-ingest-pipeline"
url = host + path
payload = {
  "description": "A text/image embedding pipeline",
  "processors": [
    {
      "text_image_embedding": {
        "model_id": remote_ml['bedrock_multimodal']['model_id'],
        "embedding": "vector_embedding",
        "field_map": {
          "text": "image_description",
          "image": "image_binary"
        }
      }
    }
  ]
}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)



200
{"acknowledged":true}


## Step 5: Create the k-NN index

Create the K-NN index and set the pipeline created in the previous step "nlp-ingest-pipeline" as the default pipeline. The caption_embedding field must be mapped as a k-NN vector with 4096 dimensions matching the model dimension. 

For the kNN index we use **nmslib** engine with **hnsw** algorithm and **l2** spacetype

In [288]:
for remote_ml_key in remote_ml.keys():
    if("multimodal" not in remote_ml_key):
        path = remote_ml_key+"-search-index"  
        url = host + path
        payload = {
          "settings": {
            "index.knn": True,
            "default_pipeline": remote_ml_key+"-ingest-pipeline",
            "number_of_shards": "4",
            "number_of_replicas": "0"
          },
          "mappings": {
            "properties": {
              "image_s3_url": {
                "type": "text"
              },
              "caption_embedding": {
                "type": "knn_vector",
                "dimension": remote_ml[remote_ml_key]['dimensions'],
                "method": {
                  "engine": "nmslib",
                  "space_type": "l2",
                  "name": "hnsw",
                  "parameters": {}
                }
              },
              "caption": {
                "type": "text"
              }
            }
          }
        }
        r = requests.put(url, auth=awsauth, json=payload, headers=headers)
        print(r.status_code)
        print(r.text)

200
{"acknowledged":true,"shards_acknowledged":true,"index":"sagemaker-search-index"}
200
{"acknowledged":true,"shards_acknowledged":true,"index":"bedrock_text-search-index"}


In [291]:
path = "bedrock-multimodal-search-index"
url = host + path
payload = {
  "settings": {
    "index.knn": True,
    "default_pipeline": "bedrock-multimodal-ingest-pipeline",
    "number_of_shards": 4,
    "number_of_replicas": "0"
  },
  "mappings": {
    "properties": {
      "vector_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {}
        }
      },
      "image_description": {
        "type": "text"
      },
        "image_s3_url": {
        "type": "text"
      },
      "image_binary": {
        "type": "binary"
      }
    }
  }
}
r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

200
{"acknowledged":true,"shards_acknowledged":true,"index":"bedrock-multimodal-search-index"}


## Step 6: Prepare the dataset

Download the Amazon Bekerley dataset from S3 and pre-process in such a way that you get the image properties in a dataframe

For simplicity we use only 1655 sample images from the dataset

In [265]:
import pandas as pd
import string
#meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_0.json.gz", lines=True)

appended_data = []

for character in string.digits[0:]+string.ascii_lowercase:
    if(character == 'g'):
        break
    meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_"+character+".json.gz", lines=True)
    appended_data.append(meta)

appended_data_frame = pd.concat(appended_data)

appended_data_frame.shape
meta = appended_data_frame
def func_(x):
    us_texts = [item["value"] for item in x if item["language_tag"] == "en_US"]
    return us_texts[0] if us_texts else None
 
meta = meta.assign(item_name_in_en_us=meta.item_name.apply(func_))
meta = meta[~meta.item_name_in_en_us.isna()][["item_id", "item_name_in_en_us", "main_image_id"]]
print(f"#products with US English title: {len(meta)}")
meta.head()

image_meta = pd.read_csv("s3://amazon-berkeley-objects/images/metadata/images.csv.gz")
dataset = meta.merge(image_meta, left_on="main_image_id", right_on="image_id")
dataset.head()

#products with US English title: 26424


Unnamed: 0,item_id,item_name_in_en_us,main_image_id,image_id,height,width,path
0,B0896LJNLH,AmazonBasics Serene 16-Piece Old Fashioned and...,61izEZdhlaL,61izEZdhlaL,1197,894,07/075e5d67.jpg
1,B07HCR1LSQ,[Find] Amazon Collection Platinum Plated Sterl...,61kDp2x8tPL,61kDp2x8tPL,1000,1000,c9/c923418f.jpg
2,B075DQBBJZ,Arizona Desert Sand Horizon Photo with Wood Ha...,91IjyKZ76qL,91IjyKZ76qL,2560,2560,c6/c6889ed4.jpg
3,B073P6DSBQ,Amazon Brand – Rivet Arizona Desert Sand Horiz...,91IjyKZ76qL,91IjyKZ76qL,2560,2560,c6/c6889ed4.jpg
4,B07S74D9T7,AmazonBasics Adjustable Speaker Stand - 3.8 to...,71x4c-BafpL,71x4c-BafpL,2560,2560,2b/2b90e918.jpg


In [314]:
a = dataset.loc[dataset['path'] == 'b5/b5319e00.jpg']

# options = ['abc', 'def']
# dataset[dataset.a.str.startswith(tuple(options))]

In [316]:
a['item_name_in_en_us']

449                            Sliced Mushrooms
450    Chipotle Chicken Avocado Sandwich, 16 Oz
451                      Corn and Arugula Salad
452                               Shredded Kale
453                                Beet Noodles
                         ...                   
600                         Cauliflower Florets
601                              Orange Chicken
602                        Blue Cheese Dressing
603                            Julienne Peppers
604                               Pickled Beets
Name: item_name_in_en_us, Length: 156, dtype: object

## Step 7: Ingest the prepared data into OpenSearch

We ingest only the captions and the image urls of the images into the opensearch index

This step takes approcimately 10 minutes to load the data into opensearch

In [289]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from time import sleep
from tqdm import tqdm
from alive_progress import alive_bar
port = 443


host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
headers = { "Content-Type": "application/json"}
client = OpenSearch(
    hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    #verify_certs = True,
    connection_class = RequestsHttpConnection
)

cnt = 0
batch = 0
action = json.dumps({ "index": { "_index": "sagemaker-search-index" } })
action_1 = json.dumps({ "index": { "_index": "bedrock_text-search-index" } })
body_ = ''
body_1 = ''


with alive_bar(len(dataset), force_tty = True) as bar:
    for index, row in (dataset.iterrows()):
        if(row['path'] == '87/874f86c4.jpg' or row['path'] ==  'b5/b5319e00.jpg'):
            continue

        payload = {}
        payload['image_s3_url'] = "https://amazon-berkeley-objects.s3.amazonaws.com/images/small/"+row['path']
        payload['caption'] = row['item_name_in_en_us']
        body_ = body_ + action + "\n" + json.dumps(payload) + "\n"
        body_1 = body_1 + action_1 + "\n" + json.dumps(payload) + "\n"
        cnt = cnt+1


        if(cnt == 100):
            
            response = client.bulk(
                                index = 'sagemaker-search-index',
                                 body = body_)
            response_1 = client.bulk(
                                index = 'bedrock_text-search-index',
                                 body = body_1)
            #r = requests.post(url, auth=awsauth, json=body_+"\n", headers=headers)
            cnt = 0
            batch = batch +1
            body_ = ''
            body_1 = ''
        
        bar()
print("Total Bulk batches completed: "+str(batch))

|███████████████████████████████████████▉⚠︎ (!) 26221/26296 [100%] in 26:30.1 (16
Total Bulk batches completed: 262


In [296]:
def resize_image(photo, bucket, width, height):
    
    Image.MAX_IMAGE_PIXELS = 100000000
    
    with Image.open(photo) as image:
        image.verify()
    with Image.open(photo) as image:    
        
        if image.format in ["JPEG", "PNG"]:
            file_type = image.format.lower()
            path = image.filename.rsplit(".", 1)[0]

            image.thumbnail((width, height))
            image.save(f"{path}-resized.{file_type}")

            #fileshort = os.path.basename(path)
            
            #print(path)

            s3.upload_file(
                f"{path}-resized.{file_type}",
                bucket,
                f"resized/{fileshort}-resized.{file_type}",
                ExtraArgs={"ContentType": f"image/{file_type}"},
            )
            
        else:
            raise Exception("Unsupported image format")
        
    return file_type, path


In [318]:
images_ = os.path.join("images_")
if not os.path.exists(images_):
    os.mkdir(images_)

In [301]:

host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
headers = { "Content-Type": "application/json"}
client = OpenSearch(
    hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    #verify_certs = True,
    connection_class = RequestsHttpConnection
)

path = "bedrock-multimodal-search-index/_doc" 
url = host + path
width = 2048
height = 2048
cnt = 0
count = 0
batch = 0
action = json.dumps({ "index": { "_index": "bedrock-multimodal-search-index" } })
body_ = ''

with alive_bar(len(dataset), force_tty = True) as bar:
    for index, row in dataset.iterrows():
        if(row['path'] == '87/874f86c4.jpg' or row['path'] ==  'b5/b5319e000000000.jpg'):
            continue
        count = count+1


        
        fileshort = "images_/"+row['path'].replace("/","_")

        s3.download_file('amazon-berkeley-objects', 'images/small/'+row['path'], fileshort)



        payload = {}
        payload['image_description'] = row['item_name_in_en_us']
        payload['image_s3_url'] = "https://amazon-berkeley-objects.s3.amazonaws.com/images/small/"+row['path']

        file_type, path = resize_image(fileshort, s3_bucket, width, height)

        with open(fileshort.split(".")[0]+"-resized."+file_type, "rb") as image_file:
            input_image = base64.b64encode(image_file.read()).decode("utf8")


        payload['image_binary'] = input_image
        body_ = body_ + action+ "\n"+json.dumps(payload) + "\n"
        #print(payload)
        #r = requests.post(url, auth=awsauth, json=payload, headers=headers)
        cnt = cnt+1

        if(cnt == 100):
            #r = requests.post(url, auth=awsauth, json=body_+"\\\\n", headers=headers)


            response = client.bulk(
                                 index = 'bedrock-multimodal-search-index',
                                  body = body_)

            cnt = 0
            batch = batch +1
            body_ = ''
            #print(batch)
            
        bar()

print("Total Bulk batches completed: "+str(batch))


|████████████████████████████████████████| 26296/26296 [100%] in 9:27.2 (46.37/s
Total Bulk batches completed: 262


### The following 2 steps are optional because in the final web application, these steps are performed by the Lambda function itself that is already deployed by the cloud formation template.
## Create the Search pipeline in OpenSearch

Create a search pipeline in OpenSearch to normalize the search results from the text and vector search queries. The search pipeline combines the results from each subquery.

In [None]:
path = "_search/pipeline/nlp-search-pipeline" 
url = host + path

payload = {
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}


r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

## Search with Hybrid Query

This is an example of Hybrid query that you will run uing the web application later.

In [None]:
path = "nlp-image-search-index/_search?search_pipeline=nlp-search-pipeline" 
url = host + path
query_ = "wine glass"

payload = {
  "_source": {
    "exclude": [
      "caption_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "caption": {
              "query": query_
            }
          }
        },
        {
          "neural": {
            "caption_embedding": {
              "query_text": query_,
              "model_id": model_id,
              "k": 2
            }
          }
        }
      ]
    }
  },"size":1
}

r = requests.get(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

In [114]:
path = "multimodal-image-search-index/_search"
url = host + path
query_ = "decorative materials for living room"

payload = {
  "size": 1,
     "_source": {
    "exclude": [
      "image_binary"
    ]
  },
  "query": {
    "neural": {
      "vector_embedding": {
        "query_text": query_,
        #"query_image": "iVBORw0KGgoAAAANSUI...",
        "model_id": model_id,
        "k": 5
      }
    }
  }
}

r = requests.get(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)

200
{"took":196,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":25,"relation":"eq"},"max_score":0.59631926,"hits":[{"_index":"multimodal-image-search-index","_id":"D0pNQIwBuQkLO8mDdf7F","_score":0.59631926,"_source":{"image_description":"Urban Living Coffee Tables","vector_embedding":[0.013061523,-0.0072021484,-0.0016708374,-4.6348572E-4,0.022460938,-0.0055236816,0.12207031,0.010375977,0.030151367,0.040527344,-0.012023926,-0.005859375,-0.009094238,-0.052734375,0.03930664,0.044189453,-0.020751953,-0.0018539429,-0.0078125,-0.03173828,0.018066406,0.08105469,-0.0065612793,0.017700195,0.05859375,-0.056396484,-0.008483887,0.008911133,-0.00390625,-0.0023040771,0.05126953,-0.032714844,0.041015625,0.025512695,-1.9931793E-4,0.003036499,0.055419922,-0.030761719,-0.015014648,0.027832031,0.012939453,-0.034179688,-0.028808594,0.00491333,-0.0095825195,0.024536133,0.021484375,-0.0021820068,0.020996094,0.06591797,0.030639648,-0.020019531,0.01770019

## Step 8: Update the environment variables of lambda

Here, we pass the OpenSearch endpoint, AWS region and OpenSearch model identifier to Lambda.

In [150]:
lambda_client = boto3.client('lambda')

response = lambda_client.update_function_configuration(
            FunctionName=lambdaFunction,
            Environment={
                'Variables': {
                    'DOMAIN_ENDPOINT': OpenSearchDomainEndpoint,
                    'REGION':region,
                    'SAGEMAKER_MODEL_ID':sagemaker_model_id,
                    'BEDROCK_TEXT_MODEL_ID':bedrock_text_model_id,
                    'BEDROCK_MULTIMODAL_MODEL_ID':bedrock_multimodal_model_id,
                }
            }
        )

## Step 9: Create the Lambda URL

Here we create external Lambda URL for lambda function to be called from the outside world.

In [151]:
lambda_ = boto3.client('lambda')


response_ = lambda_.add_permission(
FunctionName=lambdaFunction,
StatementId=lambdaFunction+'_permissions',
Action="lambda:InvokeFunctionUrl",
Principal=account_id,
FunctionUrlAuthType='AWS_IAM')


response = lambda_.create_function_url_config(
FunctionName=lambdaFunction,
AuthType='AWS_IAM',
Cors={
    'AllowCredentials': True,

    'AllowMethods':["*"],
    'AllowOrigins': ["*"]

},
InvokeMode='RESPONSE_STREAM'
)

query_invoke_URL = response['FunctionUrl']

## Step 10: Host the Hybrid Search application in EC2

## Notice

To ensure security access to the provisioned resources, we use EC2 security group to limit access scope. Before you go into the final step, you need to add your current **PUBLIC IP** address to the ec2 security group so that you are able to access the web application (chat interface) that you are going to host in the next step.

<h3 style="color:red;"><U>Warning</U></h3>
<h4>Without doing the below steps, you will not be able to proceed further.</h4>

<div>
    <h3 style="color:red;"><U>Enter your IP address </U></h3>
    <h4> STEP 1. Get your IP address <span style="display:inline;color:blue"><a href = "https://ipinfo.io/ip ">HERE</a></span>. If you are connecting with VPN, we recommend you disconnect VPN first.</h4>
</div>

<h4>STEP 2. Run the below cell </h4>
<h4>STEP 3. Paste the IP address in the input box that prompts you to enter your IP</h4>
<h4>STEP 4. Press ENTER</h4>

In [152]:
my_ip = (input("Enter your IP : ")).split(".")
my_ip.pop()
IP = ".".join(my_ip)+".0/24"

port_protocol = {443:'HTTPS',80:'HTTP',8501:'streamlit'}

IpPermissions = []

for port in port_protocol.keys():
     IpPermissions.append({
            'FromPort': port,
            'IpProtocol': 'tcp',
            'IpRanges': [
                {
                    'CidrIp': IP,
                    'Description': port_protocol[port]+' access',
                },
            ],
            'ToPort': port,
        })

IpPermissions

for output in cfn_outputs:
    if('securitygroupid' in output['OutputKey'].lower()):
        sg_id = output['OutputValue']
        
#sg_id = 'sg-0e0d72baa90696638'

ec2_ = boto3.client('ec2')        

response = ec2_.authorize_security_group_ingress(
    GroupId=sg_id,
    IpPermissions=IpPermissions,
)

print("\nIngress rules added for the security group, ports:protocol - "+json.dumps(port_protocol)+" with my ip - "+IP)

Enter your IP : 54.239.6.190

Ingress rules added for the security group, ports:protocol - {"443": "HTTPS", "80": "HTTP", "8501": "streamlit"} with my ip - 54.239.6.0/24


Finally, We are ready to host our conversational search application, here we perform the following steps, Steps 2-5 are achieved by executing the terminal commands in the ec2 instance using a SSM client.
1. Update the web application code files with lambda url (in [api.py](https://github.com/aws-samples/semantic-search-with-amazon-opensearch/blob/main/generative-ai/Module_1_Build_Conversational_Search/webapp/api.py)) and s3 bucket name (in [app.py](https://github.com/aws-samples/semantic-search-with-amazon-opensearch/blob/main/generative-ai/Module_1_Build_Conversational_Search/webapp/app.py))
2. Archieve the application files and push to the configured s3 bucket.
3. Download the application (.zip) from s3 bucket into ec2 instance (/home/ec2-user/), and uncompress it.
4. We install the streamlit and boto3 dependencies inside a virtual environment inside the ec2 instance.
5. Start the streamlit application.

In [303]:
#modify the code files with lambda url and s3 bucket names
query_invoke_URL_cmd = query_invoke_URL.replace("/","\/")

with io.capture_output() as captured:
    #Update the webapp files to include the s3 bucket name and the LambdaURL
    !sed -i 's/API_URL_TO_BE_REPLACED/{query_invoke_URL_cmd}/g' webapp/api.py
    #Push the WebAPP code artefacts to s3
    !cd webapp && zip -r webapp.zip *
    !aws s3 cp webapp/webapp.zip s3://$s3_bucket
        
#Get the Ec2 instance ID which is already deployed
response = cfn.describe_stack_resources(
    StackName=stackname
)
for resource in response['StackResources']:
    if(resource['ResourceType'] == 'AWS::EC2::Instance'):
        ec2_instance_id = resource['PhysicalResourceId']
   
ec2_instance_id

'i-05bbd71645c0b0990'

Copy the URL that will be generated after running the next cell and open the URL in your web browser to start using the application.

In [317]:
# function to execute commands in ec2 terminal
def execute_commands_on_linux_instances(client, commands):
    resp = client.send_command(
        DocumentName="AWS-RunShellScript", # One of AWS' preconfigured documents
        Parameters={'commands': commands},
        InstanceIds=[ec2_instance_id],
    )
    return resp['Command']['CommandId']

ssm_client = boto3.client('ssm') 

commands = [
            'aws s3 cp s3://'+s3_bucket+'/webapp.zip /home/ec2-user/',
            'unzip -o /home/ec2-user/webapp.zip -d /home/ec2-user/'  ,  
            'sudo chmod -R 0777 /home/ec2-user/',
            'python3 -m venv /home/ec2-user/.myenv',
            'source /home/ec2-user/.myenv/bin/activate',
            'pip install streamlit',
            'pip install boto3',
    
            #start the web applicaiton
            'streamlit run /home/ec2-user/app.py',
            ]

command_id = execute_commands_on_linux_instances(ssm_client, commands)

ec2_ = boto3.client('ec2')
response = ec2_.describe_instances(
    InstanceIds=[ec2_instance_id]
)
public_ip = response['Reservations'][0]['Instances'][0]['PublicIpAddress']
print("Please wait while the application is being hosted . . .")
time.sleep(10)
print("\nApplication hosted successfully")
print("\nClick the below URL to open the application. It may take up to a minute or two to start the application, Please keep refreshing the page if you are seeing connection error.\n")
print('http://'+public_ip+":8501")
#print("\nCheck the below video on how to interact with the application")

Please wait while the application is being hosted . . .

Application hosted successfully

Click the below URL to open the application. It may take up to a minute or two to start the application, Please keep refreshing the page if you are seeing connection error.

http://44.204.151.62:8501
