# Hybrid Search with Amazon OpenSearch Service

**Welcome to Hybrid search notebook. Use this notebook to build a Hybrid Search application powered by Amazon OpenSearch Service**

In this notebook, you will perform the following steps in sequence,

The lab includes the following steps:
1. [Step 1: Get the Cloudformation outputs](#Step-1:-Get-the-Cloudformation-outputs)
2. [Step 2: Create the OpenSearch-Sagemaker ML connector](#Step-2:-Create-the-OpenSearch-Sagemaker-ML-connector)
3. [Step 3: Register and deploy the embedding model in OpenSearch](#Step-3:-Register-and-deploy-the-embedding-model-in-OpenSearch)
4. [Step 4: Create the OpenSearch ingest pipeline with text-embedding processor](#TODO-Step-4:-Create-the-OpenSearch-ingest-pipeline-with-text-embedding-processor)
5. [Step 5: Create the k-NN index](#Step-5:-Create-the-k-NN-index)
6. [Step 6: Prepare the image dataset](#Step-6:-Prepare-the-image-dataset)
7. [Step 7: Ingest the prepared data into OpenSearch](#Step-7:-Ingest-the-prepared-data-into-OpenSearch)
8. [Step 8: Update the environment variables of lambda](#Step-8:-Update-the-environment-variables-of-lambda)
9. [Step 9: Create the Lambda URL](#Step-9:-Create-the-Lambda-URL)
10. [Step 10: Host the Hybrid Search application in EC2](#Step-7:-Host-the-Hybrid-Search-application-in-EC2)

In [None]:
#Install dependencies
#Implement header-based authentication and request authentication for AWS services that support AWS auth v4
%pip install requests_aws4auth
#OpenSearch Python SDK
%pip install opensearch_py
#Progress bar for for loop
%pip install alive-progress

## Step 1: Get the Cloudformation outputs

Here, we retrieve the services that are already deployed as a part of the cloudformation template to be used in building the application. The services include,
1. **Sagemaker Endpoint**
2. **OpenSearch Domain** Endpoint
3. **S3** Bucket name
4. **Lambda** Function name 

In [None]:
import sagemaker, boto3, json, time
from sagemaker.session import Session
import subprocess
from IPython.utils import io

cfn = boto3.client('cloudformation')

response = cfn.list_stacks(StackStatusFilter=['CREATE_COMPLETE','UPDATE_COMPLETE'])

for cfns in response['StackSummaries']:
    if('TemplateDescription' in cfns.keys()):
        if('hybrid search' in cfns['TemplateDescription']):
            stackname = cfns['StackName']
stackname

response = cfn.describe_stack_resources(
    StackName=stackname
)
# for resource in response['StackResources']:
#     if(resource['ResourceType'] == "AWS::SageMaker::Endpoint"):
#         SagemakerEmbeddingEndpoint = resource['PhysicalResourceId']

cfn_outputs = cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']

for output in cfn_outputs:
    if('OpenSearchDomainEndpoint' in output['OutputKey']):
        OpenSearchDomainEndpoint = output['OutputValue']
        
    if('EmbeddingEndpointName' in output['OutputKey']):
        SagemakerEmbeddingEndpoint = output['OutputValue']
        
    if('s3' in output['OutputKey'].lower()):
        s3_bucket = output['OutputValue']
        
    if('lambdafunction' in output['OutputKey'].lower()):
        lambdaFunction = output['OutputValue']

region = boto3.Session().region_name  
        

account_id = boto3.client('sts').get_caller_identity().get('Account')



print("stackname: "+stackname)
print("account_id: "+account_id)  
print("region: "+region)
print("SagemakerEmbeddingEndpoint: "+SagemakerEmbeddingEndpoint)
print("OpenSearchDomainEndpoint: "+OpenSearchDomainEndpoint)
print("S3 Bucket: "+s3_bucket)
print("lambda Function : "+lambdaFunction)

## Step 2: Create the OpenSearch-Sagemaker ML connector 

Amazon OpenSearch Service AI connectors allows you to create a connector from OpenSearch Service to SageMaker Runtime.
To create a connector, we use the Amazon OpenSearch Domain endpoint, SagemakerEndpoint that hosts the GPT-J-6B embedding model and an IAM role that grants OpenSearch Service access to invoke the sagemaker model (this role is already created as a part of the cloudformation template)

Here, Using the connector_id obtained from the previous step, we register and deploy the model in OpenSearch and get a model identifier (model_id)

In [None]:
import boto3
import requests 
from requests_aws4auth import AWS4Auth
import json

host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)


remote_ml = {
                "sagemaker_sparse":
                 {
                     "endpoint_url":"https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/sparse-os-search-os/invocations",
                     "pre_process_fun": '\n    StringBuilder builder = new StringBuilder();\n    builder.append("\\"");\n    builder.append(params.text_docs[0]);\n    builder.append("\\"");\n    def parameters = "{" +"\\"inputs\\":" + builder + "}";\n    return "{" +"\\"parameters\\":" + parameters + "}";\n    ', 
                    "post_process_fun": '\n    def name = "sentence_embedding";\n    def dataType = "FLOAT32";\n    if (params.result == null || params.result.length == 0) {\n        return null;\n    }\n    def shape = [params.result[0].length];\n    def json = "{" +\n               "\\"name\\":\\"" + name + "\\"," +\n               "\\"data_type\\":\\"" + dataType + "\\"," +\n               "\\"shape\\":" + shape + "," +\n               "\\"data\\":" + params.result[0] +\n               "}";\n    return json;\n    ',
                    "request_body": "[\"${parameters.inputs}\"]"
             
                 }
                
               
            }



In [None]:
connector_path_url = host+'_plugins/_ml/connectors/_create'
register_model_path_url = host+'_plugins/_ml/models/_register'


headers = {"Content-Type": "application/json"}

for remote_ml_key in remote_ml.keys():
    
    #create connector
    payload_1 = {
       "name": remote_ml_key+": embedding",
       "description": "Test connector for"+remote_ml_key+" remote embedding model",
       "version": 1,
       "protocol": "aws_sigv4",
       "credential": {
          "roleArn": "arn:aws:iam::"+account_id+":role/opensearch-sagemaker-role"
       },
       "parameters": {
          "region": region,
          "service_name": remote_ml_key.split("_")[0]
       },
       "actions": [
          {
             "action_type": "predict",
             "method": "POST",
             "headers": {
                "content-type": "application/json"
             },
             "url": remote_ml[remote_ml_key]["endpoint_url"],
             "pre_process_function": remote_ml[remote_ml_key]["pre_process_fun"],
              "request_body": remote_ml[remote_ml_key]["request_body"],
             #"post_process_function": remote_ml[remote_ml_key]["post_process_fun"]
          }
       ]
    }
    

    r_1 = requests.post(connector_path_url, auth=awsauth, json=payload_1, headers=headers)
    remote_ml[remote_ml_key]["connector_id"] = json.loads(r_1.text)["connector_id"]
    
    time.sleep(2)
    
    #register model
    
    payload_2 = { 
                "name": remote_ml_key,
                "function_name":"remote",
                "description": remote_ml_key+" embeddings model",
                "connector_id": remote_ml[remote_ml_key]["connector_id"]
                
                }

    r_2 = requests.post(register_model_path_url, auth=awsauth, json=payload_2, headers=headers)
    remote_ml[remote_ml_key]["model_id"] = json.loads(r_2.text)["model_id"]
    
    time.sleep(2)
    
    #deploy model
    
    deploy_model_path_url = host+'_plugins/_ml/models/'+remote_ml[remote_ml_key]["model_id"]+'/_deploy'

    r_3 = requests.post(deploy_model_path_url, auth=awsauth, headers=headers)
    deploy_status = json.loads(r_3.text)["status"]
    print("Deployment status of the "+remote_ml_key+" model, "+remote_ml[remote_ml_key]["model_id"]+" : "+deploy_status)
    
    
    #test model

    payload_4 = {
      "parameters": {
        "inputs": "hello"
          }
            }

    path_4 = host+'_plugins/_ml/models/'+remote_ml[remote_ml_key]["model_id"]+'/_predict'
    r_4 = requests.post(path_4, auth=awsauth, json=payload_4, headers=headers)
    print(r_4.text)

## Step 4: Create the OpenSearch ingest pipeline with sparse_encoding processor

In the ingestion pipeline, you choose "text_embedding" processor to generate vector embeddings from "caption" field and store vector data in "caption_embedding" field of type knn_vector.

In [None]:
path = "_ingest/pipeline/sagemaker_sparse-ingest-pipeline"
url = host + path
payload = {
  "description": "An sparse encoding ingest pipeline",
  "processors": [
    {
      "sparse_encoding": {
        "model_id": remote_ml["sagemaker_sparse"]["model_id"],
        "field_map": {
          "caption": "caption_embedding"
        }
      }
    }
  ]
}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)


## Step 5: Create the Sparse index with rank_features

Create the K-NN index and set the pipeline created in the previous step "nlp-ingest-pipeline" as the default pipeline. The caption_embedding field must be mapped as a k-NN vector with 4096 dimensions matching the model dimension. 

For the kNN index we use **nmslib** engine with **hnsw** algorithm and **l2** spacetype

In [None]:
path = "sagemaker_sparse-search-index"
url = host + path
payload = {
  "settings": {
    
    "default_pipeline": "sagemaker_sparse-ingest-pipeline",
    "number_of_shards": 4,
    "number_of_replicas": "0"
  },
  "mappings": {
    "properties": {
      "caption_embedding": {
        "type": "rank_features"
      },
      "caption": {
        "type": "text"
      },
        "image_s3_url": {
        "type": "text"
      }
    }
  }
}
r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)




In [None]:
path = "sagemaker_sparse-search-index-retail"
url = host + path
payload = {
  "settings": {
    
    "default_pipeline": "sagemaker_sparse-ingest-pipeline",
    "number_of_shards": 4,
    "number_of_replicas": "0"
  },
  "mappings": {
    "properties": {
      "caption_embedding": {
        "type": "rank_features"
      },
            "image_caption": {
        "type": "text"
      },
            "image_category": {
        "type": "text"
      },
              "image_style": {
        "type": "text"
      },
                "image_price": {
        "type": "double"
      },
                  "image_gender": {
        "type": "text"
      },
                     "image_stock": {
        "type": "integer"
      },
      "caption": {
        "type": "text"
      },
        "image_s3_url": {
        "type": "text"
      }
    }
  }
}
r = requests.put(url, auth=awsauth, json=payload, headers=headers)
print(r.status_code)
print(r.text)




## Step 6: Prepare the dataset

Download the Amazon Bekerley dataset from S3 and pre-process in such a way that you get the image properties in a dataframe

For simplicity we use only 1655 sample images from the dataset

In [None]:
import pandas as pd
import string
#meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_0.json.gz", lines=True)

appended_data = []

for character in string.digits[0:]+string.ascii_lowercase:
    if(character == 'g'):
        break
    meta = pd.read_json("s3://amazon-berkeley-objects/listings/metadata/listings_"+character+".json.gz", lines=True)
    appended_data.append(meta)

appended_data_frame = pd.concat(appended_data)

appended_data_frame.shape
meta = appended_data_frame
def func_(x):
    us_texts = [item["value"] for item in x if item["language_tag"] == "en_US"]
    return us_texts[0] if us_texts else None
 
meta = meta.assign(item_name_in_en_us=meta.item_name.apply(func_))
meta = meta[~meta.item_name_in_en_us.isna()][["item_id", "item_name_in_en_us", "main_image_id"]]
print(f"#products with US English title: {len(meta)}")
meta.head()

image_meta = pd.read_csv("s3://amazon-berkeley-objects/images/metadata/images.csv.gz")
dataset = meta.merge(image_meta, left_on="main_image_id", right_on="image_id")
dataset.head()

## Step 7: Ingest the prepared data into OpenSearch

We ingest only the captions and the image urls of the images into the opensearch index

This step takes approcimately 10 minutes to load the data into opensearch

In [None]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from time import sleep
from tqdm import tqdm
from alive_progress import alive_bar
port = 443


host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
headers = { "Content-Type": "application/json"}
client = OpenSearch(
    hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    #verify_certs = True,
    connection_class = RequestsHttpConnection
)

cnt = 0
batch = 0
action = json.dumps({ "index": { "_index": "sagemaker_sparse-search-index" } })
body_ = ''
body_1 = ''


with alive_bar(len(dataset), force_tty = True) as bar:
    for index, row in (dataset.iterrows()):
        if(row['path'] == '87/874f86c4.jpg' or row['path'] ==  'b5/b5319e00.jpg'):
            continue

        payload = {}
        payload['image_s3_url'] = "https://amazon-berkeley-objects.s3.amazonaws.com/images/small/"+row['path']
        payload['caption'] = row['item_name_in_en_us']
        body_ = body_ + action + "\n" + json.dumps(payload) + "\n"
        body_1 = body_1 + action_1 + "\n" + json.dumps(payload) + "\n"
        cnt = cnt+1


        if(cnt == 100):
            
            response = client.bulk(
                                index = 'sagemaker_sparse-search-index',
                                 body = body_)
             #r = requests.post(url, auth=awsauth, json=body_+"\n", headers=headers)
            cnt = 0
            batch = batch +1
            body_ = ''
            body_1 = ''
        
        bar()
print("Total Bulk batches completed: "+str(batch))

In [None]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from time import sleep
from tqdm import tqdm
from alive_progress import alive_bar
port = 443
from ruamel.yaml import YAML

yaml = YAML()
input_file = 'products.yaml'

items_ = yaml.load(open(input_file))

host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
headers = { "Content-Type": "application/json"}
client = OpenSearch(
    hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    #verify_certs = True,
    connection_class = RequestsHttpConnection
)

cnt = 0
with alive_bar(len(items_), force_tty = True) as bar:
    for item in items_:
        cnt = cnt +1
#         if(cnt<1430):
#             print("skipping")
#             continue
        if(cnt%100 == 0):
            host = 'https://'+OpenSearchDomainEndpoint+'/'
            service = 'es'
            credentials = boto3.Session().get_credentials()
            awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
            headers = { "Content-Type": "application/json"}
            client = OpenSearch(
                hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
                http_auth = awsauth,
                use_ssl = True,
                #verify_certs = True,
                connection_class = RequestsHttpConnection
            )
            
        payload = {}
        payload['image_s3_url'] = "https://retail-demo-store-us-east-1.s3.amazonaws.com/images/"+item["category"]+"/"+item["image"]
        payload['caption'] = item['description']
        payload['image_price'] = item['price']
        if('style' in item):
            payload['image_style'] = item['style']
        payload['image_category'] = item['category']
        if('current_stock' in item):
            payload['image_current_stock'] = item['current_stock']
        if('gender_affinity' in item):
            payload['image_gender'] = item['gender_affinity']
        payload['image_caption'] = item['name']
        
        
        

        response = client.index(
            index = 'sagemaker_sparse-search-index-retail',
            body = payload
        )
      
        
        
        bar()
    


In [None]:
#optional code for calling the model directly from local

import json

from ts.torch_handler.base_handler import BaseHandler
import torch
import os
from transformers import PreTrainedTokenizerFast
from sagemaker_inference import logging
from ts.handler_utils.micro_batching import MicroBatching
tokenizer = PreTrainedTokenizerFast(tokenizer_file='test/tokenizer.json')
device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
model = torch.jit.load('test/opensearch-neural-sparse-encoding-v1.pt', map_location=device)

from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from time import sleep
from tqdm import tqdm
from alive_progress import alive_bar
port = 443


host = 'https://'+OpenSearchDomainEndpoint+'/'
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
headers = { "Content-Type": "application/json"}
client = OpenSearch(
    hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
    http_auth = awsauth,
    use_ssl = True,
    #verify_certs = True,
    connection_class = RequestsHttpConnection
)

cnt = 0
batch__ = 0
action = json.dumps({ "index": { "_index": "sagemaker_sparse-search-index" } })
body_ = ''
body_1 = ''


with alive_bar(len(dataset), force_tty = True) as bar:
    for index, row in (dataset.iterrows()):
        cnt = cnt+1
        if(row['path'] == '87/874f86c4.jpg' or row['path'] ==  'b5/b5319e00.jpg' or cnt<=26000):
            continue
        if(batch__%25 == 0):
            host = 'https://'+OpenSearchDomainEndpoint+'/'
            service = 'es'
            credentials = boto3.Session().get_credentials()
            awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
            client = OpenSearch(
            hosts = [{'host': OpenSearchDomainEndpoint, 'port': 443}],
            http_auth = awsauth,
            use_ssl = True,
            #verify_certs = True,
            connection_class = RequestsHttpConnection
        )
        payload = {}
        payload['image_s3_url'] = "https://amazon-berkeley-objects.s3.amazonaws.com/images/small/"+row['path']
        payload['caption'] = row['item_name_in_en_us']
        #cnt = cnt+1
        ############################
        inputSentence = [row['item_name_in_en_us']]
        input_data = tokenizer(inputSentence, return_tensors="pt", padding=True, add_special_tokens=True,
                                            max_length=256,
                                            truncation="longest_first", return_attention_mask=True)
                #logger.info(inputSentence)
        input_data = input_data.to(device)
        input_data
        predictions = model(input_data)
        prediction = {"pred": predictions, "batch_l": [1]}
        batch_idx = prediction["batch_l"]
        prediction = prediction["pred"]
        batch = len(prediction["output"])
        output = []
        tensor = prediction["output"]
        for i in range(batch):
            tokenWeights = {}
            nonzero_indices = tensor[i].nonzero().squeeze()  # Get indices of nonzero elements for the ith batch
            for idx in nonzero_indices:
                if tensor[i][idx] > 0:
                    tokenWeights[tokenizer.decode([idx.item()])] = float(tensor[i][idx])
            output.append(tokenWeights)
        outputs = []
        index = 0
        #print("batch info", len(output), batch_idx)
        for b in batch_idx:
            #print("index:", index, b, len(output[index:index + b]))
            outputs.append(output[index:index + b])
            index += b
        ##################################
        payload['caption_embedding'] = outputs[0][0]
        body_ = body_ + action + "\n" + json.dumps(payload) + "\n"
        
        if(cnt == 1000):
            
            response = client.bulk(
                                index = 'sagemaker_sparse-search-index',
                                 body = body_)
            #r = requests.post(url, auth=awsauth, json=body_+"\n", headers=headers)
            cnt = 0
            batch__ = batch__ +1
            #body_ = ''
            body_1 = ''
            
        
        bar()
print("Total Bulk batches completed: "+str(batch))

## Step 8: Update the environment variables of lambda

Here, we pass the OpenSearch endpoint, AWS region and OpenSearch model identifier to Lambda.

In [None]:
lambda_client = boto3.client('lambda')

response = lambda_client.update_function_configuration(
            FunctionName=lambdaFunction,
            Environment={
                'Variables': {
                    'DOMAIN_ENDPOINT': OpenSearchDomainEndpoint,
                    'REGION':region,
                    'SAGEMAKER_MODEL_ID':sagemaker_model_id,
                    'BEDROCK_TEXT_MODEL_ID':bedrock_text_model_id,
                    'BEDROCK_MULTIMODAL_MODEL_ID':bedrock_multimodal_model_id,
                    
                }
            }
        )

## Step 9: Create the Lambda URL

Here we create external Lambda URL for lambda function to be called from the outside world.

In [None]:
lambda_ = boto3.client('lambda')


response_ = lambda_.add_permission(
FunctionName=lambdaFunction,
StatementId=lambdaFunction+'_permissions',
Action="lambda:InvokeFunctionUrl",
Principal=account_id,
FunctionUrlAuthType='AWS_IAM')


response = lambda_.create_function_url_config(
FunctionName=lambdaFunction,
AuthType='AWS_IAM',
Cors={
    'AllowCredentials': True,

    'AllowMethods':["*"],
    'AllowOrigins': ["*"]

},
InvokeMode='RESPONSE_STREAM'
)

query_invoke_URL = response['FunctionUrl']

## Step 10: Host the Hybrid Search application in EC2

## Notice

To ensure security access to the provisioned resources, we use EC2 security group to limit access scope. Before you go into the final step, you need to add your current **PUBLIC IP** address to the ec2 security group so that you are able to access the web application (chat interface) that you are going to host in the next step.

<h3 style="color:red;"><U>Warning</U></h3>
<h4>Without doing the below steps, you will not be able to proceed further.</h4>

<div>
    <h3 style="color:red;"><U>Enter your IP address </U></h3>
    <h4> STEP 1. Get your IP address <span style="display:inline;color:blue"><a href = "https://ipinfo.io/ip ">HERE</a></span>. If you are connecting with VPN, we recommend you disconnect VPN first.</h4>
</div>

<h4>STEP 2. Run the below cell </h4>
<h4>STEP 3. Paste the IP address in the input box that prompts you to enter your IP</h4>
<h4>STEP 4. Press ENTER</h4>

In [None]:
my_ip = (input("Enter your IP : ")).split(".")
my_ip.pop()
IP = ".".join(my_ip)+".0/24"

port_protocol = {443:'HTTPS',80:'HTTP',8501:'streamlit'}

IpPermissions = []

for port in port_protocol.keys():
     IpPermissions.append({
            'FromPort': port,
            'IpProtocol': 'tcp',
            'IpRanges': [
                {
                    'CidrIp': IP,
                    'Description': port_protocol[port]+' access',
                },
            ],
            'ToPort': port,
        })

IpPermissions

for output in cfn_outputs:
    if('securitygroupid' in output['OutputKey'].lower()):
        sg_id = output['OutputValue']
        
#sg_id = 'sg-0e0d72baa90696638'

ec2_ = boto3.client('ec2')        

response = ec2_.authorize_security_group_ingress(
    GroupId=sg_id,
    IpPermissions=IpPermissions,
)

print("\nIngress rules added for the security group, ports:protocol - "+json.dumps(port_protocol)+" with my ip - "+IP)

Finally, We are ready to host our conversational search application, here we perform the following steps, Steps 2-5 are achieved by executing the terminal commands in the ec2 instance using a SSM client.
1. Update the web application code files with lambda url (in [api.py](https://github.com/aws-samples/semantic-search-with-amazon-opensearch/blob/main/generative-ai/Module_1_Build_Conversational_Search/webapp/api.py)) and s3 bucket name (in [app.py](https://github.com/aws-samples/semantic-search-with-amazon-opensearch/blob/main/generative-ai/Module_1_Build_Conversational_Search/webapp/app.py))
2. Archieve the application files and push to the configured s3 bucket.
3. Download the application (.zip) from s3 bucket into ec2 instance (/home/ec2-user/), and uncompress it.
4. We install the streamlit and boto3 dependencies inside a virtual environment inside the ec2 instance.
5. Start the streamlit application.

In [None]:
#modify the code files with lambda url and s3 bucket names
query_invoke_URL_cmd = query_invoke_URL.replace("/","\/")

with io.capture_output() as captured:
    #Update the webapp files to include the s3 bucket name and the LambdaURL
    !sed -i 's/API_URL_TO_BE_REPLACED/{query_invoke_URL_cmd}/g' webapp/api.py
    #Push the WebAPP code artefacts to s3
    !cd webapp && zip -r webapp.zip *
    !aws s3 cp webapp/webapp.zip s3://$s3_bucket
        
#Get the Ec2 instance ID which is already deployed
response = cfn.describe_stack_resources(
    StackName=stackname
)
for resource in response['StackResources']:
    if(resource['ResourceType'] == 'AWS::EC2::Instance'):
        ec2_instance_id = resource['PhysicalResourceId']
   
ec2_instance_id

Copy the URL that will be generated after running the next cell and open the URL in your web browser to start using the application.

In [None]:
# function to execute commands in ec2 terminal
def execute_commands_on_linux_instances(client, commands):
    resp = client.send_command(
        DocumentName="AWS-RunShellScript", # One of AWS' preconfigured documents
        Parameters={'commands': commands},
        InstanceIds=[ec2_instance_id],
    )
    return resp['Command']['CommandId']

ssm_client = boto3.client('ssm') 

commands = [
            'aws s3 cp s3://'+s3_bucket+'/webapp.zip /home/ec2-user/',
            'unzip -o /home/ec2-user/webapp.zip -d /home/ec2-user/'  ,  
            'sudo chmod -R 0777 /home/ec2-user/',
            'python3 -m venv /home/ec2-user/.myenv',
            'source /home/ec2-user/.myenv/bin/activate',
            'pip install streamlit',
            'pip install boto3',
    
            #start the web applicaiton
            'streamlit run /home/ec2-user/app.py',
            ]

command_id = execute_commands_on_linux_instances(ssm_client, commands)

ec2_ = boto3.client('ec2')
response = ec2_.describe_instances(
    InstanceIds=[ec2_instance_id]
)
public_ip = response['Reservations'][0]['Instances'][0]['PublicIpAddress']
print("Please wait while the application is being hosted . . .")
time.sleep(10)
print("\nApplication hosted successfully")
print("\nClick the below URL to open the application. It may take up to a minute or two to start the application, Please keep refreshing the page if you are seeing connection error.\n")
print('http://'+public_ip+":8501")
#print("\nCheck the below video on how to interact with the application")