# Deploying Semantic Search with Amazon OpenSearch Service 

#### Note, we are currently testing the arthitecture using the semantic search with pretrained model. The deployment architecture is as flow. 
The deployment architecture of semantic search includes: 
- Choose a pretrain BERT model, here we use all-MiniLM-L6-v2 model
- Save the ML models in S3 bucket
- Host the ML models using SageMaker endpoints 
- Create Vector index and load data into the index 
- Create API gateway handels queries from web applications and pass it to lambda 
- Create a Lambda function to call SageMaker endpoints to generate embeddings from user query, and send the query results back to API gateway 
- API gateway sends the search results to frontend, and return search results to the users 

![Semantic_search_pretrain_fullstack](image/Semantic_search_pretrain_fullstack.png)

In [1]:
import torch
print(torch.__version__)

2.1.0


In [3]:
%store -r #df_en

no stored variable or alias #df_en


### 1. Initialize boto3

We will use boto3 to interact with other AWS services.

Note: You can ignore any PythonDeprecationWarning warnings.

In [40]:
import boto3
import re
import time
import sagemaker
from sagemaker import get_execution_role

s3_resource = boto3.resource("s3")
s3 = boto3.client('s3')

### 2. Save pre-trained all-MiniLM-L6-v2 model to S3

First off, we will host a pretrained ['all-MiniLM-L6-v2'](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model in a SageMaker Pytorch model server to generate 384x1 dimension fixed length sentence embedding from [sentence-transformers](https://github.com/UKPLab/sentence-transformers) using HuggingFace Transformers

This SageMaker endpoint will be called by the application to generate vector for the search query. 

First we'll get a pre-trained model and upload to S3
By using the model.save() method provided by the Sentence Transformers library, both the model and its associated tokenizer are saved together in the directory specified by saved_model_dir. 


In [41]:
#!pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer, util
import numpy as np 
import os

# Load the Sentence Transformer model
model_name = "all-MiniLM-L6-v2"
saved_model_dir = 'model/all-MiniLM-L6-v2'
os.makedirs(saved_model_dir, exist_ok=True)

# Load the model using the Sentence Transformers library
model = SentenceTransformer(model_name)

# Save the model (and its tokenizer) to the specified directory
model.save(saved_model_dir)




Create a SageMaker session and get the execution role to be used later.

In [42]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

Pack the model, compresses all files and directories within the current directory (transformer in this case) into a single tarball archive named model.tar.gz. The archive is saved one level up from the current directory, as indicated by ../model.tar.gz.

In [43]:
!cd model/all-MiniLM-L6-v2 && tar czvf ../all-MiniLM-L6-v2-pretrain.tar.gz *

1_Pooling/
1_Pooling/config.json
2_Normalize/
config.json
config_sentence_transformers.json
model.safetensors
modules.json
README.md
sentence_bert_config.json
special_tokens_map.json
tokenizer_config.json
tokenizer.json
vocab.txt


Upload the model to S3. The method call will upload the model.tar.gz file to the S3 bucket associated with the sagemaker_session, storing it with a key that starts with the specified prefix, effectively organizing it within a folder-like structure in S3 named sentence-transformers-model.

In [52]:
inputs = sagemaker_session.upload_data(path='model/all-MiniLM-L6-v2-pretrain.tar.gz', key_prefix='sentence-transformers-model')
inputs

's3://sagemaker-ca-central-1-759472643633/sentence-transformers-model/all-MiniLM-L6-v2-pretrain.tar.gz'

### 3. Create PyTorch Model Object

Next we need to create a PyTorchModel object. The deploy() method on the model object creates an endpoint which serves prediction requests in real-time. If the instance_type is set to a SageMaker instance type (e.g. ml.m5.large) then the model will be deployed on SageMaker. If the instance_type parameter is set to local then it will be deployed locally as a Docker container and ready for testing locally.

We need to create a Predictor class to accept TEXT as input and output JSON. The default behaviour is to accept a numpy array.

In [53]:
from sagemaker.pytorch import PyTorch, PyTorchModel
from sagemaker.predictor import Predictor
from sagemaker import get_execution_role

"""
When you deploy a PyTorch model on SageMaker and want it to handle raw text input directly, you can use this StringPredictor class to create a predictor for the deployed endpoint. 
This setup is especially useful for NLP models or any model where the input is text.
"""
class StringPredictor(Predictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')

### 4. Deploy the sentence transformer model to SageMaker Endpoint
Now that we have the predictor class, let's deploy a SageMaker endpoint for our application to invoke.

#### Note: This process will take about 5 minutes to complete.

You can ignore the "content_type is a no-op in sagemaker>=2" warning.

This example assumes you have an inference script (inference.py) that defines how to load the model and process inputs and outputs. The framework_version should match the PyTorch version used for training the model.

In [59]:
pytorch_model = PyTorchModel(model_data = inputs, #s3 path to the model.tar.gz
                             role=role, 
                             entry_point ='inference.py', ##script to process inputs and outputs
                             source_dir = './deployment/pytorch/code',
                             py_version = 'py310', 
                             framework_version = '2.0.1', # The PyTorch version you're using
                             predictor_cls=StringPredictor)

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', #https://aws.amazon.com/sagemaker/pricing/ or ml.m5d.large
                                 initial_instance_count=1, 
                                 endpoint_name = f'semantic-search-pretrain-all-MiniLM-L6-v2-{int(time.time())}')

---------!

content_type is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


### 5. Test the SageMaker Endpoint.

Now that the endpoint is created, let's quickly test it out.

In [None]:
#Test the predictor function for text embedding
import json
original_payload = 'Riverice events in ottawa'
features = predictor.predict(original_payload)
vector_data = json.loads(features)
#vector

In [66]:
# Initialize a boto3 client for SageMaker
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.huggingface.model import HuggingFaceModel

# Initialize a boto3 client for SageMaker
sagemaker_client = boto3.client('sagemaker', region_name='ca-central-1')  # Specify the AWS region
def list_sagemaker_endpoints():
    """List all SageMaker endpoints"""
    try:
        # Get the list of all SageMaker endpoints
        response = sagemaker_client.list_endpoints(SortBy='Name')
        print("Listing SageMaker Endpoints:")
        for endpoint in response['Endpoints']:
            print(f"Endpoint Name: {endpoint['EndpointName']}, Status: {endpoint['EndpointStatus']}")
    except Exception as e:
        print(f"Error listing SageMaker endpoints: {e}")

def invoke_sagemaker_endpoint_ft(endpoint_name, payload):
    """Invoke a SageMaker endpoint to get predictions with ContentType='application/json'."""
    # Initialize the runtime SageMaker client
    runtime_client = boto3.client('runtime.sagemaker', region_name='ca-central-1')  
    try:
        """
        if not isinstance(payload, str):
            payload = str(payload)
        """
        # Invoke the SageMaker endpoint
        response = runtime_client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='application/json',
            Body=json.dumps(payload)
        )
        # Decode the response
        result = json.loads(response['Body'].read().decode())
        return (result)
        #print(f"Prediction from {endpoint_name}: {result}")
    except Exception as e:
        print(f"Error invoking SageMaker endpoint {endpoint_name}: {e}")

def invoke_sagemaker_endpoint_pretrain(endpoint_name, payload):
    """Invoke a SageMaker endpoint to get predictions with ContentType='text/plain'."""
    # Initialize the runtime SageMaker client
    runtime_client = boto3.client('runtime.sagemaker', region_name='ca-central-1')  

    try:
        # Ensure payload is a string, since ContentType is 'text/plain'
        if not isinstance(payload, str):
            payload = str(payload)
        
        # Invoke the SageMaker endpoint
        response = runtime_client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType='text/plain',
            Body=payload
        )
        
        # Decode the response
        result = json.loads(response['Body'].read().decode())
        return (result)
        #print(f"Prediction from {endpoint_name}: {result}")
    except Exception as e:
        print(f"Error invoking SageMaker endpoint {endpoint_name}: {e}")
    

In [67]:
list_sagemaker_endpoints()

Listing SageMaker Endpoints:
Endpoint Name: semantic-search-pretrain-all-MiniLM-L6-v2-1719456491, Status: InService


In [68]:
endpoint_name = 'semantic-search-pretrain-all-MiniLM-L6-v2-1719456491'
payload = "This is an example of how to invoke SageMaker endpoints!"
vector = invoke_sagemaker_endpoint_pretrain(endpoint_name, payload)
len(vector)

384