# Semantic Search with OpenSearch Neural Search 

We will use the semantic search to provide the best matching wine based on the review description.

### 1. Check PyTorch Version


As in the previous modules, let's import PyTorch and confirm that have have the latest version of PyTorch. The version should already be 1.10.2 or higher. If not, please run the lab in order to get everything set up.

In [None]:
import torch
print(torch.__version__)

### 2. Retrieve notebook variables

The line below will retrieve your shared variables from the previous notebook.

In [None]:
%store -r

### 3. Install OpenSearch ML Python library

In [None]:
!pip install opensearch-py-ml
!pip install accelerate

Now we need to restart the kernel by running below cell.

In [None]:
from IPython.display import display_html
def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)
restartkernel()

### 4. Import library



In [None]:
import boto3
import re
import time

### 5. Prepare Headset WMD data
You can download the dataset from various sources. One is Kaggle.
https://www.kaggle.com/datasets/christopheiv/winemagdata130k?select=winemag-data-130k-v2.json

After downloading and copying here, unzip in the working directory

In [None]:
# https://www.kaggle.com/datasets/christopheiv/winemagdata130k?select=winemag-data-130k-v2.json

!unzip -o winemag-data-130k-v2.json.zip

In [None]:
import pandas as pd

df = pd.read_json('winemag-data-130k-v2.json')

df.sample(3)

In [None]:
import json
import pandas as pd

# wm_list = df.to_dict('records')
wm_list = df.sample(500).to_dict('records') # sample to keep lab quick

wm_list[:5]

### 6. Create an OpenSearch cluster connection.
Next, we'll use Python API to set up connection with OpenSearch Cluster.

Note: if you're using a region other than us-east-1, please update the region in the code below.

#### Get Cloud Formation stack output variables

We also need to grab some key values from the infrastructure we provisioned using CloudFormation. To do this, we will list the outputs from the stack and store this in "outputs" to be used later.

You can ignore any "PythonDeprecationWarning" warnings.

In [None]:
import boto3

cfn = boto3.client('cloudformation')

def get_cfn_outputs(stackname):
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

## Setup variables to use for the rest of the demo
cloudformation_stack_name = "semantic-search"

outputs = get_cfn_outputs(cloudformation_stack_name)

bucket = outputs['s3BucketTraining']
aos_host = outputs['OpenSearchDomainEndpoint']

outputs

In [None]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3

region = 'us-east-1' 

#credentials = boto3.Session().get_credentials()
#auth = AWSV4SignerAuth(credentials, region)
auth = ("master","Semantic123!")
index_name = 'nlp_wmd'

aos_client = OpenSearch(
    hosts = [{'host': aos_host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

### 7. Configure OpenSearch domain to enable run Machine Learning code in data node

In [None]:
s = b'{"transient":{"plugins.ml_commons.only_run_on_ml_node": false}}'
aos_client.cluster.put_settings(body=s)

Verify `plugins.ml_commons.only_run_on_ml_node` is set to false

In [None]:
aos_client.cluster.get_settings(flat_settings=True)

### 8. Download pre-trained BERT model

In [None]:
import urllib.request
urllib.request.urlretrieve('https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true', 'model/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip')


Verify model is downloaded successfully in the `model` folder

In [None]:
!ls -al model

### 9. Upload BERT model to OpenSearch domain

In [None]:
from opensearch_py_ml.ml_models import SentenceTransformerModel
from opensearch_py_ml.ml_commons import MLCommonClient

ml_client = MLCommonClient(aos_client)

In [None]:

model_path = './model/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip'
model_config_path = './model/all-MiniLM-L6-v2_torchscript.json'


model_id=ml_client.upload_model(model_path, model_config_path, isVerbose=True)

print("model id:" + model_id)

ml_client.unload_model(model_id)

### 10. Load the model for inference.

In [None]:
load_model_output = ml_client.load_model(model_id)

print(load_model_output)
task_id = load_model_output['task_id']

Get the task detailed information.

In [None]:
task_info = ml_client.get_task_info(task_id)

print(task_info)

Get the model detailed information.

In [None]:
model_info = ml_client.get_model_info(model_id)

print(model_info)

### 11. Create pipeline to convert text into vector with BERT model
We will use the just uploaded model to convert `qestion` field into vector(embedding) and stored into `question_vector` field.

In [None]:
pipeline={
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text_embedding": {
        "model_id": model_id,
        "field_map": {
           "description": "description_vector"
        }
      }
    }
  ]
}
pipeline_id = 'nlp_pipeline'
aos_client.ingest.put_pipeline(id=pipeline_id,body=pipeline)

Verify pipeline is created succefuflly.

In [None]:
aos_client.ingest.get_pipeline(id=pipeline_id)

### 12. Create a index in Amazon Opensearch Service 
Whereas we previously created an index with 2 fields, this time we'll define the index with 3 fields: the first field ' question_vector' holds the vector representation of the question, the second is the "question" for raw sentence and the third field is "answer" for the raw answer data.

To create the index, we first define the index in JSON, then use the aos_client connection we initiated ealier to create the index in OpenSearch.

In [None]:
knn_index = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "default_pipeline": pipeline_id,
        "analysis": {
          "analyzer": {
            "default": {
              "type": "standard",
              "stopwords": "_english_"
            }
          }
        }
    },
    "mappings": {
        "properties": {
            "description_vector": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "name": "hnsw",
                    "space_type": "l2",
                    "engine": "faiss"
                },
                "store": True
            },
            "description": {
                "type": "text",
                "store": True
            },
            "designation": {
                "type": "text",
                "store": True
            },
            "variety": {
                "type": "text",
                "store": True
            },
            "country": {
                "type": "text",
                "store": True
            },
            "winery": {
                "type": "text",
                "store": True
            },
            "points": {
                "type": "integer",
                "store": True
            },
        }
    }
}


If for any reason you need to recreate your dataset, you can uncomment and execute the following to delete any previously created indexes. If this is the first time you're running this, you can skip this step.

In [None]:
aos_client.indices.delete(index="nlp_wmd")
# 

Using the above index definition, we now need to create the index in Amazon OpenSearch

In [None]:
aos_client.indices.create(index="nlp_wmd",body=knn_index,ignore=400)


Let's verify the created index information

In [None]:
aos_client.indices.get(index="nlp_wmd")

### 13. Load the raw data into the Index
Next, let's load the headset enhanced PQA data into the index we've just created. During ingest data, `question` field will also be converted to vector(embedding) by the `nlp_pipeline` we defined.

In [None]:
i = 0
for c in wm_list:
    content=c['description']
    description=c['description']
    points=c["points"]
    variety=c["variety"]
    country=c["country"]
    designation=c["designation"]
    winery=c["winery"]
    
    i+=1
    
    aos_client.index(index='nlp_wmd',body={
        "content": content,
        "points": points,
        "variety": variety,
        "country": country,
        "description": description,
        "designation": designation,
        "winery": winery,
    })

To validate the load, we'll query the number of documents number in the index. We should have 1000 hits in the index.

In [None]:
res = aos_client.search(index="nlp_wmd", body={"query": {"match_all": {}}})
print("Records found: %d." % res['hits']['total']['value'])


In [None]:
# res

### 14. Search vector with "Semantic Search" 

We can search the data with neural search.


In [None]:
query={
  "_source": {
        "exclude": [ "description_vector" ]
    },
  "size": 30,
  "query": {
    "neural": {
      "description_vector": {
        "query_text": "big bold cab with berries and cherries",
        "model_id": model_id,
        "k": 30
      }
    }
  }
}

res = aos_client.search(index="nlp_wmd", 
                       body=query,
                       stored_fields=["description","winery","points", "designation", "country"])

print("Got %d Hits:" % res['hits']['total']['value'])
query_result=[]
for hit in res['hits']['hits']:
    row=[
            hit['_id'],
            hit['_score'],
            hit['_source']['description'],
            hit['_source']['winery'],
            hit['_source']['points'],
            hit['_source']['designation'],
            hit['_source']['country'],
        ]
    query_result.append(row)
    
query_result[0]

query_result_df = pd.DataFrame(data=query_result,columns=[
                                                        "_id",
                                                        "_score",
                                                        "description",
                                                        "winery", 
                                                        "points", 
                                                        "designation",
                                                        "country",                                                                        
                                                     ])
display(query_result_df)

import sagemaker, json
from sagemaker import get_execution_role

aws_role = get_execution_role()



In [None]:
import sagemaker, json
from sagemaker import get_execution_role
from datetime import datetime
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

aws_role = get_execution_role()

# model_version = "*" fetches the latest version of the model
model_id, model_version = "pytorch-textgeneration1-alexa20b", "*"

endpoint_name = name_from_base(f"jumpstart-console-infer-{model_id}")

endpoint_config_name = "config-" + endpoint_name


# GPU Instance Reqts: >50 GB of CPU RAM and >42 GB of GPU memory in total
# Tested with ml.g4dn.12xlarge, ml.p3.8xlarge and ml.p3.16xlarge
instance_type = "ml.g4dn.12xlarge"

# If using an EBS-backed instance, you must specify at least 256 GB of storage
# If using an instance with local SSD storage, volume_size must be None
if instance_type == "ml.g4dn.12xlarge":
    volume_size = None
elif instance_type in ["ml.p3.8xlarge", "ml.p3.16xlarge"]:
    volume_size = 256
else:
    volume_size = None
    print(
        f"Instance_type={instance_type} not tested. Setting volume_size = None."
        "If you run into out of space errors and your instance supports EBS storage,"
        "please set volume_size = 256."
    )

# Retrieve the inference docker container uri. This is the base PyTorch container image.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)


# Retrieve the model uri. This includes both pre-trained parameters, inference handling scripts and any dependencies.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

env = {
    "SAGEMAKER_MODEL_SERVER_TIMEOUT": str(3600),
    "MODEL_CACHE_ROOT": "/opt/ml/model",
    "SAGEMAKER_ENV": "1",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code/",
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_MODEL_SERVER_WORKERS": "1",  # without this, there will be one process per GPU
    "TS_DEFAULT_WORKERS_PER_MODEL": "1",  # without this, each worker will have 1/num_gpus the RAM
}

# Create the SageMaker model instance. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
    env=env,
)

print("☕ Spinning up the endpoint. This will take a little while ☕")

# deploy the Model.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name,
    volume_size=volume_size,  # Specify the size of the Amazon EBS volume.
    model_data_download_timeout=3600,  # Specify the model download timeout in seconds.
    container_startup_health_check_timeout=3600,  # Specify the health checkup timeout in seconds
)

In [None]:
def query(model_predictor, text, generate_kwargs=None, max_num_attempts=5):
    """Query the model predictor.

    model_predictor: The deployed model pipeline.
    text: a string or list of strings to input to the model pipeline.
    generate_kwargs: A dictionary of generation arguments.
    max_num_attempts: Maximum number of invokation request.

    returns: A JSON of the model outputs.
    """

    payload = {"text_inputs": text}
    if generate_kwargs is not None:
        payload.update(generate_kwargs)

    encoded_inp = json.dumps(payload).encode("utf-8")
    for _ in range(max_num_attempts):
        try:
            query_response = model_predictor.predict(
                encoded_inp,
                {"ContentType": "application/json", "Accept": "application/json"},
            )
            break
        except Exception as e:
            print("Invokation request unsuccessful. Retrying.")
            continue
    return query_response


def parse_response(query_response):
    """Parse response and return the list of generated texts."""

    return json.loads(query_response)["generated_texts"]


newline, bold, unbold = "\n", "\033[1m", "\033[0m"

text = f"[CLM] Wine Recomendation: [{{'description': 'Big, tough, gutsy, fruity, tannic. In other words, Petite Sirah, and classic at that. Shows very ripe, deep and long-lasting flavors of blackberries, blueberries, currants, chocolate, cedar and spices, in a bone-dry, full-bodied red wine. Good now, and should develop over a decade.','winery': 'Field Stone','points': 92,'designation': 'Staten Family Reserve','country': 'US'}}] ==> My Recomendation: [You should try Staten Family Reserve by Field Stone in the US. I was blown away the first time I tried it. It's an instant classic. Big, tough, gutsy, fruity, tannic. It's a dry, full bodied wine flavors of blackberries, blueberries, currants, chocolate, cedar and spices and scored 92 points in wine spectator.] <br><br><br> Wine Recomendation [{recomendation}] ==> My Recomendation:"

kwargs = {
    "num_beams": 5, 
    "no_repeat_ngram_size": 3, 
    "temperature": 1, 
#     "top_p": .8,
    "top_k": 147,
    "max_length": 250,
    "early_stopping": True,
    "seed": 0,
}
query_response = query(model_predictor, text, kwargs)
generated_texts = parse_response(query_response)
print(f"Input text: {text}{newline}" f"Generated text: {bold}{generated_texts}{unbold}{newline}")

In [None]:
def query_wines(desired_description, n=1):
    osquery={
      "_source": {
            "exclude": [ "description_vector" ]
        },
      "size": 30,
      "query": {
        "neural": {
          "description_vector": {
            "query_text": desired_description,
            "model_id": model_id,
            "k": 30
          }
        }
      }
    }

    res = aos_client.search(index="nlp_wmd", 
                           body=osquery,
                           stored_fields=["description","winery","points", "designation", "country"])

    print("Got %d Hits:" % res['hits']['total']['value'])
    query_result=[]
    for hit in res['hits']['hits']:
        row=[
                hit['_id'],
                hit['_score'],
                hit['_source']['description'],
                hit['_source']['winery'],
                hit['_source']['points'],
                hit['_source']['designation'],
                hit['_source']['country'],
            ]
        query_result.append(row)

    query_result_df = pd.DataFrame(data=query_result,columns=[
                                                            "_id",
                                                            "_score",
                                                            "description",
                                                            "winery", 
                                                            "points", 
                                                            "designation",
                                                            "country",                                                                        
                                                         ])
    
    query_result_df.drop(['_id', '_score'], inplace=True, axis=1)
    result = query_result_df.head(n).to_dict('records')
    return result

query_wines('big and bold, jammy, blackberries', 2)

In [None]:
def render_prompt(requested_description):
    recomendation = query_wines(requested_description)[0]
    prompt = f"[CLM] Wine Recomendation: [{{'description': 'Big, tough, gutsy, fruity, tannic. In other words, Petite Sirah, and classic at that. Shows very ripe, deep and long-lasting flavors of blackberries, blueberries, currants, chocolate, cedar and spices, in a bone-dry, full-bodied red wine. Good now, and should develop over a decade.','winery': 'Field Stone','points': 92,'designation': 'Staten Family Reserve','country': 'US'}}] ==> My Recomendation: [You should try Staten Family Reserve by Field Stone in the US. I was blown away the first time I tried it. It's an instant classic. Big, tough, gutsy, fruity, tannic. It's a dry, full bodied wine flavors of blackberries, blueberries, currants, chocolate, cedar and spices and scored 92 points in wine spectator.] <br><br><br> Wine Recomendation [{recomendation}] ==> My Recomendation:"
    return prompt

prompt = render_prompt("light, fruity goes great with fish")

prompt

In [None]:
query_response = query(model_predictor, prompt, kwargs)
generated_texts = parse_response(query_response)

print(f"Input text: {prompt}{newline}" f"Generated text: {bold}{generated_texts}{unbold}{newline}")

### 17. Summary


In [None]:
for line in prompt.split("\n"):
    print(line)