# Retrieval Augmented Generation

We will use the semantic search to provide the best matching wine based on the review description. [Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401) is a process that combines retrieval-based models and generative models to enhance natural language generation by retrieving relevant information and incorporating it into the generation process. In this notebook, we'll walk through enhancing an OpenSearch cluster search with generative AI to output conversational wine recommendations based on a desired description.

### 1. Install OpenSearch ML Python library

For this notebook we require the use of a few key libraries. We'll use the Python clients for OpenSearch and SageMaker, and Python frameworks for text embeddings.

In [None]:
!pip install opensearch-py-ml accelerate tqdm --quiet
!pip install sagemaker --upgrade --quiet

### 2. Check PyTorch Version


As in the previous modules, let's import PyTorch and confirm that the latest version of PyTorch is running. The version should already be 1.13.1 or higher. If not, please run the lab in order to get everything set up.

In [None]:
import torch
print(torch.__version__)

Now we need to restart the kernel by running below cell.

In [None]:
from IPython.display import display_html
def restartkernel() :
    display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)
restartkernel()

### 3. Import libraries
The line below will import all the relevant libraries and modules used in this notebook.

In [None]:
import boto3
import os
import time
import json
import pandas as pd
from tqdm import tqdm
import sagemaker
from opensearchpy import OpenSearch, RequestsHttpConnection
from sagemaker import get_execution_role


### 4. Prepare data

This lab combines semantic search with a generative model to present the retrieved data to the user . Below is a dataset of wine reviews, we'll sample this data set to recommend wines that resemble the user provided description.

### Note
You can download the dataset from various sources. One is Kaggle (You will need to create a free account):
https://www.kaggle.com/datasets/christopheiv/winemagdata130k?select=winemag-data-130k-v2.json

1. Navigate to the SageMaker Notebook URL (refer back to the outputs tab on the CloudFormation stack if you can't find it)
2. Click "Upload" to upload the zip downloaded from Kaggle
3. Click "New" -> "Terminal" to open a terminal window
4. Navigate to the SageMaker -> semantic-search-with-amazon-opensearch directory e.g. `cd SageMaker/semantic-search-with-amazon-opensearch`
5. Unzip the uploaded zip file e.g. `unzip archive.zip`

After downloading and copying here, execute the following cells to inspect the dataset, transform it into a pandas DataFrame, and sample a subset of the data.

In [None]:
df = pd.read_json('winemag-data-130k-v2.json')

df.sample(3)

In [None]:
df.columns

In [None]:
wm_list = df.sample(300,
                   random_state=37).to_dict('records') # sample to keep lab quick

wm_list[:1]

### 5. Create an OpenSearch cluster connection.
Next, we'll use Python API to set up connection with OpenSearch Cluster.

Note: if you're using a region other than us-east-1, please update the region in the code below.

#### Get Cloud Formation stack output variables

We also need to grab some key values from the infrastructure we provisioned using CloudFormation. To do this, we will list the outputs from the stack and store this in "outputs" to be used later.

You can ignore any "PythonDeprecationWarning" warnings.

In [None]:
region = 'us-east-1' 

cfn = boto3.client('cloudformation')

def get_cfn_outputs(stackname):
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

## Setup variables to use for the rest of the demo
cloudformation_stack_name = "semantic-search"

outputs = get_cfn_outputs(cloudformation_stack_name)
aos_host = outputs['OpenSearchDomainEndpoint']

outputs

In [None]:
kms = boto3.client('secretsmanager')
aos_credentials = json.loads(kms.get_secret_value(SecretId=outputs['OpenSearchSecret'])['SecretString'])

#credentials = boto3.Session().get_credentials()
#auth = AWSV4SignerAuth(credentials, region)
auth = (aos_credentials['username'], aos_credentials['password'])

aos_client = OpenSearch(
    hosts = [{'host': aos_host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

### 6. Get SageMaker endpoint for embedding

---
This is SageMaker Endpoint with GPT-J 6B parameters model to convert text into vector.

In [None]:
embedding_endpoint_name=outputs['EmbeddingEndpointName']
print(embedding_endpoint_name)

Define function to convert text into vector with SageMaker Embedding endpoint

In [None]:
def query_endpoint_with_json_payload(encoded_json, endpoint_name, content_type="application/json"):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType=content_type, Body=encoded_json
    )
    #print(response)
    response_json = json.loads(response['Body'].read().decode("utf-8"))
    embeddings = response_json["embedding"]
    if len(embeddings) == 1:
        return [embeddings[0]]
    return embeddings


### 7. Test the embeddings endpoint with a sample phrase
Using any text phrase, the endpoint converts the text to a vectorized array of size 768. We're also creating a function `embed_phrase` so that we can call it later.

In [None]:
def embed_phrase(input_data):
    input_str = json.dumps({"text_inputs": input_data})
    encoded_input_str = input_str.encode("utf-8")
    features = query_endpoint_with_json_payload(encoded_input_str,embedding_endpoint_name)
    return features

Ask one question about wine

In [None]:
question_on_wine="A wine that pairs well with meat."

In [None]:
result = embed_phrase(question_on_wine)

print(len(result[0]))
result[0][:10]

### 8. Create a index in Amazon Opensearch Service 
Whereas we previously created an index with 2-3 fields, this time we'll define the index with multiple fields: the vectorization of the `description` field, and all others present within the dataset.

To create the index, we first define the index in JSON, then use the aos_client connection we initiated ealier to create the index in OpenSearch.

In [None]:
knn_index = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "analysis": {
          "analyzer": {
            "default": {
              "type": "standard",
              "stopwords": "_english_"
            }
          }
        }
    },
    "mappings": {
        "properties": {
            "description_vector": {
                "type": "knn_vector",
                "dimension": 4096,
                "store": True
            },
            "description": {
                "type": "text",
                "store": True
            },
            "designation": {
                "type": "text",
                "store": True
            },
            "variety": {
                "type": "text",
                "store": True
            },
            "country": {
                "type": "text",
                "store": True
            },
            "winery": {
                "type": "text",
                "store": True
            },
            "points": {
                "type": "integer",
                "store": True
            },
        }
    }
}


Using the above index definition, we now need to create the index in Amazon OpenSearch. Running this cell will recreate the index if you have already executed this notebook.

In [None]:
index_name = "wine_knowledge_base"

try:
    aos_client.indices.delete(index=index_name)
    print("Recreating index '" + index_name + "' on cluster.")
    aos_client.indices.create(index=index_name,body=knn_index,ignore=400)
except:
    print("Index '" + index_name + "' not found. Creating index on cluster.")
    aos_client.indices.create(index=index_name,body=knn_index,ignore=400)


Let's verify the created index information

In [None]:
aos_client.indices.get(index=index_name)

### 9. Load the raw data into the Index
Next, let's load the wine review data into the index we've just created. During ingest data defined by the `os_import` function, `description` field will also be converted to vector (embedding) by calling the previously created endpoint.

In [None]:
def os_import(record, aos_client, index_name):
    description = record["description"]
    search_vector = embed_phrase(description)
    aos_client.index(index=index_name,
             body={"description_vector": search_vector[0], 
                   "description": record["description"],
                   "points":record["points"],
                   "variety":record["variety"],
                   "country":record["country"],
                   "designation":record["designation"],
                   "winery":record["winery"]
                  }
            )

print("Loading records...")
for record in tqdm(wm_list): 
    os_import(record, aos_client, index_name)
print("Records loaded.")

To validate the load, we'll query the number of documents number in the index. We should have 300 hits in the index, or however many was specified earlier in sampling.

In [None]:
res = aos_client.search(index=index_name, body={"query": {"match_all": {}}})
print("Records found: %d." % res['hits']['total']['value'])

### 10. Search vector with "Semantic Search" 

Now we can define a helper function to execute the search query for us to find a wine whose review most closely matches the requested description. `retrieve_opensearch_with_semantic_search` embeds the search phrase, searches the index for the closest matching vector, and returns the top result.


In [None]:
def retrieve_opensearch_with_semantic_search(phrase, n=1):
    search_vector = embed_phrase(phrase)[0]
    osquery={
        "_source": {
            "exclude": [ "description_vector" ]
        },
        
      "size": n,
      "query": {
        "knn": {
          "description_vector": {
            "vector":search_vector,
            "k":n
          }
        }
      }
    }

    res = aos_client.search(index=index_name, 
                           body=osquery,
                           stored_fields=["description","winery","points", "designation", "country"],
                           explain = True)
    top_result = res['hits']['hits'][0]
    
    result = {
        "description":top_result['_source']['description'],
        "winery":top_result['_source']['winery'],
        "points":top_result['_source']['points'],
        "designation":top_result['_source']['designation'],
        "country":top_result['_source']['country'],
        "variety":top_result['_source']['variety'],
    }
    
    return result


Use the semantic search to get similar records with the sample question

In [None]:
example_request = retrieve_opensearch_with_semantic_search(question_on_wine)
print(example_request)

### 11. Get SageMaker endpoint for content generation

We are using Falcon 7B LLM in this lab. Please refere HuggingFace documentaion for more information: https://huggingface.co/tiiuae/falcon-7b

In [None]:
llm_endpoint_name=outputs['LLMEndpointName']
print(llm_endpoint_name)


Define function to use LLM to generate content. As LLM is trained with static, outdated data, and it does not have business domain knowledge, the generated content is not factual(hallucination).

In [None]:
def query_llm_endpoint_with_json_payload(encoded_json, endpoint_name, content_type="application/json"):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType=content_type, Body=encoded_json
    )
    model_predictions = json.loads(response["Body"].read())
    return [gen["generated_text"] for gen in model_predictions]

def query_llm_with_hallucination(question):
    payload = {
        "inputs": question,
        "parameters":{
            "max_new_tokens": 1024,
            "num_return_sequences": 1,
            "top_k": 100,
            "top_p": 0.95,
            "do_sample": False,
            "return_full_text": True,
            "temperature": 0.9
        }
    }
    query_response = query_llm_endpoint_with_json_payload(json.dumps(payload).encode("utf-8"), endpoint_name=llm_endpoint_name)
    return query_response


Check the generated result from LLM with hallucination.

In [None]:
generated_texts = query_llm_with_hallucination(question_on_wine)

print(f"The recommened wine from LLM with hallucination: \n{generated_texts[0]}\n")

### Retrieval Augmented Generation
---
To resolve LLM hallunination problem, we can more context to LLM so that LLM can use context information to fine the model and generated factual result. RAG is one of the solution to the LLM hallucination. 


### 12. Create a prompt for the LLM using the search results from OpenSearch

We will be using the Falcon-7B model for one-shot generation, using a canned recommendation and response to guide the output. 

Before querying the model, the below function `generate_prompt_to_llm` is used to easily make a prompt for one-shot generation. The function takes in an input string to search the OpenSearch cluster for a matching wine, then compose the prompt to LLM. The prompt is in the following format:

```
A sommelier uses their vast knowledge of wine to make great recommendations people will enjoy. As a sommelier, you must include the wine variety, the country of origin, and a colorful description relating to the following phrase: {original_question_on_win}.

Data:{'description': 'This perfumey white dances in intense and creamy layers of stone fruit and vanilla, remaining vibrant and balanced from start to finish. The generous fruit is grown in the relatively cooler Oak Knoll section of the Napa Valley. This should develop further over time and in the glass.', 'winery': 'Darioush', 'points': 92, 'designation': None, 'country': 'US'}

Recommendation:I have a wonderful wine for you. It's a dry, medium bodied white wine from Darioush winery in the Oak Knoll section of Napa Valley, US. It has flavors of vanilla and oak. It scored 92 points in wine spectator.

Data: {retrieved_documents}

Recommendation:
```



In [None]:
def generate_prompt_to_llm(original_question_on_win):
    retrieved_documents = retrieve_opensearch_with_semantic_search(original_question_on_win)
    print("retrieved relevant wine per your query is : \n" + str(retrieved_documents))
    print("------------")
    one_shot_description_example = "{'description': 'This perfumey white dances in intense and creamy layers of stone fruit and vanilla, remaining vibrant and balanced from start to finish. The generous fruit is grown in the relatively cooler Oak Knoll section of the Napa Valley. This should develop further over time and in the glass.', 'winery': 'Darioush', 'points': 92, 'designation': None, 'country': 'US'}"
    one_shot_response_example = "I have a wonderful wine for you. It's a dry, medium bodied white wine from Darioush winery in the Oak Knoll section of Napa Valley, US. It has flavors of vanilla and oak. It scored 92 points in wine spectator."
    prompt = (
        f"A sommelier uses their vast knowledge of wine to make great recommendations people will enjoy. As a sommelier, you must include the wine variety, the country of origin, and a colorful description relating to the following phrase: {original_question_on_win}.\n"
        f"Data: {one_shot_description_example} \n Recommendation: {one_shot_response_example} \n"
        f"Data: {retrieved_documents} \n Recommendation:"
    )
    
    return prompt

### 13. Format LLM prompt and query using the generated prompt
We also need a few more helper functions to query the LLM. `generate_llm_input` transforms the generated prompt into the correct input format, `render_llm_output` parses the LLM output. 

`query_llm_with_rag` combines everything we've done in this module. It does all of the following:
- generate vector for the input
- searches the OpenSearch index with semantic search for the relevant wine with "description vector"
- generate an LLM prompt from the search results
- queriy the LLM with RAG for a response

In [None]:
def generate_llm_input(data, **kwargs):
    default_kwargs = {
        "num_beams": 5,
        "no_repeat_ngram_size": 3,
        "do_sample": True,
        "max_new_tokens": 100,
        "temperature": 0.01,
        "watermark": True,
        "top_k": 200,
        "max_length": 200,
        "early_stopping": True
    }
    
    default_kwargs = {**default_kwargs, **kwargs}
    
    input_data = {
        "inputs": data,
        "parameters": default_kwargs
    }
    
    return input_data

def query_llm_with_rag(description, **kwargs):
    prompt = generate_prompt_to_llm(description)
    query_payload = generate_llm_input(prompt, **kwargs)
    response = query_llm_endpoint_with_json_payload(json.dumps(query_payload).encode("utf-8"), endpoint_name=llm_endpoint_name)
    return response

#### And finally, let's call the function and get a wine recommendation.

In [None]:
recommendation = query_llm_with_rag(question_on_wine)
print(recommendation)

### Additional info: changing kwargs for querying the LLM
If you want to change or add new parameters for LLM querying, you're able to add in new keyword arguments to the `query_llm` function. For example, to change the `temperature` value, simply change the function call:
`query_llm(description phrase, temperature = new float value)`