# RAG with Kendra

#### If you are running this in sagemaker studio you need to select a kernel with a python version >3.8



---

This notebook is based on the this aws blog 
* https://aws.amazon.com/blogs/machine-learning/quickly-build-high-accuracy-generative-ai-applications-on-enterprise-data-using-amazon-kendra-langchain-and-large-language-models/

And this repo
* https://github.com/aws-samples/amazon-kendra-langchain-extensions

---

### Solution overview
The following diagram shows the architecture of a GenAI application with a RAG approach.

<img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2023/05/02/ML-13807-image001-new.png">

We use an Amazon Kendra index to ingest enterprise unstructured data from data sources such as wiki pages, MS SharePoint sites, Atlassian Confluence, and document repositories such as Amazon S3. When a user interacts with the GenAI app, the flow is as follows:

1. The user makes a request to the GenAI app.
2. The app issues a search query to the Amazon Kendra index based on the user request.
3. The index returns search results with excerpts of relevant documents from the ingested enterprise data.
4. The app sends the user request and along with the data retrieved from the index as context in the LLM prompt.
5. The LLM returns a succinct response to the user request based on the retrieved data.
6. The response from the LLM is sent back to the user.

With this architecture, you can choose the most suitable LLM for your use case. LLM options include our partners Hugging Face, AI21 Labs, Cohere, and others hosted on an Amazon SageMaker endpoint, as well as models by companies like Anthropic and OpenAI. With Amazon Bedrock, you will be able to choose Amazon Titan, Amazon’s own LLM, or partner LLMs such as those from AI21 Labs and Anthropic with APIs securely without the need for your data to leave the AWS ecosystem. The additional benefits that Amazon Bedrock will offer include a serverless architecture, a single API to call the supported LLMs, and a managed service to streamline the developer workflow.

For the best results, a GenAI app needs to engineer the prompt based on the user request and the specific LLM being used. Conversational AI apps also need to manage the chat history and the context. GenAI app developers can use open-source frameworks such as LangChain that provide modules to integrate with the LLM of choice, and orchestration tools for activities such as chat history management and prompt engineering. We have provided the KendraIndexRetriever class, which implements a LangChain retriever interface, which applications can use in conjunction with other LangChain interfaces such as chains to retrieve data from an Amazon Kendra index. We have also provided a few sample applications in the GitHub repo. You can deploy this solution in your AWS account using the step-by-step guide in this post.

---

0. [Prerequisites](#Prerequisites)
1. [Permissions and environment variables](#1.-Permissions-and-environment-variables)
2. [Select a pre-trained model](#2.-Select-a-pre-trained-model)
3. [Retrieve Artifacts & Deploy an Endpoint](#3.-Retrieve-Artifacts-&-Deploy-an-Endpoint)
4. [Query endpoint and parse response](#4.-Query-endpoint-and-parse-response)
5. [Query endpoint with Langchain and Kendra Index](#5.-Query-endpoint-with-Langchain-and-Kendra-Index)
6. [[OPTIONAL] Installing Streamlet application and running a WebUI for a chatbot](#6.-[OPTIONAL]-Installing-Streamlit-application-and-running-a-WebUI-for-a-chatbot)
7. [Clean up the endpoint](#7.-Clean-up-the-endpoint)

---

## Prerequisites

### Kendra Index

---

Use the provided AWS CloudFormation to create a new Amazon Kendra index. This template includes sample data containing AWS online documentation for Amazon Kendra, Amazon Lex, and Amazon SageMaker. Alternately, if you have an Amazon Kendra index and have indexed your own dataset, you can use that. 


Deployment steps:
   1. Download the [template](https://github.com/aws-samples/amazon-kendra-langchain-extensions/blob/main/kendra_retriever_samples/kendra-docs-index.yaml) from the github repo
   2. Deploy it using the [cloudformation console](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create)
      1. Select Upload a template file and then the choose file button to select the template you just downloaded
      2. Press Next
   3. Provide a stack name and press Next   
   4. Leave all default options and press Next
   5. Check the acknowledgement at the bottom and press Submit
   6. The stack will take around 15 minutes to deploy
   7. Take note of the `AWSRegion` and `KendraIndexID` in the Outputs tab as we will need it in later steps   
---

### Kendra Permissions
---

You will need to update your SageMaker execution role with permissions to query Kendra. 

1. Navigate to IAM console and select Roles
2. Search for your SageMaker execution role it will look like `AmazonSageMaker-ExecutionRole-<TIMESTAMP>`
3. Select your role and add permission search for `AmazonKendraReadOnlyAccess` and attach the policy

---

### AWS Langchain

---

This repo provides a set of utility classes to work with Langchain. It currently has a retriever class KendraIndexRetriever for working with a Kendra index and sample scripts to execute the QA chain for SageMaker, Open AI and Anthropic providers.

---

Clone the repository:

In [None]:
!git clone https://github.com/aws-samples/amazon-kendra-langchain-extensions.git

Install the classes:
we use a specific version of amazon-kendra-langchain-extensions as it later gets integrated into langchain and would require change of the code of this lab

In [None]:
!cd amazon-kendra-langchain-extensions && git checkout 28cb1d4de7cf3bfe8984c6365ce248c12e8b77e0 && pip install . --quiet

### 1. Permissions and environment variables

---
To host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. 

---

In [None]:
import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

### 2. Select a pre-trained model
***
You can continue with the default model, or can choose a different model. A complete list of SageMaker pre-trained models can also be accessed at [SageMaker pre-trained Models](https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html#). Be sure to select a model that can be used for text2text generation.
***

In [None]:
model_id, model_version = "huggingface-text2text-flan-t5-xl", "*"

### 3. Retrieve Artifacts & Deploy an Endpoint

***

Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. This may take a few minutes.

***

In [None]:
from sagemaker import image_uris, instance_types, model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


endpoint_name = name_from_base(f"jumpstart-example-{model_id}")

# Retrieve the inference instance type for the specified model.
instance_type = instance_types.retrieve_default(
    model_id=model_id, model_version=model_version, scope="inference"
)

# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

# Retrieve the model uri.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# Deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

**If you have a predeployed endpoint you want to use you can define it here and avoid running the code block above**

In [None]:
# endpoint_name = "<YOUR_PREDEPLOYED_ENDPOINT_NAME>"

### 4. Query endpoint and parse response

---
Input to the endpoint is any string of text formatted as json and encoded in `utf-8` format. Output of the endpoint is a `json` with generated text.

---

In [None]:
def query_endpoint(encoded_text, endpoint_name):
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/x-text", Body=encoded_text
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_text"]
    return generated_text

---
Below, we can query the model for information that it does not know.

---

In [None]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

text = "What is Amazon Lex?"

query_response = query_endpoint(text.encode("utf-8"), endpoint_name=endpoint_name)
generated_text = parse_response(query_response)
print(
    f"Inference:{newline}"
    f"input text: {text}{newline}"
    f"generated text: {bold}{generated_text}{unbold}{newline}"
)

### 5. Query endpoint with Langchain and Kendra Index

---
Now we will use use the `KendraIndexRetriever` retriever class with Langchain to retrieve information from our Kendra Index that matches the query.

---

In [None]:
# UPDATE THE FOLLOWING WITH THE OUTPUTS FROM THE CLOUDFORMATION DEPLOYMENT
kendra_index_id="<YOUR_KENDRA_INDEX_ID>"
region="<YOUR_KENDRA_INDEX_DEPLOYMENY_REGION>"

In [None]:
from aws_langchain.kendra_index_retriever import KendraIndexRetriever
from langchain.chains import RetrievalQA
from langchain import OpenAI
from langchain.prompts import PromptTemplate
from langchain import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import ContentHandlerBase
import json

class ContentHandler(ContentHandlerBase):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode('utf-8')

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json["generated_texts"][0]

content_handler = ContentHandler()
llm=SagemakerEndpoint(
        endpoint_name=endpoint_name,
        region_name="us-east-1", 
        model_kwargs={"temperature":1e-10, "max_length": 500},
        content_handler=content_handler
    )

retriever = KendraIndexRetriever(kendraindex=kendra_index_id,
        awsregion=region,
        return_source_documents=True
    )

prompt_template = """
The following is a friendly conversation between a human and an AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it
does not know.
{context}
Instruction: Based on the above documents, provide a detailed answer for, {question} Answer "don't know" if not present in the document. Solution:
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(
    llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs=chain_type_kwargs,
    return_source_documents=True
)

result = qa("What's Amazon Lex?")
print(f'{bold}Answer{unbold}: {result["result"]}\n\n{bold}Sources:{unbold}')

for doc in result['source_documents']:
    print(f'''\n{doc.metadata["title"]}\n{doc.metadata["source"]}\n{doc.metadata["excerpt"]}\n''')


### Example Queries

The quality of the response you will recieve is going to depend on the model you have deployed. Try deploying a different model endpoint and comparing the results you recieve from the same queries.

In [None]:
result = qa("What service can I use to build a chatbot?")
print(f'{bold}Answer{unbold}: {result["result"]}')

In [None]:
result = qa("How can I track the health of my amazon lex box?")
print(f'{bold}Answer{unbold}: {result["result"]}')

In [None]:
result = qa("What's the pricing for amazon Kendra?")
print(f'{bold}Answer{unbold}: {result["result"]}')

In [None]:
result = qa("What do I need for a sagemaker endpoint configuration?")
print(f'{bold}Answer{unbold}: {result["result"]}')

In [None]:
result = qa("List the steps to deploy a sagemaker endpoint?")
print(f'{bold}Answer{unbold}: {result["result"]}')

### 6. [OPTIONAL] Installing Streamlit application and running a WebUI for a chatbot

---
This sections provides instructions on how to run a streamlet application within sagemaker studio and accessing it using jupyter proxy. The commands and instructions below need to be run inside a **SageMaker System Terminal**.

---

1. Launch a new SageMaker System Terminal 
   1. From the SageMaker Studio Home screen select `Open Launcher`
   2. From the Launcher panel under `Utilities and files` select `System terminal`
2. Activate the conda environment
```
conda activate studio
```
3. Install the AWS Langchain utility classes from the repo downloaded in step 1 (make sure you're in the right folder)
```
pip install ./amazon-kendra-langchain-extensions
```
4. [Optional] For executing sample chains, install the optional dependencies
```
pip install "./amazon-kendra-langchain-extensions/[samples]"
```
5. Set your environment variables
```
export AWS_REGION="<YOUR_AWS_REGION>"
export KENDRA_INDEX_ID="<YOUR_KENDRA_INDEX_ID>"
export FLAN_XL_ENDPOINT="<YOUR_SAGEMAKER_ENDPOINT_FOR_FLAN_T_XL>"
```
6. Run the streamlit application
```
cd ./amazon-kendra-langchain-extensions/samples
streamlit run app.py flanxl
```
7. This will output something similar to the below, you need to take note of the port (in this case 8501)
```
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.


  You can now view your Streamlit app in your browser.

  Network URL: http://169.255.255.2:8501
  External URL: http://18.213.200.192:8501
```
8. Copy the current URL of the SageMaker Studio which should have the form:
```
https://<YOUR_STUDIO_DOMAIN>.studio.<AWS_REGION>.sagemaker.aws/jupyter/default/lab/workspaces/auto-Z/tree/kendra_rag_demo.ipynb
```
9. Delete everything from `lab/` onwards and replace it with `proxy/<PORT>/`
   1. DON'T FORGET THE END `/`
```
https://<YOUR_STUDIO_DOMAIN>.studio.<AWS_REGION>.sagemaker.aws/jupyter/default/proxy/8501/
```
10. Paste the new address into the browser and you will now be able to access your chatbot UI which uses Langchain and Kendra. Each response will list the sources from Kendra it used for its answers.

### 7. Clean up the endpoint

In [None]:
# Delete the SageMaker endpoint
model_predictor.delete_model()
model_predictor.delete_endpoint()