# RAG with Kendra and Falcon 40b Instruct

## Prerequisite

### Kendra Index

---

We will leverage the previous labs Kendra Index for demonstration purposes, in order for this notebook to work, please add `AmazonKendraFullAccess` policy to the IAM role

1. Navigate to your notebook instance `https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/notebook-instances/` and open the associated IAM Role
![Notebook instance](./assets/lab2/01_notebook_instance.png)

2. Attach the `AmazonKendraFullAccess` policy to the role
![Notebook instance](./assets/lab2/02_permissions.png)

---

## Lab: Adding RAG capabilities to Langchain

---

We start by setting up our variables for the workshop, add your Kendra Index ID found in the Kendra Console, we will also populate the current AWS region and the API Key you were provided for Falcon API Access

---

In [None]:
CONFIG = {
    "api_url": 'https://falcon.cliffordduke.cloud',
    "api_key": '',
    "kendra_index": '', 
    "region": 'us-west-2'
}

---

Let's install and load the required dependencies

---

In [None]:
!pip install boto3 langchain wikipedia --quiet

In [None]:
import boto3
from langchain import LLMChain
from langchain.llms import AmazonAPIGateway
from langchain.retrievers import AmazonKendraRetriever
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory
from langchain.prompts import PromptTemplate
from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.tools import Tool
from langchain.utilities import WikipediaAPIWrapper


---

We start with defining a langchain supported llm, in this case we are using an already deployed `AmazonAPIGateway` that proxies to a cluster of `Falcon 40b Instruct` models hosted on Amazon SageMaker Hosted Inference.

Next we will also configure the `AmazonKendraRetriever` allowing langchain to programmatically access our Kendra document index

If you are interested in deploying your own API cluster, you can use this [CDK Template](https://github.com/cliffordduke/cdk-llm-api-gateway)

---

In [None]:
llm = AmazonAPIGateway(
    api_url=CONFIG['api_url'],
    headers= {
        "X-API-Key": CONFIG['api_key']
    }
)

falcon_kwargs = {
    "max_new_tokens": 300,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": 0.95,
    "do_sample": False,
    "return_full_text": False,
    "temperature": 0.2,
}
llm.model_kwargs = falcon_kwargs

retriever = AmazonKendraRetriever(
    index_id=CONFIG['kendra_index'],
    region_name=CONFIG['region']
)


def print_qa(result):
    bold, unbold = "\033[1m", "\033[0m"
    print(f'{bold}Answer{unbold}: {result["result"]}\n\n{bold}Sources:{unbold}')
    for doc in result['source_documents']:
        print(f'''\n{doc.metadata["title"]}\n{doc.metadata["source"]}\n{doc.metadata["excerpt"]}\n''')


---

Next let's try some basic prompting capabilties, we start by creating a prompt template to wrap around your user input using the `PromptTemplate` class,  We then load both the Prompt template and Kendra index into a RetrievalQA chain.

For advanced users, feel free to try out different prompt templates and see how that can affect the overall generated completions!

---

In [None]:
prompt_template = """
The following is a friendly conversation between a human and an AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it
does not know.
{context}
Instruction: Based on the above documents, provide a detailed answer for, {question} Answer "don't know" if not present in the document. Solution:
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(
    llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs=chain_type_kwargs,
    return_source_documents=True
)

---

Now that we have the chain set up, lets try some prompts! The kendra index was loaded with AWS Whitepapers, so let's ask it some AWS related questions!

---

In [None]:
print_qa(qa("What is SageMaker?"))

In [None]:
print_qa(qa("When should I choose single-region vs multi-region architecture"))

---

Awesome! as you can see, the chain passes the user query to Kendra, which returns a list of documents it thinks is most relevant to the user question. We then use the Large Langage Model to help extract information from the documents to provide a more concise answer!

---

### Adding Memory

---

It's great that we already have a working chain that can respond to queries, but each query is treated as a single question, what happens if a user asks a follow up question? For that, we need to give the chain some Short-Term Memory!

In this example, lets use DynamoDB as a memory store to cache chat history, we first begin creating a DynamoDB table to store this information

---

In [None]:
dynamodb = boto3.resource('dynamodb', region_name=CONFIG['region'])

try:
    table = dynamodb.create_table(
        TableName="SessionTable",
        KeySchema=[{"AttributeName": "SessionId", "KeyType": "HASH"}],
        AttributeDefinitions=[{"AttributeName": "SessionId", "AttributeType": "S"}],
        BillingMode="PAY_PER_REQUEST",
    )
    table.meta.client.get_waiter("table_exists").wait(TableName="SessionTable")
except dynamodb.meta.client.exceptions.ResourceInUseException:
    print("DynamoDB Already Exists")
    pass

---

Once the table is ready, we will use the `DynamoDBChatMessageHistory` class to apply an external data store for the `ConversationBufferMemory`

We can then create a `ConversationalRetriverChain` chain using our previously create llm 

---

In [None]:
qaHistory = DynamoDBChatMessageHistory(table_name="SessionTable", session_id="2")

qaMemory = ConversationBufferMemory(
    memory_key="chat_history", input_key='question', output_key='answer', chat_memory=qaHistory, return_messages=True
)

qa = ConversationalRetrievalChain.from_llm(
    llm, 
    retriever, 
    memory=qaMemory,
    return_source_documents=True,
    verbose=False
)

def question(input):
    bold, unbold = "\033[1m", "\033[0m"
    result = qa({"question": input})
    print(f'{bold}Answer{unbold}: {result["answer"]}\n\n{bold}Sources:{unbold}')
    for doc in result['source_documents']:
        print(f'''\n{doc.metadata["title"]}\n{doc.metadata["source"]}\n{doc.metadata["excerpt"]}\n''')


---

Now let's try asking it some information like before


---

In [None]:
question("what is SageMaker")

---

But this time, we try a follow up question that may be missing some context

---

In [None]:
question("What capabilities does it have?")

---

We see that it has the ability to take previous conversations, and apply it as context to the follow up question!

---

### External Tools

---

We can apply this with not just your own data, but through external tools as well, for example using wikipedia to pull information. Remember though, each LLM has a context window limit that restricts how much information you can pass into the prompt!

---

In [None]:
wikipedia = WikipediaAPIWrapper(
    top_k_results=1,
    doc_content_chars_max=500
)

tools = [
    Tool(
        name="Wiki",
        description="useful for finding general information, use this",
        func=wikipedia.run,
    )
]

agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [None]:
agent.run("When did AWS first launch?")