# Retrieval Augmented Generation with Amazon Bedrock and Couchbase

Retrieval Augmented Generation (RAG) is a machine learning approach that combines retrieval systems with large language models (LLMs) to generate output that has contextual information that can take advantage of private data and external knowledge sources.

#### Prepare Embeddings
![Embeddings](./RAG_pipeline_Couchbase_Bedrock.png)

Before being able to answer the questions, the documents must be processed and a stored in a document store index
- Create a numerical vector representation of each document using Amazon Bedrock Titan Embeddings model
- Create an vector index using the corresponding embeddings

#### Ask question
![Question](./RAG_runtime_Couchbase_Bedrock.png)

When the vector index is prepared, you are ready to ask the questions and relevant documents will be fetched based on the question being asked. Following steps will be executed.
- Create an embedding of the input question
- Compare the question embedding with the embeddings in the index
- Fetch the (top N) relevant document chunks
- Add those chunks as part of the context in the prompt
- Send the prompt to the model under Amazon Bedrock
- Get the contextual answer based on the documents retrieved

## Usecase
#### Dataset
In this example, you will use the Couchbase travel sample dataset.

## Implementation
In order to follow the RAG approach this notebook is using the LangChain framework where it has integrations with different services and tools that allow efficient building of patterns such as RAG. We will be using the following tools:

- **LLM (Large Language Model)**: Anthropic Claude available through Amazon Bedrock

- **Embeddings Model**: Amazon Titan Embeddings available through Amazon Bedrock

  This model will be used to generate a numerical representation of the hotel description and reviews

- **Vector Store**: Couchbase available through LangChain

- **Index**: VectorIndex
  The index helps to compare the input embedding and the document embeddings to find relevant document.

### Python 3.10

⚠️⚠️⚠️ For this lab we need to run the notebook based on a Python 3.10 runtime. ⚠️⚠️⚠️

If you carry out the workshop from your local environment outside of the Amazon SageMaker studio please make sure you are running a Python runtime > 3.10.

### Note
It is possible to choose other models available with Bedrock. You can replace the `model_id` as follows to change the model.

`llm = Bedrock(model_id="...")`

# Complete the prereuiste steps from the blog post

In [None]:
%pip install langchain==0.2.1 langchain-couchbase==0.1.1 langchain-aws==0.1.6 --quiet

Define helper print function

In [None]:
from io import StringIO
import sys
import textwrap


def print_ww(*args, width: int = 100, **kwargs):
    """Like print(), but wraps output to `width` characters (default 100)"""
    buffer = StringIO()
    try:
        _stdout = sys.stdout
        sys.stdout = buffer
        print(*args, **kwargs)
        output = buffer.getvalue()
    finally:
        sys.stdout = _stdout
    for line in output.splitlines():
        print("\n".join(textwrap.wrap(line, width=width)))

Create boto3 Bedrock client.

In [None]:
import boto3
import json
import os
import sys

session = boto3.session.Session()
bedrock_client = session.client('bedrock')

Create boto3 Bedrock runtime client for embedding and LLM model. 

In [None]:
bedrock_runtime_client = session.client('bedrock-runtime', region_name="us-east-1")
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain_aws import BedrockLLM
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# - create the Anthropic Model
llm = BedrockLLM(model_id="anthropic.claude-v2", 
              client=bedrock_runtime_client, 
              model_kwargs={
                  'max_tokens_to_sample': 200
              }, 
              callbacks=[StreamingStdOutCallbackHandler()])

# - create the Titan Embeddings Model
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-g1-text-02",
                                       client=bedrock_runtime_client)

<div class="alert alert-block alert-warning">
    <div>
        <p><b>Note</b></p>
        <p> Please add Couchbase Capella connection details to continue. </p>
    </div>
</div>

In [None]:
from langchain.chains.question_answering import load_qa_chain
from langchain_couchbase import CouchbaseVectorStore
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

COUCHBASE_CONNECTION_STRING = (
    "<Enter Couchbase endpoint URL>"
)
DB_USERNAME = "<Enter Couchbase Database Username>"
DB_PASSWORD = "<Enter Couchbase Database Password>"
DB_BUCKET = "travel-sample"
DB_SCOPE = "inventory"
DB_COLLECTION = "hotel"
SEARCH_INDEX_NAME="hotel-vector-search"

from datetime import timedelta
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions

auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options = ClusterOptions(auth)
cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)

# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))


<div class="alert alert-block alert-warning">
    <div>
        <p><b>Note</b></p>
        <p> Next block generates vector embedding for the travel-sample data and updates the Couchbase Capella cluster with the vector embeddings. It might take time to run. Please wait.</p>
    </div>
</div>

In [None]:
import couchbase.subdocument as SD
bucket = cluster.bucket(DB_BUCKET)
collection = bucket.scope(DB_SCOPE).collection(DB_COLLECTION)

result = cluster.query("SELECT META(h).id AS meta_id, h.city as city, h.name as name, h.description as description, ARRAY_COUNT(h.reviews) AS review_count FROM `travel-sample`.inventory.hotel h")
for row in result:
    if row['review_count'] > 0 :
        data_string=". city: "+ str(row['city'])
        data_string=data_string+". name: "+ row['name']
        data_string=data_string+". description: " + row['description']
        data_string=data_string+". review: "
        try:
            for i in range(row['review_count']):
                row_result=collection.lookup_in(row['meta_id'], [SD.get("reviews["+str(i)+"].content")])
                data_string=data_string+row_result.content_as[str](0)
            collection.mutate_in(row['meta_id'], [SD.upsert("data_string",data_string)])
            collection.mutate_in(row['meta_id'], [SD.upsert("vector_embedding",bedrock_embeddings.embed_query(data_string))])
        except:
            pass
    else:
        collection.mutate_in(row['meta_id'], [SD.upsert("data_string",'')])
        collection.mutate_in(row['meta_id'], [SD.upsert("vector_embedding",'')])
    
    

Define Couchbase as vector store.

In [None]:
vectorstore_cb = CouchbaseVectorStore(
    embedding=bedrock_embeddings,
    cluster=cluster,
    bucket_name=DB_BUCKET,
    scope_name=DB_SCOPE,
    collection_name=DB_COLLECTION,
    index_name=SEARCH_INDEX_NAME,
    text_key="data_string",
    embedding_key="vector_embedding",
    
)

In [None]:
query = "which hotel has best reviews in the city with golden gate?"

results = vectorstore_cb.similarity_search(query, k=5)
print_ww(results)

In [None]:
results = vectorstore_cb.similarity_search(
    query,
    search_options={"query": {"field": "metadata.city", "match": "San Francisco"}},
    k=5
)
print_ww(results)

In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """

Human: Use the following pieces of context to provide an answer to the question at the end with reason. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Assistant:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore_cb.as_retriever(
        search_type="similarity", search_kwargs={"k": 10}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT},
    callbacks=[StreamingStdOutCallbackHandler()]
)

In [None]:
query = "which hotel has negative reviews in San Francisco?"
result = qa.invoke({"query": query})

print(f'Query: {result["query"]}\n')
print_ww(f'Result: {result["result"]}\n')
print(f'\nContext Documents: ')
for srcdoc in result["source_documents"]:
      print_ww(f'{srcdoc.metadata}\n')

## Conclusion
Congratulations on completing this moduel on retrieval augmented generation! This is an important technique that combines the power of large language models with the precision of retrieval methods. By augmenting generation with relevant retrieved examples, the responses we recieved become more coherent, consistent and grounded. I'm sure the knowledge you've gained will be very useful for building creative and engaging language generation systems. Well done!

In the above implementation of RAG based Question Answering we have explored the following concepts and how to implement them using Amazon Bedrock and it's LangChain integration.

- Retrieving documents related to the question
- Preparing a prompt which goes as input to the LLM
- Present an answer in a human friendly manner

# Thank You