# Retrieval Augmented Generation

Retrieval Augmented Generation (RAG, 检索增强生成).

You can use Retrieval Augmented Generation (RAG) to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context. 

Especially if the data is private and should not be shared in the internet.

![RAG](assets/rag.jpg)

(image source from SageMaker doc)

LangChain provide many ways to retrieve data from outside of LLMs.  It provides an interface of `Retriever` that returns documents given an unstructured query. 

Retriever is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) it. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well, such as Amazon Kendra.

We will focus on Vector store-backed retriever in this notebook, but you can try others by yourself.

## Vector store-backed retriever

Basic Data follow and stages. We will talk about each components that are used.

![RAG Flow](assets/qa_flow.jpeg)

(Image from LangChain doc)

### Document Loader

In this notebook, we will still use questions about Amazon Bedrock as example. 

There is a good blog about Bedrock from AWS and we will try to load the document via WebBaseLoader. 

There are many other loaders (such as PDF, csv, Unstructured data loader) provided by LangChain, you can try that with different sources by yourself.

In [1]:
from langchain.document_loaders import WebBaseLoader
web_loader = WebBaseLoader("https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/")
data = web_loader.load()

In [2]:
# For some loaders such as PDF, you can load and split by pages.
len(data)

1

### Text Splitter

LangChain provides some TextSplitter out of the box, we will use RecursiveCharacterTextSplitter here. You can refer to LangChain doc for other types.

The purpose of splitter is to split a long document into smaller chunks.

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split a long document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", "."],
    chunk_size = 512,
    chunk_overlap  = 0,
    length_function = len,
    is_separator_regex = False,
)

In [4]:
docs = text_splitter.split_documents(data)

In [5]:
# check how many chunks
len(docs)

81

In [6]:
# Randomly pick one, you can open the blog from browser and find out where is this chunk in the blog.
docs[30]

Document(page_content='. Bedrock will offer the ability to access a range of powerful FMs for text and images—including Amazon’s Titan FMs, which consist of two new LLMs we’re also announcing today—through a scalable, reliable, and secure AWS managed service', metadata={'source': 'https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/', 'title': 'Announcing New Tools for Building with Generative AI on AWS | AWS Machine Learning Blog', 'language': 'en-US'})

### Text Embedding Model

If you are using OpenAI, you can simply use `OpenAIEmbeddings` which use the `text-embedding-ada-002` model by default.

In this notebook, we will use a small sentence transformers embedding model with 384 dimensions. You can try something else on your own, such as BGE etc.

In [8]:
from langchain.embeddings import HuggingFaceEmbeddings

embeddings_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [9]:
#  list of texts
# embeddings = embeddings_model.embed_documents(["Hello World!", "Who are you?"])

# a single piece of text
embeddings = embeddings_model.embed_query("Hello World!")
embeddings[:10]

[-0.020386869087815285,
 0.025280870497226715,
 -0.0005662219482474029,
 0.01161543931812048,
 -0.037988364696502686,
 -0.11998133361339569,
 0.04170951247215271,
 -0.02085716277360916,
 -0.05900677293539047,
 0.024232564494013786]

In [10]:
len(embeddings)

384

### Vector Store

There are many vector stores supported by LangChain, check https://python.langchain.com/docs/integrations/vectorstores/ for the full list.

In this notebook, we will use FAISS.  In the Appendix section, I also provide some example of using OpenSearch as the vector store.

In [None]:
# You will need to install FAISS first.
!pip install faiss-gpu # For CUDA 7.5+ Supported GPU's.
# OR
!pip install faiss-cpu # For CPU Installation

In [11]:
from langchain.vectorstores import FAISS
db = FAISS.from_documents(docs, embeddings_model)

In [12]:

query = "What is Bedrock?"
query_result = db.similarity_search(query)
# query_result = db.similarity_search_with_score(query)
query_result

[Document(page_content='. Bedrock also makes it easy to access Stability AI’s suite of text-to-image foundation models, including Stable Diffusion (the most popular of its kind), which is capable of generating unique, realistic, high-quality images, art, logos, and designs.', metadata={'source': 'https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/', 'title': 'Announcing New Tools for Building with Generative AI on AWS | AWS Machine Learning Blog', 'language': 'en-US'}),
 Document(page_content='One of the most important capabilities of Bedrock is how easy it is to customize a model. Customers simply point Bedrock at a few labeled examples in Amazon S3, and the service can fine-tune the model for a particular task without having to annotate large volumes of data (as few as 20 examples is enough). Imagine a content marketing manager who works at a leading fashion retailer and needs to develop fresh, targeted ad and campaign copy for a

In [13]:
# Maximum marginal relevance search (MMR) - 最大边际相关性检索
mmr_query_result = db.max_marginal_relevance_search(query, k=4, fetch_k=10)
mmr_query_result

[Document(page_content='. Bedrock also makes it easy to access Stability AI’s suite of text-to-image foundation models, including Stable Diffusion (the most popular of its kind), which is capable of generating unique, realistic, high-quality images, art, logos, and designs.', metadata={'source': 'https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/', 'title': 'Announcing New Tools for Building with Generative AI on AWS | AWS Machine Learning Blog', 'language': 'en-US'}),
 Document(page_content='One of the most important capabilities of Bedrock is how easy it is to customize a model. Customers simply point Bedrock at a few labeled examples in Amazon S3, and the service can fine-tune the model for a particular task without having to annotate large volumes of data (as few as 20 examples is enough). Imagine a content marketing manager who works at a leading fashion retailer and needs to develop fresh, targeted ad and campaign copy for a

In [14]:
# You can persist the data for later use.
db.save_local("faiss_index")

In [15]:
# After that, you can load from local. For that, you don't have to load the source and split again.
db = FAISS.load_local("faiss_index", embeddings_model)

### Retriever

We can simply use the vector store as retriever.

In [16]:

retriever = db.as_retriever(search_kwargs={"k": 4})
docs = retriever.get_relevant_documents("What FMs are supported by bedrock?")
docs

[Document(page_content='. Bedrock will offer the ability to access a range of powerful FMs for text and images—including Amazon’s Titan FMs, which consist of two new LLMs we’re also announcing today—through a scalable, reliable, and secure AWS managed service', metadata={'source': 'https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/', 'title': 'Announcing New Tools for Building with Generative AI on AWS | AWS Machine Learning Blog', 'language': 'en-US'}),
 Document(page_content='We took all of that feedback from customers, and today we are excited to announce Amazon Bedrock, a new service that makes FMs from AI21 Labs, Anthropic, Stability AI, and Amazon accessible via an API. Bedrock is the easiest way for customers to build and scale generative AI-based applications using FMs, democratizing access for all builders', metadata={'source': 'https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generativ

Here are more examples of defining the retrievers

## RetrievalQA Chain

Now we have define the vector based retriever, and we can use it with LLMs to generate outputs. We will still use Llama-2 as the LLM here.

In [17]:
import torch
from transformers import pipeline
from langchain.llms import HuggingFacePipeline

pipe = pipeline(
    task="text-generation",
    model="meta-llama/Llama-2-7b-chat-hf",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    temperature=0.0, 
    max_new_tokens=1024,
)

llm = HuggingFacePipeline(pipeline=pipe)

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.55s/it]


In [18]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type="stuff", verbose=True)


In [19]:
query = "What FMs are supported by bedrock?"

qa_chain.run(query=query)




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


' Based on the provided context, Bedrock supports FMs from AI21 Labs, Anthropic, Stability AI, and Amazon.'

Check out different types of chain type： https://python.langchain.com/docs/modules/chains/document/stuff

You can also add memery to the QA chain, for example

## Appendix - OpenSearch as Vector Store

This section is optional, only if you are interested in OpenSearch as vector store.

OpenSearch supports three different methods for obtaining the k-nearest neighbors from an index of vectors: 
- Approximate k-NN
- Script Score k-NN (exact k-NN search)
- Painless extensions 

Check https://opensearch.org/docs/latest/search-plugins/knn/index/ for more details

The Approximate k-NN search methods leveraged by OpenSearch use approximate nearest neighbor (ANN) algorithms from the nmslib, faiss, and Lucene libraries to power k-NN search. It is the best choice for searches over large indexes (that is, hundreds of thousands of vectors or more) that require low latency. 

In [None]:
from langchain.vectorstores import OpenSearchVectorSearch

For Amazon OpenSearch Service (AOS) 

In [None]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
import boto3

# Use aoss for Serverless and es for OpenSearch.
service = "aoss"
region = "us-west-2"

opensearch_url = "https://xxx.us-west-2.aoss.amazonaws.com"  # replace this url with your own
credentials = boto3.Session().get_credentials()
awsauth = AWSV4SignerAuth(credentials, region, service)

docsearch = OpenSearchVectorSearch.from_documents(
    docs,
    embeddings_model,
    opensearch_url=opensearch_url,
    http_auth=awsauth,
    timeout = 300,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection,
    index_name="test-vector-index",
    engine="faiss",
    bulk_size=512,
)

In [None]:
query_result = docsearch.similarity_search(
    query,
    k=4,
)
query_result

For OpenSearch Local

In [None]:
docsearch = OpenSearchVectorSearch.from_documents(
    docs,
    embeddings_model,
    opensearch_url="https://localhost:9200",
    http_auth=("admin", "admin"),
    use_ssl = False,
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False,
    bulk_size=512,
    index_name="test-vector-index1",

    engine="faiss",
    space_type="innerproduct",
    ef_construction=256,
    m=48,
)

In [None]:
# similarity_search using Approximate k-NN Search with Custom Parameters
query_result = docsearch.similarity_search(
    query,
    k=4,
)
query_result

Run `GET test-vector-index1` from Dev Tools in OpenSearch Dashboard