# Building a RAG application using Rafay Environment Manager and Jupyter Notebooks

The document describes how to implement a Retriever-Augmented Generation (RAG) pipeline in a Jupyter notebook setting. It utilizes LangChain to build the RAG pipeline, which is designed to answer questions based on the `Attention is all you need paper`` from Arxiv. The AWS Bedrock Large Language Model (LLM) is employed in conjunction with the Qdrant vector database for text-generation and document embeddings storage.

The infrastructure is provisioned using [Rafay Environment Manager](https://docs.rafay.co/env_manager/overview/). Rafay Environment Manager simplifies the process of setting up and managing cloud-based environments necessary for deploying sophisticated AI applications. It provides a streamlined approach for Development and DevOps teams, allowing them to quickly and efficiently create environments tailored for specific needs. At the same time, it ensures that Ops, SRE, and Platform teams have the necessary controls to enforce security, cost-efficiency, governance, and standardization. For more information refer to [docs](https://docs.rafay.co/env_manager/overview/).

Install the packages. 

In [None]:
!pip install boto3 langchain openai PyPDF2 pypdf qdrant-client tiktoken

In [None]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Qdrant
from langchain.embeddings import OpenAIEmbeddings
import boto3

from langchain.llms.bedrock import Bedrock
from langchain.chains.question_answering import load_qa_chain
from qdrant_client import QdrantClient

Configure environment variables. Rafay environment manager populates these variables for you, this can be configured from environment manager application.

In [None]:
qdrant_url = os.environ.get('QDRANT_URL')
open_ai_secret_prefix = os.environ.get('OPEN_AI_SECRET_PREFIX')

Helper functions for fetching the secrets from `AWS Secrets Manager`.

In [None]:

def get_secret(secret_prefix):
    client = boto3.client('secretsmanager')
    secret_arn = locate_secret_arn(secret_prefix, client)
    secret_value = client.get_secret_value(SecretId=secret_arn)
    return secret_value['SecretString']
def locate_secret_arn(secret_tag_value, client):
    response = client.list_secrets(
        Filters=[
            {
                'Key': 'tag-key',
                'Values': ['Name']
            },
            {
                'Key': 'tag-value',
                'Values': [secret_tag_value]
            }
        ]
    )
    return response['SecretList'][0]['ARN']

## Ingestion

Load the pdf document, extract the content from the PDF documents and split it into small chunks. OpenAI embedding model is used to compute the embeddings for each chunk and then stored to Qdrant vector database. In the example below we ingest `Attention is all you need` paper from ArXiv - https://arxiv.org/pdf/1706.03762.pdf

In [None]:
# PDF document loading
loader = PyPDFLoader("https://arxiv.org/pdf/1706.03762.pdf")

# Load and split the document into small chunks.
docs = loader.load_and_split() 

# Using openai api compute embeddings for all the text chunks
open_ai_key = get_secret(open_ai_secret_prefix)
embeddings = OpenAIEmbeddings(openai_api_key=open_ai_key)

# Store embeddings into the Qdrant vector database.
qdrant = Qdrant.from_documents(docs,embeddings,url=qdrant_url, collection_name='rag', prefer_grpc=True)
print("Stored documents sucessfully to qdrant collection " + qdrant.collection_name)

## Retrieval

Initiate the retrieval process by querying the AWS Bedrock LLM, which leverages advanced language models for understanding and interpreting the context of the questions posed. The system utilizes LangChain's QA chain to efficiently map queries to relevant information extracted from PDF documents.
This integration ensures precise extraction of information, with the LangChain QA chain effectively pinpointing the most relevant data chunks stored in the Qdrant vector database. The retrieved information is then presented in a user-friendly format, providing accurate and contextually relevant answers to the posed queries.

In [None]:
# Retrieval configuration
retrieve_top_k = 3
temperature = 0.5
max_tokens = 200
retrieve_top_p = 0.5


rag_query = "What are transformers?"

# Compute the embedding for rag_query
open_ai_key = get_secret(open_ai_secret_prefix)
embeddings = OpenAIEmbeddings(openai_api_key=open_ai_key)

# Load the bedrock llm
bedrock_client = boto3.client("bedrock-runtime")
bedrock_llm = Bedrock(
    model_id="ai21.j2-ultra-v1",
    client=bedrock_client,
    model_kwargs={'temperature': temperature, 'maxTokens': max_tokens, 'topP':retrieve_top_p}
)

# Retrive the top k documents similar to the provided rag_query
client = QdrantClient(url=qdrant_url, prefer_grpc=True)
qdrant = Qdrant(client=client, collection_name='rag', embeddings=embeddings)
search_results = qdrant.similarity_search(rag_query, k=retrieve_top_k)

# Generate the response and present it to the user.
chain = load_qa_chain(bedrock_llm,chain_type="stuff")
results = chain({"input_documents": search_results, "question": rag_query}, return_only_outputs=False)
print(f"Question:\n {results['question']} \n")
print(f"Answer:\n {results['output_text']} \n")
print(f"Sources:\n")
print(f"------------------------------------\n")
for doc in results["input_documents"]:
    print(f"Page No: {doc.metadata['page']} \n")
    print(f"Source: {doc.metadata['source']} \n")
    print(f"Page Content: \n {doc.page_content} \n")
    print(f"------------------------------------\n")