# Preview

Retrieval Augmentation Generation (RAG) is a technique that leverages the natural language understanding capability of large language models (llm) and more traditional techniques such as keyword search and semantic search to provide the llm knowledge about a specific topic, knowledgebase or documents that it has not been trained on. With this we can use an llm to query about our domain specific dataset/documents without training the llms. With the help of frameworks such as langchain, the implementation of a simple end to end rag solution can be done in very few lines!

This tutorial shows you how to run a basic Retrieval Augmented Generation (RAG) system using langchain and openai. It follows this langchain tutorial: https://python.langchain.com/v0.2/docs/tutorials/rag/

Prerequisite:
- Follow the README instruction in the base directory.
- Generate an api key and save it into your .env's OPENAI_API_KEY variable.

# Running and end to end rag using contents from the web

### 1. Import necessary dependencies

In [5]:
# Import necessary dependencies
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate

### 2. Initialize your llm instance

Initialize an llm using a generated api key generated from the OpenAI's web portal. The code cell below will prompt for the key.

In [7]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125", max_tokens="256", temperature=0.0)

In [4]:
response = llm.invoke("Hi")

In [7]:
response

AIMessage(content='Hello! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 8, 'total_tokens': 17}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-bd2bf42c-1fcb-4d1b-9f05-e1d36f304a3e-0', usage_metadata={'input_tokens': 8, 'output_tokens': 9, 'total_tokens': 17})

In [10]:
response.response_metadata

{'token_usage': {'completion_tokens': 9,
  'prompt_tokens': 8,
  'total_tokens': 17},
 'model_name': 'gpt-3.5-turbo-0125',
 'system_fingerprint': None,
 'finish_reason': 'stop',
 'logprobs': None}

In [1]:
import sys
sys.path.append("..")

In [6]:
from rag.ingest import PdfIngestor
ingestion_config = {
        "pdf":{
            "doc_path": "docs/TayXueHao-Resume.pdf",
            "splitter": {
                "recursiveCharacterTextSplitter": {
                    "chunk_size": 200,
                    "chunk_overlap": 0
                }
            },
            "embedding":{
                "openAI": {
                    "model_name": "text-embedding-3-small",
                }
            }
        }
}
db_config = {"vectordb": "chroma"}
ingestor = PdfIngestor(ingestion_config, db_config)
vectordb = ingestor()



In [6]:
vectordb.search(search_type="similarity", query="Updates", )

[Document(page_content='Intel Corporation\nGraduate Trainee Focusing on AI Engineering\nDevelop LLM RAG for internal data 1.\nAI Compiler, model optimization. 2.\nUpdated in April’24ExcellentExcellent\nGood', metadata={'page': 1, 'source': 'c:\\Users\\USER\\adam-ai-poc\\llm-base\\rag\\../docs/TayXueHao-Resume.pdf'}),
 Document(page_content='Completed 3 different Natural language Processing gigs:\n1. Fine-Tuning Large Language Models (LLM) with mental health  \nq&a dataset.\n2. Format data from raw text to JSON to prepare for fine-tuning.\n3. Semi-supervised learning with Twitter comment dataset to\npredict users’ MBTI personality with 20:80 labeled and unlabled\ndata.\nCore Skills\nExperienced : \nPython / Java / C++\nMachine Learning / Deep Learning\nData Analytics / Data Science\nScikit Learn\nPytorch / Tensorflow\nPandas / Numpy\nMatplotlib / Tableau\nSQL / AWS S3\nReinforcement Learning \nGithub/Git\nLearning :\nDeployment / CICD pipelines\nDocker / Flask / RestfulAPI\nAWS ECR / EC

In [4]:
from rag.agent import OpenaiAgent
agent = OpenaiAgent(max_tokens=1, debug=True)
agent("Hi")

Query: Hi
Usage metadata: 
Tokens Used: 9
	Prompt Tokens: 8
	Completion Tokens: 1
Successful Requests: 1
Total Cost (USD): $5.5e-06
Response metadata: 
{
 "token_usage": {
  "completion_tokens": 1,
  "prompt_tokens": 8,
  "total_tokens": 9
 },
 "model_name": "gpt-3.5-turbo-0125",
 "system_fingerprint": null,
 "finish_reason": "length",
 "logprobs": null
}


'Hello'

In [4]:
import os
print(os.environ["LANGCHAIN_API_KEY"])

...


### 3. Load our document of choice

For this demo, we will use langchaim's built-in webscraper to load contents from the web and store it as a Document() object which is used by langchain to perform further processing.

In [1]:
# Import necessary dependencies
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://python.langchain.com/v0.2/docs/tutorials/rag/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("main-wrapper mainWrapper_z2l0 docsWrapper_BCFX", "post-title", "post-header")
        )
    ),
)
docs = loader.load()
print(docs)

[Document(page_content='IntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a Simple LLM Application with LCELBuild a Query Analysis SystemBuild a ChatbotConversational RAGBuild an Extraction ChainBuild an AgentTaggingdata_generationBuild a Local RAG ApplicationBuild a PDF ingestion and Question/Answering systemBuild a Retrieval Augmented Generation (RAG) AppVector stores and retrieversBuild a Question/Answering system over SQL dataSummarize TextHow-to guidesHow-to guidesHow to use tools in a chainHow to use a vectorstore as a retrieverHow to add memory to chatbotsHow to use example selectorsHow to map values to a graph databaseHow to add a semantic layer over graph databaseHow to invoke runnables in parallelHow to stream chat model responsesHow to add default invocation args to a RunnableHow to add retrieval to chatbotsHow to use few shot examples in chat modelsHow to do tool/function callingHow to best prompt for Graph-RAGInstallationHow to

- The retrieved docs is talking about LLM powered agents.

### 4. Ingest the document and store the chunks it into a vectordb

We recursively split the document into text chunks, which will then be used to create embeddings to represent the meaning of the chunks as vectors. To create the embeddings, we will need to use an embedding model that has been trained to understand sentences and convert them into embeddings or vectors. For this demo, we use the cheapest one that OpenAI offers which is `text-embedding-3-small`.

For this tutorial, we are using Chromadb as our vectordb to store the embeddings of our text chunks.

More about:
- vector db: https://www.cloudflare.com/learning/ai/what-is-vector-database/
- chromadb: https://www.trychroma.com/

In [35]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(model="text-embedding-3-small"))

### 5. Initialize our retriever

We will use chromadb's as_retriever function as our retriever to retrieve relevant contexts. k=6 will retrieve top 6 most relevant context chunks to the query.

In [53]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

# Test our retriever
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
print(retrieved_docs)
len(retrieved_docs)

[Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}), Document(page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree s

6

### 6. Create our prompt template

In [54]:
system_prompt = PromptTemplate.from_template("You are a helpful assistant. Please answer based on the contexts given: {context}. Question: {question}")

### 7. Chain everything together into one rag chain

We define our rag pipeline as a chain using Langchain Expression Language (LCEL). It involves retrieving the contexts relevant to the query, formatting the contexts, feeding the contexts into our prompt template and finally feeding the final prompt to our llm.

More about LCEL: https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel

In [55]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | system_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

"Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps, allowing an agent to better plan and execute the task effectively. It involves transforming big tasks into multiple manageable tasks, enabling a clearer interpretation of the agent's thinking process."

### Challenge: Use the rag chain as above with the following modifications:
- Load a pdf document stored in docs/. 
- Reduce the chunk size and overlap the document chunks. Observe the difference.
- Instruct the llm to answer in Chinese.
- Experiment your own resume and have fun with it!

In [5]:
from langchain_community.document_loaders import PyPDFLoader
PyPDFLoader()

In [10]:
print(text_splitter)
print()

<langchain_text_splitters.character.RecursiveCharacterTextSplitter object at 0x0000021BB4C300D0>


In [9]:
from langchain_community.document_loaders import PyPDFLoader

# LLM
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", max_tokens="256", temperature=0.0)

# Chunk / Ingest -> vectordb
loader = PyPDFLoader("../docs/TayXueHao-Resume.pdf")
pages = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=0)
splits = text_splitter.split_documents(pages)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings(model="text-embedding-3-small"))

# Retriever
retriever = vectorstore.as_retriever(
    search_type="similarity", 
    search_kwargs={"k": 6}
    )

# Prompt templates
system_prompt = PromptTemplate.from_template("You are a helpful assistant that screens resume. Please answer based on the contexts given: {context}. Question: {question}")

# Preprocessing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | system_prompt
    | llm
    | StrOutputParser()
)

# Query
rag_chain.invoke("What is xue hao's cgpa?")

"Xue Hao's CGPA is 4.00."

Hurrah! You managed to chain all the pieces of rag together! Good job and have a great day!