Breaking down the RAG process step by step

1. Document Loaders
2. splitters
3. Embeddings creation
4. Vectore store creation
5. Retrieval
6. Generation

Loading environmanement variables

In [5]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

In [7]:
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_PROJECT'] = 'test-rag'
os.environ['LANGCHAIN_API_KEY'] = os.getenv("LANGCHAIN_API_KEY")
os.environ['GROQ_API_KEY'] = os.getenv("GROQ_API_KEY")

1. Document Loader

Link: https://python.langchain.com/v0.2/docs/integrations/document_loaders/

In [11]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2024-07-07-hallucination/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

In [12]:
blog_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-07-07-hallucination/'}, page_content='\n\n      Extrinsic Hallucinations in LLMs\n    \nDate: July 7, 2024  |  Estimated Reading Time: 30 min  |  Author: Lilian Weng\n\n\nHallucination in large language models usually refers to the model generating unfaithful, fabricated, inconsistent, or nonsensical content. As a term, hallucination has been somewhat generalized to cases when the model makes mistakes. Here, I would like to narrow down the problem of hallucination to cases where the model output is fabricated and not grounded by either the provided context or world knowledge.\nThere are two types of hallucination:\n\nIn-context hallucination: The model output should be consistent with the source content in context.\nExtrinsic hallucination: The model output should be grounded by the pre-training dataset. However, given the size of the pre-training dataset, it is too expensive to retrieve and identify conflicts per g

2. Splitter

Link : https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

In [38]:
splits[0:2]

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-07-07-hallucination/'}, page_content='Extrinsic Hallucinations in LLMs\n    \nDate: July 7, 2024  |  Estimated Reading Time: 30 min  |  Author: Lilian Weng\n\n\nHallucination in large language models usually refers to the model generating unfaithful, fabricated, inconsistent, or nonsensical content. As a term, hallucination has been somewhat generalized to cases when the model makes mistakes. Here, I would like to narrow down the problem of hallucination to cases where the model output is fabricated and not grounded by either the provided context or world knowledge.\nThere are two types of hallucination:\n\nIn-context hallucination: The model output should be consistent with the source content in context.\nExtrinsic hallucination: The model output should be grounded by the pre-training dataset. However, given the size of the pre-training dataset, it is too expensive to retrieve and identify conflicts per generation.

3. Embeddings creation

Link : https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/

In [19]:
question = "What kinds of pets do I like?"
document = "My favorite pet is a cat."

In [20]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings
model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}
hf_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)
query_result = hf_embeddings.embed_query(question)
document_result = hf_embeddings.embed_query(document)
len(query_result)

384

In [21]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.902305234006825


4. Vector stores

Link : https://python.langchain.com/v0.2/docs/integrations/vectorstores/

In [22]:
from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(documents=splits, 
                                    embedding=hf_embeddings)

retriever = vectorstore.as_retriever()

In [23]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x000001BF53529970>)

5. Retrieval

Fetching the similar vectors

In [28]:
docs = retriever.get_relevant_documents("What is Task Hallucination?")


In [29]:
len(docs)

4

In [30]:
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-07-07-hallucination/'}, page_content='Fig. 3. The evaluation framework for the FactualityPrompt benchmark.(Image source: Lee, et al. 2022)\nGiven the model continuation and paired Wikipedia text, two evaluation metrics for hallucination are considered:\n\nHallucination NE (Named Entity) errors: Using a pretrained entity detection model and document-level grounding, this metric measures the fraction of detected named entities that do not appear in the ground truth document.\nEntailment ratios: Using a RoBERTa model fine-tuned on MNLI and sentence-level knowledge grounding, this metric calculates the fraction of generated sentences that are marked as relevant to the paired Wikipedia sentence by the entailment model.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-07-07-hallucination/'}, page_content='Or\n@article{weng2024hallucination,\n  title   = "Extrinsic Hallucinations in LLMs.",\n  auth

6. Generation

In [31]:
from langchain_groq import ChatGroq
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'))])

In [32]:
# LLM
llm = ChatGroq(model="llama3-8b-8192", temperature=0)

In [33]:
# Chain
chain = prompt | llm

In [34]:
# Run
chain.invoke({"context":docs,"question":"What is Hallucination?"})

AIMessage(content='According to the provided context, Hallucination in large language models refers to the model generating "unfaithful, fabricated, inconsistent, or nonsensical content".', response_metadata={'token_usage': {'completion_tokens': 34, 'prompt_tokens': 999, 'total_tokens': 1033, 'completion_time': 0.026204326, 'prompt_time': 0.168682456, 'queue_time': None, 'total_time': 0.194886782}, 'model_name': 'llama3-8b-8192', 'system_fingerprint': 'fp_179b0f92c9', 'finish_reason': 'stop', 'logprobs': None}, id='run-cfb868e1-5649-4eda-bee6-fb72a137c32c-0', usage_metadata={'input_tokens': 999, 'output_tokens': 34, 'total_tokens': 1033})

RAG Chains

Link: https://python.langchain.com/v0.1/docs/expression_language/get_started/#rag-search-example

In [35]:
from langchain import hub
prompt_hub_rag = hub.pull("rlm/rag-prompt")

In [36]:
prompt_hub_rag


ChatPromptTemplate(input_variables=['context', 'question'], metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))])

In [37]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_hub_rag
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is  Hallucination?")

'Hallucination in large language models refers to the model generating unfaithful, fabricated, inconsistent, or nonsensical content. It is a type of mistake where the model output is fabricated and not grounded by either the provided context or world knowledge.'