RAG Fusion with Local LLM.


Install python packages required for langchain

In [1]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain unstructured sentence-transformers



Set up os environment variables.
This is to enable visibility of tracing langchain invocations on smith.langchain.com

In [2]:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = "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"

Set up indexing for the vector store db

In [4]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import UnstructuredFileLoader
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.embeddings.sentence_transformer import (SentenceTransformerEmbeddings,)
loader = DirectoryLoader("./documents/markdown", glob="**/*.md", show_progress=True, loader_cls=UnstructuredFileLoader)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
split_docs = text_splitter.split_documents(documents)
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(documents=split_docs,
                                    embedding=embedding_function,
                                    persist_directory="./db")
retriever = vectorstore.as_retriever()

100%|██████████| 16/16 [00:02<00:00,  6.32it/s]
Created a chunk of size 554, which is longer than the specified 500
Created a chunk of size 551, which is longer than the specified 500
Created a chunk of size 660, which is longer than the specified 500
Created a chunk of size 699, which is longer than the specified 500
Created a chunk of size 586, which is longer than the specified 500
Created a chunk of size 758, which is longer than the specified 500
Created a chunk of size 770, which is longer than the specified 500
Created a chunk of size 504, which is longer than the specified 500
Created a chunk of size 564, which is longer than the specified 500
Created a chunk of size 642, which is longer than the specified 500
Created a chunk of size 562, which is longer than the specified 500
Created a chunk of size 605, which is longer than the specified 500
Created a chunk of size 643, which is longer than the specified 500
Created a chunk of size 661, which is longer than the specified 500


Query the vector store to retrieve query similar documents

In [5]:
query = "What is the code A_100?"
print(vectorstore)
docs = vectorstore.similarity_search(query)
for doc in docs:
    print(f"Document source: {doc.metadata}")
    print(f"Document page_content: {doc.page_content}\n")
    print(f"--------------------------------------------")

<langchain_community.vectorstores.chroma.Chroma object at 0x10569a410>
Document source: {'source': 'documents/markdown/general-system-failures.md'}
Document page_content: Feature: Error Codes

Scenario Outline: Authentication Error Codes
Given the user is not authenticated
When the user makes a request to a protected resource
Then the service should return an error response with status code 401
And the error response should contain error code A_100
And the error response should contain error message Authentication token is missing

--------------------------------------------
Document source: {'source': 'documents/markdown/general-system-failures.md'}
Document page_content: Scenario Outline: Authentication Error Codes
Given the user is not authenticated
When the user makes a request to a protected resource
Then the service should return an error response with status code 401
And the error response should contain error code A_101
And the error response should contain error message Authe

Generate multiple search queries based on the user's input question.

In [78]:
from langchain_community.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate

question = "What are the downstreams of purchase gateway?"

# RAG-Fusion: Related
template = """You are a helpful assistant that generates multiple sub-questions related to an input question.
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation.
Generate multiple search queries related to: {question}
The three queries are (3 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)
from langchain_core.output_parsers import StrOutputParser

def print_ouput(output):
    print(output)
    return output

generate_queries = (
        prompt_rag_fusion
        | ChatOllama(model="ghyghoo8/minicpm-llama3-2_5:8b")
        | StrOutputParser()
        | (lambda llm_response: llm_response.split("\n"))
        | (lambda queries: list(filter(lambda item: item.strip(), queries)))
)

queries_output = []
while len(queries_output) != 3:
    queries_output = generate_queries.invoke({"question":question})

print(queries_output)


['What is the process for completing a purchase through the purchase gateway?', 'What are the steps involved in making a purchase using the purchase gateway?', 'Can you provide me with information about the purchase gateway and how it facilitates transactions?']


Retrieve the related documents to the three queries provided by the LLM. 
Perform ranking of the retrieved documents.

In [33]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    fused_scores = {}
    for docs in results:
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            previous_score = fused_scores[doc_str]
            fused_scores[doc_str] += 1 / (rank + k)
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]
    return reranked_results

retrieval_chain = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain.invoke({"question":question})


print(f"Number of docs retrieved: {len(docs)}")
for doc in docs:
    print(f"Document score: {doc[1]}")
    print(f"Document source: {doc[0].metadata}")
    print(f"Document page content:")
    print(f"\t{doc[0].page_content}")
    print(f"--------------------------------------------\n")

Number of docs retrieved: 37
Document score: 1.4541558594176254
Document source: {'source': 'documents/markdown/product-and-catalogue-get-api.md'}
Document page content:
	X-Territory Y
the territory of the user

X-TraceId   N
used for request tracing in log

Success Response
HTTP/1.1 200 OK
X-TraceId: f81d4fae-7dec-11d0-a765-00a0c91e6bf6

Content-Type: vnd.availableProducts.v1+json
{
"products": [
{
"name": "Alpha",
"productId":"abc"
"priceInGBP":3,
"duration":"1M"
"voucherAvailability":"true"
},
{
"name": "Beta",
"productId":"def"
"priceInGBP":4,
"duration":"1W"
"voucherAvailability":"false"
}
--------------------------------------------

Document score: 1.0753439433238483
Document source: {'source': 'documents/markdown/content-access-permission-jwt.md'}
Document page content:
	Scenario Outline: Customer access based on role graph
Given the role graph is defined as:
"""
admin:
- editor
- viewer
editor:
- reviewer
- contributor

viewer:
- reader
- guest
"""
And the customer has the rol

In [83]:
from operator import itemgetter

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
        {"context": retrieval_chain,
         "question": itemgetter("question")}
        | prompt
        | ChatOllama(model="ghyghoo8/minicpm-llama3-2_5:8b")
        | StrOutputParser()
)
response = ''
while not response:
    response = final_rag_chain.invoke({"question":question})
print(response)

 Answer.
There is no information provided on the page that lists the downstreams of Purchase Gateway. This would typically be found in a flowchart or diagram within the "Purchase Gateway" section. It may also be found within specific documents linked from the Purchase Gateway section, but it is not included in the text content of this page.
