Query Expansion for Optimizing RAG

In [22]:
import os
os.environ["LANGCHAIN_TRACING_V2"]="true"

In [23]:
import getpass
os.environ["LANGCHAIN_API_KEY"]=getpass.getpass()

Defining the Ollama Model

In [1]:
from langchain_ollama import OllamaLLM

llm = OllamaLLM(model="llama3.2:1b")

In [2]:
#Testing just in case
llm.invoke("What is a LLM?")

"A Large Language Model (LLM) is a type of artificial intelligence (AI) model that uses machine learning algorithms to process and generate human-like text. It's a subset of deep learning, which involves training neural networks on large datasets to learn patterns and relationships in language.\n\nLLMs are designed to understand the meaning of words, phrases, and sentences, and to generate coherent and contextually relevant responses. They can be used for a wide range of applications, such as:\n\n1. Chatbots: LLMs can be used to power virtual assistants, like Siri or Alexa, that can understand voice commands and respond accordingly.\n2. Content generation: LLMs can generate text articles, social media posts, and other types of content for businesses, researchers, and writers.\n3. Language translation: LLMs can translate languages in real-time, allowing people to communicate with each other who speak different languages.\n4. Summarization: LLMs can summarize long pieces of text into sho

Using the PDFLoader from Langchain

In [11]:
from langchain_community.document_loaders import PyPDFLoader

pdf_loader = PyPDFLoader("notes.pdf")

loaded_document = pdf_loader.load()

In [12]:
#Content can be accessed through indexing as follows
loaded_document[1].page_content

'in the operational environments where systems\nare  deployed.  Increased  consideration  of\noperational security risk earlier in the acquisition\nand  development  processes  provides  an\nopportunity to tune decisions to address security\nrisk  and  reduce  the  total  cost  of  operational\nsecurity.  This  book  provides  key  operational\nmanagement  approaches,  methodologies,  and\npractices for assuring a greater level of software\nand system security throughout the development\nand acquisition lifecycle. \nThis  book  contains  recommendations  to  guide\nsoftware  professionals  in  creating  a\ncomprehensive lifecycle process for system and\nsoftware  security.  That  process  allows\norganizations to incorporate widely accepted and\nwell-defined assurance approaches into their own\nspecific  methods  for  ensuring  operational\nsecurity of their software and system assets. It’s\nworth pointing out that the material in this book\nis applicable to many different types of sys

Using RecursiveTextSplitter to convert the loaded text into chunks

In [13]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size =100,
    chunk_overlap=50
)

In [14]:
all_splits = text_splitter.split_documents(loaded_document)

In [15]:
len(all_splits)

700

Defining Embedding Model

In [16]:
from langchain_community import embeddings

embedding_model = embeddings.OllamaEmbeddings(
    model="nomic-embed-text"
)

Defining ChromaDB (Vector Database)

In [17]:
from langchain_community.vectorstores import Chroma

vector_store = Chroma.from_documents(
    embedding=embedding_model,
    documents=all_splits
)

Defining type of search to be done and number of nearest neighbors 

In [18]:
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k":6}
)

In [19]:
#Testing number of relevant documents which will be retrieved
retriever.get_relevant_documents(
    "What is life-cycle assurance?"
)

  retriever.get_relevant_documents(


[Document(metadata={'page': 0, 'page_label': '1', 'source': 'notes.pdf'}, page_content='In This Chapter \n• 1.1 Introduction \n• 1.2 What Do We Mean by Lifecycle Assurance?'),
 Document(metadata={'page': 3, 'page_label': '4', 'source': 'notes.pdf'}, page_content='software lifecycle. \n1.2 What  Do  We  Mean  by  Lifecycle\nAssurance?'),
 Document(metadata={'page': 8, 'page_label': '9', 'source': 'notes.pdf'}, page_content='for implementation and sustainment. As a result,\nassurance must be planned across the lifecycle'),
 Document(metadata={'page': 8, 'page_label': '9', 'source': 'notes.pdf'}, page_content='lifecycle assurance [Mead 2010a]: \nApplication  of  technologies  and  processes  to'),
 Document(metadata={'page': 16, 'page_label': '17', 'source': 'notes.pdf'}, page_content='assurance. \n1.4 Addressing Lifecycle Assurance3 \n3.  Material  in  this  section  comes  from'),
 Document(metadata={'page': 0, 'page_label': '1', 'source': 'notes.pdf'}, page_content='Assurance \n• 1.4 A

In [24]:
#Importing prompt from Langchain hub
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")



In [25]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Function for query expansion

In [26]:
def expand_query(query):

    docs = retriever.invoke(query)
    context = format_docs(docs)

    #Generating answer based on initial context
    intermediate_answer = llm.invoke(
        f"Based on the following information, provide a concise answer:\n\n{context}"
    )

    #Expand the query
    expanded_query = f"{query} {intermediate_answer}"

    return expanded_query

Defining the RAG Chain

In [27]:
def rag_chain_function(question):

    #Normal Retrieval using the initial query
    expanded_question = expand_query(question)

    #Second retrieval using expanded query
    refined_docs = retriever.invoke(expanded_question)
    refined_context = format_docs(refined_docs)

    #Generating final answer
    final_answer = llm.invoke(
        f"Answer based on the following retrieved context:\n\n{refined_context}"
    )

    return final_answer

Testing the model

In [31]:
# Step 7: Execute Query Expansion RAG
query = "Summarize the Fly-By-Night case study."
final_response = rag_chain_function(query)

print("\nFinal Answer:", final_response)


Final Answer: Based on the provided context, here's an expanded version of the information:

As the demand for flights continues to grow, the industry is shifting towards more efficient and customer-centric approaches. For example, for a Fly-By-Night Airlines, it would be beneficial to develop along with a frequent flyer program.

To begin this process, capable devices in the targeted geographic area need to be implemented. This could include upgrading or introducing new technologies such as:

* High-speed internet connectivity
* Satellite communications
* Advanced customer relationship management (CRM) systems

One effective way to expand the number of passengers is by providing higher-quality service. This can be achieved through various initiatives, including:

* Investing in modernized aircraft and crew training programs
* Implementing advanced passenger experience features such as personalized check-in, mobile apps, and dedicated customer support teams
* Enhancing security protoc