In [1]:
# !pip install python-dotenv langchain langchain-nvidia-ai-endpoints langchain-core pydantic

## Enhancing RAG Results with QuantexAI: Advanced Prompting Techniques for Developers

In the rapidly advancing domain of AI, keeping up to date with the latest research and methodologies can be a daunting task for users and developers alike. A common challenge within AI systems is their tendency to "hallucinate" or generate inaccurate information when provided with insufficient context or data. This issue underscores the importance of innovative solutions to enhance the accuracy and reliability of AI-generated content.

### Introduction to Retrieval Augmented Generation (RAG)

To mitigate the problem of hallucinations in AI, the concept of Retrieval Augmented Generation (RAG) has been introduced. RAG addresses this issue by enabling AI models to query additional information relevant to the given question or prompt. This process enriches the AI's response with a broader context, potentially improving the accuracy and relevance of the generated output.

However, the efficacy of RAG is contingent upon the quality and relevance of the information retrieved during the query process. A notable challenge arises when the required information is not successfully retrieved, often due to the imprecision in the query mechanism itself. This issue is particularly prevalent when users are unsure of the exact keywords or phrases stored within the Vector Database, which typically relies on distance-based or similarity search algorithms.

### Advanced Prompting Techniques

Recognizing these challenges, QuantexAI introduces a suite of advanced prompting techniques designed to significantly enhance the probability of retrieving the most relevant information through RAG. Our approach combines two innovative RAG techniques: MultiQuery Retrieval and Step-back Prompting.

In [2]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
from dotenv import load_dotenv

load_dotenv()

## Store your API key in a .env file in the root directory of this project, or set os.environ['NVIDIA_API_KEY'] to your API key

True

In [3]:
# The LLM we are using is the "mixtral_8x7b" model, which is one of the best open-source model available
model = ChatNVIDIA(model="mixtral_8x7b")

## MultiQuery Retrieval

The process of data retrieval in AI systems, particularly in the context of RAG, is highly sensitive to the specific wording of queries. Subtle variations in query phrasing can lead to significantly different results, which poses a challenge in consistently retrieving the most relevant information.

MultiQuery Retrieval extends the traditional single-query approach by simultaneously deploying multiple queries with variations in phrasing and context. This method increases the coverage of potential matches within the Vector Database, thereby increasing the likelihood of retrieving the most relevant information related to the user's request.

### Automated Prompt Tuning with Language Models
MultiQuery Retrieval leverages Large Language Models (LLMs) to automate the creation of varied queries based on the original prompt. This process involves generating multiple versions of the query from different perspectives, each designed to explore a unique angle or aspect of the information sought. By doing so, MultiQuery Retrieval broadens the scope of the search, enhancing the probability of matching with the relevant data stored within the Vector Database.

### Diverse Perspectives and Keywords
The key to MultiQuery Retrieval's effectiveness lies in its ability to use unique keywords and perspectives in the generated queries. This diversity ensures a wider coverage in the search process, increasing the chances of retrieving information that is most pertinent to the user's query. By systematically exploring a range of phrasings and contexts, MultiQuery Retrieval mitigates the limitations of conventional, single-query searches, which may miss critical information due to the constraints of specific wording.

Source: https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever

In [4]:
from typing import List

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field

# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    """
    A Pydantic model to represent a list of lines as output from the LLM.

    Attributes:
        lines (List[str]): A list of strings, each representing a line of text.
    """
    lines: List[str] = Field(description="Lines of text")

# Custom output parser to convert LLM text output into a structured LineList
class LineListOutputParser(PydanticOutputParser):
    """
    An output parser that extends PydanticOutputParser to parse LLM output text into a structured LineList object.

    Methods:
        parse(text: str) -> LineList:
            Parses a given text string into a LineList object by splitting the text into lines.
    """
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)

# Initialize the output parser
output_parser = LineListOutputParser()

# Define the prompt template for generating multi-query prompts
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""As an AI language model assistant, your primary objective is to enhance the user's ability to retrieve pertinent documents from a vector database through a distance-based 
                similarity search. To achieve this, create three distinct versions of the user's original question, focusing on using different wordings for each version. 
                Each version should offer a unique perspective or rephrasing, aiming to capture a broader range of potential search results by overcoming the inherent limitations of keyword-based searches in vector databases. Ensure that these alternative questions 
                are diverse in their phrasing and focus, yet remain true to the essence of the original query. Present the rephrased questions one after the other, separated by newlines.
                Original question: {question}""",
)

# Demonstrates the use of the multiquery_chain to process a question, generate multiple queries, and parse the output. Model is defined as mixtral_8x7b hosted by NVIDIA
multiquery_chain = QUERY_PROMPT | model | LineListOutputParser()

# Example usage
res = multiquery_chain.invoke({"question":"Latest development to query information from a vector database"})

res.lines

['1. What are the most recent advancements in extracting data from a vector database?',
 '2. How can one access the newest features for querying information within a vector database?',
 '3. In what ways have techniques for retrieving data evolved in vector databases recently?']

## Step-Back Prompting: Leveraging High-Level Concepts for Enhanced LLM Reasoning

Step-back Prompting, a technique developed by Google Deepmind, involves strategically restructuring the prompt to include broader context or alternative formulations of the query. Before diving into the problem-solving process, the model is prompted to consider and articulate the underlying concepts and principles relevant to the query.

### Chain-of-Thought (CoT) and Its Limitations
While the Chain-of-Thought (CoT) prompting method has been recognized for improving AI model performance by outlining intermediate steps in reasoning, errors may occur within these intermediate steps, potentially leading to inaccurate conclusions. Step-back Prompting addresses this issue by reinforcing the model's foundational understanding before engaging in the step-by-step reasoning process.

<p align="center"><b>Step-back prompting technique</b><br>
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*C-M88KonAuQhIGg5rlos-g.png" alt="Step-back prompting technique"></p>

<p align="center"><b>Comparison of performance among Base Models, CoT, and Step-back prompting</b><br>
<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*Jdv3FM1d625TJ7yDX4iHnQ.png" alt="Comparison of performance among Base Models, CoT, and Step-back prompting"></p>

Source: https://python.langchain.com/docs/templates/stepback-qa-prompting, https://arxiv.org/pdf/2310.06117.pdf, https://cobusgreyling.medium.com/a-new-prompt-engineering-technique-has-been-introduced-called-step-back-prompting-b00e8954cacb

In [5]:
# Few-shot examples provided to guide the LLM in understanding the task
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]

# Transforming the examples into a format that can be used in chat-based prompting
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

# Creating a few-shot prompt template with the provided examples
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

# Defining the main prompt template that incorporates few-shot examples to guide the LLM
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert at world knowledge. Your task is to step back "
            "and paraphrase a question to one more generic step-back question, which "
            "is easier to answer and gather foundational knowledge. Ensure only ONE generic step-back question is returned. ONLY RETURN step-back question."
            "Here are a few examples:",
        ),
        # Integrating few-shot examples
        few_shot_prompt,
        # Placeholder for new question
        ("user", "{question}"),
    ]
)

# Configuring the pipeline to generate step-back questions from the original queries
question_gen = {"question":RunnablePassthrough()} | prompt | model | StrOutputParser()

# Generating step-back questions for each line in the previously obtained results
stepback_answers = question_gen.batch(res.lines)
stepback_answers

['what are the latest developments in vector database technology?\n\n(This question is more generic and focuses on the overall technology of vector databases, rather than specifically on data extraction methods.)',
 'what are the ways to obtain the most recent features for querying data in a vector database?',
 'how have recent developments impacted data retrieval techniques in vector databases?']

## Refine and Summarize: Streamlining Information Presentation
In the final stage of leveraging advanced prompting techniques with Retrieval Augmented Generation (RAG), the focus shifts towards the effective organization of information based on the information retrieved. This ensures that the vast array of data, whether sourced from a Vector Database, online searches, or other repositories, is coherently synthesized for the user.

### The Refine Chain Process
The refine chain process begins with generating an answer based on the first document retrieved. Subsequently, this initial answer undergoes a series of refinements with each additional document. This iterative approach allows for the gradual enhancement of the answer, incorporating broader perspectives and finer details with each step.

Source: https://python.langchain.com/docs/use_cases/summarization#option-3.-refine

In [6]:
from langchain.tools import DuckDuckGoSearchRun
from langchain.docstore.document import Document
import time

# Initialize the DuckDuckGo search functionality
search = DuckDuckGoSearchRun()

# Combine the original queries and the generated step-back questions for searching
queries_to_search = res.lines + stepback_answers

queries_to_search

['1. What are the most recent advancements in extracting data from a vector database?',
 '2. How can one access the newest features for querying information within a vector database?',
 '3. In what ways have techniques for retrieving data evolved in vector databases recently?',
 'what are the latest developments in vector database technology?\n\n(This question is more generic and focuses on the overall technology of vector databases, rather than specifically on data extraction methods.)',
 'what are the ways to obtain the most recent features for querying data in a vector database?',
 'how have recent developments impacted data retrieval techniques in vector databases?']

In [7]:
# search.batch(queries_to_search)

# Due to DuckDuckGo's rate limitations, a batch search isn't feasible directly
# Instead, perform searches one by one, with a delay to avoid rate limiting
search_results = []
for query in queries_to_search:
    search_results.append(Document(page_content=search.run(query)))
    time.sleep(5)
    
search_results

[Document(page_content='The emergence of semantic search Sentence Embeddings: Enhancing search relevance Vector Database and Pinecone Step-1: Create a Pinecone Index Step-2: Loading Data into the index Step-3: Query the index Question answering and semantic search with GPT-4 Introduction The adoption of Vector Search and Vector Database Algorithms is witnessing a surge across various industries, revolutionizing how organizations leverage and interact with their data. E-commerce Personalization: Online retailers are leveraging Vector Search to enhance product recommendations. Vector databases have gained significant importance in various fields due to their unique ability to efficiently store, index, and search high-dimensional data points, often referred to as... Source: Pinecone : The database indexes vectors using algorithms like PQ (Product Quantization), LSH (Locality-Sensitive Hashing), or HNSW (Hierarchical Navigable Small World). This step maps vectors to data structures for fas

In [8]:
from langchain.chains.summarize import load_summarize_chain

# Load the summarization chain designed for refining and summarizing information
chain = load_summarize_chain(model, chain_type="refine")

# Invoke the summarization chain with the search results to obtain a refined summary
res = chain.invoke(search_results)

res['output_text']

"Databases have significantly evolved over the past half-century, extending and enhancing their capabilities to ingest, capture, snapshot, query, analyze, and now even understand high-dimensional vector data. Vector databases play a crucial role in enabling data-driven decision-making processes by providing efficient storage, retrieval, and analysis capabilities for such data.\n\nThe process begins with creating embeddings or vectors using a model. These models can be free and open-sourced, or they can be created by calling API endpoints provided by various platforms. A vector database stores data as vector embeddings, which is a technique of translating data into numerical vectors that capture essential properties of the data. These vectors are stored in a high-dimensional space, where each dimension represents a unique feature of the data. This allows the database to perform low-latency retrieval of data, significantly improving data interaction across industries.\n\nThe adoption of 