<a href="https://colab.research.google.com/github/Jashkine/RAG_Pipeline/blob/main/Workbook_Exercises_Module_3_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Introduction to Multi-Query RAG

![](https://i.ibb.co/26x6qpD/Screenshot-2024-11-02-at-19-34-48.png)

### What is Multi-Query RAG?

In traditional RAG systems, a single query is used to retrieve relevant documents to feed into the language model. **Multi-Query RAG** improves upon this by:
- Generating multiple queries from a single user input.
- Retrieving documents for each query, which are then aggregated.
- This increases the diversity of the retrieved documents, leading to richer and more accurate responses from the model.

### Benefits of Multi-Query RAG:
1. **More Contextual Information**: By retrieving documents from multiple perspectives, the model gets access to more varied information.
2. **Improved Response Quality**: The model has more data to generate a better response, reducing the risk of hallucinations.

### How does Multi-Query RAG work?

1. **Initial Query Generation**: A single user question is split into multiple related queries.
2. **Parallel Retrieval**: Each query retrieves relevant documents or passages from a knowledge base.
3. **Fusion of Information**: The retrieved documents are combined and sent to the model to generate a response.

In the next part, we will implement this architecture step by step.


# 2. Multi-Query RAG Implementation (Part 1)

In this section, we will implement the first part of the Multi-Query RAG system.

### Key Steps:
1. **Query Decomposition**: Split a single user query into multiple queries.
2. **Retrieve Documents for Each Query**: Use embeddings and a vector database to retrieve relevant documents for each query.




In [None]:
!pip install openai
!pip install pinecone

In [None]:
import openai
import pinecone
import json

In [None]:
client = openai.OpenAI(api_key="YOUR_KEY")
pinecone_clinet = Pinecone(api_key="YOUR_API_KEY")
index_name = "faq-embeddings"
# Connecting to an Index
index = pinecone_clinet.Index(index_name)

In [None]:
def embedding_model(query, openai_client, model="text-embedding-3-small"):
  # Getting the embedding
  response = openai_client.embeddings.create(
      model=model,
      input=query
  )

  embedding = response.data[0].embedding
  return embedding

In [None]:
def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

In [None]:
system_prompt = {
                    "role": "system",
                    "content": f"""
                    You are a helpfull E-Commerce assistant helping customers with their general questions regarding policies and procedures when buying in our store.
                    Our store sells e-books and courses for IT professionals.
                    """,
                }

def prompt_builder(system_message, context):
  return system_message.format(context)

In [None]:
def candidates_generation(query, openai_client, n_candidates=5):

  system_prompt = ""

  messages = [{"role": "system", "content": system_prompt}]

  response = openai_client.client.chat.completions.create(
      model="gpt-4o",
      messages=messages,
      max_tokens=1500,
      response_format={ "type": "json_object" }
    )

  response_content = response.choices[0].message.content
  response_type = json.loads(response_content)
  return response_type

In [None]:
# Function to find the most similar FAQ in Pinecone
def retrieve_faq(query_embedding, index):
    query_result = index.query(
        queries=[query_embedding],
        top_k=top_k,
        include_metadata=True
    )
    return query_result['matches'][0]['metadata']['answer']

In [None]:
def combine_documents(retrieved_docs):
    return "\n\n".join(retrieved_docs)

In [None]:
def multi_query_rag_chatbot(query, openai_client, index):

    # Step 1: Get multi-representation
    candidates = False

    # Step 2: Retrieve the most relevant FAQ from Pinecone (for each candidate)
    relevant_docs = []
    for key, candidate in candidates.items():
      candidate_embedding = False
      best_match = False
      relevant_docs.append(best_match)

    # Step 3: Combine docs
    context = False

    # Step 4: Augment the query with context
    augmented_prompt = prompt_builder(system_prompt, context)

    messages = [{"role": "system","content": augmented_prompt},
                {"role": "user","content": query}]

    # Step 4: Use OpenAI to generate a response
    response = openai_client.client.chat.completions.create(
      model="gpt-4o",
      messages=messages,
      max_tokens=250
    )

    return response.choices[0].text

In [None]:
# Example usage
while True:
  query = input()
  response = multi_query_rag_chatbot(query)
  print(f"User: {query}")
  print(f"Bot: {response}")


# 4. What is Fusion RAG?

### Introduction to Fusion RAG

Fusion RAG improves upon traditional RAG by fusing multiple sources of information, such as different document types or databases. It’s similar to Multi-Query RAG but instead of using multiple queries, it uses multiple **sources** of data to retrieve relevant information.

### How Does Fusion RAG Work?
1. **Multi-source Retrieval**: Information is retrieved from multiple sources (databases, websites, document types).
2. **Context Fusion**: The retrieved information is combined and fed into the model.
3. **Response Generation**: The model generates a response using the fused context.

Fusion RAG is especially useful when we want to combine structured and unstructured data or information from different domains.


# 5. Exercise: Implement Fusion RAG

### Task:

In this exercise, you will implement a Fusion RAG system that retrieves documents in the same way we used in the multi-query RAG, but now, we rag each doc with reciprotial fusion rank.

![](https://i.ibb.co/xs33BNz/Screenshot-2024-11-02-at-19-38-53.png)

**Steps:**
1. Create multiple candidate (4 candidates) for the original user-query
2. Get top 5 documents for each candidate + original query
3. Run the reciprotial fusion ranking algorithm to re-rank all results
4. Take top 4 documents
5. Generate a reuslt based on the newly created context



In [None]:
def retrieve_faq_top_n(query_embedding, index, top_k=5):
    # TODO: Implement Top N Retrieval function

![](https://weaviate.io/img/blog/2023-01-03-hybrid-search-explained/RRF-calculation.png)

In [None]:
def reciprocal_rank_fusion(results, k=60, top_n=5):
    # TODO: Implement RRF function

In [None]:
def fusion_rag_chatbot(query, openai_client, index):

    # Step 1: Get multi-representation
    candidates = candidates_generation(query, openai_client, n_candidates=4)

    # Step 2: Retrieve the most relevant FAQ from Pinecone (for each candidate)
    relevant_docs = []
    for key, candidate in candidates.items():
      candidate_embedding = embedding_model(candidate, openai_client)
      best_match = False
      relevant_docs.append(best_match)

    # Step 3: Ranking
    ranked_docs = False

    # Step 4: Combine docs
    context = combine_documents(ranked_docs)

    # Step 5: Augment the query with context
    augmented_prompt = prompt_builder(system_prompt, context)

    messages = [{"role": "system","content": augmented_prompt},
                {"role": "user","content": query}]

    # Step 6: Use OpenAI to generate a response
    response = openai_client.client.chat.completions.create(
      model="gpt-4o",
      messages=messages,
      max_tokens=250
    )

    return response.choices[0].text

# 7. What is HyDE RAG?

### What is HyDE RAG?

HyDE (Hypothetical Document Embeddings) RAG takes a different approach from traditional RAG. Instead of retrieving real documents, HyDE generates **hypothetical** documents based on the user's query and then retrieves information that aligns with those hypothetical documents.

![](https://i.ibb.co/v3grFj3/Screenshot-2024-11-02-at-19-41-36.png)

### How Does HyDE RAG Work?
1. **Query Understanding**: The user query is transformed into a hypothetical document or context by the model.
2. **Document Retrieval**: The system retrieves documents that align with the generated hypothetical document.
3. **Response Generation**: The final response is generated based on the retrieved real documents and the hypothetical context.

This approach is useful when relevant documents might not exist in the knowledge base, but the hypothetical documents guide the retrieval process to related information.


# 9. HyDE RAG Implementation (Part 1)

We will implement the first part of HyDE RAG, where the model generates hypothetical documents based on the user's query.



In [None]:
# Step 1: Generate hypothetical document from query
def generate_hypothetical_document(query, openai_client):
    prompt = ""
    messages = [{"role": "system", "content": prompt}]

    response = openai_client.client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        max_tokens=1500,
    )

    return response.choices[0].text

In [None]:
# Example usage
query = "How do neural networks learn?"
hypothetical_doc = generate_hypothetical_document(query, openai_client)
print("Hypothetical Document:", hypothetical_doc)



# 10. HyDE RAG Implementation (Part 2)

In this part, we will retrieve documents based on the hypothetical context and generate the final response.



In [None]:
def hypo_chatbot(query, openai_client, index):

    # Step 1: Get multi-representation
    hypo_candidate = False

    # Step 2: Retrieve the most relevant FAQ from Pinecone
    candidate_embedding = embedding_model(hypo_candidate, openai_client)

    # Step 3: get real answer
    best_match = retrieve_faq(candidate_embedding, index)

    # Step 4: Augment the query with context
    augmented_prompt = prompt_builder(system_prompt, context)

    messages = [{"role": "system","content": augmented_prompt},
                {"role": "user","content": query}]

    # Step 5: Use OpenAI to generate a response
    response = openai_client.client.chat.completions.create(
      model="gpt-4o",
      messages=messages,
      max_tokens=250
    )

    return response.choices[0].text

In [None]:
# Example usage
response = hyde_rag_chatbot("How do neural networks learn?")
print("Generated Response:", response)