### RAG Pipeline for Question Answering

In a Retrieval-Augmented Generation (RAG) pipeline for Q&A, the process combines the strengths of information retrieval and large language models. First, a user question is taken as input and transformed into an embedding (dense vector) or keywords (sparse representation). Next, relevant documents or passages are retrieved from a knowledge base using similarity search. These retrieved chunks of information are then passed to the language model, which generates a final, context-aware answer. This approach ensures that responses are grounded in external knowledge, improving accuracy and reliability compared to relying on the model alone.

Basic rag pipeline:

![Rag pipeline](../../data/images/basic_rag_pipeline.PNG)

# Steps for a Simple Q&A RAG Pipeline

1. **Input the Question**  
   - Take the user’s query as input (e.g., "What is sparse retrieval?").

2. **Embed the Question**  
   - Convert the query into a vector representation using an embedding model.

3. **Retrieve Relevant Documents**  
   - Search the vector store (or BM25 for sparse retrieval) to fetch the most relevant document chunks.

4. **Prepare the Context**  
   - Collect and format the retrieved text passages to provide grounding for the model.

5. **Generate the Answer**  
   - Pass the user’s question along with the retrieved context to a language model.

6. **Return Final Response**  
   - The model outputs a concise and context-aware answer to the user’s query.

### Pre-requisites

Ollama setup locally
1. Download ollama from https://ollama.com/
2. On command line
 a. ollama     # test ollama installed correctly

![Ollama setup1](../../data/images/ollama_setup1.PNG)

 b. ollama pull <model you want to pull>

 ![Ollama setup1](../../data/images/ollama_setup2.PNG)






In [9]:
# imports and setup
import sys
import os
import json
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceBgeEmbeddings
import warnings
from langchain.schema import Document
warnings.filterwarnings('ignore')

project_root = os.path.abspath(os.path.join("..", ".."))
sys.path.append(project_root)

topk = 3
MODEL_NAME = "intfloat/e5-base-v2"
vec_store_path = os.path.join(project_root, "data", "vector_store", "faiss_index")
# load the embedding model  
embeddings = HuggingFaceBgeEmbeddings(model_name=MODEL_NAME, encode_kwargs={"normalize_embeddings": True})

vectorstore = FAISS.load_local(vec_store_path, embeddings, allow_dangerous_deserialization=True)

# dense retrieval
def dense_search(query, k=3):
    results = vectorstore.similarity_search_with_score(query, k=k)
    return results
keys = vectorstore.docstore._dict.keys()  # _dict is internal dict


query = "recipe of banana cake and all the ingredients required"
results_sparse = dense_search(query, k=topk)

context = "\n".join([doc.page_content for doc, _ in results_sparse])
print(f"Context: {context}")

prompt = f"Answer the question based on the context below. Format the reponse in clear steps and tabular format\n\nContext: {context}\n\nQuestion: {query}\nAnswer:"

Context:  BANANA CAKE Directions Step 1 Heat oven to 180C/160C fan/gas 4. Step 2 Butter your tin and line the base and sides with baking parchment. Step 3 Mix the butter and sugar until light and fluffy, then slowly add the eggs with a little flour. Fold in the remaining flour, baking powder and bananas. Step 4 Pour into the tin and bake for about 30 mins until a skewer comes out clean. Cool in the tin for 10 mins, then remove to a wire rack. Step 5 Optional: Mix the icing sugar with 2-3 tsp water to make a runny icing. Drizzle the icing across the top of the cake and decorate with banana chips. PREP:  15  MINS COOK TIME:  40  MINS CUT INTO:  8-10  SLICES
Ingredients:1 tsp, 2 very ripe, mashed, 50g icing sugar, handful dried chips, for decoration, Ingredients:, 140g , softened, plus extra, for the tin, 140g caster sugar, 2 large eggs, beaten, 140g flour,
 APPLE AND ALMOND DESSERT CAKE Directions Step 1 Preheat the oven to 170 C . Brush around the base of your tin with melted butter to 

In [10]:
import torch
from langchain.llms import Ollama

CHAT_MODEL = "llama3"
llm = Ollama(model=CHAT_MODEL)
answer = llm(prompt, temperature=0.02, max_tokens=1000, top_p=0.95)

print(f"Answer: {answer}")

Answer: Here is the answer in a clear step-by-step format with a table:

**Banana Cake Recipe**

**Step-by-Step Instructions:**

1. Heat oven to 180C/160C fan/gas 4.
2. Butter your tin and line the base and sides with baking parchment.
3. Mix the butter and sugar until light and fluffy, then slowly add the eggs with a little flour. Fold in the remaining flour, baking powder, and bananas.
4. Pour into the tin and bake for about 30 minutes until a skewer comes out clean. Cool in the tin for 10 minutes, then remove to a wire rack.
5. Optional: Mix the icing sugar with 2-3 tsp water to make a runny icing. Drizzle the icing across the top of the cake and decorate with banana chips.

**Ingredients:**

| Ingredient | Quantity |
| --- | --- |
| Butter | 140g, softened (plus extra for greasing) |
| Sugar | 140g caster sugar |
| Eggs | 2 large eggs, beaten |
| Flour | 140g flour |
| Bananas | 2 very ripe, mashed |
| Icing sugar | 50g icing sugar |
| Banana chips | Handful, for decoration |

**Pr