## Retrieval Strategies: Choosing the Right Approach

Once we have embeddings and a vector store, the next step is **retrieval** — deciding how to fetch the most relevant information for a query. There are multiple retrieval strategies, each with its strengths and weaknesses. Choosing the right one depends on your data, the type of queries, and the application.  

---

### 1. Dense Retrieval

**How it works:**  
Dense retrieval uses **embeddings** to represent text and queries as vectors in a high-dimensional space. Similarity metrics like cosine similarity or Euclidean distance are used to find the closest matches.

**Pros:**  
- Captures **semantic meaning** beyond exact keywords.  
- Works well with **paraphrased queries** or loosely related text.  
- Ideal for **knowledge retrieval** where context matters.  

**Cons:**  
- Embedding generation can be **computationally expensive**.  
- May require **GPU** for large datasets.  
- Can sometimes miss **rare keyword-specific matches**.  

**Use cases:**  
- Retrieving recipe steps based on intent, e.g., “quick chocolate dessert.”  
- Question-answering over unstructured text.  
- Semantic search where users don’t know exact phrasing.  

---

### 2. Sparse Retrieval (Keyword-Based)

**How it works:**  
Sparse retrieval uses traditional text-based methods like **BM25** or **TF-IDF**, which score documents based on keyword matches between the query and the text.

**Pros:**  
- Very **fast and lightweight**, can run on CPU.  
- Effective for queries with **exact keywords**.  
- Works well for **small to medium datasets**.  

**Cons:**  
- Does not understand **semantics or context**.  
- Cannot match paraphrased queries well.  

**Use cases:**  
- Searching for documents with specific ingredients, e.g., “vanilla extract.”  
- Indexing structured documents where keywords are precise and consistent.  
- Filtering large datasets quickly before semantic processing.  

---

### 3. Hybrid Retrieval

**How it works:**  
Hybrid retrieval combines **dense embeddings** and **sparse keyword matching**. Results from both methods are merged, often with weighting, to produce a ranked list of relevant documents.

**Pros:**  
- Balances **semantic understanding** with **exact matches**.  
- Reduces chances of missing critical keyword-specific information.  
- Can improve overall **retrieval accuracy**.  

**Cons:**  
- Slightly more complex to implement.  
- Requires tuning of weights to balance dense vs sparse scores.  

**Use cases:**  
- Large recipe PDFs where exact ingredient names are important but context also matters.  
- QA systems where you need **precision + semantic recall**.  
- Enterprise knowledge bases with mixed structured and unstructured content.  

---

### Choosing the Right Strategy

| Strategy        | Best for                                                   | Notes                                         |
|-----------------|------------------------------------------------------------|-----------------------------------------------|
| Dense           | Semantic search, paraphrased queries, context-rich data    | Slower, may need GPU, good for embeddings    |
| Sparse          | Exact keyword match, structured data, fast retrieval      | Lightweight, but misses semantic matches     |
| Hybrid          | Mixed requirements, both keywords and semantic meaning    | Offers best of both worlds, slightly complex |

---

**Key takeaway:**  
- If your queries are **flexible and intent-driven**, use **dense retrieval**.  
- If you need **exact matches or lightweight search**, use **sparse retrieval**.  
- If you want **high accuracy across both semantics and keywords**, go **hybrid** — this is often the optimal choice for real-world applications like recipe PDFs or enterprise knowledge bases.


In [5]:
# imports and setup
import sys
import os
import json
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceBgeEmbeddings
import warnings
from langchain.schema import Document
warnings.filterwarnings('ignore')

project_root = os.path.abspath(os.path.join("..", ".."))
sys.path.append(project_root)

MODEL_NAME = "intfloat/e5-base-v2"
vec_store_path = os.path.join(project_root, "data", "vector_store", "faiss_index")
# load the embedding model  
embeddings = HuggingFaceBgeEmbeddings(model_name=MODEL_NAME, encode_kwargs={"normalize_embeddings": True})

vectorstore = FAISS.load_local(vec_store_path, embeddings, allow_dangerous_deserialization=True)

# dense retrieval
def dense_search(query, k=3):
    results = vectorstore.similarity_search_with_score(query, k=k)
    return results

In [6]:
# Example search
query = "Give me cake recipe with chocolate and frosting"
results_dense = dense_search(query)
for doc, score in results_dense:
    print(f"Score: {score:.4f}\nContent: {doc.page_content}\n")

Score: 0.3450
Content:  CHOCOLATE CAKE WITH CHOCOLATE BUTTERCREAM FROSTING Directions Step 1 Preheat oven to 350º F. Prepare two 9-inch cake pans by spraying with baking spray or buttering and lightly fl ouring. Step 2 Add fl our, sugar, cocoa, baking powder, baking soda, salt and espresso powder to a large bowl or the bowl of a stand mixer. Whisk through fl our mixture until combined well. Step 3 Add milk, vegetable oil, eggs, and vanilla to fl our mixture and mix together on medium speed until well combined. Reduce speed and carefully add boiling water to the cake batter until well combined. Step 4 Distribute cake batter evenly between the two prepared cake pans. Bake for 30-35 minutes, until a toothpick or cake tester inserted in the center of the chocolate cake comes out clean. Step 5 Remove from the oven and allow to cool for about 10 minutes, removing from the pan and cooling completely. PREP:  55  MINS COOK TIME:  40  MINS TOTAL TIME:  1:35  MINS
Ingredients:1 teaspoon (1 gram) 

### Notes on Dense Retrieval Scores

- **Score meaning**: The `score` returned by `similarity_search_with_score` represents a **distance or similarity metric**, depending on your vectorstore implementation.  
- **FAISS**: Typically returns **distance**, so **lower values indicate more similar** results.  
- **Cosine similarity**: Some implementations return **cosine similarity**, where **higher values indicate more similar** results.  
- **Adjusting results**: You can modify the parameter `k` to retrieve more or fewer results.  

Using scores along with the retrieved chunks allows you to **rank or filter results** based on relevance, providing more control over your retrieval pipeline.


In [3]:
# Example sparse retrieval using BM25
import nltk
from rank_bm25 import BM25Okapi
from nltk.tokenize import word_tokenize
warnings.filterwarnings('ignore')
# Download tokenizer if not already
nltk.download('punkt', quiet=True)

def sparse_search(query, k=3):

    query_tokens = word_tokenize(query.lower())
    scores = bm25.get_scores(query_tokens)
    
    # Get top k indices
    top_k_indices = scores.argsort()[-k:][::-1]
    
    results = [(chunks[i], scores[i]) for i in top_k_indices]
    return results
# Get all keys in the docstore
keys = vectorstore.docstore._dict.keys()  # _dict is internal dict

# Extract chunks
all_docs = vectorstore.docstore._dict.values()  # all loaded FAISS docs
corpus = [doc.page_content.split() for doc in all_docs]
bm25 = BM25Okapi(corpus)
chunks = list(all_docs)

query = "recipe of banana cake"
results_sparse = sparse_search(query, k=3)

for text, score in results_sparse:
    print(f"Score: {score:.2f}\nContent: {text}\n")

Score: 4.60
Content: page_content=' BANANA CAKE Directions Step 1 Heat oven to 180C/160C fan/gas 4. Step 2 Butter your tin and line the base and sides with baking parchment. Step 3 Mix the butter and sugar until light and fluffy, then slowly add the eggs with a little flour. Fold in the remaining flour, baking powder and bananas. Step 4 Pour into the tin and bake for about 30 mins until a skewer comes out clean. Cool in the tin for 10 mins, then remove to a wire rack. Step 5 Optional: Mix the icing sugar with 2-3 tsp water to make a runny icing. Drizzle the icing across the top of the cake and decorate with banana chips. PREP:  15  MINS COOK TIME:  40  MINS CUT INTO:  8-10  SLICES
Ingredients:1 tsp, 2 very ripe, mashed, 50g icing sugar, handful dried chips, for decoration, Ingredients:, 140g , softened, plus extra, for the tin, 140g caster sugar, 2 large eggs, beaten, 140g flour,' metadata={'page_num': 35, 'name': ' BANANA CAKE'}

Score: 2.14
Content: page_content=' ZEBRA MARBLE CAKE D

In [4]:
def hybrid_search(dense_results, sparse_results, alpha=0.7, top_k=5):
    # Normalize dense scores to 0-1
    dense_scores = [score for _, score in dense_results]
    max_dense, min_dense = max(dense_scores), min(dense_scores)
    dense_norm = {doc.page_content: (score - min_dense)/(max_dense - min_dense + 1e-8)
              for doc, score in dense_results}
    
    # Normalize sparse scores to 0-1
    sparse_scores = [score for _, score in sparse_results]
    max_sparse, min_sparse = max(sparse_scores), min(sparse_scores)
    sparse_norm = {doc.page_content: (score - min_sparse)/(max_sparse - min_sparse + 1e-8)
               for doc, score in sparse_results}
    
    # Combine scores
    combined_scores = {}
    texts = set(list(dense_norm.keys()) + list(sparse_norm.keys()))
    for text in texts:
        d_score = dense_norm.get(text, 0)
        s_score = sparse_norm.get(text, 0)
        combined_scores[text] = alpha * d_score + (1 - alpha) * s_score
    print(combined_scores)
    # Sort and return top_k
    sorted_combined = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)
    return sorted_combined[:top_k]

query = "Give me cake recipe with chocolate and frosting"
results_dense = dense_search(query,k=3)
results_sparse = sparse_search(query, k=3)
results = hybrid_search(results_dense, results_sparse, alpha=0.7, top_k=3)

for text, score in results:
    print(f"Combined Score: {score:.4f}\nContent: {text}\n")


{' CHOCOLATE FUDGE CAKE WITH CHOCOLATE CURL Directions Step 1 Preheat oven to 160°/140°C . Grease a 6cm-deep, 22cm round cake pan. Line base and side with baking paper, allowing a 2cm overhang. Step 2 Place butter, cold water, milk, chocolate and sugar in a saucepan over medium heat. Cook, stirring occasionally, for 8 to 10 minutes or until smooth and combined. Set aside for 20 minutes to cool. Step 3 Mix fl our and cocoa into chocolate mixture. Whisk in eggs until smooth and combined. Spoon mixture into prepared pan. Step 4 Bake for 1 hour 40 minutes or until a skewer inserted in centre of cake comes out clean. Cool cake in pan. Step 5 Make ganache: Place chocolate and cream in a microwave-safe bowl. Microwave on medium-high (75%) for 1 to 2 minutes, stirring with a metal spoon every 30 seconds, or until smooth. Refrigerate for 15 minutes or until thickened. Step 6 Using a serrated knife, cut cake into 3 layers. Place base on a plate. Spread with 1/4 ganache. Top with layer of cake, t