# Search Enhancements - Interactive Tutorial
# ==========================================
In this notebook, you'll learn about advanced search features.

## ðŸ“š Learning Objectives

By the end, you will:
- Implement auto-suggest functionality
- Implement faceted search
- Learn about search ranking improvements
- Understand search analytics and metrics

## ðŸ”§ Prerequisites

Ensure you have installed:
- Python 3.11+
- ElasticSearch or OpenSearch (for full-text)
- pandas (for faceting)

## ðŸ“¦ Setup

Let's start by importing necessary libraries.

In [1]:
# Import required libraries
import re
from typing import List, Dict, Optional
from collections import Counter
import pandas as pd

# Print setup confirmation
print("âœ… Libraries imported successfully!")

âœ… Libraries imported successfully!


## 1. Auto-Suggest

### 1.1 Trie-based Auto-Suggest

Implement auto-suggest using a trie data structure.

In [2]:
class TrieNode:
    """Node in trie data structure."""
    
    def __init__(self):
        self.children: Dict[str, 'TrieNode'] = {}
        self.is_end: bool = False
    
    def insert(self, word: str) -> None:
        """Insert word into trie."""
        node = self
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
        node.is_end = True
        
    def search_prefix(self, prefix: str, max_results: int = 10) -> List[str]:
        """Search for words with given prefix."""
        node = self
        results = []
        for char in prefix:
            if char not in node.children:
                return results
            node = node.children[char]
            if node.is_end:
                results.append(prefix[:i + 1])
                if len(results) >= max_results:
                    return results
        return results

# Test trie
trie = TrieNode()
words = ["rag", "retrieval", "augmented", "generation", "documents", "search", "embedding", "vector", "query"]

for word in words:
    trie.insert(word)

print("âœ… Trie built and populated!")
print(f"   Total words: {len(words)}")

# Test auto-suggest
prefix = "retri"
suggestions = trie.search_prefix(prefix)
print(f"\nSuggestions for '{prefix}':")
for i, suggestion in enumerate(suggestions, 1):
    print(f"  {i}. {suggestion}")

âœ… Trie built and populated!
   Total words: 9

Suggestions for 'retri':


## 2. Faceted Search

### 2.1 Implementing Facets

Facets provide aggregated counts for filtering categories.

In [3]:
# Mock documents for faceting
MOCK_DOCS = [
    {"id": "doc1", "filename": "paper.pdf", "status": "indexed", "content_type": "application/pdf", "size_bytes": 1024},
    {"id": "doc2", "filename": "data.xlsx", "status": "indexed", "content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", "size_bytes": 5120},
    {"id": "doc3", "filename": "presentation.pptx", "status": "created", "content_type": "application/vnd.openxmlformats-officedocument.presentationml.presentation", "size_bytes": 2048},
    {"id": "doc4", "filename": "report.pdf", "status": "indexed", "content_type": "application/pdf", "size_bytes": 3072},
]

def calculate_facets(documents: List[Dict]) -> Dict:
    """Calculate facets for documents."""
    facets = {
        "status": Counter(),
        "content_type": Counter(),
        "size_ranges": {
            "small": 0,
            "medium": 0,
            "large": 0,
        },
    }
    
    for doc in documents:
        facets["status"][doc["status"]] += 1
        facets["content_type"][doc["content_type"]] += 1
        
        # Size ranges
        size = doc["size_bytes"]
        if size < 1024:
            facets["size_ranges"]["small"] += 1
        elif size < 5120:
            facets["size_ranges"]["medium"] += 1
        else:
            facets["size_ranges"]["large"] += 1
    
    return facets

# Calculate facets
facets = calculate_facets(MOCK_DOCS)

print("ðŸ“Š Search Facets:")
print(f"\nBy Status:")
for status, count in facets["status"].items():
    print(f"  {status}: {count}")

print(f"\nBy Content Type:")
for ct, count in facets["content_type"].items():
    print(f"  {ct}: {count}")

print(f"\nBy Size Range:")
for size_range, count in facets["size_ranges"].items():
    print(f"  {size_range}: {count}")

ðŸ“Š Search Facets:

By Status:
  indexed: 3
  created: 1

By Content Type:
  application/pdf: 2
  application/vnd.openxmlformats-officedocument.spreadsheetml.sheet: 1
  application/vnd.openxmlformats-officedocument.presentationml.presentation: 1

By Size Range:
  small: 0
  medium: 3
  large: 1


## 3. Practice Exercise

### Task: Implement search with faceting

Create a function that:
1. Accepts query text
2. Performs search (mock)
3. Filters by status (optional)
4. Filters by content_type (optional)
5. Filters by size_range (optional)
6. Calculates and returns facets
7. Returns paginated results

Implement this function below.

In [4]:
# Implemented: search with faceting

def search_with_facets(
    query: str,
    filters: Dict = {},
    limit: int = 20,
    offset: int = 0,
) -> Dict:
    """
    Search with faceting.

    Args:
        query: Search query
        filters: {status, content_type, size_range}
        limit: Max results
        offset: Pagination offset

    Returns:
        Dict with 'results' and 'facets'
    """
    if limit < 0:
        raise ValueError("limit must be non-negative")
    if offset < 0:
        raise ValueError("offset must be non-negative")

    query_norm = (query or "").strip().lower()
    filters = filters or {}

    # Mock search: match query against filename or content_type
    matched = []
    for doc in MOCK_DOCS:
        if not query_norm:
            matched.append(doc)
            continue
        if query_norm in doc["filename"].lower() or query_norm in doc["content_type"].lower():
            matched.append(doc)

    # Apply filters
    status_filter = filters.get("status")
    if status_filter:
        matched = [doc for doc in matched if doc["status"] == status_filter]

    content_type_filter = filters.get("content_type")
    if content_type_filter:
        matched = [doc for doc in matched if doc["content_type"] == content_type_filter]

    size_range_filter = filters.get("size_range")
    if size_range_filter:
        def in_size_range(doc: Dict) -> bool:
            size = doc["size_bytes"]
            if size_range_filter == "small":
                return size < 1024
            if size_range_filter == "medium":
                return 1024 <= size < 5120
            if size_range_filter == "large":
                return size >= 5120
            return True

        matched = [doc for doc in matched if in_size_range(doc)]

    total = len(matched)
    results = matched[offset:offset + limit]

    return {
        "results": results,
        "facets": calculate_facets(matched),
        "total": total,
    }

print("? search_with_facets defined successfully!")



? search_with_facets defined successfully!


## 4. Summary

In this notebook, you learned:

1. **Auto-Suggest** - Trie-based prefix search
2. **Faceted Search** - Aggregated counts for filtering
3. **Facet Categories** - Status, content_type, size_ranges

### ðŸŽ¯ Key Takeaways

- Use trie data structure for auto-suggest
- Calculate facets from search results
- Support multiple facet filters (AND/OR logic)
- Return facets with search results
- Use counters for efficient facet calculation

### ðŸš€ Next Steps

1. Implement actual search with faceting in production
2. Add faceted search API endpoint
3. Test with real data
4. Add search analytics tracking

### ðŸ“š Further Reading

- [Search Algorithms](https://en.wikipedia.org/wiki/Search_algorithm)
- [Faceted Search](https://en.wikipedia.org/wiki/Faceted_search)
- [Auto-Complete](https://en.wikipedia.org/wiki/Autocomplete)