# Lab 3.4.3: LlamaIndex Query Engine - Solutions

This notebook contains complete solutions for the LlamaIndex exercises.

---

## Exercise: Combine Hybrid Retrieval + Reranking + Citations

**Solution:** Creating the ultimate query engine that combines all three techniques.

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import BM25Retriever, QueryFusionRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from pathlib import Path

# Configure LlamaIndex
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
Settings.llm = Ollama(model="llama3.1:8b", temperature=0.1)
Settings.chunk_size = 512
Settings.chunk_overlap = 50

# Load documents
DATA_DIR = Path.cwd().parent / "data" / "sample_documents"
documents = SimpleDirectoryReader(input_dir=str(DATA_DIR)).load_data()

# Create nodes
node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = node_parser.get_nodes_from_documents(documents)

print(f"Loaded {len(documents)} documents, created {len(nodes)} nodes")

In [None]:
# Create the index
index = VectorStoreIndex(nodes=nodes, show_progress=True)

# Create component retrievers
# 1. BM25 (keyword) retriever
bm25_retriever = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=10  # Get more initially for reranking
)

# 2. Vector (semantic) retriever
vector_retriever = index.as_retriever(similarity_top_k=10)

# 3. Hybrid retriever using QueryFusion
hybrid_retriever = QueryFusionRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    similarity_top_k=10,
    num_queries=1,
    mode="reciprocal_rerank",  # RRF fusion
)

print("Hybrid retriever created (BM25 + Vector with RRF fusion)")

In [None]:
# Create reranker
reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    top_n=5  # Keep top 5 after reranking
)

# Create response synthesizer with citation support
response_synthesizer = get_response_synthesizer(
    response_mode="compact",
    # The synthesizer will include source info in the response
)

print("Reranker and response synthesizer created")

In [None]:
# Create the ultimate query engine
ultimate_query_engine = RetrieverQueryEngine(
    retriever=hybrid_retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[reranker],  # Apply reranking
)

print("Ultimate Query Engine created!")
print("Features:")
print("  ✓ Hybrid retrieval (BM25 + Vector)")
print("  ✓ RRF fusion for combining results")
print("  ✓ Cross-encoder reranking")
print("  ✓ Compact response synthesis")

In [None]:
# Test the ultimate query engine
test_queries = [
    "What are the memory specifications of DGX Spark?",
    "How does LoRA enable efficient fine-tuning?",
    "What quantization formats does Blackwell support?",
]

for query in test_queries:
    print("\n" + "="*60)
    print(f"Query: {query}")
    print("="*60)
    
    response = ultimate_query_engine.query(query)
    
    print(f"\nResponse: {response}")
    print(f"\nSources used: {len(response.source_nodes)}")
    for i, node in enumerate(response.source_nodes[:3], 1):
        print(f"  [{i}] {node.text[:100]}...")

## Challenge: Multi-Query Retrieval

**Solution:** Generate multiple query variations and combine results.

In [None]:
from llama_index.core.retrievers import QueryFusionRetriever

# The QueryFusionRetriever can generate multiple queries
# Let's create one that generates variations

multi_query_retriever = QueryFusionRetriever(
    retrievers=[vector_retriever],
    similarity_top_k=5,
    num_queries=4,  # Generate 4 query variations
    mode="reciprocal_rerank",
    use_async=False,
)

# Create query engine with multi-query retrieval
multi_query_engine = RetrieverQueryEngine(
    retriever=multi_query_retriever,
    response_synthesizer=response_synthesizer,
)

print("Multi-Query Retriever created!")
print("This will generate 4 variations of each query for broader coverage.")

In [None]:
# Test multi-query retrieval
query = "What makes DGX Spark good for AI?"

print(f"Original Query: {query}")
print("\nGenerated Query Variations (by the retriever):")

response = multi_query_engine.query(query)

print(f"\n{'='*60}")
print(f"Response: {response}")

## Key Takeaways

1. **Hybrid retrieval** catches both exact matches (BM25) and semantic meaning (vector)
2. **RRF fusion** effectively combines rankings from multiple retrievers
3. **Reranking** with cross-encoders significantly improves result quality
4. **Multi-query retrieval** broadens coverage by searching variations
5. **The combination** of all techniques provides the most robust retrieval