# Advanced RAG Techniques

This notebook covers advanced RAG features:
- Document chunking strategies
- Hybrid search (dense + sparse)
- Result reranking with MMR
- Query optimization

In [None]:
import sys
sys.path.insert(0, '../../sdk/python')

from ragamuffin import RagamuffinClient
import httpx

client = RagamuffinClient("http://localhost:8000")
client.login("demo@example.com", "SecurePass123!")

# Direct RAG service access for advanced features
RAG_URL = "http://localhost:8001"
print("Connected!")

## 1. Document Chunking

Split long documents into smaller chunks for better retrieval.

### Chunking Strategies
- **Character**: Fixed character count with overlap
- **Separator**: Split by paragraphs or sections
- **Sentence**: Group by sentence boundaries

In [None]:
# Sample long document
long_document = """
Machine Learning Fundamentals

Machine learning is a branch of artificial intelligence that focuses on building 
applications that learn from data and improve their accuracy over time without 
being programmed to do so.

Supervised Learning

In supervised learning, algorithms learn from labeled training data. The algorithm 
makes predictions and is corrected by the teacher. Learning continues until the 
algorithm achieves an acceptable level of performance.

Unsupervised Learning

Unsupervised learning algorithms work on unlabeled data. The system tries to learn 
the patterns and structure from the data without external guidance. Clustering and 
dimensionality reduction are common unsupervised techniques.

Deep Learning

Deep learning is a subset of machine learning that uses neural networks with many 
layers. These deep neural networks can learn complex patterns in large amounts of 
data, enabling breakthroughs in image recognition, natural language processing, 
and other domains.
"""

print(f"Document length: {len(long_document)} characters")

In [None]:
# Call chunking API directly
async def chunk_document(text, strategy="sentence", chunk_size=500, overlap=50):
    async with httpx.AsyncClient() as http:
        response = await http.post(
            f"{RAG_URL}/chunk",
            json={
                "text": text,
                "strategy": strategy,
                "chunk_size": chunk_size,
                "chunk_overlap": overlap
            }
        )
        return response.json()

# Example usage (run in async context)
import asyncio

async def demo_chunking():
    result = await chunk_document(long_document, strategy="separator")
    chunks = result.get('chunks', [])
    print(f"Created {len(chunks)} chunks:")
    for i, chunk in enumerate(chunks[:3]):
        print(f"\nChunk {i+1}:")
        print(f"  {chunk[:100]}...")

# asyncio.run(demo_chunking())  # Uncomment to run

## 2. Hybrid Search

Combine dense (semantic) and sparse (keyword) retrieval.

- **Dense Search**: Uses vector embeddings for semantic similarity
- **Sparse Search**: Uses BM25 for keyword matching
- **Hybrid**: Combines both with Reciprocal Rank Fusion (RRF)

In [None]:
# Embed documents for hybrid search demo
hybrid_docs = [
    "Python programming language is great for data science and machine learning.",
    "The python snake is a non-venomous constrictor found in tropical regions.",
    "TensorFlow and PyTorch are popular deep learning frameworks in Python.",
    "Anaconda is both a Python distribution and a large snake species.",
    "JavaScript is the most popular programming language for web development."
]

client.rag.embed(hybrid_docs, collection_name="hybrid_demo")
print("Documents embedded for hybrid search demo")

In [None]:
# Compare search modes
query = "Python programming"

# Dense search (semantic)
dense_results = client.rag.search(query, top_k=3, collection_name="hybrid_demo")

print(f"Query: '{query}'")
print("\nDense Search Results (semantic):")
for r in dense_results.get('results', []):
    print(f"  - {r.get('text', '')[:60]}...")

## 3. Result Reranking

Improve result quality with:
- **MMR (Maximal Marginal Relevance)**: Balance relevance and diversity
- **Cross-encoder reranking**: More accurate relevance scoring

In [None]:
# Search with reranking
query = "What are the best tools for machine learning?"

# Get more results, then rerank
results = client.rag.search(query, top_k=10, collection_name="hybrid_demo")

print("Search Results:")
for i, r in enumerate(results.get('results', [])[:5]):
    score = r.get('score', 0)
    text = r.get('text', '')[:60]
    print(f"  {i+1}. [{score:.3f}] {text}...")

## 4. Query Optimization Tips

### Best Practices

1. **Clear, specific queries** work better than vague ones
2. **Include key terms** that match your documents
3. **Adjust top_k** based on your needs (3-10 typical)
4. **Use appropriate chunking** for your document types
5. **Combine with LLM** for best RAG results

In [None]:
# Query optimization examples
queries = [
    # Vague query
    "tell me about learning",
    # Specific query
    "What is supervised machine learning?",
    # Keyword-rich query
    "Python deep learning frameworks TensorFlow PyTorch"
]

for query in queries:
    results = client.rag.search(query, top_k=2, collection_name="hybrid_demo")
    top_score = results.get('results', [{}])[0].get('score', 0)
    print(f"Query: '{query[:40]}...'")
    print(f"  Top score: {top_score:.3f}\n")

## Summary

Advanced RAG techniques covered:

| Technique | Use Case |
|-----------|----------|
| Character Chunking | Fixed-size splits |
| Separator Chunking | Paragraph-based documents |
| Sentence Chunking | Conversational content |
| Hybrid Search | Better recall |
| MMR Reranking | Diverse results |

For more details, see:
- [API Reference](../../docs/API_REFERENCE.md)
- [Architecture Guide](../../docs/ARCHITECTURE.md)