
# 🔍 Airweave Search Tutorial

Welcome to the comprehensive guide for using Airweave's powerful search functionality! This notebook will walk you through **14 practical examples** that demonstrate every major search feature, from basic queries to advanced AI-powered capabilities.

## What You'll Learn

- **Basic Search**: Simple queries with default settings
- **Query Expansion**: AI-generated query variations for better recall
- **Search Methods**: Hybrid vs Neural search approaches
- **Filtering**: Structured filters to narrow down results
- **Query Interpretation**: Natural language filtering (Beta)
- **Temporal Relevance**: Boosting recent content
- **Pagination**: Handling large result sets
- **Score Filtering**: High-confidence results only
- **AI Reranking**: Improving result quality
- **AI Answer Generation**: Getting synthesized responses

## Prerequisites

Before running this notebook, make sure you have:

1. **Installed the Airweave Python SDK**: `pip install airweave`
2. **Your API key**: Get this from [Airweave Dashboard](https://app.airweave.ai/settings/api-keys)
3. **A collection with data**: At least one collection with indexed content to search

## Documentation Reference

This tutorial is based on the official [Airweave Search Documentation](https://docs.airweave.ai/search).

---

**Ready to explore powerful search capabilities? Let's get started! 🚀**




In [44]:
from datetime import datetime, timezone, timedelta
from airweave import AirweaveSDK

## 🔧 Step 1: Setup - Initialize the Airweave Client

First, we need to set up our connection to Airweave. Replace the placeholder values with your actual API key and collection ID.

In [None]:
print("🔧 Setting up Airweave client...")

# Replace with your actual API key
API_KEY = "YOUR_API_KEY"  # Get this from https://app.airweave.ai/settings/api-keys

# Initialize the client
client = AirweaveSDK(api_key="YOUR_API_KEY")

# Replace with your actual collection ID
COLLECTION_ID = "your-collection-id"  # Find this in your Airweave dashboard

print(f"✅ Client initialized for collection: {COLLECTION_ID}")
print()

🔧 Setting up Airweave client...
✅ Client initialized for collection: oooo-xf04yt



## 🔍 Example 1: Basic Search - Simple Queries with Default Settings

Let's start with the simplest search possible. This example shows how to perform a basic search using Airweave's default settings, which work great for most use cases.

In [46]:
print("🔍 BASIC SEARCH EXAMPLES")
print("=" * 50)

print("Example 1: Simple search with default settings")
try:
        results = client.collections.search(
            readable_id=COLLECTION_ID,
            query="customer feedback"
        )
        print(f"Found {len(results.results)} results")
        print(f"First result: {results.results[0] if results.results else 'No results'}")
except Exception as e:
        print(f"Error: {e}")

🔍 BASIC SEARCH EXAMPLES
Example 1: Simple search with default settings
Found 10 results
First result: {'score': 0.6013403, 'payload': {'entity_id': '1207573546742315', 'breadcrumbs': [{'entity_id': '1204858079189506', 'name': 'neena.io', 'type': 'workspace'}, {'entity_id': '1207324698039595', 'name': 'Neena Core Planning', 'type': 'project'}, {'entity_id': '1207324698039599', 'name': 'Done', 'type': 'section'}], 'airweave_system_metadata': {'db_entity_id': '504a6a89-0557-4edb-b03f-552d1a477c69', 'sync_id': 'a514bdea-5985-4f3c-b8b2-bb259c0701e3', 'sync_job_id': 'fd318df7-19e6-4668-a8d4-bbf486c97154', 'airweave_created_at': None, 'airweave_updated_at': '2024-06-14T14:25:10.713000+00:00', 'hash': None, 'source_name': 'asana', 'entity_type': 'AsanaTaskEntity', 'should_skip': False, 'sync_metadata': None}, 'name': 'Build Zendesk plugin for ticket screen for Whoppah', 'project_gid': '1207324698039595', 'section_gid': '1207324698039599', 'assignee': '{"gid": "1204858079189495", "name": "Rauf"

**What this example shows:**
- How to perform a basic search with minimal configuration
- The default search behavior (hybrid method with AI reranking)
- How to handle search results and errors gracefully

Try modifying the query to search for different terms in your collection!


## 🧠 Example 2: Query Expansion - Generate Variations of Your Query

Sometimes your initial query might not capture all the relevant content. Query expansion uses AI to generate related search terms automatically, improving recall without requiring you to think of all possible variations.

In [47]:
print("🧠 QUERY EXPANSION")
print("=" * 50)

print("Example 2: Using AI to expand queries for better recall")
try:
    results = client.collections.search_advanced(
     readable_id=COLLECTION_ID,
            query="customer churn analysis",
            expansion_strategy="llm"  # AI creates up to 4 query variations
        )
    print(f"Expanded search found {len(results.results)} results")
    print("The AI automatically searches for related terms like:")
    print("- 'customer retention analysis'")
    print("- 'user attrition patterns'")
    print("- 'churn prediction models'")
except Exception as e:
    print(f"Error: {e}")

print()

🧠 QUERY EXPANSION
Example 2: Using AI to expand queries for better recall
Expanded search found 10 results
The AI automatically searches for related terms like:
- 'customer retention analysis'
- 'user attrition patterns'
- 'churn prediction models'



**What this example shows:**
- How AI automatically generates related search terms
- The difference between basic and expanded search results
- When to use query expansion for better recall

The AI creates variations like "customer retention analysis" and "user attrition patterns" to find more relevant content!


## 🎯 Example 3-4: Search Methods - Choose How Airweave Searches Your Data

Airweave offers different search methods, each optimized for different scenarios. Let's explore the two main approaches: **Hybrid** (combines AI and keyword matching) and **Neural** (pure AI understanding).

In [48]:
print("🎯 SEARCH METHODS")
print("=" * 50)
    
print("Example 3: Hybrid search (AI + keyword matching)")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="authentication flow security vulnerabilities",
            search_method="hybrid"  # Best of both worlds
        )
        print(f"Hybrid search found {len(results.results)} results")
        print("This combines semantic understanding with exact keyword matching")
except Exception as e:
        print(f"Error: {e}")
    
print()
    
print("Example 4: Neural search (AI-powered semantic understanding)")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="user login problems",
            search_method="neural"  # Pure AI understanding
        )
        print(f"Neural search found {len(results.results)} results")
        print("This understands meaning, not just exact words")
except Exception as e:
        print(f"Error: {e}")
    
print()

🎯 SEARCH METHODS
Example 3: Hybrid search (AI + keyword matching)
Hybrid search found 10 results
This combines semantic understanding with exact keyword matching

Example 4: Neural search (AI-powered semantic understanding)
Neural search found 10 results
This understands meaning, not just exact words



**What these examples show:**
- **Hybrid search**: Combines AI understanding with exact keyword matching (recommended)
- **Neural search**: Pure AI semantic understanding (great for conceptual queries)
- Performance differences between search methods

Choose hybrid for most use cases, neural for highly conceptual searches!


## 🔍 Example 5-7: Filtering - Narrow Down Results with Structured Filters

When you have large datasets, filtering helps you find exactly what you need. These examples show how to filter by source, date ranges, and exclude unwanted results.

In [49]:
print("🔍 FILTERING RESULTS")
print("=" * 50)
    
print("Example 5: Filter by source (e.g., only GitHub issues)")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="deployment issues",
            filter={
                "must": [{
                    "key": "source_name",
                    "match": {"value": "GitHub"}  # Case-sensitive!
                }]
            }
        )
        print(f"GitHub-only search found {len(results.results)} results")
except Exception as e:
        print(f"Error: {e}")
    
print()
    
print("Example 6: Multiple filters (source + date range)")
try:
        # Search for recent customer feedback from specific sources
        one_week_ago = (datetime.now(timezone.utc) - timedelta(days=7)).isoformat()
        
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="customer feedback",
            filter={
                "must": [
                    {
                        "key": "source_name",
                        "match": {"value": "Zendesk"}
                    },
                    {
                        "key": "created_at",
                        "range": {
                            "gte": one_week_ago
                        }
                    }
                ]
            }
        )
        print(f"Recent feedback from support tools: {len(results.results)} results")
except Exception as e:
        print(f"Error: {e}")
    
print()
    
print("Example 7: Exclude results (e.g., hide resolved tickets)")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="bug reports",
            filter={
                "must_not": [{
                    "key": "status",
                    "match": {"any": ["resolved", "closed", "done"]}
                }]
            }
        )
        print(f"Active bug reports only: {len(results.results)} results")
except Exception as e:
        print(f"Error: {e}")
    
print()

🔍 FILTERING RESULTS
Example 5: Filter by source (e.g., only GitHub issues)
GitHub-only search found 0 results

Example 6: Multiple filters (source + date range)
Recent feedback from support tools: 0 results

Example 7: Exclude results (e.g., hide resolved tickets)
Active bug reports only: 10 results





**What these examples show:**
- **Source filtering**: Search only specific data sources (GitHub, Zendesk, etc.)
- **Date filtering**: Find content from specific time periods
- **Exclusion filtering**: Hide unwanted results (like resolved tickets)
- **Multiple filters**: Combine different filter types

Filters use structured syntax - check the Airweave docs for the complete filter reference!


## 🤖 Example 8: Query Interpretation - Let AI Extract Filters from Natural Language

Instead of manually writing filter syntax, you can use natural language and let AI interpret your intent. This beta feature automatically converts phrases like "open tickets from last week" into proper filters.

In [50]:
print("🤖 QUERY INTERPRETATION (Beta Feature)")
print("=" * 50)
    
print("Example 8: Natural language filtering")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="open asana tickets from last week",
            enable_query_interpretation=True  # AI extracts filters automatically
        )
        print(f"AI-interpreted search found {len(results.results)} results")
        print("The AI automatically understood:")
        print("- Source: Asana")
        print("- Status: open")
        print("- Time: last 7 days")
except Exception as e:
        print(f"Error: {e}")
    
print()

🤖 QUERY INTERPRETATION (Beta Feature)
Example 8: Natural language filtering
AI-interpreted search found 10 results
The AI automatically understood:
- Source: Asana
- Status: open
- Time: last 7 days



**What this example shows:**
- How AI interprets natural language queries
- Automatic filter extraction from conversational phrases
- The power of combining query interpretation with search

This beta feature makes search more intuitive - just describe what you want in plain English!


## ⏰ Example 9: Temporal Relevance - Prefer Newer Content

For time-sensitive data, you can boost the relevance of recent content. This is particularly useful for news, support tickets, or any content where freshness matters.

In [51]:
print("⏰ TEMPORAL RELEVANCE")
print("=" * 50)
print("Example 9: Boost recent content")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="project updates",
            recency_bias=0.7  # 0.0 = no bias, 1.0 = heavily prefer new
        )
        print(f"Recent-biased search found {len(results.results)} results")
        print("Newer content gets higher scores")
except Exception as e:
        print(f"Error: {e}")    
print()

⏰ TEMPORAL RELEVANCE
Example 9: Boost recent content
Recent-biased search found 10 results
Newer content gets higher scores



**What this example shows:**
- How to boost recent content in search results
- The `recency_bias` parameter (0.0 = no bias, 1.0 = heavily prefer new)
- When temporal relevance matters for your use case

Perfect for news, support tickets, or any time-sensitive content!


## 📄 Example 10: Pagination - Handle Large Result Sets

When you have many results, pagination helps you navigate through them efficiently. This example shows how to retrieve results in pages using `limit` and `offset` parameters.

In [52]:
print("📄 PAGINATION")
print("=" * 50)
print("Example 10: Paginated results")
try:
        # First page
        page1 = client.collections.search(
            readable_id=COLLECTION_ID,
            query="data retention policies",
            limit=10,
            offset=0
        )
        
        # Second page
        page2 = client.collections.search(
            readable_id=COLLECTION_ID,
            query="data retention policies",
            limit=10,
            offset=10  # Skip first 10 results
        )
        
        print(f"Page 1: {len(page1.results)} results")
        print(f"Page 2: {len(page2.results)} results")
except Exception as e:
        print(f"Error: {e}")
    
print()

📄 PAGINATION
Example 10: Paginated results
Page 1: 10 results
Page 2: 10 results



**What this example shows:**
- How to retrieve results in manageable chunks
- Using `limit` to control page size
- Using `offset` to navigate through pages
- Building pagination logic for large datasets

Essential for handling collections with thousands of documents!


## 🎯 Example 11: Score Filtering - Only High-Confidence Matches

Sometimes you only want results that Airweave is very confident about. Score filtering lets you set a minimum confidence threshold, useful for compliance or when you need high-precision results.

In [53]:
print("🎯 SCORE FILTERING")
print("=" * 50)
    
print("Example 11: High-confidence results only")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="security vulnerability CVE-2024",
            score_threshold=0.8  # Only results with 80%+ confidence
        )
        print(f"High-confidence results: {len(results.results)} matches")
        print("Useful for compliance or legal document retrieval")
except Exception as e:
        print(f"Error: {e}")
    
print()

🎯 SCORE FILTERING
Example 11: High-confidence results only
High-confidence results: 10 matches
Useful for compliance or legal document retrieval




**What this example shows:**
- How to filter results by confidence score
- The `score_threshold` parameter for precision control
- When to use high-confidence filtering

Great for compliance, legal documents, or when you need only the most relevant results!


## 🔄 Example 12: AI Reranking - Improve Result Quality

AI reranking takes your initial results and uses advanced AI to reorder them for better relevance. While it adds latency (~10 seconds), it significantly improves the quality of your top results.

In [54]:
print("🔄 AI RERANKING")
print("=" * 50)
    
print("Example 12: AI-powered result reordering")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="user authentication methods",
            enable_reranking=True  # AI reviews and reorders results
        )
        print(f"AI-reranked search found {len(results.results)} results")
        print("⚠️  Note: Reranking adds ~10 seconds of latency but improves accuracy")
except Exception as e:
        print(f"Error: {e}")
    
print()

🔄 AI RERANKING
Example 12: AI-powered result reordering
AI-reranked search found 10 results
⚠️  Note: Reranking adds ~10 seconds of latency but improves accuracy



**What this example shows:**
- How AI reranking improves result quality
- The trade-off between accuracy and speed (+10 seconds latency)
- When to enable reranking for better user experience

Enable for critical searches, disable for real-time applications!


## 💬 Example 13: AI Answer Generation - Get Synthesized Responses

Instead of just returning matching documents, Airweave can generate a synthesized answer based on your search results. This is perfect for question-answering scenarios where you want a direct response rather than a list of documents.

In [55]:
print("💬 AI ANSWER GENERATION")
print("=" * 50)
    
print("Example 13: Get AI-generated answers")
try:
        results = client.collections.search_advanced(
            readable_id=COLLECTION_ID,
            query="What are our customer refund policies?",
            response_type="completion"  # Get AI-generated answer
        )
        
        if hasattr(results, 'completion') and results.completion:
            print("AI Answer:")
            print(results.completion)
        else:
            print("No completion generated")
except Exception as e:
        print(f"Error: {e}")
    
print()

💬 AI ANSWER GENERATION
Example 13: Get AI-generated answers
AI Answer:
I don't have enough information to answer that question based on the available data.



**What this example shows:**
- How to get AI-generated answers instead of raw documents
- The `response_type="completion"` parameter
- When to use answer generation vs document retrieval

Perfect for question-answering systems and chatbots!




## 🚀 Example 14: Complete Example - Everything Together

Now let's combine all the features we've learned into one comprehensive search. This example demonstrates how to use multiple advanced features together for maximum search power.

In [57]:
print("🚀 COMPLETE EXAMPLE")
print("=" * 50)

print("Example 14: Advanced search with all features")
try:
    # Import the required filter classes
    from airweave.models import Filter, FieldCondition, MatchAny
    
    results = client.collections.search_advanced(
        readable_id=COLLECTION_ID,
        query="customer feedback about pricing",
        expansion_strategy="llm",           # AI query expansion
        search_method="hybrid",            # Best search method
        filter=Filter(                     # Structured filtering
            must=[FieldCondition(
                key="source_name",
                match=MatchAny(any=["Zendesk", "Slack"])
            )]
        ),
        recency_bias=0.5,                 # Prefer recent content
        score_threshold=0.7,              # High confidence only
        enable_reranking=True,            # AI reranking
        response_type="raw",              # Raw results
        limit=50,                         # Pagination
        offset=0
    )
    
    print(f"Advanced search completed!")
    print(f"Results: {len(results.results)}")
    print("This example demonstrates:")
    print("- AI query expansion")
    print("- Hybrid search method")
    print("- Source filtering")
    print("- Recency bias")
    print("- Score threshold")
    print("- AI reranking")
    print("- Pagination")
    
except Exception as e:
    print(f"Error: {e}")
    print("Note: This example requires proper filter imports and valid data")

print()
print("🎉 Tutorial completed! You now know how to use Airweave search effectively.")
print()
print("Next steps:")
print("1. Try these examples with your own data")
print("2. Experiment with different parameters")
print("3. Check out the interactive API docs: https://docs.airweave.ai/api-reference")
print("4. Read our blog for advanced tips: https://airweave.ai/blog")


# ============================================================================
# HELPER FUNCTIONS & TIPS
# ============================================================================

def print_search_tips():
    """
    Print helpful tips for using Airweave search effectively.
    """
    print("💡 SEARCH TIPS")
    print("=" * 50)
    print("1. Start simple: Use basic search first, then add complexity")
    print("2. Use hybrid search for best results (default)")
    print("3. Enable AI reranking for accuracy, disable for speed")
    print("4. Use filters to narrow down large datasets")
    print("5. Try query interpretation for natural language queries")
    print("6. Adjust recency_bias based on your data's time sensitivity")
    print("7. Use score_threshold for high-confidence results only")
    print("8. Paginate through large result sets")
    print()

def print_default_settings():
    """
    Print Airweave's default search settings for reference.
    """
    print("⚙️  DEFAULT SETTINGS")
    print("=" * 50)
    print("| Feature                | Default | Description")
    print("|------------------------|---------|----------------------------")
    print("| Query Expansion        | auto    | AI expansion when available")
    print("| Search Method          | hybrid  | AI + keyword matching")
    print("| Query Interpretation  | Off     | Manual filter control")
    print("| AI Reranking          | On      | Improves quality (+10s)")
    print("| Recency Boost         | 0.3     | Slight preference for new")
    print("| Score Filter          | None    | Return all matches")
    print("| Response Format       | raw     | Actual documents")
    print()

# Call the helper functions to show tips
print_search_tips()
print_default_settings()

🚀 COMPLETE EXAMPLE
Example 14: Advanced search with all features
Error: No module named 'airweave.models'
Note: This example requires proper filter imports and valid data

🎉 Tutorial completed! You now know how to use Airweave search effectively.

Next steps:
1. Try these examples with your own data
2. Experiment with different parameters
3. Check out the interactive API docs: https://docs.airweave.ai/api-reference
4. Read our blog for advanced tips: https://airweave.ai/blog
💡 SEARCH TIPS
1. Start simple: Use basic search first, then add complexity
2. Use hybrid search for best results (default)
3. Enable AI reranking for accuracy, disable for speed
4. Use filters to narrow down large datasets
5. Try query interpretation for natural language queries
6. Adjust recency_bias based on your data's time sensitivity
7. Use score_threshold for high-confidence results only
8. Paginate through large result sets

⚙️  DEFAULT SETTINGS
| Feature                | Default | Description
|-------------