# Azure AI Search Simulator - Python SDK Demo

This notebook demonstrates how to use the **official Azure AI Search Python SDK** with the local **Azure AI Search Simulator**.

## Prerequisites

1. **Start the Azure AI Search Simulator with HTTPS** (required by Azure SDK):
   ```bash
   cd src/AzureAISearchSimulator.Api && dotnet run --urls "https://localhost:7250"
   ```

2. **Start the Custom Skills API** (optional, for skillset demo):
   ```bash
   cd samples/CustomSkillSample && dotnet run
   ```

3. **Install Python dependencies**:
   ```bash
   pip install azure-search-documents httpx requests pandas
   ```

## What This Notebook Covers

- Creating search indexes with the **official Azure SDK**
- Setting up data sources for local file system
- Configuring custom Web API skills in skillsets
- Creating indexers with change detection
- Searching and displaying results

> ‚ö†Ô∏è **Note**: The Azure SDK requires HTTPS. The simulator must be started with `--urls "https://localhost:7250"`

## 1. Import Required Libraries and Configure Environment

In [113]:
# Install required packages (uncomment if needed)
# !pip install azure-search-documents requests pandas

import os
import json
import requests
import urllib3
from pathlib import Path

# Azure AI Search SDK imports
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient, SearchIndexerClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    SearchIndexer,
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection,
    SearchIndexerSkillset,
    WebApiSkill,
    InputFieldMappingEntry,
    OutputFieldMappingEntry,
    FieldMapping,
    IndexingParameters,
    IndexingParametersConfiguration,
    SplitSkill,
)

# For displaying results
import pandas as pd
from IPython.display import display, HTML

# Suppress SSL warnings for local development
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

print("‚úÖ Libraries imported successfully!")

‚úÖ Libraries imported successfully!


## 2. Initialize Azure AI Search Clients

Configure the connection to the local Azure AI Search Simulator. The simulator runs on `http://localhost:5250` by default.

In [114]:
# Configuration for Azure AI Search Simulator
# NOTE: The Azure SDK requires HTTPS. Run the simulator with HTTPS:
#   dotnet run --urls "https://localhost:7250"

SEARCH_ENDPOINT = "https://localhost:7250"  # HTTPS required for Azure SDK
ADMIN_API_KEY = "admin-key-12345"
QUERY_API_KEY = "query-key-67890"

# Index and resource names
INDEX_NAME = "pdf-documents"
DATA_SOURCE_NAME = "local-pdf-files"
SKILLSET_NAME = "pdf-enrichment"
INDEXER_NAME = "pdf-indexer"

# Custom Skill API (from CustomSkillSample project)
CUSTOM_SKILL_BASE_URL = "http://localhost:5260"

# Create credentials
admin_credential = AzureKeyCredential(ADMIN_API_KEY)
query_credential = AzureKeyCredential(QUERY_API_KEY)

# Configure HTTP client to skip SSL certificate validation for local development
# This is required because the simulator uses a self-signed dev certificate
import requests as req_lib
from azure.core.pipeline.transport import RequestsTransport

# Create a requests session with SSL verification disabled
session = req_lib.Session()
session.verify = False

# Create custom transport - pass connection_verify=False explicitly
transport = RequestsTransport(session=session, connection_verify=False)

# Create clients for index management
# Note: We also pass connection_verify=False to the client kwargs
index_client = SearchIndexClient(
    endpoint=SEARCH_ENDPOINT,
    credential=admin_credential,
    transport=transport,
    connection_verify=False
)

# Create client for indexer/data source/skillset management  
indexer_client = SearchIndexerClient(
    endpoint=SEARCH_ENDPOINT,
    credential=admin_credential,
    transport=transport,
    connection_verify=False
)

print(f"‚úÖ Connected to Azure AI Search Simulator at {SEARCH_ENDPOINT}")
print(f"   Index Client: {type(index_client).__name__}")
print(f"   Indexer Client: {type(indexer_client).__name__}")
print(f"   ‚ö†Ô∏è  SSL verification disabled for local development")

‚úÖ Connected to Azure AI Search Simulator at https://localhost:7250
   Index Client: SearchIndexClient
   Indexer Client: SearchIndexerClient
   ‚ö†Ô∏è  SSL verification disabled for local development


## 3. Download Sample PDF Files and Metadata

Download sample PDF documents from Azure AI Search samples repository. These are commonly used for demos and tutorials.

In [None]:
# Create directory for sample documents
DOCS_PATH = Path("./sample-documents")
DOCS_PATH.mkdir(exist_ok=True)

# Sample PDF URLs from Azure cognitive-search-sample-data repository
# Using English documents from the health-plan folder
SAMPLE_PDFS = {
    "employee-handbook": {
        "url": "https://raw.githubusercontent.com/Azure-Samples/azure-search-sample-data/main/health-plan/employee_handbook.pdf",
        "title": "Employee Handbook",
        "category": "HR",
        "department": "Human Resources"
    },
    "benefit-options": {
        "url": "https://raw.githubusercontent.com/Azure-Samples/azure-search-sample-data/main/health-plan/Benefit_Options.pdf",
        "title": "Benefit Options",
        "category": "Benefits",
        "department": "Human Resources"
    },
    "perks-plus": {
        "url": "https://raw.githubusercontent.com/Azure-Samples/azure-search-sample-data/main/health-plan/PerksPlus.pdf",
        "title": "Perks Plus Program",
        "category": "Benefits",
        "department": "Human Resources"
    }
}

def download_file(url: str, filepath: Path) -> bool:
    """Download a file from URL to local path."""
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        filepath.write_bytes(response.content)
        return True
    except Exception as e:
        print(f"  ‚ö†Ô∏è Failed to download {url}: {e}")
        return False

# Download PDFs and create metadata JSON files
downloaded_files = []
for doc_id, doc_info in SAMPLE_PDFS.items():
    pdf_path = DOCS_PATH / f"{doc_id}.pdf"
    json_path = DOCS_PATH / f"{doc_id}.json"
    
    # Download PDF if not exists
    if not pdf_path.exists():
        print(f"üì• Downloading {doc_info['title']}...")
        if download_file(doc_info["url"], pdf_path):
            print(f"   ‚úÖ Saved to {pdf_path}")
            downloaded_files.append(pdf_path)
    else:
        print(f"üìÑ {doc_info['title']} already exists")
        downloaded_files.append(pdf_path)
    
    # Create metadata JSON file
    metadata = {
        "id": doc_id,
        "title": doc_info["title"],
        "category": doc_info["category"],
        "department": doc_info["department"],
        "source_file": str(pdf_path.name),
        "last_modified": "2026-01-24T10:00:00Z"
    }
    json_path.write_text(json.dumps(metadata, indent=2))
    print(f"   üìã Metadata saved to {json_path}")

print(f"\n‚úÖ {len(downloaded_files)} PDF files ready in {DOCS_PATH.absolute()}")

## 4. Create a Simple Search Index

Define the search index schema with fields for document content, metadata, and enrichments from custom skills.

In [116]:
# Define the search index schema
index = SearchIndex(
    name=INDEX_NAME,
    fields=[
        # Key field (required)
        SimpleField(
            name="id",
            type=SearchFieldDataType.String,
            key=True,
            filterable=True
        ),
        
        # Document content - searchable
        SearchableField(
            name="content",
            type=SearchFieldDataType.String,
            analyzer_name="en.lucene"
        ),
        
        # Metadata fields
        SearchableField(
            name="title",
            type=SearchFieldDataType.String,
            filterable=True,
            sortable=True
        ),
        SimpleField(
            name="category",
            type=SearchFieldDataType.String,
            filterable=True,
            facetable=True
        ),
        SimpleField(
            name="department",
            type=SearchFieldDataType.String,
            filterable=True,
            facetable=True
        ),
        
        # Storage metadata (populated by indexer)
        SimpleField(
            name="metadata_storage_path",
            type=SearchFieldDataType.String,
            filterable=True
        ),
        SimpleField(
            name="metadata_storage_name",
            type=SearchFieldDataType.String,
            filterable=True,
            sortable=True
        ),
        
        # Fields populated by custom skills
        SimpleField(
            name="wordCount",
            type=SearchFieldDataType.Int32,
            filterable=True,
            sortable=True
        ),
        SimpleField(
            name="sentenceCount",
            type=SearchFieldDataType.Int32,
            filterable=True,
            sortable=True
        ),
        # NOTE: For SearchableField with Collection(Edm.String), use collection=True parameter
        # The type parameter is ignored by SearchableField and defaults to Edm.String
        SearchableField(
            name="keywords",
            collection=True,  # This makes it Collection(Edm.String)
            filterable=True,
            facetable=True
        ),
        SimpleField(
            name="sentiment",
            type=SearchFieldDataType.String,
            filterable=True,
            facetable=True
        ),
        SimpleField(
            name="sentimentScore",
            type=SearchFieldDataType.Double,
            filterable=True,
            sortable=True
        ),
        SearchableField(
            name="summary",
            type=SearchFieldDataType.String
        )
    ]
)

# Try to create the index - if it exists, delete and recreate
from azure.core.exceptions import ResourceExistsError, ResourceNotFoundError

try:
    # First try to delete any existing index
    index_client.delete_index(INDEX_NAME)
    print(f"üóëÔ∏è Deleted existing index '{INDEX_NAME}'")
except ResourceNotFoundError:
    print(f"‚ÑπÔ∏è Index '{INDEX_NAME}' does not exist, will create new")
except Exception as e:
    print(f"‚ö†Ô∏è Could not delete index: {e}")

# Create the index
try:
    result = index_client.create_index(index)
    print(f"‚úÖ Created index '{result.name}' with {len(result.fields)} fields")
except ResourceExistsError:
    # If it still exists, try to get it
    result = index_client.get_index(INDEX_NAME)
    print(f"‚ÑπÔ∏è Using existing index '{result.name}' with {len(result.fields)} fields")

# Display field information
field_info = [(f.name, str(f.type), "‚úì" if getattr(f, 'searchable', False) else "", 
               "‚úì" if getattr(f, 'filterable', False) else "",
               "‚úì" if getattr(f, 'facetable', False) else "") 
              for f in result.fields]
df = pd.DataFrame(field_info, columns=["Field Name", "Type", "Searchable", "Filterable", "Facetable"])
display(df)

üóëÔ∏è Deleted existing index 'pdf-documents'
‚úÖ Created index 'pdf-documents' with 13 fields


Unnamed: 0,Field Name,Type,Searchable,Filterable,Facetable
0,id,Edm.String,,‚úì,
1,content,Edm.String,‚úì,,
2,title,Edm.String,‚úì,‚úì,
3,category,Edm.String,,‚úì,‚úì
4,department,Edm.String,,‚úì,‚úì
5,metadata_storage_path,Edm.String,,‚úì,
6,metadata_storage_name,Edm.String,,‚úì,
7,wordCount,Edm.Int32,,‚úì,
8,sentenceCount,Edm.Int32,,‚úì,
9,keywords,Collection(Edm.String),‚úì,‚úì,‚úì


## 5. Create Data Source for Local File System

Configure a data source that points to the local folder containing PDF documents. The simulator supports `filesystem` type for local development.

In [None]:
# Get the absolute path to the sample documents folder
docs_absolute_path = str(DOCS_PATH.absolute())

# Create data source connection for local file system
# Note: The simulator uses "filesystem" type for local paths
data_source = SearchIndexerDataSourceConnection(
    name=DATA_SOURCE_NAME,
    type="filesystem",  # Simulator-specific type for local files
    connection_string=f"path={docs_absolute_path}",
    container=SearchIndexerDataContainer(
        name="."  # Use "." for root folder (files are directly in sample-documents)
    )
)

# Delete existing data source if it exists
try:
    indexer_client.delete_data_source_connection(DATA_SOURCE_NAME)
    print(f"üóëÔ∏è Deleted existing data source '{DATA_SOURCE_NAME}'")
except ResourceNotFoundError:
    print(f"‚ÑπÔ∏è Data source '{DATA_SOURCE_NAME}' does not exist, will create new")
except Exception as e:
    print(f"‚ö†Ô∏è Could not delete data source: {e}")

# Create the data source
try:
    result = indexer_client.create_data_source_connection(data_source)
    print(f"‚úÖ Created data source '{result.name}'")
    print(f"   Type: {result.type}")
    print(f"   Path: {docs_absolute_path}")
except ResourceExistsError:
    # Data source exists, try to get it
    result = indexer_client.get_data_source_connection(DATA_SOURCE_NAME)
    print(f"‚ÑπÔ∏è Using existing data source '{result.name}'")
    print(f"   Type: {result.type}")

## 6. Configure Custom Skills in Skillset

Create a skillset that uses the **Custom Web API Skills** from the `CustomSkillSample` project. These skills provide:
- **Text Stats**: Word count, sentence count, character count
- **Keyword Extraction**: Extract important keywords
- **Sentiment Analysis**: Detect positive/negative/neutral sentiment
- **Summarization**: Create extractive summaries

> ‚ö†Ô∏è Make sure the CustomSkillSample is running on `http://localhost:5260`

In [None]:
# Define custom skills using the CustomSkillSample API
skillset = SearchIndexerSkillset(
    name=SKILLSET_NAME,
    description="Skillset using custom Web API skills for PDF enrichment",
    skills=[
        # Skill 1: Text Statistics (word count, sentence count)
        WebApiSkill(
            name="text-stats-skill",
            description="Counts words, sentences, and characters in the document",
            uri=f"{CUSTOM_SKILL_BASE_URL}/api/skills/text-stats",
            http_method="POST",
            timeout="PT30S",
            batch_size=10,
            context="/document",
            inputs=[
                InputFieldMappingEntry(name="text", source="/document/content")
            ],
            outputs=[
                OutputFieldMappingEntry(name="wordCount", target_name="wordCount"),
                OutputFieldMappingEntry(name="sentenceCount", target_name="sentenceCount")
            ]
        ),
        
        # Skill 2: Keyword Extraction
        WebApiSkill(
            name="keywords-skill",
            description="Extracts keywords from document content",
            uri=f"{CUSTOM_SKILL_BASE_URL}/api/skills/extract-keywords",
            http_method="POST",
            timeout="PT30S",
            batch_size=10,
            context="/document",
            inputs=[
                InputFieldMappingEntry(name="text", source="/document/content")
            ],
            outputs=[
                OutputFieldMappingEntry(name="keywords", target_name="keywords")
            ]
        ),
        
        # Skill 3: Sentiment Analysis
        WebApiSkill(
            name="sentiment-skill",
            description="Analyzes document sentiment",
            uri=f"{CUSTOM_SKILL_BASE_URL}/api/skills/analyze-sentiment",
            http_method="POST",
            timeout="PT30S",
            batch_size=10,
            context="/document",
            inputs=[
                InputFieldMappingEntry(name="text", source="/document/content")
            ],
            outputs=[
                OutputFieldMappingEntry(name="sentiment", target_name="sentiment"),
                OutputFieldMappingEntry(name="score", target_name="sentimentScore")
            ]
        ),
        
        # Skill 4: Summarization
        WebApiSkill(
            name="summarize-skill",
            description="Creates an extractive summary",
            uri=f"{CUSTOM_SKILL_BASE_URL}/api/skills/summarize",
            http_method="POST",
            timeout="PT30S",
            batch_size=10,
            context="/document",
            inputs=[
                InputFieldMappingEntry(name="text", source="/document/content")
            ],
            outputs=[
                OutputFieldMappingEntry(name="summary", target_name="summary")
            ]
        )
    ]
)

# Delete existing skillset if it exists
try:
    indexer_client.delete_skillset(SKILLSET_NAME)
    print(f"üóëÔ∏è Deleted existing skillset '{SKILLSET_NAME}'")
except Exception:
    pass

# Create the skillset
result = indexer_client.create_skillset(skillset)
print(f"‚úÖ Created skillset '{result.name}' with {len(result.skills)} skills:")
for skill in result.skills:
    print(f"   ‚Ä¢ {skill.name}: {skill.description}")

## 7. Create Indexer with Field Mappings

Create an indexer that:
1. Reads PDF files from the data source
2. Extracts text content (document cracking)
3. Applies the custom skills from the skillset
4. Maps enriched fields to the search index

In [None]:
# Create the indexer
indexer = SearchIndexer(
    name=INDEXER_NAME,
    description="Indexer for PDF documents with custom skill enrichment",
    data_source_name=DATA_SOURCE_NAME,
    target_index_name=INDEX_NAME,
    skillset_name=SKILLSET_NAME,
    
    # Field mappings: source data -> index fields
    field_mappings=[
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="id"),
        FieldMapping(source_field_name="metadata_storage_name", target_field_name="metadata_storage_name"),
        FieldMapping(source_field_name="metadata_storage_path", target_field_name="metadata_storage_path")
    ],
    
    # Output field mappings: skill outputs -> index fields
    output_field_mappings=[
        FieldMapping(source_field_name="/document/wordCount", target_field_name="wordCount"),
        FieldMapping(source_field_name="/document/sentenceCount", target_field_name="sentenceCount"),
        FieldMapping(source_field_name="/document/keywords", target_field_name="keywords"),
        FieldMapping(source_field_name="/document/sentiment", target_field_name="sentiment"),
        FieldMapping(source_field_name="/document/sentimentScore", target_field_name="sentimentScore"),
        FieldMapping(source_field_name="/document/summary", target_field_name="summary")
    ],
    
    # Indexing parameters
    parameters=IndexingParameters(
        configuration=IndexingParametersConfiguration(
            parsing_mode="default",  # Use default for PDF cracking
            data_to_extract="contentAndMetadata"
        )
    )
)

# Delete existing indexer if it exists
try:
    indexer_client.delete_indexer(INDEXER_NAME)
    print(f"üóëÔ∏è Deleted existing indexer '{INDEXER_NAME}'")
except Exception:
    pass

# Create the indexer
result = indexer_client.create_indexer(indexer)
print(f"‚úÖ Created indexer '{result.name}'")
print(f"   Data Source: {result.data_source_name}")
print(f"   Target Index: {result.target_index_name}")
print(f"   Skillset: {result.skillset_name}")
print(f"   Field Mappings: {len(result.field_mappings)}")
print(f"   Output Field Mappings: {len(result.output_field_mappings)}")

## 8. Run Indexer and Monitor Status

Execute the indexer to process PDF documents and wait for completion.

In [None]:
import time

# First verify the indexer exists
try:
    indexer_client.get_indexer(INDEXER_NAME)
    print(f"‚úÖ Found indexer '{INDEXER_NAME}'")
except Exception as e:
    print(f"‚ùå Indexer '{INDEXER_NAME}' not found. Please run the 'Create Indexer' cell first.")
    print(f"   Error: {e}")
    raise

# Run the indexer
print("üöÄ Running indexer...")
indexer_client.run_indexer(INDEXER_NAME)

# Wait and check status
last_result = None
for i in range(10):
    time.sleep(2)
    status = indexer_client.get_indexer_status(INDEXER_NAME)
    last_result = status.last_result
    
    if last_result:
        print(f"   Status: {last_result.status}")
        if last_result.status in ["success", "transientFailure", "reset"]:
            break
        if last_result.status == "inProgress":
            # Azure SDK uses item_count, not items_processed
            print(f"   Items processed: {last_result.item_count or 0}")
    else:
        print(f"   Waiting for indexer to start... ({i+1}/10)")

# Display final status
if last_result:
    print(f"\nüìä Indexer Execution Results:")
    print(f"   Status: {last_result.status}")
    print(f"   Start Time: {last_result.start_time}")
    print(f"   End Time: {last_result.end_time}")
    # Azure SDK uses item_count and failed_item_count
    print(f"   Items Processed: {last_result.item_count}")
    print(f"   Items Failed: {last_result.failed_item_count}")
    
    if last_result.errors:
        print(f"\n‚ö†Ô∏è Errors:")
        for error in last_result.errors[:5]:
            print(f"   ‚Ä¢ {error.message}")

## 9. Alternative: Push Documents Directly (Without Indexer)

If the indexer isn't available or you prefer the **push model**, you can upload documents directly to the index.

In [117]:
# Create a search client for document operations (using official SDK)
search_client = SearchClient(
    endpoint=SEARCH_ENDPOINT,
    index_name=INDEX_NAME,
    credential=admin_credential,
    transport=transport,
    connection_verify=False
)

# Sample documents to upload (simulating enriched content)
sample_documents = [
    {
        "id": "doc-handbook",
        "title": "Employee Handbook",
        "content": "Welcome to our company! This handbook covers company policies, employee benefits, vacation policies, and workplace guidelines. All employees are expected to follow these guidelines. Our company values integrity, teamwork, and innovation.",
        "category": "HR",
        "department": "Human Resources",
        "metadata_storage_path": "/documents/employee-handbook.pdf",
        "metadata_storage_name": "employee-handbook.pdf",
        "wordCount": 42,
        "sentenceCount": 4,
        "keywords": ["company", "policies", "employee", "benefits", "handbook"],
        "sentiment": "positive",
        "sentimentScore": 0.75,
        "summary": "This handbook covers company policies, employee benefits, vacation policies, and workplace guidelines."
    },
    {
        "id": "doc-benefits",
        "title": "Health Plan Benefits",
        "content": "Our health plan provides comprehensive medical coverage including dental and vision. Employees can choose from multiple plan options. Coverage begins on your first day of employment. Family coverage is also available at competitive rates.",
        "category": "Benefits",
        "department": "Human Resources",
        "metadata_storage_path": "/documents/health-plan.pdf",
        "metadata_storage_name": "health-plan.pdf",
        "wordCount": 38,
        "sentenceCount": 4,
        "keywords": ["health", "plan", "coverage", "medical", "dental", "vision"],
        "sentiment": "positive",
        "sentimentScore": 0.82,
        "summary": "Our health plan provides comprehensive medical coverage including dental and vision."
    },
    {
        "id": "doc-product-manual",
        "title": "Product User Manual",
        "content": "This manual provides instructions for using our product safely and effectively. Please read all warnings before operating the device. The product comes with a one-year warranty. Contact support for any technical issues.",
        "category": "Documentation",
        "department": "Product",
        "metadata_storage_path": "/documents/product-manual.pdf",
        "metadata_storage_name": "product-manual.pdf",
        "wordCount": 35,
        "sentenceCount": 4,
        "keywords": ["manual", "product", "instructions", "warranty", "support"],
        "sentiment": "neutral",
        "sentimentScore": 0.55,
        "summary": "This manual provides instructions for using our product safely and effectively."
    }
]

# Upload documents using official SDK
result = search_client.upload_documents(documents=sample_documents)
print(f"‚úÖ Uploaded {len(sample_documents)} documents to index '{INDEX_NAME}'")

# Display upload results
for r in result:
    status = "‚úì" if r.succeeded else "‚úó"
    print(f"   {status} {r.key}: {r.status_code}")

‚úÖ Uploaded 3 documents to index 'pdf-documents'
   ‚úì doc-handbook: 200
   ‚úì doc-benefits: 200
   ‚úì doc-product-manual: 200


## 10. Test Search Queries

Now let's search the index with various query types!

In [None]:
# Helper function to display search results
def display_results(results, query_description):
    """Display search results in a formatted table."""
    print(f"\nüîç {query_description}")
    print("=" * 60)
    
    docs = []
    for result in results:
        docs.append({
            "Score": f"{result['@search.score']:.4f}" if '@search.score' in result else "N/A",
            "Title": result.get("title", "N/A"),
            "Category": result.get("category", "N/A"),
            "Sentiment": result.get("sentiment", "N/A"),
            "Words": result.get("wordCount", "N/A")
        })
    
    if docs:
        df = pd.DataFrame(docs)
        display(df)
        print(f"\nüìÑ Found {len(docs)} documents")
    else:
        print("No results found.")
    
    return docs

### 10.1 Simple Text Search

Search for documents containing specific keywords.

In [None]:
# Simple text search
results = search_client.search(
    search_text="employee benefits",
    include_total_count=True
)

display_results(results, "Search: 'employee benefits'")

### 10.2 Filtered Search

Search with OData filter expressions.

In [None]:
# Filtered search - find HR documents with positive sentiment
results = search_client.search(
    search_text="*",
    filter="department eq 'Human Resources' and sentiment eq 'positive'",
    include_total_count=True
)

display_results(results, "Filter: department='Human Resources' AND sentiment='positive'")

### 10.3 Faceted Search

Get facet counts for categories and sentiments.

In [None]:
# Faceted search
results = search_client.search(
    search_text="*",
    facets=["category", "department", "sentiment"],
    include_total_count=True
)

# Convert to list to get facets
results_list = list(results)

print("üìä Facet Results")
print("=" * 60)

# Display facets
facets = results.get_facets()
if facets:
    for facet_name, facet_values in facets.items():
        print(f"\nüìå {facet_name}:")
        for fv in facet_values:
            print(f"   ‚Ä¢ {fv['value']}: {fv['count']} documents")
else:
    print("No facets returned")

### 10.4 Search with Sorting

Sort results by sentiment score (descending).

In [None]:
# Search with sorting by sentiment score
results = search_client.search(
    search_text="*",
    order_by=["sentimentScore desc"],
    select=["title", "category", "sentiment", "sentimentScore", "wordCount"],
    include_total_count=True
)

print("üîç All Documents Sorted by Sentiment Score (Highest First)")
print("=" * 60)

docs = []
for result in results:
    docs.append({
        "Title": result.get("title"),
        "Category": result.get("category"),
        "Sentiment": result.get("sentiment"),
        "Score": f"{result.get('sentimentScore', 0):.2f}",
        "Words": result.get("wordCount")
    })

df = pd.DataFrame(docs)
display(df)

### 10.5 Search with Highlighting

Get search results with hit highlighting to show matching terms.

In [None]:
# Search with highlighting
results = search_client.search(
    search_text="coverage health",
    highlight_fields="content,summary",
    highlight_pre_tag="<mark>",
    highlight_post_tag="</mark>",
    include_total_count=True
)

print("üîç Search: 'coverage health' with Highlighting")
print("=" * 60)

for result in results:
    print(f"\nüìÑ {result.get('title')}")
    print(f"   Score: {result.get('@search.score', 'N/A')}")
    
    # Display highlights if available
    highlights = result.get("@search.highlights", {})
    if highlights:
        for field, snippets in highlights.items():
            print(f"   {field} highlights:")
            for snippet in snippets[:2]:
                # Convert HTML marks to console-friendly format
                display_snippet = snippet.replace("<mark>", "**").replace("</mark>", "**")
                print(f"      ‚Ä¢ ...{display_snippet}...")
    else:
        # Show content preview
        content = result.get("content", "")[:150]
        print(f"   Preview: {content}...")

## 11. View Document Details

Retrieve a specific document by its key.

In [None]:
# Get a specific document by key
doc_id = "doc-handbook"
document = search_client.get_document(key=doc_id)

print(f"üìÑ Document: {doc_id}")
print("=" * 60)

# Display as formatted JSON
for key, value in document.items():
    if key == "content":
        # Truncate long content
        print(f"  {key}: {str(value)[:100]}...")
    elif isinstance(value, list):
        print(f"  {key}: {value}")
    else:
        print(f"  {key}: {value}")

## 12. Cleanup (Optional)

Delete the resources created in this notebook.

In [None]:
# Uncomment to delete all resources created in this notebook

# Delete indexer
# try:
#     indexer_client.delete_indexer(INDEXER_NAME)
#     print(f"üóëÔ∏è Deleted indexer '{INDEXER_NAME}'")
# except Exception as e:
#     print(f"‚ö†Ô∏è Could not delete indexer: {e}")

# Delete skillset
# try:
#     indexer_client.delete_skillset(SKILLSET_NAME)
#     print(f"üóëÔ∏è Deleted skillset '{SKILLSET_NAME}'")
# except Exception as e:
#     print(f"‚ö†Ô∏è Could not delete skillset: {e}")

# Delete data source
# try:
#     indexer_client.delete_data_source_connection(DATA_SOURCE_NAME)
#     print(f"üóëÔ∏è Deleted data source '{DATA_SOURCE_NAME}'")
# except Exception as e:
#     print(f"‚ö†Ô∏è Could not delete data source: {e}")

# Delete index
# try:
#     index_client.delete_index(INDEX_NAME)
#     print(f"üóëÔ∏è Deleted index '{INDEX_NAME}'")
# except Exception as e:
#     print(f"‚ö†Ô∏è Could not delete index: {e}")

print("üí° Uncomment the code above to clean up resources")

## Summary

This notebook demonstrated:

‚úÖ **Azure AI Search Python SDK** usage with the local simulator  
‚úÖ **Index creation** with complex field types and analyzers  
‚úÖ **Data source configuration** for local file system  
‚úÖ **Custom Web API Skills** integration for document enrichment  
‚úÖ **Indexer setup** with field mappings and output field mappings  
‚úÖ **Document upload** using the push model  
‚úÖ **Search queries**: simple, filtered, faceted, sorted, highlighted  

## Next Steps

- Try the **pull model** with actual PDF documents
- Add **vector search** with Azure OpenAI embeddings
- Explore **hybrid search** (text + vector)
- Build a custom skill for your specific use case

## Resources

- [Azure AI Search Documentation](https://learn.microsoft.com/azure/search/)
- [Azure AI Search Python SDK](https://learn.microsoft.com/python/api/overview/azure/search-documents-readme)
- [Custom Skills Sample](../CustomSkillSample/README.md)
- [Simulator Limitations](../../docs/LIMITATIONS.md)