## **Building Your Coded Policy Knowledge Search Store 🚀**

In many healthcare systems, policy documents such as pre-authorization guidelines are still trapped in static, scanned PDFs. These documents are critical—they contain ICD codes, drug name coverage, and payer-specific logic—but are rarely structured or accessible in real-time. This use case demonstrates how to transform these documents into intelligent, searchable knowledge stores using Azure AI Search.

## **📝 What You’ll Build**

This use case focuses on creating a **Coded Policy Knowledge Store** that enables real-time querying and retrieval of critical policy information. By the end of this tutorial, you will have built a pipeline that:

<img src="..\utils\images\Turning Policies into Knowledge Store.png" style="display: block; margin: 20px auto; border-radius: 15px; max-width: 80%; height: auto;" alt="Turning Policies into Knowledge Store" />

1. **Preprocesses Policy Documents**:
   - Extracts text and metadata from scanned PDFs using OCR (Optical Character Recognition).
   - Cleans and organizes the extracted data for indexing.

2. **Extracts Key Metadata**:
   - Identifies and structures critical fields such as:
     - **Policy Name**
     - **Payer Name**
     - **Drug Names**
     - **Medical Specialties**
     - **Covered Diseases**
     - **ICD Codes**

3. **Indexes Data into Azure AI Search**:
   - Converts the structured data into a machine-readable format.
   - Indexes the data into Azure AI Search for fast and intelligent retrieval.

4. **Enables Real-Time AI-Assisted Querying**:
   - Allows users to query the knowledge store for:
     - ICD codes
     - Payer-specific logic
     - Drug name coverage
     - Policy details

## **💡 Why It Matters**

- **Streamlines Workflows**:  
   Automates the extraction and structuring of policy data, reducing manual effort for providers and payors.

- **Improves Transparency**:  
   Makes policy details easily accessible, enabling faster decision-making and reducing errors.

- **Enhances Querying Capabilities**:  
   Provides AI-assisted querying for ICD codes, drug names, and payer-specific logic, improving efficiency and accuracy.


## 📚 Our Approach

We will walk through the following key steps:

1. **Create an Index:** Create an Index 
2. **Context Engineering:**: Apply contnext emegeunt befoe indnecin in yej scase emanaing to extarct the ICd c
3. **Indexing Patterns:** Difiennt idnnein patwtrs with psuh api or Indexer
4. **Retrieval Strategies:** Difientin reitieva stagete spepcilay tkain afbabaat eof the coode pdodlcit to didllte ot to ammaxini the reitev stragy ina ny agent sustem wiht lates od confnfiencne


## 1. Creating the Index 

In [3]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes import SearchIndexClient, SearchIndexerClient
from azure.search.documents.indexes.models import (
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    HnswAlgorithmConfiguration,
    HnswParameters,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SemanticConfiguration,
    SemanticField,
    SemanticPrioritizedFields,
    SemanticSearch,
    VectorSearch,
    VectorSearchProfile,
)

from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Define the target directory
target_directory = os.getcwd()  # Get the current working directory

# Move one directory back
parent_directory = os.path.dirname(target_directory)

# Check if the parent directory exists
if os.path.exists(parent_directory):
    # Change the current working directory to the parent directory
    os.chdir(parent_directory)
    print(f"Directory changed to {os.getcwd()}")
else:
    print(f"Parent directory {parent_directory} does not exist.")

Directory changed to c:\Users\pablosal\Desktop\aihlsignited-medindexer


In [4]:
# Set the service endpoint and API key from the environment
# Create an SDK client
endpoint = os.environ["AZURE_AI_SEARCH_SERVICE_ENDPOINT"]

admin_documents_index_client = SearchIndexClient(
    endpoint=endpoint,
    index_name=os.environ["AZURE_SEARCH_INDEX_NAME"], # ai-policies-v2-index
    credential=AzureKeyCredential(os.environ["AZURE_SEARCH_ADMIN_KEY"]),
)

In [5]:
# Define the schema for the Azure AI Search index used in the Policy Knowledge Store
fields = [
    # Unique identifier for the parent document (e.g., the full policy file)
    SearchField(
        name="parent_id",
        type=SearchFieldDataType.String,
        sortable=True,
        filterable=True,
        facetable=True,
    ),

    # Path to the original policy file in Azure Blob Storage
    SearchField(
        name="parent_path",
        type=SearchFieldDataType.String,
        filterable=True,
        facetable=False,
    ),

    # Title or name of the policy document
    SearchField(
        name="policy_name",
        type=SearchFieldDataType.String,
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # Name of the payer organization (e.g., Aetna, Cigna)
    SearchField(
        name="payer_name",
        type=SearchFieldDataType.String,
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # List of drug names mentioned in the policy
    SearchField(
        name="drug_names",
        type=SearchFieldDataType.Collection(SearchFieldDataType.String),
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # Medical specialties the policy applies to (e.g., oncology, nephrology)
    SearchField(
        name="medical_specialties",
        type=SearchFieldDataType.Collection(SearchFieldDataType.String),
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # Diseases or conditions covered in the policy
    SearchField(
        name="covered_diseases",
        type=SearchFieldDataType.Collection(SearchFieldDataType.String),
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # ICD codes corresponding to the diseases
    SearchField(
        name="covered_diseases_icd_codes",
        type=SearchFieldDataType.Collection(SearchFieldDataType.String),
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # Drug codes referenced in the policy (e.g., NDC, RxNorm)
    SearchField(
        name="covered_drug_codes",
        type=SearchFieldDataType.Collection(SearchFieldDataType.String),
        searchable=True,
        filterable=True,
        facetable=True,
    ),

    # Unique identifier for each chunk of the policy
    SearchField(
        name="chunk_id",
        type=SearchFieldDataType.String,
        key=True,
        sortable=True,
        filterable=True,
        facetable=True,
        analyzer_name="keyword",
    ),

    # Chunked content from the policy document (e.g., sections or paragraphs)
    SearchField(
        name="chunk",
        type=SearchFieldDataType.String,
        searchable=True,
        sortable=False,
        filterable=False,
        facetable=False,
    ),

    # Vector representation of the chunk, used for semantic search
    SearchField(
        name="vector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        vector_search_dimensions=3072, # This depends on the model used, we are using embeddign large with 3072 dimensions
        vector_search_profile_name="myHnswProfile",
    ),
]


In [6]:
# Configure the vector search for policy chunk indexing and retrieval
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="myHnsw",  # Name of the HNSW algorithm configuration
            parameters=HnswParameters(
                m=5,  # Number of bi-directional links per element
                ef_construction=300,  # Size of the dynamic candidate list during index construction
                ef_search=400,  # Size of the dynamic candidate list during querying
            ),
        ),
    ],
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",  # Profile name referenced in the index's vector fields
            algorithm_configuration_name="myHnsw",  # Links to the HNSW algorithm configuration
            vectorizer_name="myOpenAIVectorizer",  # Associates with the defined vectorizer
        )
    ],
    vectorizers=[
        AzureOpenAIVectorizer(
            vectorizer_name="myOpenAIVectorizer",  # Name of the vectorizer
            parameters=AzureOpenAIVectorizerParameters(
                resource_url=os.environ['AZURE_OPENAI_ENDPOINT'],  # Azure OpenAI resource endpoint
                deployment_name=os.environ['AZURE_OPENAI_EMBEDDING_DEPLOYMENT'],  # Deployment ID of the embedding model
                model_name=os.environ['AZURE_OPENAI_EMBEDDING_DEPLOYMENT'],  # Name of the embedding model
                api_key=os.environ['AZURE_OPENAI_KEY'],  # API key for authentication
            ),
        ),
    ],
)


In [7]:
# Configure semantic search for the policy knowledge store index
semantic_config_policy_index = SemanticConfiguration(
    name="policy-index-semantic-config",  # Give a descriptive name aligned with your index
    prioritized_fields=SemanticPrioritizedFields(
        # Use policy_name as the semantic title for better highlights in results
        title_field=SemanticField(field_name="policy_name"),

        # Use payer_name, medical_specialties, and drug_names as keyword hints
        keywords_fields=[
            SemanticField(field_name="payer_name"),
            SemanticField(field_name="medical_specialties"),
            SemanticField(field_name="drug_names"),
        ],

        # Chunk is your primary content for retrieval and semantic relevance
        content_fields=[
            SemanticField(field_name="chunk"),
        ],
    )
)

# Wrap into the semantic search settings
semantic_search_policy = SemanticSearch(
    configurations=[semantic_config_policy_index]
)


In [8]:
index = SearchIndex(
    name=os.environ["AZURE_SEARCH_INDEX_NAME"],
    fields=fields,
    vector_search=vector_search,
    semantic_search=semantic_search_policy,
)

try:
    result = admin_documents_index_client.create_or_update_index(index)
    print("Index", result.name, "created")
except Exception as ex:
    print("Error creating index:", ex)


Error creating index: <urllib3.connection.HTTPSConnection object at 0x000002179C0BD4E0>: Failed to resolve 'search-ai-factory-centralus.search.windows.net' ([Errno 11002] getaddrinfo failed)


## **2. Context Engineering**

Extract Metadata...

- **Policy Name**: Extracted from document title or headers (e.g., "Antiseizure Medications – Epidiolex Prior Authorization Policy").
- **Payer Name**: Identifies which insurance company issued the policy (e.g., Cigna, UnitedHealthcare).
- **Drug Name(s)**: The medications referenced in the policy (e.g., Epidiolex, Dupixent).
- **Medical Specialties Involved**: Identifies if a policy is specific to Rheumatology, Neurology, Oncology, etc.
- **Indications & Diseases Covered**: Disease categories linked to a policy (e.g., Crohn’s disease, epilepsy)

In [10]:
from src.documentintelligence.document_intelligence_helper import AzureDocumentIntelligenceManager

text_extractor = AzureDocumentIntelligenceManager()

policy_raw_text_markdown = text_extractor.analyze_document(document_input="https://storageaeastusfactory.blob.core.windows.net/pre-auth-policies/policies_ocr/001.pdf", 
                                model_type="prebuilt-layout")

2025-04-02 22:45:10,751 - micro - MainProcess - INFO     Container 'pre-auth-policies' already exists. (blob_helper.py:_create_container_if_not_exists:89)
2025-04-02 22:45:10,754 - micro - MainProcess - INFO     Blob URL detected. Extracting content. (document_intelligence_helper.py:analyze_document:78)
2025-04-02 22:45:24,194 - micro - MainProcess - INFO     Downloaded blob 'policies_ocr/001.pdf' as bytes. (blob_helper.py:download_blob_to_bytes:311)


In [9]:
import json
from typing import List, Dict, Optional
from pydantic import BaseModel
import openai
import os
import time
import requests
from azure.core.credentials import AzureKeyCredential
from utils.ml_logging import get_logger

logger = get_logger()

# --------------------------------------------------------------------------
# Configure the Azure OpenAI client using key-based authentication.
# --------------------------------------------------------------------------
client = openai.AzureOpenAI(
    api_version="2024-12-01-preview",
    azure_endpoint="https://pablo-m2areked-westeurope.cognitiveservices.azure.com",
    azure_deployment="gpt-4o-structured-outputs",
    api_key=os.getenv("AZURE_OPENAI_KEY_StructuredOutputs")
    )  # Ensure this env var is set.

model_name = "gpt-4o-structured-outputs"

In [None]:
# ----------------- DEFINE POLICY METADATA MODEL -----------------
class PolicyMetadata(BaseModel):
    policy_name: str
    payer_name: str
    drug_names: List[str]
    medical_specialties: List[str]
    indications_diseases: List[str]
    covered_diseases_icd_codes: Optional[List[str]]
    covered_drug_codes: Optional[List[str]]

    class Config:
        extra = "forbid"

# ----------------- ICD-10 LOOKUP FUNCTION -----------------
def lookup_icd_codes_for_disease(disease: str, max_retries: int = 2) -> List[str]:
    """Fetches up to three ICD-10 codes for a given disease with retries on failure."""

    if not isinstance(disease, str) or not disease.strip():
        logger.error("Invalid disease parameter provided.")
        return []

    url = "https://clinicaltables.nlm.nih.gov/api/icd10cm/v3/search"
    params = {"sf": "code,name", "terms": disease, "maxList": 3}

    for attempt in range(max_retries + 1):
        try:
            logger.info(f"[Attempt {attempt+1}] Fetching ICD-10 codes for disease: '{disease}' with params: {params}")
            resp = requests.get(url, params=params, timeout=10)
            resp.raise_for_status()

            if 'application/json' not in resp.headers.get('Content-Type', ''):
                logger.error(f"Unexpected content type: {resp.headers.get('Content-Type')}")
                return []

            data = resp.json()
            logger.debug(f"Full ICD-10 API Response: {data}")

            # ✅ Extract only the ICD-10 codes and ensure they are valid
            icd_codes = [item for item in data[1] if isinstance(item, str) and len(item) > 3]

            if icd_codes:
                logger.info(f"ICD-10 Codes Found for '{disease}': {icd_codes}")
                return icd_codes
            else:
                logger.warning(f"No valid ICD-10 codes found for '{disease}'.")
                return []

        except requests.RequestException as e:
            logger.error(f"ICD lookup failed for '{disease}' on attempt {attempt+1}: {e}")
            if attempt < max_retries:
                logger.info(f"Retrying ICD lookup for '{disease}'...")
                time.sleep(2)  # Wait before retrying

    logger.error(f"ICD lookup ultimately failed for '{disease}' after {max_retries+1} attempts.")
    return []


# ----------------- RXNORM LOOKUP FUNCTION -----------------
def lookup_drug_details(drug: str, max_retries: int = 2) -> List[str]:
    """Fetches RxNorm IDs for TAH-validated drugs with retries on failure."""
    
    if not isinstance(drug, str) or not drug.strip():
        logger.error("Invalid drug parameter provided.")
        return []

    url = "https://rxnav.nlm.nih.gov/REST/rxcui.json"
    params = {"name": drug}

    for attempt in range(max_retries + 1):
        try:
            logger.info(f"[Attempt {attempt+1}] Fetching RxNorm ID for drug: '{drug}' with params: {params}")
            resp = requests.get(url, params=params, timeout=10)
            resp.raise_for_status()

            if 'application/json' not in resp.headers.get('Content-Type', ''):
                logger.error(f"Unexpected content type: {resp.headers.get('Content-Type')}")
                return []

            data = resp.json()
            logger.debug(f"Full RxNorm API Response: {data}")

            # Extract only the RxNorm IDs
            rxnorm_ids = data.get("idGroup", {}).get("rxnormId", [])

            if rxnorm_ids:
                logger.info(f"RxNorm IDs Found for '{drug}': {rxnorm_ids}")
                return rxnorm_ids
            else:
                logger.warning(f"No RxNorm ID found for '{drug}'.")
                return []

        except requests.RequestException as e:
            logger.error(f"Drug lookup failed for '{drug}' on attempt {attempt+1}: {e}")
            if attempt < max_retries:
                logger.info(f"Retrying RxNorm lookup for '{drug}'...")
                time.sleep(2)  # Wait before retrying

    logger.error(f"RxNorm lookup ultimately failed for '{drug}' after {max_retries+1} attempts.")
    return []


# ----------------- FINAL ENRICHMENT FUNCTION -----------------
def enrich_metadata(metadata: PolicyMetadata) -> PolicyMetadata:
    """Enrich extracted metadata with ICD-10 and RxNorm codes, only for TAH-validated terms."""
    enriched = metadata.model_copy()
    enriched.covered_diseases_icd_codes = [
        code for disease in metadata.indications_diseases
        for code in lookup_icd_codes_for_disease(disease)
    ]
    enriched.covered_drug_codes = [
        code for drug in metadata.drug_names
        for code in lookup_drug_details(drug)
    ]
    return enriched
# --------------------------------------------------------------------------
# Extraction Phase: Use Azure OpenAI to parse policy text into the above schema.
# --------------------------------------------------------------------------
# Optimized function for metadata extraction
def extract_policy_metadata(policy_text: str) -> PolicyMetadata:
    messages = [
        {
            "role": "system",
            "content": (
                "You are an advanced AI system specializing in extracting **structured metadata** from clinical policy documents. "
                "Your goal is to achieve **100% accuracy** by leveraging **Tree of Thought reasoning, multi-step validation techniques, "
                "and automated normalization of payer names, drug names, and medical conditions**.\n\n"

                "### **1️⃣ Extract Policy Name (`policy_name`)** 📄\n"
                "- **Identify the official policy title** from document **headers, footers, or the first paragraph**.\n"
                "- **Ensure Standard Formatting:** Convert extracted titles into a **consistent naming format**.\n"
                "  - **Example Input (Raw OCR Extracted Text):**\n"
                "    - 'DUPIXENT (DUPILUMAB) - PRIOR AUTHORIZATION POLICY'\n"
                "    - 'Anthem BCBS Policy 2024: Prior Authorization - Dupixent'\n"
                "  - **Expected Standard Output:**\n"
                "    - 'Dupixent Prior Authorization Policy'\n\n"

                "2️⃣ **Payer Name (payer_name) - AUTOMATIC NORMALIZATION ENABLED:**\n"
                "   - Locate in **headers, footers, disclaimers, or embedded watermarks**.\n"
                "   - Normalize using the following standardized mapping:\n"
                "     - 'Cigna', 'Cigna Healthcare', 'Cigna Corp' → 'Cigna'\n"
                "     - 'Humana', 'Humana Inc', 'Humana Health Plan' → 'Humana'\n"
                "     - 'United Healthcare', 'United Health Care', 'UHC' → 'UnitedHealthcare'\n"
                "     - 'Anthem Blue Cross Blue Shield' → 'Anthem BCBS'\n"
                "     - 'Blue Cross Blue Shield' → 'BCBS'\n"
                "     - 'Kaiser Permanente' → 'Kaiser Permanente'\n"
                "     - 'Aetna', 'Aetna Health Inc', 'Aetna Insurance' → 'Aetna'\n"
                "     - 'WellCare Health Plans' → 'WellCare'\n"
                "     - 'Medicare Advantage' → 'Medicare'\n"
                "     - 'Medicaid' → 'Medicaid'\n"
                "     - 'MVP Health Care' → 'MVP HealthCare'\n"
                "     - 'HealthFirst' → 'HealthFirst'\n"
                "     - 'Molina Healthcare' → 'Molina Healthcare'\n"
                "     - 'Centene Corporation' → 'Centene'\n"
                "     - 'Blue Shield of California' → 'Blue Shield of California'\n"
                "     - 'Empire Blue Cross Blue Shield' → 'Empire BCBS'\n"
                "     - 'Horizon Blue Cross Blue Shield' → 'Horizon BCBS'\n"

                "3️⃣ **Drug Names (drug_names) - AUTOMATIC NORMALIZATION ENABLED:**\n"
                "   - Extract **all medications mentioned in the policy**.\n"
                "   - Include **both brand and generic names**.\n"
                "   - Normalize using **RxNorm drug database standards**.\n"

                "4️⃣ **Medical Specialties (medical_specialties):**\n"
                "   - Identify relevant **clinical specialties** (e.g., Neurology, Rheumatology, Oncology).\n"
                "   - Cross-check with **medical board designations** to avoid ambiguity.\n"

                "5️⃣ **Indications & Diseases (indications_diseases) - AUTOMATIC NORMALIZATION ENABLED:**\n"
                "   - Extract **every disease, condition, or indication mentioned**.\n"
                "   - Normalize to **ICD-10 classification**.\n"
                "   - Check **eligibility, exclusions, and coverage sections** for additional conditions.\n\n"

                "### **General Guidelines:**\n"
                "- If a field is missing, return an **empty string (`""`)** for text fields or an **empty list (`[]`)** for arrays.\n"
                "- Strictly **follow the JSON schema** without adding extra keys.\n"
                "- **Cross-validate extracted information** across multiple sections to prevent errors.\n"
            )
        },
        {
            "role": "user",
            "content": (
                "Extract and normalize structured metadata from the following insurance policy document:\n\n"
                f"{policy_text}\n\n"
                "### **Output Schema (Strict JSON Format):**\n"
                "{\n"
                '  "policy_name": "The official title of the prior authorization policy.",\n'
                '  "payer_name": "The name of the insurance company, automatically normalized.",\n'
                '  "drug_names": ["List of all referenced drugs, including brand and generic, automatically normalized."],\n'
                '  "medical_specialties": ["List of relevant clinical specialties (e.g., Neurology, Oncology)."],\n'
                '  "indications_diseases": ["List of all conditions covered under this policy, normalized to ICD-10 standards."]\n'
                "}"
            )
        }
    ]

    try:
        response = client.beta.chat.completions.parse(
            model=model_name,
            messages=messages,
            response_format=PolicyMetadata
        )
        metadata = response.choices[0].message.parsed

        # Enrich metadata with ICD-10 and RxNorm codes
        enriched_metadata = enrich_metadata(metadata)
        enriched = enriched_metadata.model_dump()

        return enriched
    
    except Exception as e:
        logger.error("Error during extraction: %s", e)
        raise

In [None]:
extract_policy_metadata = extract_policy_metadata(policy_raw_text_markdown.content)
extract_policy_metadata

## **3. Indexing Patterns**

## **4. Retrieval Strategies**

In [None]:
# Set up Azure Cognitive Search credentials
service_endpoint = os.getenv("AZURE_AI_SEARCH_SERVICE_ENDPOINT")
key = os.getenv("AZURE_SEARCH_ADMIN_KEY")
credential = AzureKeyCredential(key)

# Define the name of the Azure Search index
# This is the index where your data is stored in Azure Search
index_name = os.getenv("AZURE_SEARCH_INDEX_NAME")

# Set up the Azure Search client with the specified index
# This prepares the client to interact with the Azure Search service
search_client = SearchClient(service_endpoint, index_name, credential=credential)

# Set the service endpoint and API key from the environment
# Create an SDK client
from src.aoai.aoai_helper import AzureOpenAIManager
aoai_client = AzureOpenAIManager()

search_query = "What is the prior authorization policy for Inflammatory Conditions?"
search_vector = aoai_client.generate_embedding(search_query)


This method uses the `@search.rerankerScore` parameter and a semantic ranking algorithm for scoring. Semantic ranking is a method that uses machine learning models to understand the semantic content of the queries and documents, and ranks the documents based on their relevance to the query. The scoring range is 0.00 - 4.00 in this method.

Remember, a higher score indicates a higher relevance of the document to the query.

In [None]:
# Hybrid retrieval + rerank
r = search_client.search(
    search_text=search_query,
    top=5,
    vector_queries=[
        VectorizedQuery(vector=search_vector.data[0].embedding, exhaustive=True, k_nearest_neighbors=50, fields="vector", weight=0.5),
    ],
    query_type=QueryType.SEMANTIC,
    semantic_configuration_name="policy-index-semantic-config",
    query_language="en-us",
    query_caption=QueryCaptionType.EXTRACTIVE,
    query_answer=QueryAnswerType.EXTRACTIVE,
)

# Iterate through the search results and print all metadata
for doc in r:
    content = doc["chunk"].replace("\n", " ")[:1000]
    print(
        f"score: {doc['@search.score']}, reranker: {doc['@search.reranker_score']}. {content}"
    )

### **Using Filters**

The filter expression in the search query is used to narrow down the results based on specific conditions:

1. **`payer_name eq 'Cigna'`**  
   - This condition filters documents where the `payer_name` field is exactly `'Cigna'`.  
   - It ensures that only documents related to the payer `'Cigna'` are included in the search results.

2. **`covered_diseases_icd_codes/any(c: c eq 'M45.6')`**  
   - This condition filters documents where the `covered_diseases_icd_codes` collection contains the value `'M45.6'`.  
   - The `any()` function is used to check if any element in the `covered_diseases_icd_codes` array matches the specified value.  
   - In this case, it ensures that only documents related to the ICD code `'M45.6'` (e.g., Ankylosing Spondylitis) are included.

3. **Combining Conditions with `and`**  
   - The `and` operator combines the two conditions, so the filter only returns documents where **both conditions are true**.  
   - This means the results will include documents where the payer is `'Cigna'` **and** the ICD code `'M45.6'` is present in the `covered_diseases_icd_codes` field.


In [None]:
# Perform the search query with the updated filter
# Hybrid retrieval + rerank
r = search_client.search(
    search_text=search_query,
    top=5,
    vector_queries=[
        VectorizedQuery(vector=embedding.data[0].embedding, k_nearest_neighbors=50, fields="vector", weight=0.5),
    ],
    query_type=QueryType.SEMANTIC,
    semantic_configuration_name="policy-index-semantic-config",
    query_language="en-us",
    query_caption=QueryCaptionType.EXTRACTIVE,
    query_answer=QueryAnswerType.EXTRACTIVE,
    filter="payer_name eq 'Cigna' and covered_diseases_icd_codes/any(c: c eq 'M45.6')",  # Filter by payer name and ICD code
)

# Iterate through the search results and print all relevant metadata
for doc in r:
    content = doc.get("content", "").replace("\n", " ")[:1000]  # Limit content to 1000 characters for readability
    print(
        f"ID: {doc.get('parent_id', 'N/A')}, "
        f"Policy Name: {doc.get('policy_name', 'N/A')}, "
        f"Payer Name: {doc.get('payer_name', 'N/A')}, "
        f"Drug Names: {doc.get('drug_names', 'N/A')}, "
        f"Medical Specialties: {doc.get('medical_specialties', 'N/A')}, "
        f"Covered Diseases: {doc.get('covered_diseases', 'N/A')}, "
        f"Score: {doc.get('@search.score', 'N/A')}, "
        f"Reranker Score: {doc.get('@search.reranker_score', 'N/A')}. "
        f"Content: {content}"
    )