# Advanced Search Techniques with Azure AI Search: Keyword, Vector, and Hybrid Methods

This notebook demonstrates how to perform different types of searches using Azure AI Search, including keyword search, vector search, hybrid search, semantic ranking, and query rewriting.

## Prerequisites

Before running the notebook, ensure you have the following: 

- [Fork](https://github.com/microsoft/rag-time/fork) the repository and clone it to your local machine by following the script below:

    ```bash
    git clone https://github.com/your-org/rag-time.git
    cd rag-time
    ```

- An [Azure account](https://portal.azure.com) with proper permissions to access the following services:
    - An **Azure AI Search** service with an index that contains vectorized text data. Follow the instructions in the [Quickstart](https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors?tabs=sample-data-storage%2Cmodel-aoai%2Cconnect-data-storage) to index the [MSFT_cloud_architecture_contoso.pdf](MSFT_cloud_architecture_contoso.pdf) file. 
- Install Python 3.8 or later from [python.org](https://python.org).

## Steps to Use the Notebook

### 1. Install Required Libraries

Run the first code cell to install the required Python libraries:

In [1]:
%pip install azure-search-documents==11.6.0b9 azure-identity python-dotenv pandas jinja2 --quiet

Note: you may need to restart the kernel to use updated packages.


### 2. Set Up Environment Variables

To store credentials securely, rename `.env.sample` file to `.env` in the same directory as the notebook and update the following variables:

In [None]:
AZURE_SEARCH_SERVICE_ENDPOINT="<your_search_service_endpoint>"
AZURE_SEARCH_INDEX="<your_search_index_name>"
AZURE_SEARCH_ADMIN_KEY="<your_search_admin_key>"  # Leave blank if using Managed Identity

After setting up, the notebook will automatically load these values using dotenv.

### 3. Load Environment Variables

Run the following command to load environment variables from the .env file:

In [1]:
import os
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv

load_dotenv(override=True) # take environment variables from .env.

endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
index_name = os.environ["AZURE_SEARCH_INDEX"]
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY")) if os.getenv("AZURE_SEARCH_ADMIN_KEY") else DefaultAzureCredential()

This will ensure all necessary credentials are available before setting up the API client.

### 4. Set Up API Client and Define the Display Function

Initialize the Azure AI Search Client for interacting with the Azure Search service and make the search results easier to read by defining a function that formats and displays results:

In [2]:
from azure.search.documents import SearchClient
import pandas as pd

search_client = SearchClient(endpoint, index_name, credential)

def display_results(results):
    df = pd.json_normalize(list(results)).dropna(axis=1, how='all')
    df["chunk"] = df["chunk"].apply(lambda c: c[:300] + '...' if len(c) > 300 else c)
    first_cols = ['title', 'chunk', '@search.score']
    df = df[first_cols + [col for col in df.columns if col not in first_cols]]

    df = df.style.set_properties(**{
        'max-width': '500px',
        'text-align': 'left',
        'white-space': 'normal',
        'word-wrap': 'break-word'
    }).hide(axis="index")


    return df


### 5. Perform Different Search Methods

#### Keyword Search

Execute a traditional keyword-based search:

In [3]:
results = search_client.search(search_text="What is Contoso", top=5, select=["title", "chunk"])

display_results(results)


title,chunk,@search.score
MSFT_cloud_architecture_contoso.pdf,"freeing up local disk space and reducing maintenance. Both hot and cold data are in the same tables and are always available to applications and their users and for maintenance, such as backups and restores. Analyze databases Performed an analysis of the tables in the databases that they i...",2.014516
MSFT_cloud_architecture_contoso.pdf,ACLs for least privilege access Account permissions to access resources in the cloud and what they are allowed to do must follow least -privilege guidelines. Encryption for data at rest in the cloud All data stored on disks or elsewhere in the cloud must be in an encrypted form. Encryption for...,1.800901
MSFT_cloud_architecture_contoso.pdf,"and tenants for Microsoft s cloud offeringsSubscriptions, licenses, accounts, and tenants for Microsoft s cloud offerings Tenants: This topic is 5 of 7 in a series Sales.Production Admin.Production IT.Development IT.Testing IT.Production Sales.Production IT.Production Sales.Production IT....",1.784595
MSFT_cloud_architecture_contoso.pdf,"in the Paris headquarters. • The Paris campus has the datacenters that contain the centralized application servers that serve the entire organization. For users in satellite or regional hub offices, 60% of the resources needed by employees can be served by satellite and regional hub office...",1.732725
MSFT_cloud_architecture_contoso.pdf,have identified the following elements when planning for the adoption of Microsoft s cloud offerings. Microsoft Cloud Identity for Enterprise Architects Microsoft Cloud Identity for Enterprise Architects Microsoft Cloud Networking for Enterprise Architects Microsoft Cloud Networking for...,1.628655


#### Vector Search

Retrieve documents using vector similarity search:

In [4]:
from azure.search.documents.models import VectorizableTextQuery

results = search_client.search(vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")], top=5, select=["title", "chunk"])

display_results(results)

title,chunk,@search.score
MSFT_cloud_architecture_contoso.pdf,How a fictional but representative global organization has implemented the Microsoft Cloud Contoso in the Microsoft Cloud This topic is 1 of 7 in a series The Contoso Corporation Contoso s worldwide organization Elements of Contoso s implementation of the Microsoft cloud Networking Net...,0.852663
MSFT_cloud_architecture_contoso.pdf,mailto:cloudadopt@microsoft.com mailto:cloudadopt@microsoft.com https://azure.microsoft.com/services/expressroute/ https://azure.microsoft.com/services/expressroute/ https://azure.microsoft.com/services/expressroute/ https://azure.microsoft.com/services/expressroute/ https://aka.ms/o365protect_devic...,0.846799
MSFT_cloud_architecture_contoso.pdf,a fictional but representative global organization has implemented the Microsoft Cloud Contoso s app infrastructure Contoso has the following networking infrastructure. On-premises network WAN links connect the Paris headquarters to regional offices and regional offices to satellite office...,0.844584
MSFT_cloud_architecture_contoso.pdf,have identified the following elements when planning for the adoption of Microsoft s cloud offerings. Microsoft Cloud Identity for Enterprise Architects Microsoft Cloud Identity for Enterprise Architects Microsoft Cloud Networking for Enterprise Architects Microsoft Cloud Networking for...,0.838508
MSFT_cloud_architecture_contoso.pdf,https://technet.microsoft.com/library/mt775341.aspx https://technet.microsoft.com/library/mt775341.aspx How a fictional but representative global organization has implemented the Microsoft Cloud Contoso in the Microsoft Cloud Contoso s IT infrastructure and needs Mapping Contoso s busin...,0.837523


#### Hybrid Search (Keyword + Vector Search)

Combine keyword and vector searches for better accuracy:

In [5]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"]
)

display_results(results)

title,chunk,@search.score
MSFT_cloud_architecture_contoso.pdf,have identified the following elements when planning for the adoption of Microsoft s cloud offerings. Microsoft Cloud Identity for Enterprise Architects Microsoft Cloud Identity for Enterprise Architects Microsoft Cloud Networking for Enterprise Architects Microsoft Cloud Networking for...,0.031498
MSFT_cloud_architecture_contoso.pdf,mailto:cloudadopt@microsoft.com mailto:cloudadopt@microsoft.com https://azure.microsoft.com/services/expressroute/ https://azure.microsoft.com/services/expressroute/ https://azure.microsoft.com/services/expressroute/ https://azure.microsoft.com/services/expressroute/ https://aka.ms/o365protect_devic...,0.031099
MSFT_cloud_architecture_contoso.pdf,"the Paris headquarters with a high-bandwidth WAN link. Each regional hub has an average of 2,000 workers. Satellite offices contain 80% sales and support staff and provide a physical and on-site presence for Contoso customers in key cities or sub- regions. Each satellite office is ...",0.030769
MSFT_cloud_architecture_contoso.pdf,How a fictional but representative global organization has implemented the Microsoft Cloud Contoso in the Microsoft Cloud This topic is 1 of 7 in a series The Contoso Corporation Contoso s worldwide organization Elements of Contoso s implementation of the Microsoft cloud Networking Net...,0.030751
MSFT_cloud_architecture_contoso.pdf,"and tenants for Microsoft s cloud offeringsSubscriptions, licenses, accounts, and tenants for Microsoft s cloud offerings Tenants: This topic is 5 of 7 in a series Sales.Production Admin.Production IT.Development IT.Testing IT.Production Sales.Production IT.Production Sales.Production IT....",0.030214


#### Hybrid Search + Semantic Ranker

Enhance search results using a semantic ranker:

In [None]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"],
    query_type="semantic",
    semantic_configuration_name="contoso-architecture-semantic-configuration"
)

display_results(results)

HttpResponseError: (InvalidRequestParameter) Unknown semantic configuration 'ragtime2-semantic-configuration'.
Parameter name: semanticConfiguration
Code: InvalidRequestParameter
Message: Unknown semantic configuration 'ragtime2-semantic-configuration'.
Parameter name: semanticConfiguration
Exception Details:	(UnknownSemanticConfiguration) Unknown semantic configuration 'ragtime2-semantic-configuration'.
	Code: UnknownSemanticConfiguration
	Message: Unknown semantic configuration 'ragtime2-semantic-configuration'.

#### Hybrid Search + Semantic Ranker + Query Rewriting

Use semantic ranking and query rewriting for improved relevance:

In [None]:
results = search_client.search(
    search_text="What is Contoso",
    vector_queries=[VectorizableTextQuery(text="What is Contoso", k_nearest_neighbors=50, fields="text_vector")],
    top=5,
    select=["title", "chunk"],
    query_type="semantic",
    semantic_configuration_name="ragtime2-semantic-configuration",
    query_rewrites="generative",
    query_language="en"
)

display_results(results)

## Troubleshooting

- **Environment Variables Not Loaded:** Ensure you have correctly set the .env file or manually export them in your terminal before running the notebook.
- **Authentication Issues:** If using Managed Identity, make sure your Azure identity has proper role assignments.
- **Search Results Are Empty:** Ensure your Azure AI Search index contains vectorized data.
- **Query Rewriting Issues:** Ensure your search service supports semantic configurations and generative query rewrites.

## Summary

This notebook demonstrates different search techniques using Azure AI Search, including keyword search, vector search, hybrid search, semantic ranking, and query rewriting. The approach enhances search accuracy by leveraging vector embeddings and semantic understanding to retrieve the most relevant documents.

