# Azure AI Search with LangChain - Beginner's Guide

## What is Azure AI Search?
Azure AI Search (formerly known as Azure Cognitive Search) is a cloud search service that helps you build rich search experiences. Think of it like Google Search, but for your own documents and data!

## Why use Azure AI Search?
- **Smart Search**: Understands meaning, not just keywords
- **Scalable**: Handles millions of documents
- **AI-Powered**: Built-in features like semantic search and vector search
- **Easy Integration**: Works seamlessly with LangChain

## What you'll learn:
1. Connect to Azure AI Search
2. Create a search index (like organizing a library)
3. Upload documents
4. Perform different types of searches
5. Build a Q&A system using Azure AI Search

## Setup and Installation

First, let's install and import the necessary libraries.

In [30]:
# Install required packages (uncomment if needed)
# !pip install azure-search-documents langchain langchain-openai python-dotenv

from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SimpleField,
    SearchableField,
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
)
from azure.core.credentials import AzureKeyCredential
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
import os
from dotenv import load_dotenv

load_dotenv()

True

## 1. Configure Azure AI Search Connection

**What you need:**
- Azure AI Search service endpoint (URL)
- Admin API key (like a password to access the service)
- Index name (the name for your searchable collection)

**Where to find these:**
Go to Azure Portal ‚Üí Your Search Service ‚Üí Keys section

In [31]:
# Azure AI Search Configuration
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT")  # e.g., "https://your-service.search.windows.net"
AZURE_SEARCH_KEY = os.getenv("AZURE_SEARCH_KEY")
AZURE_SEARCH_INDEX_NAME = "langchain-demo-index"

# Create credential object for authentication
credential = AzureKeyCredential(AZURE_SEARCH_KEY)

print(f"Connected to: {AZURE_SEARCH_ENDPOINT}")
print(f"Index name: {AZURE_SEARCH_INDEX_NAME}")

Connected to: https://ntloc-ai-search.search.windows.net
Index name: langchain-demo-index


## 2. Set Up Embeddings

We need embeddings to convert text into numbers for vector search (semantic search).

In [32]:
# Configure Azure OpenAI Embeddings
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME")  # Your embedding model deployment name
)

# Test embeddings
test_text = "Azure AI Search is powerful"
test_embedding = embeddings.embed_query(test_text)

print(f"Embedding dimension: {len(test_embedding)}")
print(f"First 5 values: {test_embedding[:5]}")

Embedding dimension: 3072
First 5 values: [0.013848443515598774, 0.003581428900361061, -0.008530277758836746, 0.008590883575379848, 0.013969656080007553]


## 3. Create Azure Search Vector Store

**What's a vector store?**
It's a special database that stores both your text and its numerical representation (embeddings), making semantic search possible.

LangChain makes this easy with the `AzureSearch` class!

In [33]:
# Create Azure Search vector store
# This will automatically create the index if it doesn't exist
vector_store = AzureSearch(
    azure_search_endpoint=AZURE_SEARCH_ENDPOINT,
    azure_search_key=AZURE_SEARCH_KEY,
    index_name=AZURE_SEARCH_INDEX_NAME,
    embedding_function=embeddings.embed_query,
)

print("Azure Search vector store created!")
print(f"Index: {AZURE_SEARCH_INDEX_NAME}")

Azure Search vector store created!
Index: langchain-demo-index


## 3a. (Optional) Create Custom Index Manually

**Why create a custom index?**
- Control exact field types and properties
- Configure vector search algorithms  
- Set up custom analyzers for different languages
- Define which fields are searchable, filterable, or facetable

**Note:** LangChain's `AzureSearch` automatically creates an index with sensible defaults, which is perfect for getting started! For production scenarios with specific requirements, you can create custom indexes using the Azure Portal or Azure CLI.

**To create a custom index:**
1. Go to Azure Portal ‚Üí Your Search Service ‚Üí Indexes
2. Click "Add Index" and configure fields manually
3. Or use Azure CLI/PowerShell for infrastructure-as-code approach

For this tutorial, we'll use the automatically created index which works great for most use cases!

## 4. Add Documents to the Index

Let's add some sample documents. Think of this like adding books to a library catalog.

In [34]:
# Create sample documents with metadata
documents = [
    Document(
        page_content="Azure AI Search is a cloud search service with built-in AI capabilities.",
        metadata={"source": "azure-docs", "category": "overview", "difficulty": "beginner"}
    ),
    Document(
        page_content="Vector search in Azure AI Search enables semantic search using embeddings.",
        metadata={"source": "azure-docs", "category": "features", "difficulty": "intermediate"}
    ),
    Document(
        page_content="LangChain provides easy integration with Azure AI Search for building RAG applications.",
        metadata={"source": "langchain-docs", "category": "integration", "difficulty": "intermediate"}
    ),
    Document(
        page_content="Semantic ranking improves search results by understanding query intent.",
        metadata={"source": "azure-docs", "category": "features", "difficulty": "advanced"}
    ),
    Document(
        page_content="Python SDK makes it easy to work with Azure AI Search programmatically.",
        metadata={"source": "azure-docs", "category": "sdk", "difficulty": "beginner"}
    ),
]

# Add documents to the index
# This converts text to embeddings and uploads everything
document_ids = vector_store.add_documents(documents)

print(f"Added {len(document_ids)} documents to Azure AI Search")
print(f"Document IDs: {document_ids[:3]}...")  # Show first 3 IDs

Added 5 documents to Azure AI Search
Document IDs: ['NDNhYjk3ZjktNjNhOS00ZDBiLTlmYjUtZGVhMWFiYmZjYzMx', 'ZDE1MDZhMTMtMWFiMy00MGQ0LWI0ZGYtOWZhMzg5ZjEwZGMx', 'Y2EwNDcxZTktOGY4NC00OTcyLWIxNjUtM2Y3ZDM4YzYyY2Rm']...


## 5. Perform Similarity Search

Now let's search! The search will find documents based on meaning, not just keywords.

In [35]:
# Search for documents
query = "How do I use semantic search?"

# Perform similarity search (finds top 3 most relevant documents)
results = vector_store.similarity_search(query, k=3)

print(f"Query: {query}\n")
print("Top 3 most relevant documents:\n")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Metadata: {doc.metadata}\n")

Query: How do I use semantic search?

Top 3 most relevant documents:

1. Semantic ranking improves search results by understanding query intent.
   Metadata: {'id': 'ZWIwMDkyZjgtOTMyMS00M2Q0LTkyOWYtNDk0MWJmNjM5N2Fi', 'source': 'azure-docs', 'category': 'features', 'difficulty': 'advanced'}

2. Semantic ranking improves search results by understanding query intent.
   Metadata: {'id': 'OTk4NWIzYzAtMGY3My00YzMyLTg4MDQtMTk5ODMzYjc2MDZk', 'source': 'azure-docs', 'category': 'features', 'difficulty': 'advanced'}

3. Vector search in Azure AI Search enables semantic search using embeddings.
   Metadata: {'id': 'ZGUyMjFjYzYtMWQ5MC00OGQ4LWI4MWEtMDdjMmJjMTFhNDBi', 'source': 'azure-docs', 'category': 'features', 'difficulty': 'intermediate'}



## 6. Search with Scores

Let's see how similar each result is to our query.

In [36]:
# Search with similarity scores
results_with_scores = vector_store.similarity_search_with_score(query, k=3)

print(f"Query: {query}\n")
print("Results with similarity scores:\n")
for i, (doc, score) in enumerate(results_with_scores, 1):
    print(f"{i}. Score: {score:.4f}")
    print(f"   Content: {doc.page_content}")
    print(f"   Category: {doc.metadata.get('category', 'N/A')}\n")

Query: How do I use semantic search?

Results with similarity scores:

1. Score: 0.0333
   Content: Semantic ranking improves search results by understanding query intent.
   Category: features

2. Score: 0.0328
   Content: Semantic ranking improves search results by understanding query intent.
   Category: features

3. Score: 0.0320
   Content: Vector search in Azure AI Search enables semantic search using embeddings.
   Category: features



## 7. Search with Metadata

**Note about filtering:** To filter by specific metadata fields (like `difficulty` or `category`), you would need to create a custom index where those fields are marked as `filterable=True`. 

The automatic index created by LangChain stores metadata as a JSON string, so we can't filter by individual metadata properties. For now, we'll search and display the metadata of results.

In [37]:
# Search with metadata filter
# Note: To filter by metadata fields, you need to configure them as filterable when creating the index
# For now, we'll search without filters and show all results
query = "search service"

results = vector_store.similarity_search(
    query,
    k=3
)

print(f"Query: {query}")
print(f"\nTop {len(results)} results:\n")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Difficulty: {doc.metadata.get('difficulty', 'N/A')}")
    print(f"   Category: {doc.metadata.get('category', 'N/A')}\n")

Query: search service

Top 3 results:

1. Azure AI Search is a cloud search service with built-in AI capabilities.
   Difficulty: beginner
   Category: overview

2. Azure AI Search is a cloud search service with built-in AI capabilities.
   Difficulty: beginner
   Category: overview

3. Vector search in Azure AI Search enables semantic search using embeddings.
   Difficulty: intermediate
   Category: features



## 8. Build a RAG System with Azure AI Search

**What is RAG?**
Retrieval Augmented Generation combines:
1. **Retrieval**: Finding relevant documents from Azure AI Search
2. **Generation**: Using AI to create answers based on those documents

Let's build a complete Q&A system!

In [38]:
# Step 1: Set up the AI model for generating answers
llm = AzureChatOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    temperature=0  # 0 = focused answers, 1 = creative answers
)

# Step 2: Create a prompt template
template = """You are a helpful assistant. Answer the question based only on the following context from Azure AI Search:

Context:
{context}

Question: {question}

Answer: Provide a clear and concise answer based on the context above."""

prompt = ChatPromptTemplate.from_template(template)

# Step 3: Create a retriever from Azure Search
retriever = vector_store.as_retriever(
    search_type="similarity"
)

# Step 4: Helper function to format documents
def format_docs(docs):
    """Combine multiple documents into one text block"""
    return "\n\n".join(doc.page_content for doc in docs)

# Step 5: Build the RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("RAG system ready! ‚úÖ")

RAG system ready! ‚úÖ


## 9. Ask Questions!

In [39]:
# Ask a question
question = "What is vector search in Azure AI Search?"
answer = rag_chain.invoke(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}\n")

# Show which documents were used
source_docs = retriever.invoke(question)
print("Sources used:")
for i, doc in enumerate(source_docs, 1):
    print(f"{i}. {doc.page_content[:80]}...")

Question: What is vector search in Azure AI Search?

Answer: Vector search in Azure AI Search is a feature that enables semantic search using embeddings, allowing for more advanced and meaningful search capabilities.

Sources used:
1. Vector search in Azure AI Search enables semantic search using embeddings....
2. Vector search in Azure AI Search enables semantic search using embeddings....
3. Azure AI Search is a cloud search service with built-in AI capabilities....
4. Azure AI Search is a cloud search service with built-in AI capabilities....
Sources used:
1. Vector search in Azure AI Search enables semantic search using embeddings....
2. Vector search in Azure AI Search enables semantic search using embeddings....
3. Azure AI Search is a cloud search service with built-in AI capabilities....
4. Azure AI Search is a cloud search service with built-in AI capabilities....


## 10. Try Multiple Questions

In [40]:
# Test with multiple questions
questions = [
    "How can I integrate LangChain with Azure AI Search?",
    "What programming languages can I use with Azure AI Search?",
    "What is semantic ranking?",
]

for question in questions:
    answer = rag_chain.invoke(question)
    print(f"\nQ: {question}")
    print(f"A: {answer}")
    print("-" * 80)


Q: How can I integrate LangChain with Azure AI Search?
A: You can integrate LangChain with Azure AI Search to build RAG (Retrieval-Augmented Generation) applications, as LangChain provides easy integration with Azure AI Search, which is a cloud search service with built-in AI capabilities.
--------------------------------------------------------------------------------

Q: What programming languages can I use with Azure AI Search?
A: Based on the context provided, you can use Python to work with Azure AI Search programmatically.
--------------------------------------------------------------------------------

Q: What programming languages can I use with Azure AI Search?
A: Based on the context provided, you can use Python to work with Azure AI Search programmatically.
--------------------------------------------------------------------------------

Q: What is semantic ranking?
A: Semantic ranking is a method that improves search results by understanding the query intent.
-------------

## 11. Direct Azure Search SDK (Advanced)

If you need more control, you can use the Azure Search SDK directly.

In [41]:
# Create a direct search client
search_client = SearchClient(
    endpoint=AZURE_SEARCH_ENDPOINT,
    index_name=AZURE_SEARCH_INDEX_NAME,
    credential=credential
)

# Perform a simple text search (keyword-based)
search_query = "Azure"
results = search_client.search(
    search_text=search_query,
    top=3,
    select=["content", "metadata"]
)

print(f"Keyword search for: '{search_query}'\n")
for i, result in enumerate(results, 1):
    print(f"{i}. Score: {result['@search.score']:.2f}")
    print(f"   Content: {result.get('content', 'N/A')[:100]}...")
    print()

Keyword search for: 'Azure'

1. Score: 0.48
   Content: Azure AI Search is a cloud search service with built-in AI capabilities....

2. Score: 0.48
   Content: Vector search in Azure AI Search enables semantic search using embeddings....

3. Score: 0.48
   Content: Python SDK makes it easy to work with Azure AI Search programmatically....

1. Score: 0.48
   Content: Azure AI Search is a cloud search service with built-in AI capabilities....

2. Score: 0.48
   Content: Vector search in Azure AI Search enables semantic search using embeddings....

3. Score: 0.48
   Content: Python SDK makes it easy to work with Azure AI Search programmatically....



## 12. Hybrid Search (Best of Both Worlds!)

Hybrid search combines:
- **Vector search** (semantic/meaning-based)
- **Keyword search** (exact word matching)

This gives you the most accurate results!

In [42]:
# Perform hybrid search using the vector store
query = "Python programming with search"

# This combines vector search and keyword search automatically
hybrid_results = vector_store.similarity_search(
    query,
    k=3,
    search_type="hybrid"  # Enable hybrid search
)

print(f"Hybrid search query: {query}\n")
print("Results (combining semantic + keyword search):\n")
for i, doc in enumerate(hybrid_results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Source: {doc.metadata.get('source', 'N/A')}\n")

Hybrid search query: Python programming with search

Results (combining semantic + keyword search):

1. Python SDK makes it easy to work with Azure AI Search programmatically.
   Source: azure-docs

2. Python SDK makes it easy to work with Azure AI Search programmatically.
   Source: azure-docs

3. Vector search in Azure AI Search enables semantic search using embeddings.
   Source: azure-docs



## 13. Deleting Documents (Cleanup)

In [43]:
# Delete specific documents by ID (uncomment to use)
# vector_store.delete(ids=[document_ids[0]])
# print(f"Deleted document with ID: {document_ids[0]}")

# Or delete the entire index (uncomment to use)
# from azure.search.documents.indexes import SearchIndexClient
# index_client = SearchIndexClient(endpoint=AZURE_SEARCH_ENDPOINT, credential=credential)
# index_client.delete_index(AZURE_SEARCH_INDEX_NAME)
# print(f"Deleted index: {AZURE_SEARCH_INDEX_NAME}")

print("Cleanup options available (currently commented out)")

Cleanup options available (currently commented out)


## üéâ Congratulations!

### What You Learned:

1. **Azure AI Search Basics** - Cloud-based search service with AI capabilities
2. **Connection Setup** - How to connect to Azure AI Search using credentials
3. **Index Creation** - Creating searchable indexes for your documents
4. **Document Upload** - Adding documents with metadata to the index
5. **Similarity Search** - Finding documents by meaning (semantic search)
6. **Filtered Search** - Using metadata to narrow search results
7. **RAG System** - Building Q&A applications with Azure AI Search
8. **Hybrid Search** - Combining vector and keyword search for best results
9. **Direct SDK Usage** - Advanced control with Azure Search SDK

### üöÄ Next Steps:

**Easy:**
- Add more documents with different metadata
- Try different search queries
- Experiment with different filter conditions

**Intermediate:**
- Upload PDF or text files to the index
- Implement semantic ranking
- Create a chatbot with conversation memory

**Advanced:**
- Set up custom analyzers for different languages
- Implement faceted search (filtering by categories)
- Build a production-ready search application
- Use Azure AI Search with Azure Blob Storage

### üí° Key Benefits of Azure AI Search:

‚úÖ **Scalable** - Handles millions of documents  
‚úÖ **Fast** - Optimized for quick searches  
‚úÖ **AI-Powered** - Built-in semantic understanding  
‚úÖ **Flexible** - Supports text, vectors, and hybrid search  
‚úÖ **Secure** - Enterprise-grade security and compliance  

### üìö Resources:

- [Azure AI Search Documentation](https://learn.microsoft.com/azure/search/)
- [LangChain Azure Search Integration](https://python.langchain.com/docs/integrations/vectorstores/azuresearch)
- [Azure Search Python SDK](https://learn.microsoft.com/python/api/overview/azure/search-documents-readme)

Happy searching! üîç