# Azure AI Agent with File Search Example

This notebook demonstrates how to create an Azure AI agent that uses a file search tool to answer user questions based on uploaded documents.

## Features Covered:
- File upload and management
- Vector store creation and management
- File search tool configuration
- Document-based question answering
- Resource cleanup and management

## Prerequisites

Before running this notebook, ensure you have:

1. **Azure AI Project**: Access to an Azure AI Foundry project with deployed models
2. **Authentication**: Azure CLI installed and authenticated (`az login --use-device-code`)
3. **Environment Variables**: Set up your `.env` file with connection details
4. **Dependencies**: Required agent-framework packages installed

If you need to use a different tenant, specify the tenant ID:
```bash
az login --tenant <tenant-id>
```

## Import Libraries

Import the required libraries for Azure AI agent functionality.

In [None]:
import os
from pathlib import Path
import asyncio
from pathlib import Path

from agent_framework import ChatAgent, HostedFileSearchTool, HostedVectorStoreContent
from agent_framework.azure import AzureAIAgentClient
from azure.ai.agents.models import FileInfo, VectorStore
from azure.identity import AzureCliCredential
from dotenv import load_dotenv  # For loading environment variables from .env file

# Get the path to the .env file which is in the base directory (3 levels up)
load_dotenv('../../../.env')  # Load environment variables from .env file

# Get connection string from environment variables
conn_string = os.getenv("PROJECT_CONNECTION_STRING")
model_deployment = os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4o")

## Define Sample Queries

Let's define some sample questions to ask about the uploaded document:

In [None]:
# Validate configuration
if not conn_string:
    raise ValueError("PROJECT_CONNECTION_STRING not found in environment variables")
if not model_deployment:
    raise ValueError("MODEL_DEPLOYMENT_NAME not found in environment variables")

print(f"✅ Configuration loaded successfully")
print(f"📋 Model deployment: {model_deployment}")

# Define sample queries for the employee document
USER_INPUTS = [
    "What employees are in the Sales department?",
    "Who is the Sales Manager?", 
    "Can you provide contact information for employees in the Marketing department?",
    "How many employees are listed in the document?",
    "What is Sarah Johnson's position and contact information?"
]

print(f"📝 Defined {len(USER_INPUTS)} sample queries for testing")

## Main File Search Example

This example demonstrates the complete workflow:
1. Upload a file
2. Create a vector store
3. Create a file search tool
4. Create an agent with file search capabilities
5. Query the agent about the document content

In [None]:
async def main() -> None:
    """Main function demonstrating Azure AI agent with file search capabilities."""
    file: FileInfo | None = None
    vector_store: VectorStore | None = None

    credential = AzureCliCredential()
    
    # Use AzureAIAgentClient as async context manager to ensure proper lifecycle
    async with AzureAIAgentClient(
        project_endpoint=conn_string,
        model_deployment_name=model_deployment,
        async_credential=credential
    ) as client:
        try:
            # 1. Upload file and create vector store
            # Note: Using text file for better compatibility and readability
            text_file_path = Path("./resources") / "employees.txt"  # Using text file instead of PDF
            print(f"Looking for file at: {text_file_path.absolute()}")
            
            if not text_file_path.exists():
                print("❌ File not found. Please ensure you have a text file to upload.")
                print("📝 For this example, run the create_working_sample_file() function to create the file.")
                return
            
            print(f"Uploading file from: {text_file_path}")

            file = await client.project_client.agents.files.upload_and_poll(
                file_path=str(text_file_path), purpose="assistants"
            )
            print(f"✅ Uploaded file, file ID: {file.id}")

            vector_store = await client.project_client.agents.vector_stores.create_and_poll(
                file_ids=[file.id], name="employees_vectorstore"
            )
            print(f"✅ Created vector store, vector store ID: {vector_store.id}")

            # 2. Create file search tool with uploaded resources
            file_search_tool = HostedFileSearchTool(
                inputs=[HostedVectorStoreContent(vector_store_id=vector_store.id)]
            )
            
            # 3. Create an agent with file search capabilities
            # The tool_resources are automatically extracted from HostedFileSearchTool
            async with ChatAgent(
                chat_client=client,
                name="EmployeeSearchAgent",
                instructions=(
                    "You are a helpful assistant that can search through uploaded employee files "
                    "to answer questions about employees. Provide specific information from the document."
                ),
                tools=file_search_tool,
            ) as agent:
                # 4. Simulate conversation with the agent
                print("\n=== Querying the Document ===")
                for user_input in USER_INPUTS:
                    print(f"\n🤔 User: '{user_input}'")
                    response = await agent.run(user_input)
                    print(f"🤖 Agent: {response.text}")

            # 5. Resource Information (cleanup disabled due to client lifecycle issues)
            print("\n=== Resource Information ===")
            if vector_store is not None:
                print(f"📋 Vector store created: {vector_store.id}")
            if file is not None:
                print(f"📄 File uploaded: {file.id}")
            
            print("💡 Note: Resources are left for reuse. To clean up manually:")
            print("   - Use Azure AI Studio to manage vector stores and files")
            print("   - Or implement cleanup in a separate script with fresh client connection")
            
            # Cleanup is commented out due to HTTP transport closure issues
            # Uncomment and modify if you need cleanup (may require separate client session)
            """
            print("\n=== Cleaning up resources ===")
            if vector_store is not None:
                await client.project_client.agents.vector_stores.delete(vector_store.id)
                print(f"🗑️  Deleted vector store: {vector_store.id}")
            if file is not None:
                await client.project_client.agents.files.delete(file.id)
                print(f"🗑️  Deleted file: {file.id}")
            """

        except Exception as e:
            print(f"❌ Error in main execution: {e}")
            # Resource info still provided on error
            if vector_store is not None:
                print(f"📋 Vector store that may need cleanup: {vector_store.id}")
            if file is not None:
                print(f"📄 File that may need cleanup: {file.id}")

## Execute the Example

Run the main function to see file search in action:

In [None]:
# Run the main function
await main()

## Create a Sample File for Testing

If you don't have an employee file, you can use this helper to create sample content that works directly with the file search example:

In [None]:
def create_sample_employee_file():
    """Create a sample employee file for testing."""
    from pathlib import Path
    
    # Ensure resources directory exists
    resources_dir = Path("./resources")
    resources_dir.mkdir(exist_ok=True)
    
    sample_content = """EMPLOYEE DIRECTORY - COMPANY ABC

SALES DEPARTMENT:
John Smith - Age: 28 - Position: Sales Representative
Contact: john.smith@company.com - Phone: (555) 123-4567
Skills: Customer relations, Product demos, Account management

Mike Davis - Age: 35 - Position: Sales Manager
Contact: mike.davis@company.com - Phone: (555) 345-6789
Skills: Team leadership, Strategic planning, Key account management

MARKETING DEPARTMENT:
Sarah Johnson - Age: 24 - Position: Marketing Coordinator  
Contact: sarah.johnson@company.com - Phone: (555) 234-5678
Skills: Social media, Content creation, Campaign management

CUSTOMER SERVICE DEPARTMENT:
Emily Brown - Age: 22 - Position: Support Specialist
Contact: emily.brown@company.com - Phone: (555) 456-7890
Skills: Problem solving, Customer communication, Ticket management

IT DEPARTMENT:
David Wilson - Age: 31 - Position: Software Developer
Contact: david.wilson@company.com - Phone: (555) 567-8901
Skills: Python, JavaScript, Database management

HR DEPARTMENT:
Lisa Garcia - Age: 29 - Position: HR Specialist
Contact: lisa.garcia@company.com - Phone: (555) 678-9012
Skills: Recruitment, Employee relations, Policy development

TOTAL EMPLOYEES: 6
LAST UPDATED: October 2024
"""
    
    # Save as text file in the resources directory
    file_path = resources_dir / "employees.txt"
    with open(file_path, "w", encoding='utf-8') as f:
        f.write(sample_content)
    
    print(f"✅ Created {file_path}")
    print("📝 This file can be directly used with the file search example")
    print("🔧 Text format is compatible with Azure AI file upload and indexing")

# Uncomment to create sample file
# create_sample_employee_file()

## Advanced File Search Example

Here's a more comprehensive example with error handling and additional features:

In [None]:
async def advanced_file_search_example():
    """Advanced example with better error handling and multiple files."""
    print("=== Advanced File Search Example ===")
    
    uploaded_files = []
    vector_store = None
    
    credential = AzureCliCredential()
    
    # Use AzureAIAgentClient as async context manager for proper lifecycle
    async with AzureAIAgentClient(
        project_endpoint=conn_string,
        model_deployment_name=model_deployment,
        async_credential=credential
    ) as client:
        try:
            # Check for available files
            file_patterns = ["*.pdf", "*.txt", "*.docx"]
            available_files = []
            
            for pattern in file_patterns:
                available_files.extend(Path("./resources").glob(pattern))
            
            if not available_files:
                print("❌ No suitable files found in ./resources directory")
                print("📝 Please add some PDF, TXT, or DOCX files to ./resources/ to test with")
                return
            
            print(f"📁 Found {len(available_files)} files to process")
            
            # Upload files
            for file_path in available_files[:3]:  # Limit to first 3 files
                print(f"📤 Uploading: {file_path.name}")
                try:
                    file_info = await client.project_client.agents.files.upload_and_poll(
                        file_path=str(file_path), purpose="assistants"
                    )
                    uploaded_files.append(file_info)
                    print(f"✅ Uploaded: {file_path.name} (ID: {file_info.id})")
                except Exception as e:
                    print(f"❌ Failed to upload {file_path.name}: {e}")
            
            if not uploaded_files:
                print("❌ No files were successfully uploaded")
                return
            
            # Create vector store with all uploaded files
            file_ids = [f.id for f in uploaded_files]
            vector_store = await client.project_client.agents.vector_stores.create_and_poll(
                file_ids=file_ids, name="multi_file_vectorstore"
            )
            print(f"✅ Created vector store with {len(file_ids)} files")
            
            # Create agent with file search
            file_search_tool = HostedFileSearchTool(
                inputs=[HostedVectorStoreContent(vector_store_id=vector_store.id)]
            )
            
            async with ChatAgent(
                chat_client=client,
                name="DocumentSearchAgent",
                instructions=(
                    "You are a helpful assistant that can search through uploaded documents "
                    "to answer questions. Always cite specific information from the documents when possible."
                ),
                tools=file_search_tool,
            ) as agent:
                
                # Interactive queries
                queries = [
                    "What documents do you have access to?",
                    "Can you summarize the key information from the uploaded files?",
                    "What specific details can you find about people or entities in the documents?"
                ]
                
                for query in queries:
                    print(f"\n🤔 User: {query}")
                    response = await agent.run(query)
                    print(f"🤖 Agent: {response.text}")

            # Resource Information (cleanup disabled due to client lifecycle issues)
            print("\n=== Resource Information ===")
            if vector_store:
                print(f"📋 Vector store created: {vector_store.id}")
            for i, file_info in enumerate(uploaded_files, 1):
                print(f"📄 File {i} uploaded: {file_info.id}")
            
            print("💡 Note: Resources are left for reuse. To clean up manually:")
            print("   - Use Azure AI Studio to manage vector stores and files")
            print("   - Or implement cleanup in a separate script with fresh client connection")
            
            # Cleanup is commented out due to HTTP transport closure issues
            # Uncomment and modify if you need cleanup (may require separate client session)
            """
            print("\n=== Cleaning up resources ===")
            if vector_store:
                await client.project_client.agents.vector_stores.delete(vector_store.id)
                print(f"🗑️  Deleted vector store: {vector_store.id}")
            for file_info in uploaded_files:
                await client.project_client.agents.files.delete(file_info.id)
                print(f"🗑️  Deleted file: {file_info.id}")
            """
        
        except Exception as e:
            print(f"❌ Error in advanced example: {e}")
            # Resource info still provided on error
            if vector_store:
                print(f"📋 Vector store that may need cleanup: {vector_store.id}")
            for i, file_info in enumerate(uploaded_files, 1):
                print(f"📄 File {i} that may need cleanup: {file_info.id}")

# Run the advanced example
# await advanced_file_search_example()

In [None]:
# Create a simple text-based sample for immediate testing
def create_working_sample_file():
    """Create a simple unprotected text file that can be uploaded and searched."""
    from pathlib import Path
    
    # Ensure resources directory exists
    resources_dir = Path("./resources")
    resources_dir.mkdir(exist_ok=True)
    
    # Create sample content
    sample_content = """EMPLOYEE DIRECTORY - COMPANY ABC

SALES DEPARTMENT:
John Smith - Age: 28 - Position: Sales Representative
Contact: john.smith@company.com - Phone: (555) 123-4567
Skills: Customer relations, Product demos, Account management

Mike Davis - Age: 35 - Position: Sales Manager  
Contact: mike.davis@company.com - Phone: (555) 345-6789
Skills: Team leadership, Strategic planning, Key account management

MARKETING DEPARTMENT:
Sarah Johnson - Age: 24 - Position: Marketing Coordinator
Contact: sarah.johnson@company.com - Phone: (555) 234-5678
Skills: Social media, Content creation, Campaign management

CUSTOMER SERVICE DEPARTMENT:
Emily Brown - Age: 22 - Position: Support Specialist
Contact: emily.brown@company.com - Phone: (555) 456-7890
Skills: Problem solving, Customer communication, Ticket management

IT DEPARTMENT:
David Wilson - Age: 31 - Position: Software Developer
Contact: david.wilson@company.com - Phone: (555) 567-8901
Skills: Python, JavaScript, Database management

HR DEPARTMENT:
Lisa Garcia - Age: 29 - Position: HR Specialist
Contact: lisa.garcia@company.com - Phone: (555) 678-9012
Skills: Recruitment, Employee relations, Policy development

TOTAL EMPLOYEES: 6
LAST UPDATED: October 2024
"""
    
    # Save as text file (which can be uploaded and indexed)
    file_path = resources_dir / "employees.txt"
    with open(file_path, "w", encoding='utf-8') as f:
        f.write(sample_content)
    
    print(f"✅ Created working sample file: {file_path}")
    print("📝 This file can be uploaded and searched by the Azure AI agent")
    print("🔧 File format: Plain text (no protection issues)")
    
    return str(file_path)

# Create the working sample
sample_file_path = create_working_sample_file()

In [None]:
async def test_with_working_file() -> None:
    """Test the file search functionality with an unprotected text file."""
    file: FileInfo | None = None
    vector_store: VectorStore | None = None

    credential = AzureCliCredential()
    
    # Use AzureAIAgentClient as async context manager to ensure proper lifecycle
    async with AzureAIAgentClient(
        project_endpoint=conn_string,
        model_deployment_name=model_deployment,
        async_credential=credential
    ) as client:
        try:
            # Use the text file we just created
            text_file_path = Path("./resources") / "employees.txt"
            print(f"Looking for file at: {text_file_path.absolute()}")
            
            if not text_file_path.exists():
                print("❌ Text file not found. Please run the create_working_sample_file() function first.")
                return
            
            print(f"Uploading file from: {text_file_path}")

            file = await client.project_client.agents.files.upload_and_poll(
                file_path=str(text_file_path), purpose="assistants"
            )
            print(f"✅ Uploaded file, file ID: {file.id}")

            vector_store = await client.project_client.agents.vector_stores.create_and_poll(
                file_ids=[file.id], name="text_employees_vectorstore"
            )
            print(f"✅ Created vector store, vector store ID: {vector_store.id}")

            # Create file search tool with uploaded resources
            file_search_tool = HostedFileSearchTool(
                inputs=[HostedVectorStoreContent(vector_store_id=vector_store.id)]
            )
            
            # Create an agent with file search capabilities
            async with ChatAgent(
                chat_client=client,
                name="EmployeeSearchAgent",
                instructions=(
                    "You are a helpful assistant that can search through uploaded employee files "
                    "to answer questions about employees. Provide specific information from the document."
                ),
                tools=file_search_tool,
            ) as agent:
                # Test with our sample queries
                print("\n=== Querying the Text Document ===")
                for user_input in USER_INPUTS[:3]:  # Test first 3 queries
                    print(f"\n🤔 User: '{user_input}'")
                    response = await agent.run(user_input)
                    print(f"🤖 Agent: {response.text}")

            # Resource Information
            print("\n=== Resource Information ===")
            if vector_store is not None:
                print(f"📋 Vector store created: {vector_store.id}")
            if file is not None:
                print(f"📄 File uploaded: {file.id}")
            
            print("💡 Note: Resources are left for reuse.")

        except Exception as e:
            print(f"❌ Error in text file test: {e}")
            # Resource info still provided on error
            if vector_store is not None:
                print(f"📋 Vector store that may need cleanup: {vector_store.id}")
            if file is not None:
                print(f"📄 File that may need cleanup: {file.id}")

# Run the test with working file
await test_with_working_file()