# NL2SQL Pipeline Demo - Azure AI Agent Service

## Overview
This notebook demonstrates the **Azure AI Agent Service** implementation of our Natural Language to SQL pipeline.

### Key Features
- 🤖 **Persistent AI Agents** - Agents remain in Azure AI Foundry (not deleted after use)
- 🔐 **Enterprise Authentication** - Uses DefaultAzureCredential (Azure CLI, Managed Identity)
- 📊 **Built-in Observability** - Native Azure AI Foundry tracing
- ⚡ **Performance** - Agent reuse eliminates creation overhead
- 🎯 **2-Agent Architecture** - Intent extraction + SQL generation

### Pipeline Flow
```
Natural Language Query
         ↓
    [Intent Agent] ← Persistent agent in Azure AI Foundry
         ↓
    Intent JSON (entities, filters, metrics)
         ↓
    [SQL Agent] ← Persistent agent with schema context
         ↓
    Generated T-SQL
         ↓
    SQL Execution (Azure SQL)
         ↓
    Formatted Results + Token Usage + Cost
```

### Comparison with LangChain
| Feature | LangChain | Azure AI Agent Service |
|---------|-----------|------------------------|
| Orchestration | Prompt chains | Agents + Threads |
| Authentication | API Key | DefaultAzureCredential |
| State | Stateless | Stateful threads |
| Persistence | No | Yes (Azure AI Foundry) |
| Dependencies | 3 packages | 2 packages |

---

**Let's explore the pipeline step by step!**

## Step 1: Import Required Libraries

We'll import the Azure AI Agent Service SDK and other necessary modules.

In [1]:
import os
import sys
import json
import time
from datetime import datetime
from dotenv import load_dotenv

# Azure AI Agent Service
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# Add current directory to path for local imports
sys.path.insert(0, os.path.dirname(os.path.abspath('__file__')))

print("✓ All libraries imported successfully!")
print(f"  - azure-ai-projects: Available")
print(f"  - azure-identity: Available")
print(f"  - Python version: {sys.version.split()[0]}")

✓ All libraries imported successfully!
  - azure-ai-projects: Available
  - azure-identity: Available
  - Python version: 3.13.7


## Step 2: Load Environment Configuration

Load Azure AI Foundry project configuration and verify settings.

In [2]:
# Load environment variables
load_dotenv()

# Azure AI Foundry configuration
PROJECT_ENDPOINT = os.getenv("PROJECT_ENDPOINT")
MODEL_DEPLOYMENT_NAME = os.getenv("MODEL_DEPLOYMENT_NAME")

# Azure SQL configuration
AZURE_SQL_SERVER = os.getenv("AZURE_SQL_SERVER")
AZURE_SQL_DB = os.getenv("AZURE_SQL_DB")

print("Environment Configuration:")
print("=" * 60)
print(f"✓ Azure AI Project: {PROJECT_ENDPOINT}")
print(f"✓ Model Deployment: {MODEL_DEPLOYMENT_NAME}")
print(f"✓ Azure SQL Server: {AZURE_SQL_SERVER}")
print(f"✓ Database: {AZURE_SQL_DB}")
print("=" * 60)

# Verify required variables
if not PROJECT_ENDPOINT or not MODEL_DEPLOYMENT_NAME:
    raise ValueError("Missing required environment variables. Check .env file!")

Environment Configuration:
✓ Azure AI Project: https://aq-ai-foundry-sweden-central.services.ai.azure.com/api/projects/firstProject
✓ Model Deployment: gpt-4.1
✓ Azure SQL Server: aqsqlserver001.database.windows.net
✓ Database: CONTOSO-FI


## Step 3: Initialize Azure AI Project Client

Connect to Azure AI Foundry using DefaultAzureCredential (Azure CLI authentication).

In [3]:
# Initialize Azure AI Project Client
project_client = AIProjectClient(
    endpoint=PROJECT_ENDPOINT,
    credential=DefaultAzureCredential()
)

print("✓ Azure AI Project Client initialized successfully!")
print(f"  Endpoint: {PROJECT_ENDPOINT}")
print(f"  Authentication: DefaultAzureCredential (Azure CLI)")
print("\n📌 Note: This uses your Azure CLI login (az login)")
print("   No API keys stored in code!")

✓ Azure AI Project Client initialized successfully!
  Endpoint: https://aq-ai-foundry-sweden-central.services.ai.azure.com/api/projects/firstProject
  Authentication: DefaultAzureCredential (Azure CLI)

📌 Note: This uses your Azure CLI login (az login)
   No API keys stored in code!


## Step 4: Load Database Schema

Load the CONTOSO-FI database schema that will be provided to the SQL generation agent.

In [4]:
from schema_reader import get_sql_database_schema_context

# Load database schema
schema_context = get_sql_database_schema_context()

print("✓ Database schema loaded successfully!")
print(f"  Schema size: {len(schema_context)} characters")
print(f"  Database: {AZURE_SQL_DB}")
print("\n📋 Schema Preview (first 300 chars):")
print("=" * 60)
print(schema_context[:300] + "...")
print("=" * 60)

✓ Database schema loaded successfully!
  Schema size: 4777 characters
  Database: CONTOSO-FI

📋 Schema Preview (first 300 chars):
DATABASE: CONTOSO-FI (Azure SQL)
GUIDELINES
- Prefer dbo.vw_LoanPortfolio for simple portfolio-style questions.
- For hard/complex questions, use the base tables (Loan, Company, Collateral, Covenant, PaymentSchedule, etc.) and generate SQL with multiple joins, subqueries, CTEs, or advanced logic as ...


## Step 5: Create Persistent Agents

Now we'll create our two persistent agents that will remain in Azure AI Foundry.

### Agent 1: Intent Extractor
Analyzes natural language queries to extract:
- Intent (count, list, aggregate, filter)
- Entities (tables, columns)
- Metrics to calculate
- Filters and conditions
- Grouping requirements

### Agent 2: SQL Generator
Generates T-SQL queries using:
- Intent from Agent 1
- Database schema context
- Azure SQL best practices

In [5]:
# Create Intent Extraction Agent
intent_agent = project_client.agents.create_agent(
    model=MODEL_DEPLOYMENT_NAME,
    name="intent-extractor-demo",
    instructions="""You are an AI assistant that extracts the intent and entities from natural language database queries.

Analyze the user's question and provide:
1. The main intent (e.g., count, list, aggregate, filter)
2. Key entities mentioned (tables, columns, metrics)
3. Any filters or conditions
4. Desired aggregations or groupings

Return your analysis in JSON format with keys: intent, entity, metrics, filters, group_by."""
)

print("✓ Intent Extraction Agent created!")
print(f"  Agent ID: {intent_agent.id}")
print(f"  Model: {MODEL_DEPLOYMENT_NAME}")
print(f"  Name: {intent_agent.name}")
print()

# Create SQL Generation Agent
sql_agent = project_client.agents.create_agent(
    model=MODEL_DEPLOYMENT_NAME,
    name="sql-generator-demo",
    instructions=f"""You are an expert SQL query generator for Azure SQL Database.

Given the user's intent and the database schema, generate a valid T-SQL query.

Database Schema:
{schema_context}

Requirements:
- Generate clean, efficient T-SQL
- Use proper JOINs when needed
- Include appropriate WHERE clauses for filters
- Use meaningful column aliases
- Return ONLY the SQL query, no explanations
- Do NOT include markdown code blocks"""
)

print("✓ SQL Generation Agent created!")
print(f"  Agent ID: {sql_agent.id}")
print(f"  Model: {MODEL_DEPLOYMENT_NAME}")
print(f"  Name: {sql_agent.name}")
print(f"  Schema size: {len(schema_context)} characters")
print()
print("=" * 60)
print("🎉 Both agents are now persistent in Azure AI Foundry!")
print("   They will remain even after this notebook session ends.")
print("=" * 60)

✓ Intent Extraction Agent created!
  Agent ID: asst_TOSujAUTZxQxMhgG8C6OE3Gg
  Model: gpt-4.1
  Name: intent-extractor-demo

✓ SQL Generation Agent created!
  Agent ID: asst_DLoXNhObGvykHaUpERhIhHwB
  Model: gpt-4.1
  Name: sql-generator-demo
  Schema size: 4777 characters

🎉 Both agents are now persistent in Azure AI Foundry!
   They will remain even after this notebook session ends.


## Step 6: Test Intent Extraction

Let's test the Intent Extraction agent with a sample query.

In [12]:
# Test query
test_query = "Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans."

print(f"🔍 Test Query: '{test_query}'")
print("\n" + "=" * 60)

# Create a thread for this conversation
thread = project_client.agents.threads.create()

# Send the query to the intent agent
project_client.agents.messages.create(
    thread_id=thread.id,
    role="user",
    content=test_query
)

# Run the agent
run = project_client.agents.runs.create_and_process(
    thread_id=thread.id,
    agent_id=intent_agent.id
)

# Get the response
messages = project_client.agents.messages.list(thread_id=thread.id)
for message in messages:
    if message.role == "assistant":
        intent_result = message.text_messages[0].text.value
        break

print("✓ Intent Extraction Result:")
print("=" * 60)
print(intent_result)
print("=" * 60)

# Parse JSON for better display
try:
    intent_json = json.loads(intent_result)
    print("\n📊 Structured Intent:")
    for key, value in intent_json.items():
        print(f"  {key}: {value}")
except:
    pass

🔍 Test Query: 'Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans.'

✓ Intent Extraction Result:
{
  "intent": "list",
  "entity": ["companies", "industry", "region", "loans"],
  "metrics": ["total principal amount", "average interest rate"],
  "filters": null,
  "group_by": ["company"],
  "additional": {
    "sort_by": "total principal amount",
    "order": "desc",
    "limit": 10,
    "include_fields": ["industry", "region"]
  }
}

📊 Structured Intent:
  intent: list
  entity: ['companies', 'industry', 'region', 'loans']
  metrics: ['total principal amount', 'average interest rate']
  filters: None
  group_by: ['company']
  additional: {'sort_by': 'total principal amount', 'order': 'desc', 'limit': 10, 'include_fields': ['industry', 'region']}
✓ Intent Extraction Result:
{
  "intent": "list",
  "entity": ["companies", "industry", "region", "loans"],
  "metrics": ["total principal amount", "average in

## Step 7: Test SQL Generation

Now let's use the SQL agent to generate a query based on the extracted intent.

In [13]:
print(f"📝 Generating SQL for intent: {intent_result}")
print("\n" + "=" * 60)

# Create a new thread for SQL generation
sql_thread = project_client.agents.threads.create()

# Send the intent to the SQL agent
project_client.agents.messages.create(
    thread_id=sql_thread.id,
    role="user",
    content=f"Intent: {intent_result}\n\nGenerate the SQL query:"
)

# Run the SQL agent
sql_run = project_client.agents.runs.create_and_process(
    thread_id=sql_thread.id,
    agent_id=sql_agent.id
)

# Get the SQL response
sql_messages = project_client.agents.messages.list(thread_id=sql_thread.id)
for message in sql_messages:
    if message.role == "assistant":
        generated_sql = message.text_messages[0].text.value
        break

print("✓ Generated SQL Query:")
print("=" * 60)
print(generated_sql)
print("=" * 60)

📝 Generating SQL for intent: {
  "intent": "list",
  "entity": ["companies", "industry", "region", "loans"],
  "metrics": ["total principal amount", "average interest rate"],
  "filters": null,
  "group_by": ["company"],
  "additional": {
    "sort_by": "total principal amount",
    "order": "desc",
    "limit": 10,
    "include_fields": ["industry", "region"]
  }
}

✓ Generated SQL Query:
SELECT
    CompanyName,
    Industry,
    RegionName,
    SUM(PrincipalAmount) AS TotalPrincipalAmount,
    AVG(InterestRatePct) AS AverageInterestRate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY TotalPrincipalAmount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
✓ Generated SQL Query:
SELECT
    CompanyName,
    Industry,
    RegionName,
    SUM(PrincipalAmount) AS TotalPrincipalAmount,
    AVG(InterestRatePct) AS AverageInterestRate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY TotalPrincipalAmount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS O

## Step 8: Execute SQL Query

Now let's execute the generated SQL query against the Azure SQL database.

In [14]:
from sql_executor import execute_sql_query
import time

print(f"🚀 Executing SQL Query...")
print("=" * 60)

# Clean up the SQL query (remove markdown code blocks if present)
sql_query = generated_sql.strip()
if sql_query.startswith("```"):
    lines = sql_query.split('\n')
    sql_query = '\n'.join(lines[1:-1]) if len(lines) > 2 else sql_query

print(f"Query to execute:\n{sql_query}\n")
print("=" * 60)

# Execute the query
try:
    start_time = time.time()
    results = execute_sql_query(sql_query)
    execution_time = time.time() - start_time
    
    # Display execution metadata
    print(f"\n✓ Query executed successfully!")
    print(f"  Rows returned: {len(results)}")
    print(f"  Execution time: {execution_time:.3f}s")
    
    # Display column information
    if results and len(results) > 0:
        columns = list(results[0].keys())
        print(f"\n📋 Columns ({len(columns)}):")
        for col in columns:
            print(f"  - {col}")
        
        # Display raw results (first 10 rows if many)
        print(f"\n📊 Results:")
        print("=" * 60)
        max_display = min(10, len(results))
        for i, row in enumerate(results[:max_display], 1):
            print(f"Row {i}: {row}")
        
        if len(results) > max_display:
            print(f"... and {len(results) - max_display} more rows")
    else:
        print("\n⚠️  No results returned")
    
    print("=" * 60)
    
except Exception as e:
    print(f"❌ Error executing query: {str(e)}")
    import traceback
    traceback.print_exc()
    results = None

🚀 Executing SQL Query...
Query to execute:
SELECT
    CompanyName,
    Industry,
    RegionName,
    SUM(PrincipalAmount) AS TotalPrincipalAmount,
    AVG(InterestRatePct) AS AverageInterestRate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY TotalPrincipalAmount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY


✓ Query executed successfully!
  Rows returned: 10
  Execution time: 1.145s

📋 Columns (5):
  - CompanyName
  - Industry
  - RegionName
  - TotalPrincipalAmount
  - AverageInterestRate

📊 Results:
Row 1: {'CompanyName': 'Blue Ridge Energy Corp', 'Industry': 'Energy', 'RegionName': 'Americas', 'TotalPrincipalAmount': Decimal('25000000.00'), 'AverageInterestRate': Decimal('6.250000')}
Row 2: {'CompanyName': 'Toronto Health Devices', 'Industry': 'Medical Devices', 'RegionName': 'Americas', 'TotalPrincipalAmount': Decimal('25000000.00'), 'AverageInterestRate': Decimal('6.250000')}
Row 3: {'CompanyName': 'Acme Industrial Holdings', 'Industry': 'Manufacturing

## Step 9: Format Results as Table

Display the results in a clean, formatted table for better readability.

In [15]:
import pandas as pd
from IPython.display import display, HTML

if results and len(results) > 0:
    try:
        # Convert results to pandas DataFrame
        df = pd.DataFrame(results)
        
        print("📊 Formatted Results Table")
        print("=" * 80)
        print(f"Total Rows: {len(df)}")
        print(f"Columns: {', '.join(df.columns.tolist())}")
        print("=" * 80)
        print()
        
        # Display as formatted table
        # Set pandas display options for better readability
        pd.set_option('display.max_columns', None)
        pd.set_option('display.width', None)
        pd.set_option('display.max_colwidth', 50)
        
        # Display the DataFrame
        display(df)
        
        # Show basic statistics for numeric columns
        numeric_cols = df.select_dtypes(include=['int64', 'float64']).columns
        if len(numeric_cols) > 0:
            print("\n📈 Quick Statistics (Numeric Columns):")
            print("=" * 80)
            display(df[numeric_cols].describe())
        
        # Show data types
        print("\n🔍 Column Data Types:")
        print("=" * 80)
        for col, dtype in df.dtypes.items():
            print(f"  {col}: {dtype}")
            
    except Exception as e:
        print(f"⚠️  Could not format as table: {str(e)}")
        print("Raw results are available in the 'results' variable")
        import traceback
        traceback.print_exc()
else:
    print("⚠️  No results to format. The query may not have returned any data.")
    print("   Check the previous cell for execution details.")

📊 Formatted Results Table
Total Rows: 10
Columns: CompanyName, Industry, RegionName, TotalPrincipalAmount, AverageInterestRate



Unnamed: 0,CompanyName,Industry,RegionName,TotalPrincipalAmount,AverageInterestRate
0,Blue Ridge Energy Corp,Energy,Americas,25000000.0,6.25
1,Toronto Health Devices,Medical Devices,Americas,25000000.0,6.25
2,Acme Industrial Holdings,Manufacturing,Americas,24000000.0,6.25
3,Mumbai InfraTech Ltd,Infrastructure,Asia,22000000.0,3.9
4,Osaka Precision K.K.,Electronics,Asia,22000000.0,3.9
5,Shanghai GreenChem Co,Chemicals,Asia,22000000.0,3.9
6,Singapore Data Centers,Data Centers,Asia,22000000.0,3.9
7,Amsterdam FinTech BV,Technology,Europe,18000000.0,4.75
8,Gaulois Pharma SA,Pharmaceuticals,Europe,18000000.0,4.75
9,Nordwind Logistics GmbH,Logistics,Europe,18000000.0,4.75



🔍 Column Data Types:
  CompanyName: object
  Industry: object
  RegionName: object
  TotalPrincipalAmount: object
  AverageInterestRate: object
