# NL2SQL Pipeline Demo - Azure AI Agent Service

## Overview
This notebook demonstrates the **Azure AI Agent Service** implementation of our Natural Language to SQL pipeline.

### Key Features
- 🤖 **Persistent AI Agents** - Agents remain in Azure AI Foundry (not deleted after use)
- 🔐 **Enterprise Authentication** - Uses DefaultAzureCredential (Azure CLI, Managed Identity)
- 📊 **Built-in Observability** - Native Azure AI Foundry tracing
- ⚡ **Performance** - Agent reuse eliminates creation overhead
- 🎯 **2-Agent Architecture** - Intent extraction + SQL generation

### Pipeline Flow
```
Natural Language Query
         ↓
    [Intent Agent] ← Persistent agent in Azure AI Foundry
         ↓
    Intent JSON (entities, filters, metrics)
         ↓
    [SQL Agent] ← Persistent agent with schema context
         ↓
    Generated T-SQL
         ↓
    SQL Execution (Azure SQL)
         ↓
    Formatted Results + Token Usage + Cost
```

### Comparison with LangChain
| Feature | LangChain | Azure AI Agent Service |
|---------|-----------|------------------------|
| Orchestration | Prompt chains | Agents + Threads |
| Authentication | API Key | DefaultAzureCredential |
| State | Stateless | Stateful threads |
| Persistence | No | Yes (Azure AI Foundry) |
| Dependencies | 3 packages | 2 packages |

---

**Let's explore the pipeline step by step!**

## Step 1: Import Required Libraries

We'll import the Azure AI Agent Service SDK and other necessary modules.

In [1]:
import os
import sys
import json
import time
from datetime import datetime
from dotenv import load_dotenv

# Azure AI Agent Service
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# Add current directory to path for local imports
sys.path.insert(0, os.path.dirname(os.path.abspath('__file__')))

print("✓ All libraries imported successfully!")
print(f"  - azure-ai-projects: Available")
print(f"  - azure-identity: Available")
print(f"  - Python version: {sys.version.split()[0]}")

✓ All libraries imported successfully!
  - azure-ai-projects: Available
  - azure-identity: Available
  - Python version: 3.13.7


## Step 2: Load Environment Configuration

Load Azure AI Foundry project configuration and verify settings.

In [2]:
# Load environment variables
load_dotenv()

# Azure AI Foundry configuration
PROJECT_ENDPOINT = os.getenv("PROJECT_ENDPOINT")
MODEL_DEPLOYMENT_NAME = os.getenv("MODEL_DEPLOYMENT_NAME")

# Azure SQL configuration
AZURE_SQL_SERVER = os.getenv("AZURE_SQL_SERVER")
AZURE_SQL_DB = os.getenv("AZURE_SQL_DB")

print("Environment Configuration:")
print("=" * 60)
print(f"✓ Azure AI Project: {PROJECT_ENDPOINT}")
print(f"✓ Model Deployment: {MODEL_DEPLOYMENT_NAME}")
print(f"✓ Azure SQL Server: {AZURE_SQL_SERVER}")
print(f"✓ Database: {AZURE_SQL_DB}")
print("=" * 60)

# Verify required variables
if not PROJECT_ENDPOINT or not MODEL_DEPLOYMENT_NAME:
    raise ValueError("Missing required environment variables. Check .env file!")

Environment Configuration:
✓ Azure AI Project: https://aq-ai-foundry-sweden-central.services.ai.azure.com/api/projects/firstProject
✓ Model Deployment: gpt-4.1
✓ Azure SQL Server: aqsqlserver001.database.windows.net
✓ Database: CONTOSO-FI


## Step 3: Initialize Azure AI Project Client

Connect to Azure AI Foundry using DefaultAzureCredential (Azure CLI authentication).

In [3]:
# Initialize Azure AI Project Client
project_client = AIProjectClient(
    endpoint=PROJECT_ENDPOINT,
    credential=DefaultAzureCredential()
)

print("✓ Azure AI Project Client initialized successfully!")
print(f"  Endpoint: {PROJECT_ENDPOINT}")
print(f"  Authentication: DefaultAzureCredential (Azure CLI)")
print("\n📌 Note: This uses your Azure CLI login (az login)")
print("   No API keys stored in code!")

✓ Azure AI Project Client initialized successfully!
  Endpoint: https://aq-ai-foundry-sweden-central.services.ai.azure.com/api/projects/firstProject
  Authentication: DefaultAzureCredential (Azure CLI)

📌 Note: This uses your Azure CLI login (az login)
   No API keys stored in code!


## Step 4: Load Database Schema

Load the CONTOSO-FI database schema that will be provided to the SQL generation agent.

In [4]:
from schema_reader import get_sql_database_schema_context

# Load database schema
schema_context = get_sql_database_schema_context()

print("✓ Database schema loaded successfully!")
print(f"  Schema size: {len(schema_context)} characters")
print(f"  Database: {AZURE_SQL_DB}")
print("\n📋 Schema Preview (first 300 chars):")
print("=" * 60)
print(schema_context[:300] + "...")
print("=" * 60)

✓ Database schema loaded successfully!
  Schema size: 4777 characters
  Database: CONTOSO-FI

📋 Schema Preview (first 300 chars):
DATABASE: CONTOSO-FI (Azure SQL)
GUIDELINES
- Prefer dbo.vw_LoanPortfolio for simple portfolio-style questions.
- For hard/complex questions, use the base tables (Loan, Company, Collateral, Covenant, PaymentSchedule, etc.) and generate SQL with multiple joins, subqueries, CTEs, or advanced logic as ...


## Step 5: Create Persistent Agents

Now we'll create our two persistent agents that will remain in Azure AI Foundry.

### Agent 1: Intent Extractor
Analyzes natural language queries to extract:
- Intent (count, list, aggregate, filter)
- Entities (tables, columns)
- Metrics to calculate
- Filters and conditions
- Grouping requirements

### Agent 2: SQL Generator
Generates T-SQL queries using:
- Intent from Agent 1
- Database schema context
- Azure SQL best practices

In [5]:
# Create Intent Extraction Agent
intent_agent = project_client.agents.create_agent(
    model=MODEL_DEPLOYMENT_NAME,
    name="intent-extractor-demo",
    instructions="""You are an AI assistant that extracts the intent and entities from natural language database queries.

Analyze the user's question and provide:
1. The main intent (e.g., count, list, aggregate, filter)
2. Key entities mentioned (tables, columns, metrics)
3. Any filters or conditions
4. Desired aggregations or groupings

Return your analysis in JSON format with keys: intent, entity, metrics, filters, group_by."""
)

print("✓ Intent Extraction Agent created!")
print(f"  Agent ID: {intent_agent.id}")
print(f"  Model: {MODEL_DEPLOYMENT_NAME}")
print(f"  Name: {intent_agent.name}")
print()

# Create SQL Generation Agent
sql_agent = project_client.agents.create_agent(
    model=MODEL_DEPLOYMENT_NAME,
    name="sql-generator-demo",
    instructions=f"""You are an expert SQL query generator for Azure SQL Database.

Given the user's intent and the database schema, generate a valid T-SQL query.

Database Schema:
{schema_context}

Requirements:
- Generate clean, efficient T-SQL
- Use proper JOINs when needed
- Include appropriate WHERE clauses for filters
- Use meaningful column aliases
- Return ONLY the SQL query, no explanations
- Do NOT include markdown code blocks"""
)

print("✓ SQL Generation Agent created!")
print(f"  Agent ID: {sql_agent.id}")
print(f"  Model: {MODEL_DEPLOYMENT_NAME}")
print(f"  Name: {sql_agent.name}")
print(f"  Schema size: {len(schema_context)} characters")
print()
print("=" * 60)
print("🎉 Both agents are now persistent in Azure AI Foundry!")
print("   They will remain even after this notebook session ends.")
print("=" * 60)

✓ Intent Extraction Agent created!
  Agent ID: asst_TOSujAUTZxQxMhgG8C6OE3Gg
  Model: gpt-4.1
  Name: intent-extractor-demo

✓ SQL Generation Agent created!
  Agent ID: asst_DLoXNhObGvykHaUpERhIhHwB
  Model: gpt-4.1
  Name: sql-generator-demo
  Schema size: 4777 characters

🎉 Both agents are now persistent in Azure AI Foundry!
   They will remain even after this notebook session ends.


## Step 6: Test Intent Extraction

Let's test the Intent Extraction agent with a sample query.

In [12]:
# Test query
test_query = "Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans."

print(f"🔍 Test Query: '{test_query}'")
print("\n" + "=" * 60)

# Create a thread for this conversation
thread = project_client.agents.threads.create()

# Send the query to the intent agent
project_client.agents.messages.create(
    thread_id=thread.id,
    role="user",
    content=test_query
)

# Run the agent
run = project_client.agents.runs.create_and_process(
    thread_id=thread.id,
    agent_id=intent_agent.id
)

# Get the response
messages = project_client.agents.messages.list(thread_id=thread.id)
for message in messages:
    if message.role == "assistant":
        intent_result = message.text_messages[0].text.value
        break

print("✓ Intent Extraction Result:")
print("=" * 60)
print(intent_result)
print("=" * 60)

# Parse JSON for better display
try:
    intent_json = json.loads(intent_result)
    print("\n📊 Structured Intent:")
    for key, value in intent_json.items():
        print(f"  {key}: {value}")
except:
    pass

🔍 Test Query: 'Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans.'

✓ Intent Extraction Result:
{
  "intent": "list",
  "entity": ["companies", "industry", "region", "loans"],
  "metrics": ["total principal amount", "average interest rate"],
  "filters": null,
  "group_by": ["company"],
  "additional": {
    "sort_by": "total principal amount",
    "order": "desc",
    "limit": 10,
    "include_fields": ["industry", "region"]
  }
}

📊 Structured Intent:
  intent: list
  entity: ['companies', 'industry', 'region', 'loans']
  metrics: ['total principal amount', 'average interest rate']
  filters: None
  group_by: ['company']
  additional: {'sort_by': 'total principal amount', 'order': 'desc', 'limit': 10, 'include_fields': ['industry', 'region']}
✓ Intent Extraction Result:
{
  "intent": "list",
  "entity": ["companies", "industry", "region", "loans"],
  "metrics": ["total principal amount", "average in

## Step 7: Test SQL Generation

Now let's use the SQL agent to generate a query based on the extracted intent.

In [13]:
print(f"📝 Generating SQL for intent: {intent_result}")
print("\n" + "=" * 60)

# Create a new thread for SQL generation
sql_thread = project_client.agents.threads.create()

# Send the intent to the SQL agent
project_client.agents.messages.create(
    thread_id=sql_thread.id,
    role="user",
    content=f"Intent: {intent_result}\n\nGenerate the SQL query:"
)

# Run the SQL agent
sql_run = project_client.agents.runs.create_and_process(
    thread_id=sql_thread.id,
    agent_id=sql_agent.id
)

# Get the SQL response
sql_messages = project_client.agents.messages.list(thread_id=sql_thread.id)
for message in sql_messages:
    if message.role == "assistant":
        generated_sql = message.text_messages[0].text.value
        break

print("✓ Generated SQL Query:")
print("=" * 60)
print(generated_sql)
print("=" * 60)

📝 Generating SQL for intent: {
  "intent": "list",
  "entity": ["companies", "industry", "region", "loans"],
  "metrics": ["total principal amount", "average interest rate"],
  "filters": null,
  "group_by": ["company"],
  "additional": {
    "sort_by": "total principal amount",
    "order": "desc",
    "limit": 10,
    "include_fields": ["industry", "region"]
  }
}

✓ Generated SQL Query:
SELECT
    CompanyName,
    Industry,
    RegionName,
    SUM(PrincipalAmount) AS TotalPrincipalAmount,
    AVG(InterestRatePct) AS AverageInterestRate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY TotalPrincipalAmount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
✓ Generated SQL Query:
SELECT
    CompanyName,
    Industry,
    RegionName,
    SUM(PrincipalAmount) AS TotalPrincipalAmount,
    AVG(InterestRatePct) AS AverageInterestRate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY TotalPrincipalAmount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS O

## Step 8: Execute SQL Query

Now let's execute the generated SQL query against the Azure SQL database.

In [14]:
from sql_executor import execute_sql_query
import time

print(f"🚀 Executing SQL Query...")
print("=" * 60)

# Clean up the SQL query (remove markdown code blocks if present)
sql_query = generated_sql.strip()
if sql_query.startswith("```"):
    lines = sql_query.split('\n')
    sql_query = '\n'.join(lines[1:-1]) if len(lines) > 2 else sql_query

print(f"Query to execute:\n{sql_query}\n")
print("=" * 60)

# Execute the query
try:
    start_time = time.time()
    results = execute_sql_query(sql_query)
    execution_time = time.time() - start_time
    
    # Display execution metadata
    print(f"\n✓ Query executed successfully!")
    print(f"  Rows returned: {len(results)}")
    print(f"  Execution time: {execution_time:.3f}s")
    
    # Display column information
    if results and len(results) > 0:
        columns = list(results[0].keys())
        print(f"\n📋 Columns ({len(columns)}):")
        for col in columns:
            print(f"  - {col}")
        
        # Display raw results (first 10 rows if many)
        print(f"\n📊 Results:")
        print("=" * 60)
        max_display = min(10, len(results))
        for i, row in enumerate(results[:max_display], 1):
            print(f"Row {i}: {row}")
        
        if len(results) > max_display:
            print(f"... and {len(results) - max_display} more rows")
    else:
        print("\n⚠️  No results returned")
    
    print("=" * 60)
    
except Exception as e:
    print(f"❌ Error executing query: {str(e)}")
    import traceback
    traceback.print_exc()
    results = None

🚀 Executing SQL Query...
Query to execute:
SELECT
    CompanyName,
    Industry,
    RegionName,
    SUM(PrincipalAmount) AS TotalPrincipalAmount,
    AVG(InterestRatePct) AS AverageInterestRate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY TotalPrincipalAmount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY


✓ Query executed successfully!
  Rows returned: 10
  Execution time: 1.145s

📋 Columns (5):
  - CompanyName
  - Industry
  - RegionName
  - TotalPrincipalAmount
  - AverageInterestRate

📊 Results:
Row 1: {'CompanyName': 'Blue Ridge Energy Corp', 'Industry': 'Energy', 'RegionName': 'Americas', 'TotalPrincipalAmount': Decimal('25000000.00'), 'AverageInterestRate': Decimal('6.250000')}
Row 2: {'CompanyName': 'Toronto Health Devices', 'Industry': 'Medical Devices', 'RegionName': 'Americas', 'TotalPrincipalAmount': Decimal('25000000.00'), 'AverageInterestRate': Decimal('6.250000')}
Row 3: {'CompanyName': 'Acme Industrial Holdings', 'Industry': 'Manufacturing

## Step 9: Format Results as Table

Display the results in a clean, formatted table for better readability.

In [15]:
import pandas as pd
from IPython.display import display, HTML

if results and len(results) > 0:
    try:
        # Convert results to pandas DataFrame
        df = pd.DataFrame(results)
        
        print("📊 Formatted Results Table")
        print("=" * 80)
        print(f"Total Rows: {len(df)}")
        print(f"Columns: {', '.join(df.columns.tolist())}")
        print("=" * 80)
        print()
        
        # Display as formatted table
        # Set pandas display options for better readability
        pd.set_option('display.max_columns', None)
        pd.set_option('display.width', None)
        pd.set_option('display.max_colwidth', 50)
        
        # Display the DataFrame
        display(df)
        
        # Show basic statistics for numeric columns
        numeric_cols = df.select_dtypes(include=['int64', 'float64']).columns
        if len(numeric_cols) > 0:
            print("\n📈 Quick Statistics (Numeric Columns):")
            print("=" * 80)
            display(df[numeric_cols].describe())
        
        # Show data types
        print("\n🔍 Column Data Types:")
        print("=" * 80)
        for col, dtype in df.dtypes.items():
            print(f"  {col}: {dtype}")
            
    except Exception as e:
        print(f"⚠️  Could not format as table: {str(e)}")
        print("Raw results are available in the 'results' variable")
        import traceback
        traceback.print_exc()
else:
    print("⚠️  No results to format. The query may not have returned any data.")
    print("   Check the previous cell for execution details.")

📊 Formatted Results Table
Total Rows: 10
Columns: CompanyName, Industry, RegionName, TotalPrincipalAmount, AverageInterestRate



Unnamed: 0,CompanyName,Industry,RegionName,TotalPrincipalAmount,AverageInterestRate
0,Blue Ridge Energy Corp,Energy,Americas,25000000.0,6.25
1,Toronto Health Devices,Medical Devices,Americas,25000000.0,6.25
2,Acme Industrial Holdings,Manufacturing,Americas,24000000.0,6.25
3,Mumbai InfraTech Ltd,Infrastructure,Asia,22000000.0,3.9
4,Osaka Precision K.K.,Electronics,Asia,22000000.0,3.9
5,Shanghai GreenChem Co,Chemicals,Asia,22000000.0,3.9
6,Singapore Data Centers,Data Centers,Asia,22000000.0,3.9
7,Amsterdam FinTech BV,Technology,Europe,18000000.0,4.75
8,Gaulois Pharma SA,Pharmaceuticals,Europe,18000000.0,4.75
9,Nordwind Logistics GmbH,Logistics,Europe,18000000.0,4.75



🔍 Column Data Types:
  CompanyName: object
  Industry: object
  RegionName: object
  TotalPrincipalAmount: object
  AverageInterestRate: object


## Step 10: Run All 10 Complex Queries

Now let's run all 10 complex queries from our documentation in sequence to demonstrate the full capabilities of the NL2SQL pipeline.

This will process each question through:
1. Intent extraction
2. SQL generation
3. Query execution
4. Results display

In [16]:
# Define all 10 complex queries
complex_queries = [
    {
        "id": 1,
        "name": "Multi-Table Joins with Aggregations",
        "question": "Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans."
    },
    {
        "id": 2,
        "name": "Weighted Average Calculations",
        "question": "What is the weighted average interest rate by region and currency, weighted by principal amount?"
    },
    {
        "id": 3,
        "name": "Collateral Coverage Ratio Analysis",
        "question": "Calculate the collateral coverage ratio for each loan (total collateral value divided by principal amount) and show the top 20 loans with coverage below 120%."
    },
    {
        "id": 4,
        "name": "Time-Series Analysis",
        "question": "Show the month-over-month change in ending principal from the payment schedule for the top 20 loans with the largest principal decrease."
    },
    {
        "id": 5,
        "name": "Covenant Compliance Rate",
        "question": "What is the covenant compliance rate by industry and quarter? Show the percentage of covenant tests that passed versus total tests."
    },
    {
        "id": 6,
        "name": "Delinquency Buckets Analysis",
        "question": "Group payment events into delinquency buckets (0-29, 30-59, 60-89, 90+ days) by region and show the percentage distribution for the last 6 months."
    },
    {
        "id": 7,
        "name": "Payment Timing Performance",
        "question": "What is the average number of days between due date and paid date for payments, grouped by company and quarter?"
    },
    {
        "id": 8,
        "name": "Regional Portfolio Distribution",
        "question": "For each region, show the top 3 companies by outstanding balance using the latest ending principal from payment schedules, including each company's percentage share of the regional total."
    },
    {
        "id": 9,
        "name": "Rate Type Mix by Region",
        "question": "Show the distribution of fixed versus variable rate loans by region, including counts, total principal, and percentages."
    },
    {
        "id": 10,
        "name": "Multi-Dimensional Loan Maturity",
        "question": "What is the average loan maturity period in days by industry and loan purpose, for loans originated in the last 2 years?"
    }
]

print("=" * 80)
print("📋 10 COMPLEX QUERIES LOADED")
print("=" * 80)
for q in complex_queries:
    print(f"\n{q['id']}. {q['name']}")
    print(f"   Question: {q['question']}")
print("\n" + "=" * 80)

📋 10 COMPLEX QUERIES LOADED

1. Multi-Table Joins with Aggregations
   Question: Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans.

2. Weighted Average Calculations
   Question: What is the weighted average interest rate by region and currency, weighted by principal amount?

3. Collateral Coverage Ratio Analysis
   Question: Calculate the collateral coverage ratio for each loan (total collateral value divided by principal amount) and show the top 20 loans with coverage below 120%.

4. Time-Series Analysis
   Question: Show the month-over-month change in ending principal from the payment schedule for the top 20 loans with the largest principal decrease.

5. Covenant Compliance Rate
   Question: What is the covenant compliance rate by industry and quarter? Show the percentage of covenant tests that passed versus total tests.

6. Delinquency Buckets Analysis
   Question: Group payment events into delinq

In [17]:
# Function to process a single query through the pipeline
def process_query(query_info, intent_agent, sql_agent, project_client):
    """Process a single query through intent extraction, SQL generation, and execution."""
    
    query_id = query_info['id']
    query_name = query_info['name']
    question = query_info['question']
    
    print(f"\n{'='*80}")
    print(f"QUERY {query_id}: {query_name}")
    print(f"{'='*80}")
    print(f"Question: {question}\n")
    
    results_dict = {
        'id': query_id,
        'name': query_name,
        'question': question,
        'intent': None,
        'sql': None,
        'execution_time': None,
        'row_count': 0,
        'results': None,
        'error': None
    }
    
    try:
        # Step 1: Intent Extraction
        print("⏳ Step 1: Extracting intent...")
        intent_thread = project_client.agents.threads.create()
        project_client.agents.messages.create(
            thread_id=intent_thread.id,
            role="user",
            content=question
        )
        intent_run = project_client.agents.runs.create_and_process(
            thread_id=intent_thread.id,
            agent_id=intent_agent.id
        )
        intent_messages = project_client.agents.messages.list(thread_id=intent_thread.id)
        for message in intent_messages:
            if message.role == "assistant":
                intent_text = message.text_messages[0].text.value
                results_dict['intent'] = intent_text
                print(f"✓ Intent extracted")
                break
        
        # Step 2: SQL Generation
        print("⏳ Step 2: Generating SQL...")
        sql_thread = project_client.agents.threads.create()
        project_client.agents.messages.create(
            thread_id=sql_thread.id,
            role="user",
            content=f"Intent: {intent_text}\n\nGenerate the SQL query:"
        )
        sql_run = project_client.agents.runs.create_and_process(
            thread_id=sql_thread.id,
            agent_id=sql_agent.id
        )
        sql_messages = project_client.agents.messages.list(thread_id=sql_thread.id)
        for message in sql_messages:
            if message.role == "assistant":
                generated_sql = message.text_messages[0].text.value
                results_dict['sql'] = generated_sql
                print(f"✓ SQL generated")
                break
        
        # Clean SQL
        sql_query = generated_sql.strip()
        if sql_query.startswith("```"):
            lines = sql_query.split('\n')
            sql_query = '\n'.join(lines[1:-1]) if len(lines) > 2 else sql_query
        
        print(f"\nGenerated SQL:")
        print("-" * 80)
        print(sql_query)
        print("-" * 80)
        
        # Step 3: Execute Query
        print("\n⏳ Step 3: Executing query...")
        start_time = time.time()
        query_results = execute_sql_query(sql_query)
        execution_time = time.time() - start_time
        
        results_dict['execution_time'] = execution_time
        results_dict['row_count'] = len(query_results)
        results_dict['results'] = query_results
        
        print(f"✓ Query executed successfully")
        print(f"  Rows returned: {len(query_results)}")
        print(f"  Execution time: {execution_time:.3f}s")
        
        # Display preview of results
        if query_results and len(query_results) > 0:
            print(f"\n📊 Results Preview (first 3 rows):")
            print("-" * 80)
            for i, row in enumerate(query_results[:3], 1):
                print(f"Row {i}: {row}")
            if len(query_results) > 3:
                print(f"... and {len(query_results) - 3} more rows")
        
    except Exception as e:
        results_dict['error'] = str(e)
        print(f"❌ Error: {str(e)}")
        import traceback
        traceback.print_exc()
    
    print(f"\n{'='*80}\n")
    return results_dict

print("✓ Query processing function defined")

✓ Query processing function defined


In [18]:
# Process all 10 queries
all_results = []

print("🚀 STARTING BATCH PROCESSING OF 10 COMPLEX QUERIES")
print("=" * 80)
print(f"Start time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 80)

batch_start_time = time.time()

for query_info in complex_queries:
    result = process_query(query_info, intent_agent, sql_agent, project_client)
    all_results.append(result)
    
    # Small delay to avoid overwhelming the API
    time.sleep(1)

batch_end_time = time.time()
total_time = batch_end_time - batch_start_time

print("\n" + "=" * 80)
print("🎉 BATCH PROCESSING COMPLETE")
print("=" * 80)
print(f"End time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Total processing time: {total_time:.2f}s")
print(f"Average time per query: {total_time/len(complex_queries):.2f}s")
print("=" * 80)

🚀 STARTING BATCH PROCESSING OF 10 COMPLEX QUERIES
Start time: 2025-10-08 11:26:11

QUERY 1: Multi-Table Joins with Aggregations
Question: Show the top 10 companies by total principal amount, including their industry, region, and average interest rate across all their loans.

⏳ Step 1: Extracting intent...
✓ Intent extracted
⏳ Step 2: Generating SQL...
✓ Intent extracted
⏳ Step 2: Generating SQL...
✓ SQL generated

Generated SQL:
--------------------------------------------------------------------------------
SELECT
    CompanyName AS company,
    Industry AS industry,
    RegionName AS region,
    SUM(PrincipalAmount) AS total_principal_amount,
    AVG(InterestRatePct) AS average_interest_rate
FROM dbo.vw_LoanPortfolio
GROUP BY CompanyName, Industry, RegionName
ORDER BY total_principal_amount DESC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
--------------------------------------------------------------------------------

⏳ Step 3: Executing query...
✓ SQL generated

Generated SQL:
---------

Traceback (most recent call last):
  File "/var/folders/dj/qp0fwj152ks28q9cn0_rd3fw0000gn/T/ipykernel_35373/2625500214.py", line 81, in process_query
    query_results = execute_sql_query(sql_query)
  File "/Users/arturoquiroga/GITHUB/AQ-NEW-NL2SQL/nl2sql_standalone_AzureAI/sql_executor.py", line 29, in execute_sql_query
    cursor.execute(sql_query)
    ~~~~~~~~~~~~~~^^^^^^^^^^^
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Column 'dbo.PaymentSchedule.DueDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. (8120) (SQLExecDirectW)")



QUERY 5: Covenant Compliance Rate
Question: What is the covenant compliance rate by industry and quarter? Show the percentage of covenant tests that passed versus total tests.

⏳ Step 1: Extracting intent...
✓ Intent extracted
⏳ Step 2: Generating SQL...
✓ Intent extracted
⏳ Step 2: Generating SQL...
✓ SQL generated

Generated SQL:
--------------------------------------------------------------------------------
SELECT
    cpy.Industry AS Industry,
    DATEPART(YEAR, ctr.TestDate) AS Year,
    DATEPART(QUARTER, ctr.TestDate) AS Quarter,
    COUNT(*) AS TotalTests,
    SUM(CASE WHEN ctr.Status = 'Passed' THEN 1 ELSE 0 END) AS TestsPassed,
    CAST(SUM(CASE WHEN ctr.Status = 'Passed' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS DECIMAL(5,2)) AS CovenantComplianceRatePct
FROM
    dbo.CovenantTestResult ctr
    INNER JOIN dbo.Loan ln ON ctr.LoanId = ln.LoanId
    INNER JOIN dbo.Company cpy ON ln.CompanyId = cpy.CompanyId
GROUP BY
    cpy.Industry,
    DATEPART(YEAR, ctr.TestDate),
    DATEPART

Traceback (most recent call last):
  File "/var/folders/dj/qp0fwj152ks28q9cn0_rd3fw0000gn/T/ipykernel_35373/2625500214.py", line 81, in process_query
    query_results = execute_sql_query(sql_query)
  File "/Users/arturoquiroga/GITHUB/AQ-NEW-NL2SQL/nl2sql_standalone_AzureAI/sql_executor.py", line 29, in execute_sql_query
    cursor.execute(sql_query)
    ~~~~~~~~~~~~~~^^^^^^^^^^^
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The multi-part identifier "pe.Amount" could not be bound. (4104) (SQLExecDirectW)')



QUERY 9: Rate Type Mix by Region
Question: Show the distribution of fixed versus variable rate loans by region, including counts, total principal, and percentages.

⏳ Step 1: Extracting intent...
✓ Intent extracted
⏳ Step 2: Generating SQL...
✓ Intent extracted
⏳ Step 2: Generating SQL...
✓ SQL generated

Generated SQL:
--------------------------------------------------------------------------------
SELECT
    InterestRateType AS rate_type,
    RegionName AS region,
    COUNT(*) AS loan_count,
    SUM(PrincipalAmount) AS total_principal,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS percentage_of_loans
FROM
    dbo.vw_LoanPortfolio
GROUP BY
    InterestRateType,
    RegionName
--------------------------------------------------------------------------------

⏳ Step 3: Executing query...
✓ SQL generated

Generated SQL:
--------------------------------------------------------------------------------
SELECT
    InterestRateType AS rate_type,
    RegionName AS region,
    COUNT(*) AS loa

## Step 11: Summary Report

Generate a summary report of all 10 queries showing success rates, timing, and row counts.

In [19]:
# Create summary DataFrame
summary_data = []

for result in all_results:
    summary_data.append({
        'ID': result['id'],
        'Query Name': result['name'],
        'Status': '✓ Success' if result['error'] is None else '❌ Failed',
        'Rows': result['row_count'],
        'Time (s)': f"{result['execution_time']:.3f}" if result['execution_time'] else 'N/A',
        'Error': result['error'] if result['error'] else ''
    })

summary_df = pd.DataFrame(summary_data)

print("📊 EXECUTION SUMMARY - ALL 10 COMPLEX QUERIES")
print("=" * 100)
display(summary_df)

# Calculate statistics
successful = len([r for r in all_results if r['error'] is None])
failed = len([r for r in all_results if r['error'] is not None])
total_rows = sum([r['row_count'] for r in all_results])
avg_time = sum([r['execution_time'] for r in all_results if r['execution_time']]) / successful if successful > 0 else 0

print("\n" + "=" * 100)
print("📈 STATISTICS")
print("=" * 100)
print(f"✓ Successful queries: {successful}/{len(all_results)} ({successful/len(all_results)*100:.1f}%)")
print(f"❌ Failed queries: {failed}/{len(all_results)}")
print(f"📊 Total rows returned: {total_rows:,}")
print(f"⏱️  Average execution time: {avg_time:.3f}s")
print("=" * 100)

📊 EXECUTION SUMMARY - ALL 10 COMPLEX QUERIES


Unnamed: 0,ID,Query Name,Status,Rows,Time (s),Error
0,1,Multi-Table Joins with Aggregations,✓ Success,10,1.11,
1,2,Weighted Average Calculations,✓ Success,11,1.053,
2,3,Collateral Coverage Ratio Analysis,✓ Success,9,1.046,
3,4,Time-Series Analysis,❌ Failed,0,,"('42000', ""[42000] [Microsoft][ODBC Driver 18 ..."
4,5,Covenant Compliance Rate,✓ Success,127,1.106,
5,6,Delinquency Buckets Analysis,✓ Success,3,1.081,
6,7,Payment Timing Performance,✓ Success,0,1.086,
7,8,Regional Portfolio Distribution,❌ Failed,0,,"('42000', '[42000] [Microsoft][ODBC Driver 18 ..."
8,9,Rate Type Mix by Region,✓ Success,5,1.072,
9,10,Multi-Dimensional Loan Maturity,✓ Success,5,1.054,



📈 STATISTICS
✓ Successful queries: 8/10 (80.0%)
❌ Failed queries: 2/10
📊 Total rows returned: 170
⏱️  Average execution time: 1.076s


## Step 12: Detailed Results View

View detailed results for any specific query by its ID.

In [28]:
# Select which query to view in detail (change this number 1-10)
query_to_view = 9

# Find the selected query result
selected_result = next((r for r in all_results if r['id'] == query_to_view), None)

if selected_result:
    print("=" * 100)
    print(f"DETAILED VIEW - QUERY {selected_result['id']}: {selected_result['name']}")
    print("=" * 100)
    
    print(f"\n❓ Question:")
    print(f"   {selected_result['question']}")
    
    if selected_result['intent']:
        print(f"\n🎯 Extracted Intent:")
        print("-" * 100)
        print(selected_result['intent'])
    
    if selected_result['sql']:
        print(f"\n💻 Generated SQL:")
        print("-" * 100)
        sql_display = selected_result['sql'].strip()
        if sql_display.startswith("```"):
            lines = sql_display.split('\n')
            sql_display = '\n'.join(lines[1:-1]) if len(lines) > 2 else sql_display
        print(sql_display)
    
    print(f"\n📊 Execution Results:")
    print("-" * 100)
    print(f"   Status: {'✓ Success' if selected_result['error'] is None else '❌ Failed'}")
    print(f"   Rows returned: {selected_result['row_count']}")
    print(f"   Execution time: {selected_result['execution_time']:.3f}s" if selected_result['execution_time'] else "   N/A")
    
    if selected_result['error']:
        print(f"\n❌ Error:")
        print(f"   {selected_result['error']}")
    
    if selected_result['results'] and len(selected_result['results']) > 0:
        print(f"\n📋 Data Preview:")
        print("-" * 100)
        
        # Create DataFrame for better display
        result_df = pd.DataFrame(selected_result['results'])
        
        # Display as table
        pd.set_option('display.max_columns', None)
        pd.set_option('display.width', None)
        pd.set_option('display.max_colwidth', 50)
        
        display(result_df.head(10))
        
        if len(result_df) > 10:
            print(f"\n... showing 10 of {len(result_df)} total rows")
        
        # Show statistics for numeric columns
        numeric_cols = result_df.select_dtypes(include=['int64', 'float64']).columns
        if len(numeric_cols) > 0:
            print(f"\n📈 Quick Statistics:")
            print("-" * 100)
            display(result_df[numeric_cols].describe())
    
    print("\n" + "=" * 100)
else:
    print(f"❌ Query {query_to_view} not found. Please select a number between 1 and 10.")

print(f"\n💡 Tip: Change 'query_to_view' value (1-10) to view different query results")

DETAILED VIEW - QUERY 9: Rate Type Mix by Region

❓ Question:
   Show the distribution of fixed versus variable rate loans by region, including counts, total principal, and percentages.

🎯 Extracted Intent:
----------------------------------------------------------------------------------------------------
{
  "intent": "aggregate",
  "entity": ["loans"],
  "metrics": ["count", "total_principal", "percentage"],
  "filters": [],
  "group_by": ["rate_type", "region"]
}

💻 Generated SQL:
----------------------------------------------------------------------------------------------------
SELECT
    InterestRateType AS rate_type,
    RegionName AS region,
    COUNT(*) AS loan_count,
    SUM(PrincipalAmount) AS total_principal,
    COUNT(*) * 100.0 / SUM(COUNT(*)) OVER () AS percentage_of_loans
FROM
    dbo.vw_LoanPortfolio
GROUP BY
    InterestRateType,
    RegionName

📊 Execution Results:
----------------------------------------------------------------------------------------------------
 

Unnamed: 0,rate_type,region,loan_count,total_principal,percentage_of_loans
0,Fixed,Africa,4,48000000.0,25.0
1,Fixed,Americas,1,14000000.0,6.25
2,Floating,Americas,3,60000000.0,18.75
3,Floating,Asia,4,88000000.0,25.0
4,Floating,Europe,4,72000000.0,25.0



📈 Quick Statistics:
----------------------------------------------------------------------------------------------------


Unnamed: 0,loan_count
count,5.0
mean,3.2
std,1.30384
min,1.0
25%,3.0
50%,4.0
75%,4.0
max,4.0




💡 Tip: Change 'query_to_view' value (1-10) to view different query results
