# Databricks Genie API Integration Guide

This notebook documents how to successfully connect to and use the Databricks Genie API based on the [Microsoft documentation](https://learn.microsoft.com/en-us/azure/databricks/genie/conversation-api).

## Key Learnings
- ✅ **Authentication**: Use Databricks SDK's `client.api_client.do()` method
- ✅ **Profile**: Use `DEFAULT_azure` profile (not `DEFAULT`)
- ✅ **Environment**: Use `azure_databricks` conda environment
- ✅ **API Flow**: Start conversation → Poll status → Get results


## Prerequisites

1. **Access to Azure Databricks workspace** with Databricks SQL entitlement
2. **CAN USE privileges** on a SQL pro or serverless SQL warehouse
3. **Well-curated Genie space** that can answer user questions accurately
4. **Proper authentication** configured in `~/.databrickscfg`


## Configuration

### Workspace Details
- **Workspace URL**: `https://adb-984752964297111.11.azuredatabricks.net`
- **Genie Space ID**: `01f06a3068a81406a386e8eaefc74545`
- **Genie Space URL**: `https://adb-984752964297111.11.azuredatabricks.net/genie/rooms/01f06a3068a81406a386e8eaefc74545`

### Authentication Profile
- **Profile**: `DEFAULT_azure` (not `DEFAULT`)
- **Auth Type**: Personal Access Token (PAT)
- **Environment**: `azure_databricks` conda environment


In [None]:
# Import required libraries
import os
import json
import time
from databricks.sdk import WorkspaceClient

# Set the correct profile
os.environ['DATABRICKS_CONFIG_PROFILE'] = 'DEFAULT_azure'

# Initialize the Databricks client
client = WorkspaceClient()

print("✅ Databricks client initialized successfully")
print(f"Host: {client.config.host}")
print(f"Profile: {client.config.profile}")
print(f"Auth Type: {client.config.auth_type}")


## Step 1: Start a Genie Conversation

The first step is to start a conversation with your Genie space using the `POST /api/2.0/genie/spaces/{space_id}/start-conversation` endpoint.


In [None]:
# Configuration
space_id = "01f06a3068a81406a386e8eaefc74545"
test_question = "What is the distribution of total charges for claims?"

print(f"🔮 Starting Genie conversation...")
print(f"Question: {test_question}")

# Start conversation using SDK's built-in request method
start_url = f"/api/2.0/genie/spaces/{space_id}/start-conversation"
payload = {"content": test_question}

try:
    response = client.api_client.do('POST', start_url, body=payload)
    print("✅ Conversation started successfully!")
    print(f"Response: {json.dumps(response, indent=2)}")
    
    # Extract conversation and message IDs
    conversation_id = response.get('conversation_id')
    message_id = response.get('message_id')
    
    print(f"\n📋 Conversation ID: {conversation_id}")
    print(f"📋 Message ID: {message_id}")
    
except Exception as e:
    print(f"❌ Failed to start conversation: {e}")
    conversation_id = None
    message_id = None


## Step 2: Poll for Message Completion

After starting a conversation, you need to poll the message status until it's completed. The message goes through several states:
- `SUBMITTED` → `FILTERING_CONTEXT` → `COMPLETED`
- Or `FAILED` / `CANCELLED` if there's an error


In [None]:
def poll_message_status(space_id, conversation_id, message_id, max_attempts=12, wait_time=10):
    """Poll for message completion status"""
    
    get_message_url = f"/api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}"
    
    print(f"🔄 Polling message status...")
    
    for attempt in range(max_attempts):
        try:
            print(f"\nPolling attempt {attempt + 1}...")
            response = client.api_client.do('GET', get_message_url)
            status = response.get('status', 'UNKNOWN')
            print(f"Status: {status}")
            
            if status == 'COMPLETED':
                print("✅ Message completed successfully!")
                return response
            elif status in ['FAILED', 'CANCELLED']:
                print(f"❌ Message {status.lower()}")
                return response
            else:
                print(f"⏳ Status: {status}, waiting...")
                time.sleep(wait_time)
                
        except Exception as e:
            print(f"❌ Polling failed: {e}")
            return None
    
    print("⏰ Timeout: Message did not complete within expected time")
    return None

# Poll for completion if we have the IDs
if conversation_id and message_id:
    message_response = poll_message_status(space_id, conversation_id, message_id)
    
    if message_response:
        print(f"\n📊 Full message response: {json.dumps(message_response, indent=2)}")
        
        # Check for attachments (query results)
        attachments = message_response.get('attachments', [])
        if attachments:
            print(f"\n📎 Found {len(attachments)} attachments:")
            for i, attachment in enumerate(attachments):
                print(f"Attachment {i+1}: {json.dumps(attachment, indent=2)}")
        else:
            print("📎 No attachments found")
else:
    print("❌ Cannot poll - missing conversation or message ID")


## Step 3: Retrieve Query Results

Once the message is completed, you can retrieve the actual query results using the `attachment_id` from the response.


In [None]:
def get_query_results(space_id, conversation_id, message_id, attachment_id):
    """Get query results for an attachment"""
    
    query_results_url = f"/api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}/query-result/{attachment_id}"
    
    print(f"🔍 Getting query results...")
    print(f"URL: {query_results_url}")
    
    try:
        response = client.api_client.do('GET', query_results_url)
        print("✅ Query results retrieved successfully!")
        print(f"📊 Results: {json.dumps(response, indent=2)}")
        
        # Extract and display the actual data
        if 'statement_response' in response and 'result' in response['statement_response']:
            data_array = response['statement_response']['result'].get('data_array', [])
            print(f"\n📈 Data rows: {len(data_array)}")
            for i, row in enumerate(data_array):
                print(f"Row {i+1}: {row}")
        
        return response
        
    except Exception as e:
        print(f"❌ Error getting query results: {e}")
        return None

# Get query results if we have the attachment ID
if 'message_response' in locals() and message_response:
    attachments = message_response.get('attachments', [])
    if attachments:
        attachment_id = attachments[0].get('attachment_id')
        if attachment_id:
            query_results = get_query_results(space_id, conversation_id, message_id, attachment_id)
        else:
            print("❌ No attachment_id found in attachments")
    else:
        print("❌ No attachments found in message response")
else:
    print("❌ Cannot get query results - missing message response")


## Complete Genie API Integration Function

Here's a complete function that handles the entire Genie API workflow:


In [None]:
def query_genie_api(question, space_id, max_wait_time=120):
    """Complete Genie API workflow"""
    
    print(f"🔮 Querying Genie: {question}")
    
    try:
        # Step 1: Start conversation
        start_url = f"/api/2.0/genie/spaces/{space_id}/start-conversation"
        payload = {"content": question}
        
        response = client.api_client.do('POST', start_url, body=payload)
        conversation_id = response.get('conversation_id')
        message_id = response.get('message_id')
        
        print(f"✅ Conversation started: {conversation_id}")
        
        # Step 2: Poll for completion
        get_message_url = f"/api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}"
        
        max_attempts = max_wait_time // 10  # Poll every 10 seconds
        for attempt in range(max_attempts):
            response = client.api_client.do('GET', get_message_url)
            status = response.get('status', 'UNKNOWN')
            
            if status == 'COMPLETED':
                print("✅ Message completed!")
                break
            elif status in ['FAILED', 'CANCELLED']:
                print(f"❌ Message {status.lower()}")
                return None
            else:
                print(f"⏳ Status: {status}, waiting...")
                time.sleep(10)
        else:
            print("⏰ Timeout: Message did not complete")
            return None
        
        # Step 3: Get query results
        attachments = response.get('attachments', [])
        if attachments:
            attachment_id = attachments[0].get('attachment_id')
            if attachment_id:
                query_results_url = f"/api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}/query-result/{attachment_id}"
                query_response = client.api_client.do('GET', query_results_url)
                
                print("✅ Query results retrieved!")
                return {
                    'question': question,
                    'conversation_id': conversation_id,
                    'message_id': message_id,
                    'status': status,
                    'attachments': attachments,
                    'query_results': query_response
                }
        
        return response
        
    except Exception as e:
        print(f"❌ Genie API error: {e}")
        return None

# Test the complete function
test_questions = [
    "What is the distribution of total charges for claims?",
    "Show me the top 5 claims by amount",
    "What is the average claim amount by provider?"
]

for question in test_questions:
    print(f"\n{'='*60}")
    result = query_genie_api(question, space_id)
    if result:
        print(f"✅ Successfully processed: {question}")
    else:
        print(f"❌ Failed to process: {question}")
    print(f"{'='*60}")


## Key Learnings Summary

### ✅ What Works
- **Authentication**: Use `client.api_client.do()` method from Databricks SDK
- **Profile**: Use `DEFAULT_azure` profile (not `DEFAULT`)
- **Environment**: Use `azure_databricks` conda environment
- **API Flow**: Start conversation → Poll status → Get results

### ❌ What Doesn't Work
- Raw HTTP requests with `requests` library (403 Forbidden)
- Using `DEFAULT` profile instead of `DEFAULT_azure`
- Manual token handling instead of SDK authentication

### 🔑 Critical Success Factors
1. **Use the SDK's built-in authentication** - this handles all the complexity
2. **Correct profile configuration** - `DEFAULT_azure` has the right permissions
3. **Proper polling logic** - wait for `COMPLETED` status before getting results
4. **Well-configured Genie space** - the space must be properly curated


## Real Data Example

### Successful Query Results
**Question**: "What is the distribution of total charges for claims?"

**Generated SQL**:
```sql
SELECT FLOOR(try_divide(`total_charge`,50))*50 AS charge_bin_start, COUNT(*) AS claim_count
FROM `my_catalog`.`payer_gold`.`claims_enriched`
GROUP BY charge_bin_start
ORDER BY charge_bin_start ASC
```

**Results**:
- $100-149 range: 1 claim
- $150-199 range: 4 claims  
- $200-249 range: 2 claims
- $300-349 range: 1 claim

This demonstrates that the Genie API successfully:
1. ✅ Interpreted the natural language question
2. ✅ Generated appropriate SQL query
3. ✅ Executed the query against the data
4. ✅ Returned structured results


## Conclusion

The Databricks Genie API provides powerful natural language querying capabilities for your data. The key to success is:

1. **Proper Authentication**: Use the Databricks SDK with the correct profile
2. **Correct API Flow**: Start conversation → Poll status → Get results  
3. **Error Handling**: Implement proper polling and timeout logic
4. **Well-Curated Space**: Ensure your Genie space is properly configured

This integration enables natural language data exploration and can be seamlessly integrated into applications, chatbots, and AI agent frameworks.

---

**Last Updated**: September 28, 2025  
**Reference**: [Microsoft Databricks Genie Conversation API](https://learn.microsoft.com/en-us/azure/databricks/genie/conversation-api)
