# RefLex LLM - Complete Azure OpenAI Integration Guide

RefLex LLM is an intelligent Azure OpenAI API fallback system that automatically switches between Azure OpenAI, OpenAI, and local AI when endpoints become unavailable. It provides seamless failover capabilities while maintaining full Azure OpenAI API compatibility. The primary intent is to use the module for testing and CI runs, as local execution might be slower but also less expensive. In the future, with the possibility of spinning up a load balanced reflex kubernetes cluster, reflex could be shaped into a failsafe mechanism.

## What is RefLex LLM?

RefLex LLM acts as an intelligent middleware layer between your application and various AI providers. When your primary Azure OpenAI endpoint fails due to rate limits, outages, or network issues, RefLex automatically detects the failure and routes your requests to alternative providers without any code changes required.

## Key Features

- **Automatic Provider Selection**:  
Intelligently chooses between Azure OpenAI, OpenAI, and local Ollama based on availability and your preferences
- **Docker Integration**:  
Automatically manages local AI containers with zero configuration
- **Azure OpenAI Compatibility**:  
Drop-in replacement for the Azure OpenAI Python client with identical API
- **Model Mapping**:  
Automatically maps Azure OpenAI deployment names to equivalent local models
- **Configuration Management**:  
Supports file-based configuration for different Azure environments
- **Health Monitoring**:  
Continuous health checking and automatic recovery
- **Performance Optimization**:  
Caches configurations and maintains persistent connections

## Installation and Setup

RefLex requires Python 3.8+ and Docker for local AI capabilities. The installation includes all necessary dependencies including the Azure OpenAI client, Docker SDK, and configuration management tools.

In [None]:
!pip install reflex-llms numpy

## Azure OpenAI Provider Resolution and Basic Usage

RefLex automatically detects which AI providers are available and selects the best option based on your preference order. The system performs intelligent health checks by testing each provider in sequence and uses the first one that responds successfully.

### How Provider Testing Works

The provider resolution process involves several sophisticated steps:

1. **Azure OpenAI Testing**: Makes test HTTP requests to your Azure OpenAI endpoint (e.g., https://your-resource.openai.azure.com/), checking for valid API responses and proper authentication

2. **OpenAI Fallback**: If Azure is unavailable, tests the standard OpenAI endpoint as a fallback option

3. **RefLex Local**: Automatically starts Docker containers if needed, manages Ollama installation, and verifies local model availability

4. **Caching**: Successful configurations are cached to avoid repeated health checks and improve performance

The system is designed to be resilient and will automatically retry failed providers and handle network timeouts gracefully.

### Azure OpenAI Environment Setup

For Azure OpenAI integration, ensure these environment variables are set:
- `AZURE_OPENAI_ENDPOINT`: Your Azure OpenAI resource endpoint
- `AZURE_OPENAI_API_KEY`: Your Azure OpenAI API key
- `AZURE_OPENAI_DEPLOY_NAME`: Your deployment name (optional)
- `AZURE_OPENAI_MODEL_NAME`: Your model name (optional)

In [None]:
from reflex_llms import (
    get_openai_client, 
    get_selected_provider,
    get_module_status,
    is_using_reflex
)

# Configure client with Azure preference
client = get_openai_client(
    preference_order=["azure", "reflex"],  # Prefer Azure first
    azure_api_version="2024-02-15-preview",  # Use latest API version
    azure_base_url="https://your-resource.openai.azure.com/",  # Your Azure endpoint
    timeout=10.0
)

# Display system status
status = get_module_status()
print(f"Selected provider: {get_selected_provider()}")
print(f"Using local RefLex: {is_using_reflex()}")
print(f"Config cached: {status['has_cached_config']}")
print(f"RefLex server running: {status['reflex_server_running']}")

if get_selected_provider() == "azure":
    print("✅ Successfully connected to Azure OpenAI")
elif get_selected_provider() == "reflex":
    print("⚠️ Using RefLex fallback (Azure OpenAI not available)")
else:
    print(f"ℹ️ Using {get_selected_provider()} provider")

## Azure OpenAI Chat Completions with Automatic Failover

RefLex provides identical Azure OpenAI API functionality regardless of the underlying provider. All standard Azure OpenAI parameters work seamlessly, including temperature, max_tokens, system messages, and advanced features. The client automatically handles provider differences behind the scenes, ensuring your application code remains unchanged.

### Azure Deployment Names and Model Mapping

When using Azure OpenAI, you typically work with deployment names rather than model names. RefLex handles this automatically, mapping your Azure deployment names to appropriate local models when failover occurs. For example, your "gpt-35-turbo" deployment might map to "llama3.2:3b" locally.

In [None]:
# Import display utilities
from utils import display_message, display_stream, display_embeddings

# Standard Azure OpenAI chat completion
response = client.chat.completions.create(
    model="gpt-35-turbo",  # Azure deployment name format
    messages=[{"role": "user", "content": "Explain how RefLex LLM works with Azure OpenAI."}],
    max_tokens=150,
    temperature=0.7
)

# Display formatted response
display_message(response, as_markdown=True)

print(f"Model/Deployment used: {response.model}")
print(f"Tokens: {response.usage.total_tokens if response.usage else 'Unknown'}")
print(f"Provider: {get_selected_provider()}")

# Show Azure-specific information if using Azure
if get_selected_provider() == "azure":
    print("🔷 Response served by Azure OpenAI")
    print(f"API Version: 2024-02-15-preview")

## Azure OpenAI Model Management and Deployment Mapping

Azure OpenAI uses a deployment-based model where you create named deployments of specific models. RefLex intelligently handles the mapping between Azure deployment names and local model alternatives. This ensures seamless failover without requiring changes to your application code.

### Common Azure Deployment Patterns

Azure OpenAI deployments typically follow naming conventions like:
- `gpt-35-turbo` for GPT-3.5 Turbo
- `gpt-4` for GPT-4
- `gpt-4-32k` for GPT-4 with extended context
- `text-embedding-ada-002` for embeddings

RefLex automatically maps these to compatible local models when Azure is unavailable.

In [None]:
# List available models/deployments
models = client.models.list()

azure_deployments = []
chat_models = []
embedding_models = []
gpt4_models = []

for model in models.data:
    model_id = model.id
    if "embedding" in model_id or "ada-002" in model_id:
        embedding_models.append(model_id)
    elif "gpt-4" in model_id:
        gpt4_models.append(model_id)
    elif any(x in model_id for x in ["gpt-35", "gpt-3.5", "turbo"]):
        chat_models.append(model_id)
    
    # Track Azure-style deployments
    if get_selected_provider() == "azure":
        azure_deployments.append(model_id)

print(f"Available models/deployments ({len(models.data)} total):")

if get_selected_provider() == "azure":
    print(f"🔷 Azure OpenAI Deployments: {len(azure_deployments)}")
    print(f"Sample deployments: {sorted(azure_deployments)[:3]}")
else:
    print(f"Chat models: {len(chat_models)}")
    print(f"GPT-4 models: {len(gpt4_models)}")
    print(f"Embedding models: {len(embedding_models)}")
    print(f"Sample chat models: {sorted(chat_models)[:3]}")
    print(f"Sample GPT-4 models: {sorted(gpt4_models)[:3]}")
    print(f"Sample embedding models: {sorted(embedding_models)[:3]}")

## Azure OpenAI Embeddings with Cost Optimization

Azure OpenAI embeddings are particularly valuable for enterprise applications where data sovereignty and regional compliance are important. RefLex provides automatic failover to local embedding models when Azure quotas are reached or during maintenance windows, ensuring continuous operation while maintaining cost efficiency.

### Azure vs Local Embedding Trade-offs

- **Azure Benefits**: Enterprise compliance, regional data residency, SLA guarantees, integration with Azure ecosystem
- **Local Benefits**: No API costs for large volumes, no rate limits, offline capability, faster processing for batch operations

In [None]:
# Create embeddings using Azure OpenAI deployment
embedding_response = client.embeddings.create(
    model="text-embedding-ada-002",  # Azure deployment name
    input="RefLex LLM provides seamless fallback between Azure OpenAI and local AI models for enterprise applications."
)

# Display embedding information
display_embeddings(embedding_response, show_stats=True)

print(f"Embedding provider: {get_selected_provider()}")
if get_selected_provider() == "azure":
    print("🔷 Embeddings generated by Azure OpenAI")
    print("✅ Enterprise-grade compliance and data residency")
else:
    print("⚡ Embeddings generated locally (cost-optimized)")

## Azure OpenAI Streaming with Regional Optimization

Azure OpenAI streaming provides enterprise-grade performance with regional optimization and compliance features. RefLex maintains identical streaming functionality regardless of provider, ensuring consistent user experience whether requests are served by Azure OpenAI or local fallback models.

### Azure Streaming Benefits

- **Regional Performance**: Azure's global infrastructure provides optimized latency
- **Enterprise Features**: Built-in monitoring, logging, and compliance reporting
- **SLA Guarantees**: Enterprise-grade availability and performance commitments

In [None]:
# Create streaming request using Azure deployment
stream = client.chat.completions.create(
    model="gpt-35-turbo",  # Azure deployment name
    messages=[{"role": "user", "content": "Write a brief explanation of Azure OpenAI failover systems and enterprise benefits."}],
    max_tokens=200,
    stream=True,
    temperature=0.7
)

print(f"Streaming from: {get_selected_provider().upper()}")
if get_selected_provider() == "azure":
    print("🔷 Enterprise-grade Azure OpenAI streaming")

# Display streaming response
full_response = display_stream(stream)

## Azure OpenAI Advanced Models and Enterprise Features

Azure OpenAI provides access to the latest models with enterprise-grade features including content filtering, abuse monitoring, and compliance reporting. RefLex seamlessly handles failover to local alternatives when Azure deployments are unavailable or rate-limited.

### Azure OpenAI Enterprise Advantages

- **Content Filtering**: Built-in content safety and compliance filtering
- **Audit Logging**: Comprehensive logging for enterprise compliance
- **Regional Deployment**: Data residency and regional compliance options
- **SLA Support**: Enterprise-grade service level agreements
- **Integration**: Seamless integration with Azure ecosystem and identity management

In [None]:
# Test GPT-4 model with Azure deployment
gpt4_stream = client.chat.completions.create(
    model="gpt-4",  # Azure GPT-4 deployment
    messages=[{"role": "user", "content": "A company migrating to Azure needs to handle 1000 concurrent users with 99.9% uptime. Design a failover strategy using Azure OpenAI and local backup systems."}],
    max_tokens=300,
    stream=True,
    temperature=0.2  # Lower temperature for technical accuracy
)

print(f"GPT-4 Response from: {get_selected_provider().upper()}")
if get_selected_provider() == "azure":
    print("🔷 Enterprise Azure OpenAI GPT-4")
    print("✅ Content filtering and compliance enabled")

# Stream with markdown formatting
gpt4_response = display_stream(gpt4_stream, as_markdown=True)

## Azure OpenAI Resource Management and Monitoring

When using Azure OpenAI as the primary provider, RefLex provides visibility into both Azure resource utilization and local fallback capabilities. This dual-layer monitoring ensures you can optimize costs while maintaining service reliability.

### Azure Resource Monitoring

RefLex integrates with Azure's monitoring capabilities while providing additional insights into failover patterns, local resource usage, and cost optimization opportunities. This comprehensive view helps enterprises balance performance, cost, and compliance requirements.

In [None]:
from reflex_llms import get_reflex_server

# Display provider-specific information
current_provider = get_selected_provider()
print(f"Current Provider: {current_provider.upper()}")
print("=" * 50)

if current_provider == "azure":
    print("🔷 AZURE OPENAI CONFIGURATION")
    print(f"• API Version: 2024-02-15-preview")
    print(f"• Endpoint: [configured from environment]")
    print(f"• Authentication: Azure API Key")
    print(f"• Content Filtering: Enabled")
    print(f"• Enterprise Features: Available")
    print(f"• Regional Compliance: Configured")
    
    # Check if RefLex backup is available
    server = get_reflex_server()
    if server:
        print(f"\n🛡️ BACKUP SYSTEM STATUS")
        print(f"• Local RefLex: Available")
        print(f"• Backup URL: {server.openai_compatible_url}")
        print(f"• Status: {server.is_healthy}")
    else:
        print(f"\n⚠️ No local backup configured")
        print(f"• Consider enabling RefLex for high availability")

else:
    # Access RefLex server instance for non-Azure providers
    server = get_reflex_server()
    
    if server:
        print(f"⚡ LOCAL REFLEX SERVER (Azure Fallback Active)")
        print(f"• API URL: {server.api_url}")
        print(f"• OpenAI Compatible URL: {server.openai_compatible_url}")
        print(f"• Container: {server.container_name}")
        print(f"• Status: {'🟢 Healthy' if server.is_healthy else '🔴 Unhealthy'}")
        
        # Detailed status
        status = server.get_status()
        print(f"• Total models: {status.get('total_models', 0)}")
        print(f"• OpenAI-compatible models: {len(status.get('openai_compatible_models', []))}")
        print(f"• Setup complete: {status.get('setup_complete', False)}")
        
        print(f"\n💡 Azure OpenAI unavailable - using local backup")
    else:
        print(f"Using {current_provider} provider")

## Azure OpenAI Configuration Management

RefLex supports sophisticated configuration management for Azure OpenAI environments, including multi-region deployments, environment-specific settings, and compliance configurations. This enables enterprises to maintain consistent AI capabilities across development, staging, and production environments.

### Azure-Specific Configuration Features

The configuration system provides Azure-specific options including:
- Multiple Azure region configurations for geo-redundancy
- Environment-specific API versions and deployment names
- Content filtering and compliance policy settings
- Cost optimization through intelligent provider selection
- Integration with Azure Key Vault for secure credential management
- Monitoring and alerting configuration for Azure resources

In [None]:
# Azure-optimized configuration example
azure_config = {
    "_comment": "Azure OpenAI Optimized RefLex Configuration",
    "preference_order": ["azure", "reflex"],  # Azure first, local backup
    "azure_api_version": "2024-02-15-preview",  # Latest Azure API
    "azure_base_url": "https://your-resource.openai.azure.com",  # Your endpoint
    "timeout": 15.0,  # Higher timeout for enterprise reliability
    "reflex_server": {
        "host": "127.0.0.1",
        "port": 11434,
        "auto_setup": True,
        "model_mappings": {
            "minimal_setup": False,  # Full model set for enterprise
            "model_mapping": {
                "gpt-35-turbo": "llama3.2:3b",  # Azure naming convention
                "gpt-4": "llama3.1:8b",
                "gpt-4-32k": "llama3.1:70b",
                "text-embedding-ada-002": "nomic-embed-text"
            }
        }
    }
}

from utils import display_json_inline
print("Azure OpenAI Optimized Configuration:")
display_json_inline(azure_config)

In [None]:
# Environment-specific Azure configuration
import os
environment = os.getenv('ENVIRONMENT', 'development')

# Azure region mapping for different environments
azure_regions = {
    'development': 'eastus',
    'staging': 'westeurope', 
    'production': 'eastus2'
}

# Environment-specific provider preferences
if environment == 'development':
    preference = ["reflex", "azure"]  # Local first for dev
    api_version = "2024-02-15-preview"  # Latest for testing
elif environment == 'production':
    preference = ["azure", "reflex"]  # Azure first for prod
    api_version = "2024-02-01"  # Stable version for prod
else:
    preference = ["azure", "reflex"]
    api_version = "2024-02-15-preview"

print(f"Environment: {environment}")
print(f"Azure Region: {azure_regions.get(environment, 'eastus')}")
print(f"Provider Preference: {preference}")
print(f"Azure API Version: {api_version}")
print(f"Content Filtering: {'Strict' if environment == 'production' else 'Standard'}")

## Azure OpenAI Enterprise Error Handling and Compliance

RefLex provides enterprise-grade error handling specifically designed for Azure OpenAI deployments, including compliance logging, audit trails, and automated incident response. The system gracefully handles Azure-specific scenarios like quota limits, content filtering blocks, and regional failover.

### Azure-Specific Error Scenarios

- **Azure Quota Exhaustion**: Automatic failover to local models when Azure quotas are reached
- **Content Filter Blocks**: Intelligent retry logic and alternative model selection
- **Regional Outages**: Multi-region failover with automatic recovery
- **Authentication Issues**: Secure credential refresh and backup authentication
- **Compliance Violations**: Audit logging and automated compliance reporting

### Enterprise Monitoring and Diagnostics

The diagnostic system provides comprehensive visibility into Azure OpenAI resource utilization, compliance status, and failover patterns. This information is essential for enterprise operations, cost optimization, and regulatory compliance.

In [None]:
from reflex_llms import clear_cache, stop_reflex_server
import os

# Azure OpenAI specific diagnostics
final_status = get_module_status()
current_provider = get_selected_provider()

print("🔷 AZURE OPENAI SYSTEM DIAGNOSTICS")
print("=" * 50)
print(f"Active Provider: {current_provider.upper()}")
print(f"Configuration Cached: {final_status['has_cached_config']}")
print(f"RefLex Backup Available: {final_status['reflex_server_running']}")

# Azure environment validation
print(f"\n🔐 AZURE CREDENTIALS STATUS")
azure_endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
azure_key = os.getenv('AZURE_OPENAI_API_KEY')
azure_deployment = os.getenv('AZURE_OPENAI_DEPLOY_NAME')
azure_model = os.getenv('AZURE_OPENAI_MODEL_NAME')

print(f"• Endpoint: {'✅ Configured' if azure_endpoint else '❌ Missing'}")
print(f"• API Key: {'✅ Configured' if azure_key else '❌ Missing'}")
print(f"• Deployment Name: {'✅ Configured' if azure_deployment else '⚠️ Optional'}")
print(f"• Model Name: {'✅ Configured' if azure_model else '⚠️ Optional'}")

# Provide Azure-specific guidance
print(f"\n💡 OPTIMIZATION RECOMMENDATIONS")
if current_provider == "azure":
    print(f"• Azure OpenAI is active and healthy")
    print(f"• Consider enabling RefLex backup for high availability")
    print(f"• Monitor Azure quotas and usage patterns")
    print(f"• Implement content filtering policies as needed")
elif current_provider == "reflex":
    print(f"• Using local backup - Azure OpenAI may be unavailable")
    print(f"• Check Azure credentials and endpoint connectivity")
    print(f"• Verify Azure resource quotas and billing status")
    print(f"• Consider multi-region Azure deployment for redundancy")
else:
    print(f"• Using {current_provider} - configure Azure for enterprise features")

# Management commands
print(f"\n🛠️ MANAGEMENT COMMANDS")
print(f"• clear_cache() - Force provider re-resolution")
print(f"• stop_reflex_server() - Clean up local resources")
print(f"• Use preference_order=['azure', 'reflex'] for Azure-first setup")