# Amazon Bedrock Model ID Manager - Demonstration Notebook

This notebook demonstrates the functionality of the ModelManager class for downloading, parsing, and managing Amazon Bedrock foundational model information.

## Features Demonstrated

1. **Basic Model Data Retrieval**: Download and parse the latest model information
2. **Data Exploration**: Analyze model distribution by provider and region
3. **Filtering and Querying**: Find specific models based on criteria
4. **JSON Output Analysis**: Examine the structured output format
5. **Error Handling**: Demonstrate robust error handling
6. **Advanced Configuration**: Custom settings and caching behavior

## Prerequisites

Make sure you have the required dependencies installed:
```bash
pip install requests beautifulsoup4 lxml pandas matplotlib seaborn
```

## 1. Setup and Imports

In [None]:
# Standard library imports
import logging
import json
from pathlib import Path
from datetime import datetime
from collections import Counter, defaultdict

# Third-party imports for data analysis and visualization
try:
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    VISUALIZATION_AVAILABLE = True
except ImportError:
    print("Visualization libraries not available. Install with: pip install pandas matplotlib seaborn")
    VISUALIZATION_AVAILABLE = False

# Our ModelManager components
import sys
sys.path.append('../src')

from bedrock.ModelManager import ModelManager, ModelManagerError
from bedrock.models.constants import JSONFields
from bedrock.models.aws_regions import AWSRegions

# Configure logging to see what's happening
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print("✅ Setup complete!")

## 2. Basic Usage - Download and Parse Model Data

In [None]:
# Initialize ModelManager with custom paths for this demo
manager = ModelManager(
    html_output_path=Path("../docs/demo_bedrock_models.html"),
    json_output_path=Path("../docs/demo_models.json"),
    download_timeout=60  # Longer timeout for demo
)

print(f"ModelManager configuration:")
print(f"  HTML output: {manager.html_output_path}")
print(f"  JSON output: {manager.json_output_path}")
print(f"  Source URL: {manager.documentation_url}")

In [None]:
# Download and parse the latest model data
print("🔄 Refreshing model data from AWS Bedrock documentation...\n")

try:
    catalog = manager.refresh_model_data()
    
    print(f"✅ Successfully retrieved model data!")
    print(f"📊 Found {catalog.model_count} models")
    print(f"🕐 Data retrieved at: {catalog.retrieval_timestamp}")
    
except ModelManagerError as e:
    print(f"❌ Error retrieving model data: {e}")
    # For demo purposes, we'll continue with a simplified example

## 3. Exploring the Model Catalog

In [None]:
# Display first few models to understand the data structure
if 'catalog' in locals():
    print("📋 Sample of available models:\n")
    
    sample_models = list(catalog.models.items())[:5]  # First 5 models
    
    for model_name, model_info in sample_models:
        print(f"🔹 {model_name}")
        print(f"   Provider: {model_info.provider}")
        print(f"   Model ID: {model_info.model_id}")
        print(f"   Regions: {', '.join(model_info.regions_supported[:3])}{'...' if len(model_info.regions_supported) > 3 else ''}")
        print(f"   Input: {', '.join(model_info.input_modalities)}")
        print(f"   Output: {', '.join(model_info.output_modalities)}")
        print(f"   Streaming: {'✅' if model_info.streaming_supported else '❌'}")
        print()

## 4. Provider Analysis

In [None]:
# Analyze models by provider
if 'catalog' in locals():
    provider_counts = Counter()
    provider_streaming = defaultdict(list)
    
    for model_name, model_info in catalog.models.items():
        provider_counts[model_info.provider] += 1
        provider_streaming[model_info.provider].append(model_info.streaming_supported)
    
    print("📈 Models by Provider:\n")
    for provider, count in provider_counts.most_common():
        streaming_count = sum(provider_streaming[provider])
        streaming_pct = (streaming_count / count) * 100
        print(f"🏢 {provider}: {count} models ({streaming_count} support streaming - {streaming_pct:.1f}%)")
    
    # Demonstrate provider filtering
    print("\n🔍 Amazon models:")
    amazon_models = manager.get_models_by_provider("Amazon")
    for name in list(amazon_models.keys())[:3]:
        print(f"   • {name}")
    if len(amazon_models) > 3:
        print(f"   ... and {len(amazon_models) - 3} more")

## 5. Regional Analysis

In [None]:
# Analyze models by AWS region
if 'catalog' in locals():
    region_counts = Counter()
    
    for model_name, model_info in catalog.models.items():
        for region in model_info.regions_supported:
            region_counts[region] += 1
    
    print("🌍 Top 10 Regions by Model Availability:\n")
    for region, count in region_counts.most_common(10):
        print(f"📍 {region}: {count} models")
    
    # Demonstrate region filtering
    print("\n🔍 Models available in us-east-1:")
    us_east_models = manager.get_models_by_region("us-east-1")
    print(f"   Total: {len(us_east_models)} models")
    
    # Show a few examples
    for name in list(us_east_models.keys())[:5]:
        print(f"   • {name}")
    if len(us_east_models) > 5:
        print(f"   ... and {len(us_east_models) - 5} more")

## 6. Modality Analysis

In [None]:
# Analyze input and output modalities
if 'catalog' in locals():
    input_modalities = Counter()
    output_modalities = Counter()
    
    for model_name, model_info in catalog.models.items():
        for modality in model_info.input_modalities:
            input_modalities[modality] += 1
        for modality in model_info.output_modalities:
            output_modalities[modality] += 1
    
    print("🎯 Input Modalities:\n")
    for modality, count in input_modalities.most_common():
        print(f"   📥 {modality}: {count} models")
    
    print("\n🎯 Output Modalities:\n")
    for modality, count in output_modalities.most_common():
        print(f"   📤 {modality}: {count} models")
    
    # Find multimodal models
    multimodal_input = [name for name, info in catalog.models.items() 
                       if len(info.input_modalities) > 1]
    multimodal_output = [name for name, info in catalog.models.items() 
                        if len(info.output_modalities) > 1]
    
    print(f"\n🔀 Multimodal Models:")
    print(f"   Multiple inputs: {len(multimodal_input)} models")
    print(f"   Multiple outputs: {len(multimodal_output)} models")

## 7. Streaming Support Analysis

In [None]:
# Analyze streaming support
if 'catalog' in locals():
    streaming_models = manager.get_streaming_models()
    total_models = catalog.model_count
    streaming_percentage = (len(streaming_models) / total_models) * 100
    
    print(f"🚀 Streaming Support Analysis:\n")
    print(f"   Total models: {total_models}")
    print(f"   Streaming supported: {len(streaming_models)} ({streaming_percentage:.1f}%)")
    print(f"   No streaming: {total_models - len(streaming_models)} ({100 - streaming_percentage:.1f}%)")
    
    print("\n🔍 Sample streaming-enabled models:")
    for name in list(streaming_models.keys())[:5]:
        model_info = streaming_models[name]
        print(f"   • {name} ({model_info.provider})")
    
    if len(streaming_models) > 5:
        print(f"   ... and {len(streaming_models) - 5} more")

## 8. JSON Output Examination

In [None]:
# Examine the JSON output structure
if manager.json_output_path.exists():
    print("📄 JSON Output Structure:\n")
    
    # Load and display JSON structure
    with open(manager.json_output_path, 'r', encoding='utf-8') as f:
        json_data = json.load(f)
    
    print(f"📁 Top-level keys: {list(json_data.keys())}")
    print(f"🕐 Retrieval timestamp: {json_data[JSONFields.RETRIEVAL_TIMESTAMP]}")
    print(f"📊 Number of models: {len(json_data[JSONFields.MODELS])}")
    
    # Show structure of one model
    if json_data[JSONFields.MODELS]:
        sample_model_name = list(json_data[JSONFields.MODELS].keys())[0]
        sample_model = json_data[JSONFields.MODELS][sample_model_name]
        
        print(f"\n🔍 Sample model structure ({sample_model_name}):")
        for field, value in sample_model.items():
            value_type = type(value).__name__
            if isinstance(value, list):
                print(f"   {field}: {value_type} with {len(value)} items")
            elif isinstance(value, str) and len(value) > 50:
                print(f"   {field}: {value_type} ({len(value)} chars)")
            else:
                print(f"   {field}: {value} ({value_type})")
    
    # File size information
    file_size = manager.json_output_path.stat().st_size
    print(f"\n💾 JSON file size: {file_size:,} bytes ({file_size/1024:.1f} KB)")

else:
    print("❌ JSON output file not found")

## 9. Visualization (if libraries available)

In [None]:
# Create visualizations if pandas/matplotlib are available
if VISUALIZATION_AVAILABLE and 'catalog' in locals():
    print("📊 Creating visualizations...\n")
    
    # Prepare data for visualization
    model_data = []
    for name, info in catalog.models.items():
        model_data.append({
            'name': name,
            'provider': info.provider,
            'streaming': info.streaming_supported,
            'num_regions': len(info.regions_supported),
            'num_input_modalities': len(info.input_modalities),
            'num_output_modalities': len(info.output_modalities)
        })
    
    df = pd.DataFrame(model_data)
    
    # Set up the plotting style
    plt.style.use('default')
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Amazon Bedrock Models Analysis', fontsize=16, fontweight='bold')
    
    # Provider distribution
    provider_counts = df['provider'].value_counts()
    axes[0, 0].pie(provider_counts.values, labels=provider_counts.index, autopct='%1.1f%%')
    axes[0, 0].set_title('Models by Provider')
    
    # Streaming support by provider
    streaming_by_provider = df.groupby(['provider', 'streaming']).size().unstack(fill_value=0)
    streaming_by_provider.plot(kind='bar', stacked=True, ax=axes[0, 1], 
                              color=['lightcoral', 'lightgreen'])
    axes[0, 1].set_title('Streaming Support by Provider')
    axes[0, 1].set_xlabel('Provider')
    axes[0, 1].set_ylabel('Number of Models')
    axes[0, 1].legend(['No Streaming', 'Streaming Supported'])
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Region availability distribution
    axes[1, 0].hist(df['num_regions'], bins=10, edgecolor='black', alpha=0.7)
    axes[1, 0].set_title('Distribution of Regional Availability')
    axes[1, 0].set_xlabel('Number of Regions')
    axes[1, 0].set_ylabel('Number of Models')
    
    # Modality complexity
    axes[1, 1].scatter(df['num_input_modalities'], df['num_output_modalities'], 
                      alpha=0.6, s=60)
    axes[1, 1].set_title('Input vs Output Modalities')
    axes[1, 1].set_xlabel('Number of Input Modalities')
    axes[1, 1].set_ylabel('Number of Output Modalities')
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print("✅ Visualizations complete!")
    
else:
    print("📊 Visualization libraries not available or no data to visualize")

## 10. Error Handling Demonstration

In [None]:
# Demonstrate error handling with invalid configurations
print("🧪 Testing error handling scenarios...\n")

# Test 1: Invalid URL
try:
    invalid_manager = ModelManager(
        documentation_url="https://invalid-url-that-does-not-exist.com",
        download_timeout=5
    )
    catalog = invalid_manager.refresh_model_data()
except ModelManagerError as e:
    print(f"✅ Correctly handled invalid URL: {type(e).__name__}")

# Test 2: Trying to query without data
try:
    empty_manager = ModelManager()
    # Don't call refresh_model_data()
    models = empty_manager.get_models_by_provider("Amazon")
except ModelManagerError as e:
    print(f"✅ Correctly handled missing data: {e}")

print("\n🎯 Error handling tests complete!")

## 11. Advanced Configuration Example

In [None]:
# Demonstrate advanced configuration options
print("⚙️ Advanced Configuration Example\n")

# Create a manager with custom settings for production use
production_manager = ModelManager(
    html_output_path=Path("../cache/production_bedrock.html"),
    json_output_path=Path("../output/production_models.json"),
    download_timeout=120  # Longer timeout for production
)

print(f"Production configuration:")
print(f"  HTML cache: {production_manager.html_output_path}")
print(f"  JSON output: {production_manager.json_output_path}")
print(f"  Source: {production_manager.documentation_url}")

# Demonstrate caching behavior
print(f"\n🗂️ Caching demonstration:")
if production_manager.html_output_path.exists():
    file_time = datetime.fromtimestamp(production_manager.html_output_path.stat().st_mtime)
    age_minutes = (datetime.now() - file_time).total_seconds() / 60
    print(f"  Cached HTML file age: {age_minutes:.1f} minutes")
    
    if age_minutes < 60:  # Less than 1 hour
        print("  ⚡ Would use cached data with force_download=False")
    else:
        print("  📥 Would download fresh data (cache is stale)")
else:
    print("  📥 No cached data available - would download fresh")

print(f"\n📋 Manager representation: {production_manager}")

## 12. Performance and Usage Summary

In [None]:
# Summarize what we've learned and performance characteristics
print("📈 Performance and Usage Summary\n")

if 'catalog' in locals():
    print(f"✅ Successfully processed {catalog.model_count} models")
    print(f"📅 Data retrieved: {catalog.retrieval_timestamp}")
    
    # File sizes
    if manager.html_output_path.exists():
        html_size = manager.html_output_path.stat().st_size
        print(f"💾 HTML file size: {html_size:,} bytes ({html_size/1024:.1f} KB)")
    
    if manager.json_output_path.exists():
        json_size = manager.json_output_path.stat().st_size
        print(f"💾 JSON file size: {json_size:,} bytes ({json_size/1024:.1f} KB)")
        
        if 'html_size' in locals():
            compression_ratio = (1 - json_size/html_size) * 100
            print(f"📦 Size reduction: {compression_ratio:.1f}% (structured data vs raw HTML)")

print("\n🎯 Key Takeaways:")
print("   • ModelManager provides easy access to Bedrock model information")
print("   • Supports filtering by provider, region, and capabilities")
print("   • Outputs clean, structured JSON with proper timestamps")
print("   • Handles errors gracefully with informative messages")
print("   • Uses intelligent caching to minimize unnecessary downloads")
print("   • Follows production-quality coding standards")

print("\n🚀 Ready for production use!")

## 13. Next Steps and Integration Examples

Here are some ways you might integrate this ModelManager into your applications:

### Automated Model Discovery
```python
# Run daily to keep model information current
def update_model_catalog():
    manager = ModelManager()
    try:
        catalog = manager.refresh_model_data(force_download=False)
        logger.info(f"Updated catalog with {catalog.model_count} models")
        return catalog
    except ModelManagerError as e:
        logger.error(f"Failed to update model catalog: {e}")
        return None
```

### Region-Specific Model Selection
```python
# Find best model for specific region and requirements
def find_suitable_models(region, streaming_required=False, provider_preference=None):
    manager = ModelManager()
    catalog = manager.refresh_model_data(force_download=False)
    
    suitable_models = {}
    for name, info in catalog.models.items():
        if (region in info.regions_supported and 
            (not streaming_required or info.streaming_supported) and
            (not provider_preference or info.provider == provider_preference)):
            suitable_models[name] = info
    
    return suitable_models
```

### Cost Optimization
```python
# Analyze model availability across regions for cost optimization
def analyze_regional_options(preferred_models):
    manager = ModelManager()
    catalog = manager.refresh_model_data(force_download=False)
    
    analysis = {}
    for model_name in preferred_models:
        if model_name in catalog.models:
            model_info = catalog.models[model_name]
            analysis[model_name] = {
                'regions': model_info.regions_supported,
                'streaming': model_info.streaming_supported,
                'provider': model_info.provider
            }
    
    return analysis
```

## Documentation

For complete documentation, see:
- `docs/ModelManager_Documentation.md` - Comprehensive API documentation
- Source code in `src/bedrock/` - Fully commented implementation
- This notebook - Interactive examples and demonstrations