# Multimodal Automotive Damage Assessment Demo

This notebook demonstrates multimodal indexing patterns using automotive damage assessment data. It shows how to process both text descriptions and images of vehicle damage using different AI processing patterns.

## Features demonstrated:
- **Text Pattern**: Processes written damage descriptions
- **Hybrid Pattern**: Combines photos with text descriptions
- **Full Embedding Pattern**: Creates unified understanding of images and text
- **Describe Pattern**: Generates text descriptions from photos
- **Summarize Pattern**: Condenses lengthy reports into key points

## Prerequisites:
- AWS credentials configured
- S3 Vector permissions
- Damage photos in the images/ folder
- Sample damage data in JSON format

## Setup and Imports

In [None]:
# Run the setup script to configure imports
%run setup_imports.py

In [None]:
# Additional imports for notebook
from typing import Dict, Any, List
import json
from IPython.display import Image, display, HTML
import matplotlib.pyplot as plt
from PIL import Image as PILImage

print("✅ Notebook setup complete!")

## Load and Display Sample Data

Let's examine the automotive damage data we'll be working with:

In [None]:
def load_damage_data():
    """Load the multimodal damage data from JSON file."""
    data = load_json_data(MULTIMODAL_DAMAGE_DATA_FILE)
    return data['damage_cases']

def load_dealer_escalation_text():
    """Load the dealer escalation text for summarize pattern."""
    return load_text_data(DEALER_ESCALATION_FILE)

def get_image_path(damage_id):
    """Get the path to the damage image."""
    return os.path.join(IMAGES_DIR, f"{damage_id}.jpeg")

# Load the data
damage_cases = load_damage_data()
dealer_escalation_text = load_dealer_escalation_text()

print(f"Loaded {len(damage_cases)} damage cases")
print(f"Dealer escalation text: {len(dealer_escalation_text):,} characters")

# Display first damage case
print("\nFirst damage case:")
case = damage_cases[0]
print(f"ID: {case['id']}")
print(f"Vehicle: {case['metadata']['vehicle_year']} {case['metadata']['vehicle_make']} {case['metadata']['vehicle_model']}")
print(f"Damage Type: {case['metadata']['damage_type']}")
print(f"Estimated Cost: ${case['metadata']['estimated_cost']}")
print(f"Description: {case['damage_text']}")

## Display Sample Damage Images

Let's look at the actual damage photos we'll be processing:

In [None]:
# Display the first few damage images
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
fig.suptitle('Sample Automotive Damage Photos', fontsize=16)

for i, case in enumerate(damage_cases[:6]):
    row = i // 3
    col = i % 3
    
    image_path = get_image_path(case['id'])
    
    try:
        # Load and display image
        img = PILImage.open(image_path)
        axes[row, col].imshow(img)
        axes[row, col].set_title(f"{case['id']}\n{case['metadata']['vehicle_make']} {case['metadata']['vehicle_model']}")
        axes[row, col].axis('off')
        
        # Print image description
        print(f"{case['id']}: {case['image_description']}")
        
    except Exception as e:
        axes[row, col].text(0.5, 0.5, f"Image not found\n{case['id']}", 
                           ha='center', va='center', transform=axes[row, col].transAxes)
        axes[row, col].axis('off')
        print(f"{case['id']}: Image not found - {e}")

plt.tight_layout()
plt.show()

## Create MMIngestor with Image Processing

Set up the multimodal ingestor with image resizing for automotive photos:

In [None]:
def create_mm_ingestor():
    """Create MMIngestor with image resizing for automotive photos."""
    vector_bucket_name, object_bucket_name, index_name = get_standard_names()
    
    mm_ingestor = MMIngestor(
        index_name=index_name,
        region_name=REGION_NAME
    )
    
    # Configure image resizer for automotive damage photos
    mm_ingestor.preprocessor_chain.clear()
    image_resizer = ImageResizer(
        target_size=(512, 384),
        preserve_aspect_ratio=True
    )
    mm_ingestor.add_preprocessor(image_resizer)
    
    print(f"MMIngestor created with {len(mm_ingestor.pattern_engine.list_patterns())} patterns: {mm_ingestor.pattern_engine.list_patterns()}")
    
    return mm_ingestor

# Create the ingestor
mm_ingestor = create_mm_ingestor()

## Display Dealer Escalation Text for Summarize Pattern

The summarize pattern works with longer text documents. Let's look at the dealer escalation case:

In [None]:
print("=== Dealer Escalation Case (for Summarize Pattern) ===")
print(f"Length: {len(dealer_escalation_text):,} characters")
print("\nContent preview:")
print(dealer_escalation_text[:500] + "...")

# Show the full text in a scrollable area
display(HTML(f"""
<div style="height: 300px; overflow-y: scroll; border: 1px solid #ccc; padding: 10px; background-color: #f9f9f9;">
<h4>Full Dealer Escalation Text:</h4>
<pre style="white-space: pre-wrap;">{dealer_escalation_text}</pre>
</div>
"""))

## Ingest Dealer Escalation Document (Summarize Pattern)

In [None]:
def ingest_dealer_escalation():
    """Ingest dealer escalation text using summarize pattern."""
    print("=== Ingesting Dealer Escalation Document ===")
    
    print(f"Loaded dealer escalation text: {len(dealer_escalation_text):,} characters")
    
    # Use summarize pattern for long document
    doc_id = mm_ingestor.ingest(
        content={'text': dealer_escalation_text},
        metadata={
            'document_type': 'dealer_escalation',
            'category': 'customer_service',
            'dealer': 'Lakeside Honda',
            'customer': 'Mrs. Janet T.',
            'vehicle': '2021 Honda Civic EX',
            'case_id': '2024-0934'
        },
        pattern="summarize"
    )
    
    print(f"Ingested dealer escalation document: {doc_id}")
    return doc_id

# Ingest the dealer escalation document
dealer_doc_id = ingest_dealer_escalation()

In [None]:
def ingest_damage_cases():
    """Ingest automotive damage cases using different patterns."""
    print("=== Ingesting Automotive Damage Cases ===")
    
    print(f"Loaded {len(damage_cases)} damage cases")
    
    # Use different patterns for each case
    patterns_to_use = ["text", "hybrid", "full_embedding", "describe", "summarize", "text"]
    ingested_docs = []
    
    for i, case in enumerate(damage_cases):
        pattern = patterns_to_use[i]
        image_path = get_image_path(case['id'])
        
        # Extend text for summarize pattern (needs 1000+ characters)
        if pattern == "summarize":
            case_text = case['damage_text'] + " " + """
            Additional detailed assessment: The damage assessment was conducted by certified technicians 
            following industry standard procedures. The inspection revealed multiple areas of concern 
            that require immediate attention. The structural integrity of the vehicle has been evaluated 
            and documented according to insurance guidelines. Repair estimates include both parts and 
            labor costs based on current market rates. The vehicle owner has been notified of all 
            findings and recommended repair procedures. All documentation has been submitted to the 
            insurance carrier for processing and approval. The repair facility has been selected 
            based on certification and availability. Timeline for repairs depends on parts availability 
            and shop scheduling. Customer satisfaction and safety remain our top priorities throughout 
            the entire repair process. Quality assurance checks will be performed at each stage.
            The assessment includes detailed photographic documentation of all damage areas, measurements
            of affected components, and evaluation of potential safety implications. All work will be
            performed according to manufacturer specifications and industry best practices.
            """
        else:
            case_text = case['damage_text']
        
        vehicle = f"{case['metadata']['vehicle_year']} {case['metadata']['vehicle_make']} {case['metadata']['vehicle_model']}"
        
        metadata = {
            'damage_id': case['id'],
            'vehicle_make': case['metadata']['vehicle_make'],
            'vehicle_model': case['metadata']['vehicle_model'],
            'vehicle_year': case['metadata']['vehicle_year'],
            'damage_type': case['metadata']['damage_type'],
            'severity': case['metadata']['severity'],
            'estimated_cost': case['metadata']['estimated_cost']
        }
        
        print(f"\nProcessing {case['id']}: {vehicle} using {pattern} pattern")
        
        # Display the image being processed
        if pattern in ["hybrid", "full_embedding", "describe"]:
            try:
                display(Image(image_path, width=200))
                print(f"Image: {case['image_description']}")
            except:
                print(f"Image not found: {image_path}")
        
        print(f"Text: {case_text[:100]}...")
        
        try:
            if pattern in ["hybrid", "full_embedding", "describe"]:
                doc_id = mm_ingestor.ingest(content={'text': case_text, 'image': image_path}, metadata=metadata, pattern=pattern)
            else:  # text and summarize patterns
                doc_id = mm_ingestor.ingest(content={'text': case_text}, metadata=metadata, pattern=pattern)
            
            ingested_docs.append(doc_id)
            print(f"✓ Successfully ingested as: {doc_id}")
            
        except Exception as e:
            print(f"✗ Error: {e}")
    
    print(f"\nSuccessfully ingested {len(ingested_docs)} damage cases")
    return ingested_docs

# Ingest all damage cases
damage_docs = ingest_damage_cases()

## Search Examples and Results Analysis

Now let's demonstrate search functionality and analyze the results:

In [None]:
def print_search_results(results: List[Dict[str, Any]], query: str):
    """Print search results in a clean format with detailed analysis."""
    print(f"\n🔍 Query: '{query}'")
    print(f"Found {len(results)} results:")
    print("=" * 80)
    
    for i, result in enumerate(results, 1):
        metadata = result.get('metadata', {})
        pattern = metadata.get('pattern', 'unknown')
        score = result['similarity_score']
        
        print(f"\n{i}. Score: {score:.3f} | Pattern: {pattern}")
        
        if 'vehicle_make' in metadata and 'vehicle_model' in metadata:
            vehicle = f"{metadata.get('vehicle_year', '')} {metadata['vehicle_make']} {metadata['vehicle_model']}"
            damage_type = metadata.get('damage_type', '').replace('_', ' ').title()
            cost = metadata.get('estimated_cost', 'N/A')
            print(f"   🚗 Vehicle: {vehicle}")
            print(f"   💥 Damage: {damage_type}")
            print(f"   💰 Cost: ${cost}")
        
        if 'document_type' in metadata:
            doc_type = metadata['document_type'].replace('_', ' ').title()
            print(f"   📄 Document: {doc_type}")
            if 'dealer' in metadata:
                print(f"   🏢 Dealer: {metadata['dealer']}")
            if 'case_id' in metadata:
                print(f"   🔢 Case ID: {metadata['case_id']}")
        
        print("-" * 40)

# Search Example 1: Honda damage
print("=== Search Example 1: Honda Damage ===")
query1 = "Honda bumper damage"
results1 = mm_ingestor.search(query={'text': query1}, top_k=3)
print_search_results(results1, query1)

In [None]:
# Search Example 2: Dashboard warning lights (should find dealer escalation)
print("=== Search Example 2: Dashboard Warning Lights ===")
query2 = "dashboard warning lights"
results2 = mm_ingestor.search(query={'text': query2}, top_k=3)
print_search_results(results2, query2)

In [None]:
# Search Example 3: Collision damage with metadata filter
print("=== Search Example 3: Collision Damage (with metadata filter) ===")
query3 = "collision damage"
results3 = mm_ingestor.search(
    text=query3,
    metadata_filters={'damage_type': 'collision_damage'},
    top_k=3
)
print_search_results(results3, query3)

In [None]:
# Search Example 4: High-cost repairs
print("=== Search Example 4: Expensive Repairs ===")
query4 = "expensive repair high cost damage"
results4 = mm_ingestor.search(query={'text': query4}, top_k=4)
print_search_results(results4, query4)

## Query S3 Vector Metadata and Show Storage Details

Let's examine what's actually stored in the S3 Vector Store:

In [None]:
import boto3
import sys
import os
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath('.'))))
from utils import get_standard_names

def show_s3_vector_metadata():
    """Query and display S3 Vector Store metadata."""
    vector_bucket_name, object_bucket_name, index_name = get_standard_names()
    index_name = index_name[index_name.find("/") +1 :]
    print(f'vector_bucket_name: {vector_bucket_name}')
    print(f'object_bucket_name: {object_bucket_name}')
    print(f'index_name: {index_name}')
    
    # Initialize S3 Vectors client
    s3_vectors_client = boto3.client('s3vectors', region_name=REGION_NAME)
    
    try:
        # List vectors in the index
        response = s3_vectors_client.list_vectors(
            vectorBucketName=vector_bucket_name,
            indexName=index_name,
            maxResults=20
        )
        
        print(f"=== S3 Vector Store Contents ===")
        print(f"Bucket: {vector_bucket_name}")
        print(f"Index: {index_name}")
        print(f"Total vectors found: {len(response.get('vectors', []))}")
        
        # Display metadata for each vector
        for i, vector_info in enumerate(response.get('vectors', []), 1):
            vector_key = vector_info['key']
            
            # Get detailed vector information
            vector_details = s3_vectors_client.get_vectors(
                vectorBucketName=vector_bucket_name,
                indexName=index_name,
                keys=[vector_key],
                returnMetadata=True
            )
            
            if vector_details.get('vectors'):
                vector_data = vector_details['vectors'][0]
                metadata = vector_data.get('metadata', {})
                
                print(f"\n{i}. Vector Key: {vector_key}")
                print(f"   Metadata Tags: {len(metadata)} items")
                
                # Show key metadata
                for key, value in metadata.items():
                    if isinstance(value, str) and len(value) > 50:
                        print(f"   {key}: {value[:50]}...")
                    else:
                        print(f"   {key}: {value}")
                
                # Check for S3 object references
                if 's3_object_key' in metadata:
                    print(f"   📁 S3 Object: s3://{object_bucket_name}/{metadata['s3_object_key']}")
        
    except Exception as e:
        print(f"Error querying S3 Vector Store: {e}")

# Show S3 Vector Store contents
show_s3_vector_metadata()

## Pattern Analysis Summary

Let's analyze how each multimodal pattern performed:

In [None]:
def analyze_patterns():
    """Analyze the performance and characteristics of different patterns."""
    
    patterns_used = ["text", "hybrid", "full_embedding", "describe", "summarize", "text"]
    
    print("=== Multimodal Pattern Analysis ===")
    print()
    
    pattern_descriptions = {
        "text": {
            "description": "Processes only text descriptions using standard text embeddings",
            "use_case": "When you have detailed written damage reports",
            "pros": ["Fast processing", "Good for text-heavy documents", "Lower resource usage"],
            "cons": ["Misses visual information", "Limited to text content only"]
        },
        "hybrid": {
            "description": "Combines text and image processing with separate embeddings",
            "use_case": "When you have both photos and descriptions of damage",
            "pros": ["Leverages both text and visual info", "Good search across modalities"],
            "cons": ["More complex processing", "Requires both text and images"]
        },
        "full_embedding": {
            "description": "Creates unified embeddings from both text and images",
            "use_case": "For comprehensive multimodal understanding",
            "pros": ["Single unified representation", "Best cross-modal search"],
            "cons": ["Most resource intensive", "Complex embedding process"]
        },
        "describe": {
            "description": "Generates text descriptions from images, then processes as text",
            "use_case": "When you have images but limited text descriptions",
            "pros": ["Extracts info from images", "Creates searchable text from visuals"],
            "cons": ["Dependent on image description quality", "May miss nuanced details"]
        },
        "summarize": {
            "description": "Condenses long documents into key points before embedding",
            "use_case": "For lengthy reports, case files, or documentation",
            "pros": ["Handles long documents", "Focuses on key information"],
            "cons": ["May lose detailed information", "Requires substantial text input"]
        }
    }
    
    for pattern, info in pattern_descriptions.items():
        print(f"🔧 {pattern.upper()} PATTERN")
        print(f"   Description: {info['description']}")
        print(f"   Best Use Case: {info['use_case']}")
        print(f"   ✅ Pros: {', '.join(info['pros'])}")
        print(f"   ⚠️  Cons: {', '.join(info['cons'])}")
        print()
    
    print("=== Pattern Selection Guidelines ===")
    print("• Use TEXT for pure text documents (reports, descriptions)")
    print("• Use HYBRID when you have both good text and images")
    print("• Use FULL_EMBEDDING for the best cross-modal search experience")
    print("• Use DESCRIBE when images are your primary data source")
    print("• Use SUMMARIZE for long documents that need condensing")

# Run pattern analysis
analyze_patterns()

In [None]:
# Interactive search - modify this cell to try your own queries
custom_query = "severe damage expensive repair"
custom_results = mm_ingestor.search(query={'text': custom_query}, top_k=5)
print_search_results(custom_results, custom_query)

In [None]:
# Try a search with metadata filtering
filtered_query = "vehicle damage"
filtered_results = mm_ingestor.search(
    text=filtered_query, 
    metadata_filters={'vehicle_make': 'Honda'},
    top_k=3
)
print_search_results(filtered_results, f"{filtered_query} (Honda only)")

## Summary and Key Takeaways

This notebook demonstrated the power of multimodal AI processing for automotive damage assessment:

### What We Accomplished:
1. **Processed Real Damage Photos**: Combined visual and textual information from actual automotive damage cases
2. **Multiple AI Patterns**: Demonstrated 5 different processing approaches for various use cases
3. **Semantic Search**: Found relevant damage cases using natural language queries
4. **Metadata Integration**: Combined AI-powered search with structured data filtering
5. **Document Summarization**: Processed lengthy dealer escalation cases into searchable summaries

### Real-World Applications:
- **Insurance Claims Processing**: Automatically categorize and route damage claims
- **Repair Cost Estimation**: Find similar historical cases for accurate pricing
- **Quality Control**: Identify patterns in damage types and repair outcomes
- **Customer Service**: Quickly find relevant cases and solutions for customer inquiries
- **Training Data**: Build comprehensive databases for training adjusters and technicians

### Technical Benefits:
- **Scalable Processing**: Handle thousands of damage cases efficiently
- **Flexible Search**: Natural language queries work better than keyword matching
- **Rich Metadata**: Combine AI insights with structured business data
- **Multiple Modalities**: Process text, images, and documents in a unified system

### Next Steps:
- Integrate with claims management systems
- Add more sophisticated image analysis
- Implement automated damage severity scoring
- Build conversational interfaces for adjusters and customers
- Create automated reporting and analytics dashboards

This multimodal approach transforms how automotive businesses can process, understand, and act on damage-related information, making operations more efficient and customer experiences more responsive.