# Lab 08: Advanced Generative AI with Vision

## Overview
This advanced notebook explores sophisticated multimodal AI techniques using Azure OpenAI's GPT-4 Vision. You'll learn advanced prompt engineering, complex image reasoning, batch processing, and production-ready patterns.

## Advanced Topics Covered
- Complex image reasoning and analysis
- Image comparison and visual question answering
- Chain-of-thought reasoning with images
- Few-shot learning for vision tasks
- Custom image captioning styles
- Content moderation and safety
- Batch image analysis
- Advanced prompt engineering

## Setup

In [None]:
!pip install azure-ai-projects azure-identity python-dotenv pillow matplotlib -q

In [None]:
import os
import base64
import json
from pathlib import Path
from typing import List, Dict
from dotenv import load_dotenv
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.projects.models import UserMessage, ImageContentItem, ImageUrl, TextContentItem
from PIL import Image
from IPython.display import display, HTML, Markdown
import matplotlib.pyplot as plt

# Load configuration
load_dotenv('python/.env')
project_endpoint = os.getenv("PROJECT_CONNECTION")
model_deployment = os.getenv("MODEL_DEPLOYMENT")

# Initialize client
project_client = AIProjectClient.from_connection_string(
    conn_str=project_endpoint,
    credential=DefaultAzureCredential()
)
chat_client = project_client.inference.get_chat_completions_client()

print("‚úì Environment initialized")

## 1. Complex Image Reasoning

GPT-4 Vision can perform complex reasoning tasks including counting, spatial relationships, and logical analysis.

In [None]:
def analyze_with_reasoning(image_path, prompt, show_reasoning=True):
    """Analyze image with chain-of-thought reasoning."""
    with open(image_path, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode('utf-8')
    
    reasoning_prompt = f"""{prompt}
    
{'Please explain your reasoning step-by-step before giving your final answer.' if show_reasoning else ''}
"""
    
    messages = [UserMessage(
        content=[
            TextContentItem(text=reasoning_prompt),
            ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
        ]
    )]
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=800
    )
    
    return response.choices[0].message.content

# Example: Complex reasoning about produce
print("üß† Complex Reasoning Example:\n")
result = analyze_with_reasoning(
    "mango.jpeg",
    "Based on the visual characteristics, estimate how many days until this fruit reaches peak ripeness.",
    show_reasoning=True
)
print(result)

## 2. Image Comparison and Analysis

Compare multiple images to identify differences, similarities, or make recommendations.

In [None]:
def compare_images(image_paths: List[str], comparison_prompt: str):
    """Compare multiple images with a custom prompt."""
    content = [TextContentItem(text=comparison_prompt)]
    
    # Add all images to the message
    for i, image_path in enumerate(image_paths, 1):
        with open(image_path, "rb") as img_file:
            base64_image = base64.b64encode(img_file.read()).decode('utf-8')
        content.append(ImageContentItem(
            image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}")
        ))
    
    messages = [UserMessage(content=content)]
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=800
    )
    
    return response.choices[0].message.content

# Example: Compare mango and orange
print("üîç Comparing Images:\n")
comparison = compare_images(
    ["mango.jpeg", "orange.jpeg"],
    """I've provided two fruit images. Please:
    1. Identify each fruit
    2. Compare their nutritional profiles
    3. Compare their shelf life
    4. Recommend which one is better for a smoothie and why
    """
)
print(comparison)

## 3. Few-Shot Learning with Vision

Teach the model specific patterns or styles by providing examples.

In [None]:
def few_shot_vision_learning(examples: List[Dict], test_image: str):
    """Use few-shot learning to teach the model a specific task."""
    
    messages = [
        {"role": "system", "content": "You are a produce quality inspector. Learn from the examples provided."}
    ]
    
    # Add example pairs (image + expected output)
    for example in examples:
        with open(example['image'], "rb") as img_file:
            base64_image = base64.b64encode(img_file.read()).decode('utf-8')
        
        messages.append(UserMessage(
            content=[
                TextContentItem(text="Analyze this produce:"),
                ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
            ]
        ))
        messages.append({"role": "assistant", "content": example['output']})
    
    # Now test on new image
    with open(test_image, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode('utf-8')
    
    messages.append(UserMessage(
        content=[
            TextContentItem(text="Analyze this produce:"),
            ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
        ]
    ))
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=500
    )
    
    return response.choices[0].message.content

# Example: Teach a specific quality grading format
print("üìö Few-Shot Learning Example:\n")
examples = [
    {
        'image': 'mango.jpeg',
        'output': '''Quality Grade: A
Ripeness: 85%
Shelf Life: 3-4 days
Visual Quality: Excellent color, no blemishes
Recommendation: Ready for immediate sale'''
    }
]

result = few_shot_vision_learning(examples, "orange.jpeg")
print(result)

## 4. Custom Image Captioning Styles

Generate image captions in different styles or formats for various use cases.

In [None]:
def generate_custom_caption(image_path, style):
    """Generate image captions in different styles."""
    
    style_prompts = {
        'technical': 'Provide a technical, detailed description suitable for a product catalog.',
        'poetic': 'Write a poetic, artistic description of the image.',
        'marketing': 'Write compelling marketing copy to sell this product.',
        'scientific': 'Describe from a botanical/scientific perspective.',
        'social_media': 'Write an engaging social media post with emojis.',
        'accessibility': 'Write an accessibility-friendly alt text description.'
    }
    
    with open(image_path, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode('utf-8')
    
    prompt = f"Describe this image. Style: {style_prompts.get(style, style)}"
    
    messages = [UserMessage(
        content=[
            TextContentItem(text=prompt),
            ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
        ]
    )]
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=300
    )
    
    return response.choices[0].message.content

# Generate captions in multiple styles
print("üé® Custom Caption Styles:\n")
styles = ['technical', 'poetic', 'marketing', 'social_media']

for style in styles:
    print(f"\n{'='*60}")
    print(f"Style: {style.upper()}")
    print('='*60)
    caption = generate_custom_caption("mango.jpeg", style)
    print(caption)

## 5. Content Moderation with Vision

Use GPT-4 Vision to analyze images for quality, safety, and appropriateness.

In [None]:
def moderate_image_content(image_path, criteria):
    """Analyze image against specific content criteria."""
    
    with open(image_path, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode('utf-8')
    
    prompt = f"""Analyze this image for the following criteria:
    
    {chr(10).join(f'- {criterion}' for criterion in criteria)}
    
    Provide a JSON response with:
    - overall_status: "approved" or "rejected"
    - confidence: 0-100
    - issues: list of any concerns
    - recommendations: list of suggestions
    """
    
    messages = [UserMessage(
        content=[
            TextContentItem(text=prompt),
            ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
        ]
    )]
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=500
    )
    
    return response.choices[0].message.content

# Example: Product quality check
print("üõ°Ô∏è Content Moderation Example:\n")
quality_criteria = [
    "Product is clearly visible and in focus",
    "No visible damage or defects",
    "Appropriate for retail display",
    "Professional product photography standards",
    "Accurate color representation"
]

moderation_result = moderate_image_content("mango.jpeg", quality_criteria)
print(moderation_result)

## 6. Batch Image Analysis

Efficiently process multiple images with structured output.

In [None]:
def batch_analyze_images(image_paths, analysis_template):
    """Analyze multiple images with a consistent template."""
    results = []
    
    for image_path in image_paths:
        print(f"Analyzing: {image_path}...")
        
        with open(image_path, "rb") as img_file:
            base64_image = base64.b64encode(img_file.read()).decode('utf-8')
        
        messages = [UserMessage(
            content=[
                TextContentItem(text=analysis_template),
                ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
            ]
        )]
        
        response = chat_client.complete(
            model=model_deployment,
            messages=messages,
            max_tokens=400
        )
        
        results.append({
            'image': image_path,
            'analysis': response.choices[0].message.content
        })
    
    return results

# Batch process available images
print("üìä Batch Analysis:\n")
images_to_analyze = ["mango.jpeg", "orange.jpeg"]

template = """Provide a structured analysis:
1. Fruit Type:
2. Estimated Weight:
3. Ripeness (1-10):
4. Price Suggestion:
5. Marketing Angle:
"""

batch_results = batch_analyze_images(images_to_analyze, template)

for result in batch_results:
    print(f"\n{'='*60}")
    print(f"Image: {result['image']}")
    print('='*60)
    print(result['analysis'])

## 7. Advanced Prompt Engineering Techniques

Explore sophisticated prompting strategies for optimal results.

In [None]:
def structured_analysis(image_path, structure):
    """Get structured, parseable output from vision analysis."""
    
    with open(image_path, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode('utf-8')
    
    prompt = f"""Analyze this image and provide a response in the following JSON structure:
    
    {json.dumps(structure, indent=2)}
    
    Provide ONLY valid JSON, no additional text.
    """
    
    messages = [UserMessage(
        content=[
            TextContentItem(text=prompt),
            ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
        ]
    )]
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=600
    )
    
    return response.choices[0].message.content

# Example: Get structured product data
print("üèóÔ∏è Structured Analysis Example:\n")
structure_template = {
    "product_name": "string",
    "category": "string",
    "color": ["primary_color", "secondary_color"],
    "quality_score": "1-10",
    "attributes": {
        "size": "small/medium/large",
        "ripeness": "percentage",
        "condition": "description"
    },
    "recommended_use": "string",
    "storage_instructions": "string"
}

structured_result = structured_analysis("mango.jpeg", structure_template)
print(structured_result)

## 8. Visual Question Answering with Confidence Scores

Get answers with confidence levels for decision-making.

In [None]:
def vqa_with_confidence(image_path, question):
    """Visual Question Answering with confidence scoring."""
    
    with open(image_path, "rb") as img_file:
        base64_image = base64.b64encode(img_file.read()).decode('utf-8')
    
    prompt = f"""{question}
    
    Provide your answer in this format:
    Answer: [your answer]
    Confidence: [0-100]%
    Reasoning: [brief explanation]
    Alternative: [if applicable, alternative interpretation]
    """
    
    messages = [UserMessage(
        content=[
            TextContentItem(text=prompt),
            ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
        ]
    )]
    
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=400
    )
    
    return response.choices[0].message.content

# Example questions
print("‚ùì VQA with Confidence:\n")
questions = [
    "Is this fruit organic?",
    "What is the estimated ripening date?",
    "Would this fruit be suitable for making juice?"
]

for q in questions:
    print(f"\nQuestion: {q}")
    print("-" * 60)
    answer = vqa_with_confidence("mango.jpeg", q)
    print(answer)
    print()

## 9. Error Handling and Best Practices

In [None]:
def robust_vision_analysis(image_path, prompt, max_retries=3):
    """Robust vision analysis with error handling and retries."""
    
    for attempt in range(max_retries):
        try:
            # Validate image
            if not os.path.exists(image_path):
                return {"error": "Image file not found", "status": "failed"}
            
            # Check file size (GPT-4 Vision has limits)
            file_size_mb = os.path.getsize(image_path) / (1024 * 1024)
            if file_size_mb > 20:
                return {"error": "Image too large (>20MB)", "status": "failed"}
            
            # Process image
            with open(image_path, "rb") as img_file:
                base64_image = base64.b64encode(img_file.read()).decode('utf-8')
            
            messages = [UserMessage(
                content=[
                    TextContentItem(text=prompt),
                    ImageContentItem(image_url=ImageUrl(url=f"data:image/jpeg;base64,{base64_image}"))
                ]
            )]
            
            response = chat_client.complete(
                model=model_deployment,
                messages=messages,
                max_tokens=500,
                temperature=0.7
            )
            
            return {
                "result": response.choices[0].message.content,
                "status": "success",
                "attempt": attempt + 1
            }
            
        except Exception as e:
            if attempt == max_retries - 1:
                return {
                    "error": str(e),
                    "status": "failed",
                    "attempts": max_retries
                }
            print(f"Attempt {attempt + 1} failed, retrying...")
    
    return {"error": "Max retries exceeded", "status": "failed"}

# Test error handling
print("üõ†Ô∏è Testing Robust Analysis:\n")
result = robust_vision_analysis("mango.jpeg", "Describe this image briefly.")
print(f"Status: {result['status']}")
if result['status'] == 'success':
    print(f"Result: {result['result']}")
else:
    print(f"Error: {result['error']}")

## Summary

In this advanced lab, you explored:

‚úÖ **Complex reasoning** - Chain-of-thought and logical analysis  
‚úÖ **Image comparison** - Multi-image analysis and recommendations  
‚úÖ **Few-shot learning** - Teaching custom patterns to the model  
‚úÖ **Custom captioning** - Style-specific descriptions  
‚úÖ **Content moderation** - Quality and safety checking  
‚úÖ **Batch processing** - Efficient multi-image workflows  
‚úÖ **Structured output** - JSON and parseable responses  
‚úÖ **Confidence scoring** - Decision-making support  
‚úÖ **Error handling** - Production-ready patterns  

## Best Practices

1. **Be specific in prompts** - Detailed instructions yield better results
2. **Use structured output** - Request JSON for parseable responses
3. **Implement retries** - Handle transient failures gracefully
4. **Validate images** - Check size and format before processing
5. **Use few-shot examples** - Guide the model with demonstrations
6. **Request confidence scores** - Make informed decisions
7. **Batch efficiently** - Process multiple images systematically

## Production Considerations

- **Rate limiting**: Implement throttling for API calls
- **Caching**: Cache results for repeated queries
- **Cost optimization**: Monitor token usage
- **Security**: Validate and sanitize image inputs
- **Monitoring**: Track success rates and errors
- **Fallbacks**: Have backup strategies for failures