# Lab 08: Generative AI with Vision

## Overview
In this lab, you'll learn how to use Azure OpenAI's GPT-4 Vision model to analyze and understand images. You'll build an interactive application that can answer questions about images, similar to a smart grocery store assistant that helps identify and provide information about produce.

## Learning Objectives
- Connect to Azure OpenAI GPT-4 Vision service
- Analyze images with natural language queries
- Build multi-turn conversations with image context
- Understand prompt engineering for vision tasks

## Prerequisites
- Azure OpenAI resource with GPT-4 Vision deployment
- API credentials configured in `.env` file

## Step 1: Setup and Configuration

First, let's install the required packages and import dependencies.

In [None]:
# Install required packages
!pip install azure-ai-projects azure-identity python-dotenv pillow -q

In [None]:
import os
import base64
from pathlib import Path
from dotenv import load_dotenv
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.projects.models import UserMessage, ImageContentItem, ImageUrl, TextContentItem
from PIL import Image
from IPython.display import display, HTML

print("‚úì Packages imported successfully")

## Step 2: Load Configuration

Load the Azure OpenAI credentials from the `.env` file.

In [None]:
# Load environment variables from the python subfolder
load_dotenv('python/.env')

project_endpoint = os.getenv("PROJECT_CONNECTION")
model_deployment = os.getenv("MODEL_DEPLOYMENT")

if not project_endpoint or not model_deployment:
    print("‚ö†Ô∏è  Please configure PROJECT_CONNECTION and MODEL_DEPLOYMENT in python/.env file")
else:
    print(f"‚úì Configuration loaded")
    print(f"  Endpoint: {project_endpoint[:50]}...")
    print(f"  Model Deployment: {model_deployment}")

## Step 3: Initialize Azure OpenAI Client

Create a connection to the Azure OpenAI service and get a chat client for GPT-4 Vision.

In [None]:
# Initialize the project client
project_client = AIProjectClient.from_connection_string(
    conn_str=project_endpoint,
    credential=DefaultAzureCredential()
)

# Get a chat client
chat_client = project_client.inference.get_chat_completions_client()

print("‚úì Azure OpenAI client initialized successfully")

## Step 4: Helper Functions

Create utility functions to encode images and display them in the notebook.

In [None]:
def encode_image_to_base64(image_path):
    """Encode image file to base64 string."""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def display_image(image_path, width=400):
    """Display an image in the notebook."""
    img = Image.open(image_path)
    display(img.resize((width, int(img.height * width / img.width))))

def get_vision_response(image_path, user_prompt, system_prompt=None):
    """Get a response from GPT-4 Vision for an image and prompt."""
    
    # Encode image to base64
    base64_image = encode_image_to_base64(image_path)
    image_url = f"data:image/jpeg;base64,{base64_image}"
    
    # Prepare messages
    messages = []
    
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    # Create user message with image and text
    messages.append(UserMessage(
        content=[
            TextContentItem(text=user_prompt),
            ImageContentItem(image_url=ImageUrl(url=image_url))
        ]
    ))
    
    # Get response
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=500
    )
    
    return response.choices[0].message.content

print("‚úì Helper functions defined")

## Step 5: Analyze Your First Image

Let's analyze an image of a mango and ask questions about it. This demonstrates the basic image understanding capabilities of GPT-4 Vision.

In [None]:
# Path to the mango image
image_path = "mango.jpeg"

# Display the image
print("üñºÔ∏è  Analyzing image:")
display_image(image_path)

# Define system message
system_message = "You are an AI assistant in a grocery store that sells fruit. You provide detailed answers to questions about produce."

# Ask a question about the image
question = "What fruit is this?"
print(f"\n‚ùì Question: {question}")
print("\nü§ñ Response:")

response = get_vision_response(image_path, question, system_message)
print(response)

## Step 6: Ask More Detailed Questions

GPT-4 Vision can answer more complex questions about the image, such as ripeness, nutritional information, and usage suggestions.

In [None]:
# Question about ripeness
question = "Is this mango ripe? How can you tell?"
print(f"‚ùì Question: {question}\n")

response = get_vision_response(image_path, question, system_message)
print(f"ü§ñ Response:\n{response}\n")
print("-" * 80)

In [None]:
# Question about nutritional value
question = "What are the nutritional benefits of this fruit?"
print(f"‚ùì Question: {question}\n")

response = get_vision_response(image_path, question, system_message)
print(f"ü§ñ Response:\n{response}\n")
print("-" * 80)

In [None]:
# Question about usage
question = "What are some ways to use this fruit in cooking or recipes?"
print(f"‚ùì Question: {question}\n")

response = get_vision_response(image_path, question, system_message)
print(f"ü§ñ Response:\n{response}\n")

## Step 7: Analyze Another Image (Orange)

Let's try with a different fruit to see how the model adapts.

In [None]:
# Path to the orange image
orange_image = "orange.jpeg"

# Display the image
print("üñºÔ∏è  Analyzing image:")
display_image(orange_image)

# Ask questions
question = "What type of fruit is this and what are its characteristics?"
print(f"\n‚ùì Question: {question}\n")

response = get_vision_response(orange_image, question, system_message)
print(f"ü§ñ Response:\n{response}")

## Step 8: Multi-Turn Conversation (Advanced)

For multi-turn conversations where you want to maintain context across multiple questions about the same image, you can build a conversation history.

In [None]:
def multi_turn_conversation(image_path, questions, system_prompt=None):
    """Have a multi-turn conversation about an image."""
    
    # Encode image once
    base64_image = encode_image_to_base64(image_path)
    image_url = f"data:image/jpeg;base64,{base64_image}"
    
    # Initialize conversation history
    messages = []
    
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    # Add first message with image
    messages.append(UserMessage(
        content=[
            TextContentItem(text=questions[0]),
            ImageContentItem(image_url=ImageUrl(url=image_url))
        ]
    ))
    
    # Get first response
    response = chat_client.complete(
        model=model_deployment,
        messages=messages,
        max_tokens=500
    )
    
    print(f"‚ùì Q1: {questions[0]}")
    print(f"ü§ñ A1: {response.choices[0].message.content}\n")
    
    # Add assistant's response to history
    messages.append({"role": "assistant", "content": response.choices[0].message.content})
    
    # Continue conversation with follow-up questions
    for i, question in enumerate(questions[1:], start=2):
        messages.append({"role": "user", "content": question})
        
        response = chat_client.complete(
            model=model_deployment,
            messages=messages,
            max_tokens=500
        )
        
        print(f"‚ùì Q{i}: {question}")
        print(f"ü§ñ A{i}: {response.choices[0].message.content}\n")
        
        messages.append({"role": "assistant", "content": response.choices[0].message.content})

print("‚úì Multi-turn conversation function defined")

In [None]:
# Display the mystery fruit image
mystery_image = "python/mystery-fruit.jpeg"
print("üñºÔ∏è  Mystery Fruit:")
display_image(mystery_image)
print("\nüí¨ Starting conversation...\n")

# Define a series of related questions
conversation_questions = [
    "What fruit is in this image?",
    "How can I tell when it's ripe?",
    "What's the best way to store it?",
    "Can you suggest a simple recipe using it?"
]

# Have the conversation
multi_turn_conversation(mystery_image, conversation_questions, system_message)

## Step 9: Interactive Q&A Session

Create an interactive widget where you can ask your own questions about an image.

In [None]:
def interactive_qa(image_path):
    """Interactive Q&A session for an image."""
    print("üñºÔ∏è  Current Image:")
    display_image(image_path)
    print("\n" + "="*80)
    print("üí° Ask questions about the image (type 'quit' to exit)")
    print("="*80 + "\n")
    
    system_prompt = "You are an AI assistant in a grocery store that sells fruit. You provide detailed answers to questions about produce."
    
    while True:
        question = input("\n‚ùì Your question: ").strip()
        
        if question.lower() == 'quit':
            print("\nüëã Thank you for using the image Q&A assistant!")
            break
        
        if not question:
            print("‚ö†Ô∏è  Please enter a question.")
            continue
        
        print("\nü§î Analyzing...")
        response = get_vision_response(image_path, question, system_prompt)
        print(f"\nü§ñ Response:\n{response}")
        print("\n" + "-"*80)

# Uncomment the line below to start an interactive session
# interactive_qa("mango.jpeg")

## Step 10: Experiment with Your Own Images

Try analyzing your own images! Place an image file in the current directory and update the path below.

In [None]:
# Upload your own image and specify the path
# your_image = "path/to/your/image.jpg"
# display_image(your_image)
# 
# your_question = "Describe what you see in this image."
# response = get_vision_response(your_image, your_question)
# print(f"Response: {response}")

## Summary

In this lab, you learned how to:

‚úÖ **Connect to Azure OpenAI GPT-4 Vision** - Initialize and authenticate with the service  
‚úÖ **Analyze images** - Ask questions and get detailed responses about image content  
‚úÖ **Build multi-turn conversations** - Maintain context across multiple questions  
‚úÖ **Create interactive applications** - Build practical Q&A systems with vision capabilities  

## Key Takeaways

- GPT-4 Vision can understand and analyze images with natural language
- Images must be encoded as base64 or provided as URLs
- System prompts help guide the model's behavior and responses
- Multi-turn conversations allow for deeper, context-aware interactions

## Next Steps

- Explore the **Advanced Notebook** (08-gen-ai-vision-advanced.ipynb) for more complex vision AI scenarios
- Try different types of images (products, scenes, documents)
- Experiment with different system prompts to customize behavior
- Combine vision with other AI capabilities for richer applications