# Lab 05: Object Detection with Azure Custom Vision

In this lab, you'll learn how to train a custom object detection model using Azure Custom Vision. Unlike image classification which assigns labels to entire images, object detection identifies and locates specific objects within images using bounding boxes.

## Learning Objectives

By the end of this lab, you will be able to:
- Understand the difference between classification and object detection
- Create and configure an Azure Custom Vision object detection project
- Upload images with bounding box annotations
- Train a custom object detection model
- Detect objects in images and visualize results with bounding boxes
- Evaluate model performance using confidence thresholds

## Prerequisites

- Azure subscription with Custom Vision resource created
- Training and prediction keys from Azure portal
- Images with bounding box annotations (JSON format)

## 1. Setup and Installation

In [None]:
# Install required packages
!pip install azure-cognitiveservices-vision-customvision python-dotenv pillow matplotlib numpy

## 2. Import Required Libraries

In [None]:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from azure.cognitiveservices.vision.customvision.training.models import (
    ImageFileCreateBatch, ImageFileCreateEntry, Region
)
from msrest.authentication import ApiKeyCredentials
from dotenv import load_dotenv
from PIL import Image, ImageDraw, ImageFont
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
import time
import os
import json
import uuid

print("Libraries imported successfully!")

## 3. Understanding Object Detection

### Classification vs Object Detection:

**Image Classification:**
- Assigns a single label to an entire image
- Output: "This image contains an apple"

**Object Detection:**
- Identifies and locates multiple objects within an image
- Output: "Apple at coordinates (x=0.2, y=0.3, width=0.15, height=0.2) with 95% confidence"

### Bounding Box Coordinates:
- Coordinates are normalized (0-1 range)
- **left**: Distance from left edge (0 = left, 1 = right)
- **top**: Distance from top edge (0 = top, 1 = bottom)
- **width**: Width of box as fraction of image width
- **height**: Height of box as fraction of image height

## 4. Configure Azure Custom Vision Credentials

In [None]:
# Load environment variables
env_path = 'python/train-detector/.env'
load_dotenv(env_path)

# Get configuration settings
training_endpoint = os.getenv('TrainingEndpoint')
training_key = os.getenv('TrainingKey')
prediction_endpoint = os.getenv('PredictionEndpoint', training_endpoint)
prediction_key = os.getenv('PredictionKey', training_key)
project_id = os.getenv('ProjectID', None)

print(f"Training Endpoint: {training_endpoint}")
print(f"Prediction Endpoint: {prediction_endpoint}")
print(f"Project ID: {project_id if project_id else 'Will create new project'}")

## 5. Authenticate Clients

In [None]:
# Authenticate training client
training_credentials = ApiKeyCredentials(in_headers={"Training-key": training_key})
training_client = CustomVisionTrainingClient(training_endpoint, training_credentials)

# Authenticate prediction client
prediction_credentials = ApiKeyCredentials(in_headers={"Prediction-key": prediction_key})
prediction_client = CustomVisionPredictionClient(prediction_endpoint, prediction_credentials)

print("Clients authenticated successfully!")

## 6. Create Object Detection Project

Create a Custom Vision project specifically for object detection. Note the `domain_id` and project type.

In [None]:
# Get available domains
domains = training_client.get_domains()

# Find an object detection domain
print("Available Object Detection Domains:")
print("=" * 70)

obj_detection_domain = None
for domain in domains:
    if domain.type == 'ObjectDetection':
        print(f"  - {domain.name} (Exportable: {domain.exportable})")
        if obj_detection_domain is None:
            obj_detection_domain = domain

# Create or get existing project
if project_id:
    print(f"\nConnecting to existing project: {project_id}")
    project = training_client.get_project(project_id)
else:
    project_name = f"Fruit Object Detection {uuid.uuid4().hex[:8]}"
    print(f"\nCreating new object detection project: {project_name}")
    project = training_client.create_project(
        name=project_name,
        description="Detect and locate fruits (apple, banana, orange) in images",
        domain_id=obj_detection_domain.id
    )
    project_id = project.id

print(f"\nProject Details:")
print(f"  Name: {project.name}")
print(f"  ID: {project.id}")
print(f"  Type: Object Detection")
print(f"\n⚠️  Save this Project ID: {project.id}")

## 7. Create Object Tags

Create tags for the types of objects you want to detect.

In [None]:
# Define object types to detect
tag_names = ['apple', 'banana', 'orange']

# Get existing tags or create new ones
existing_tags = training_client.get_tags(project.id)
existing_tag_names = [tag.name for tag in existing_tags]

tags = {}
for tag_name in tag_names:
    if tag_name in existing_tag_names:
        tag = next(t for t in existing_tags if t.name == tag_name)
        print(f"Found existing tag: {tag_name}")
    else:
        tag = training_client.create_tag(project.id, tag_name)
        print(f"Created new tag: {tag_name}")
    tags[tag_name] = tag

print(f"\nTotal tags: {len(tags)}")

## 8. Load and Visualize Tagged Images

Load the bounding box annotations from JSON file and visualize some examples.

In [None]:
# Load tagged images data
tagged_images_json = 'python/train-detector/tagged-images.json'

with open(tagged_images_json, 'r') as f:
    tagged_data = json.load(f)

print(f"Loaded {len(tagged_data['files'])} tagged images")
print(f"\nFirst image example:")
print(f"  Filename: {tagged_data['files'][0]['filename']}")
print(f"  Number of objects: {len(tagged_data['files'][0]['tags'])}")
print(f"\nObject details:")
for obj in tagged_data['files'][0]['tags']:
    print(f"  - {obj['tag']}: left={obj['left']:.3f}, top={obj['top']:.3f}, "
          f"width={obj['width']:.3f}, height={obj['height']:.3f}")

In [None]:
def visualize_bounding_boxes(image_path, bounding_boxes, title="Tagged Image"):
    """
    Visualize an image with bounding boxes.
    
    Args:
        image_path: Path to image file
        bounding_boxes: List of dicts with 'tag', 'left', 'top', 'width', 'height'
        title: Plot title
    """
    # Load image
    img = Image.open(image_path)
    img_width, img_height = img.size
    
    # Create figure and axis
    fig, ax = plt.subplots(1, figsize=(12, 8))
    ax.imshow(img)
    
    # Define colors for different object types
    colors = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange'}
    
    # Draw bounding boxes
    for bbox in bounding_boxes:
        # Convert normalized coordinates to pixel coordinates
        left = bbox['left'] * img_width
        top = bbox['top'] * img_height
        width = bbox['width'] * img_width
        height = bbox['height'] * img_height
        
        # Create rectangle patch
        color = colors.get(bbox['tag'], 'cyan')
        rect = patches.Rectangle(
            (left, top), width, height,
            linewidth=3, edgecolor=color, facecolor='none'
        )
        ax.add_patch(rect)
        
        # Add label
        ax.text(
            left, top - 5,
            bbox['tag'],
            color='white',
            fontsize=12,
            fontweight='bold',
            bbox=dict(facecolor=color, alpha=0.7, edgecolor='none', pad=2)
        )
    
    ax.set_title(title, fontsize=14, fontweight='bold')
    ax.axis('off')
    plt.tight_layout()
    plt.show()

# Visualize first few training images with bounding boxes
print("Sample Training Images with Bounding Boxes:\n")

images_folder = 'python/train-detector/images'
for i in range(min(3, len(tagged_data['files']))):
    image_info = tagged_data['files'][i]
    image_path = os.path.join(images_folder, image_info['filename'])
    
    if os.path.exists(image_path):
        visualize_bounding_boxes(
            image_path,
            image_info['tags'],
            f"Training Image {i+1}: {image_info['filename']}"
        )

## 9. Upload Images with Bounding Box Annotations

Upload training images along with their bounding box annotations. Each image can contain multiple objects.

**Note**: Azure Custom Vision requires at least 15 images for object detection training, with at least 5 instances of each object type.

In [None]:
def upload_tagged_images(project_id, tagged_data, images_folder, tags_dict, training_client):
    """
    Upload images with bounding box annotations.
    
    Args:
        project_id: Custom Vision project ID
        tagged_data: Dictionary containing image filenames and bounding box data
        images_folder: Path to folder containing images
        tags_dict: Dictionary mapping tag names to tag objects
        training_client: Authenticated training client
    """
    print("Uploading images with bounding box annotations...\n")
    
    tagged_images_with_regions = []
    
    for image_info in tagged_data['files']:
        filename = image_info['filename']
        image_path = os.path.join(images_folder, filename)
        
        if not os.path.exists(image_path):
            print(f"⚠️  Image not found: {filename}")
            continue
        
        # Create regions (bounding boxes) for each object in the image
        regions = []
        for bbox in image_info['tags']:
            tag_name = bbox['tag']
            
            if tag_name not in tags_dict:
                print(f"⚠️  Unknown tag '{tag_name}' in {filename}")
                continue
            
            tag_id = tags_dict[tag_name].id
            
            # Create region with normalized coordinates
            region = Region(
                tag_id=tag_id,
                left=bbox['left'],
                top=bbox['top'],
                width=bbox['width'],
                height=bbox['height']
            )
            regions.append(region)
        
        # Read image data
        with open(image_path, 'rb') as image_file:
            image_data = image_file.read()
        
        # Add to batch
        tagged_images_with_regions.append(
            ImageFileCreateEntry(
                name=filename,
                contents=image_data,
                regions=regions
            )
        )
    
    # Upload in batches (max 64 images per batch)
    batch_size = 64
    total_uploaded = 0
    
    for i in range(0, len(tagged_images_with_regions), batch_size):
        batch = tagged_images_with_regions[i:i + batch_size]
        
        print(f"Uploading batch {i // batch_size + 1} ({len(batch)} images)...")
        
        upload_result = training_client.create_images_from_files(
            project_id,
            ImageFileCreateBatch(images=batch)
        )
        
        if upload_result.is_batch_successful:
            total_uploaded += len(batch)
            print(f"  ✓ Batch uploaded successfully")
        else:
            print(f"  ⚠️  Some images failed to upload")
            for image in upload_result.images:
                if image.status != "OK":
                    print(f"    - {image.source_url}: {image.status}")
    
    print(f"\n✓ Total images uploaded: {total_uploaded}")
    return total_uploaded

# Upload the tagged images
uploaded_count = upload_tagged_images(
    project.id,
    tagged_data,
    images_folder,
    tags,
    training_client
)

## 10. Verify Uploaded Images

Check image counts and ensure sufficient training data.

In [None]:
# Get detailed statistics
print("Training Data Summary:")
print("=" * 70)

# Count images
total_images = training_client.get_tagged_image_count(project.id)
print(f"\nTotal tagged images: {total_images}")

# Count instances per tag
print(f"\nObject Instances by Type:")
print("-" * 70)

for tag_name, tag in tags.items():
    tag_info = training_client.get_tag(project.id, tag.id)
    print(f"{tag_name:15} : {tag_info.image_count} instances")

# Check if we have enough data
print(f"\n" + "=" * 70)
if total_images >= 15:
    print("✓ Sufficient images for training (minimum 15)")
else:
    print(f"⚠️  Warning: Only {total_images} images. Recommend at least 15 for good results.")

min_instances = min(tag.image_count for tag in [training_client.get_tag(project.id, t.id) for t in tags.values()])
if min_instances >= 5:
    print("✓ Sufficient instances per object type (minimum 5 each)")
else:
    print(f"⚠️  Warning: Some objects have fewer than 5 instances.")

## 11. Train the Object Detection Model

Train the model to detect and locate objects. Training may take longer than classification due to the complexity of learning object locations.

In [None]:
def train_detector(project_id, training_client):
    """
    Train the object detection model.
    
    Args:
        project_id: Custom Vision project ID
        training_client: Authenticated training client
    
    Returns:
        Completed iteration object
    """
    print("Starting object detection model training...")
    print("This may take 5-10 minutes or longer. Please wait...\n")
    
    # Start training
    iteration = training_client.train_project(project_id)
    
    # Monitor training progress
    while iteration.status not in ["Completed", "Failed"]:
        iteration = training_client.get_iteration(project_id, iteration.id)
        print(f"Training status: {iteration.status}")
        
        if iteration.status == "Failed":
            print("❌ Training failed!")
            return None
        
        time.sleep(10)  # Check every 10 seconds
    
    print(f"\n✓ Model trained successfully!")
    print(f"  Iteration Name: {iteration.name}")
    print(f"  Iteration ID: {iteration.id}")
    
    return iteration

# Train the model
iteration = train_detector(project.id, training_client)

## 12. Evaluate Model Performance

Check the model's performance metrics. For object detection, we look at:
- **Precision**: Of all detected objects, how many were correct?
- **Recall**: Of all actual objects, how many did we detect?
- **mAP (mean Average Precision)**: Overall detection quality

In [None]:
if iteration:
    # Get performance metrics
    performance = training_client.get_iteration_performance(project.id, iteration.id)
    
    print("Object Detection Model Performance:")
    print("=" * 70)
    print(f"\nOverall Metrics:")
    print(f"  Precision:          {performance.precision:.2%}")
    print(f"  Recall:             {performance.recall:.2%}")
    print(f"  Average Precision:  {performance.average_precision:.2%} (mAP)")
    
    print(f"\nPer-Object Performance:")
    print("-" * 70)
    print(f"{'Object Type':<15} {'Precision':<12} {'Recall':<12} {'AP':<12}")
    print("-" * 70)
    
    for tag_perf in performance.per_tag_performance:
        print(f"{tag_perf.name:<15} {tag_perf.precision:<12.2%} "
              f"{tag_perf.recall:<12.2%} {tag_perf.average_precision:<12.2%}")
    
    # Visualize performance
    tag_names_perf = [tp.name for tp in performance.per_tag_performance]
    precisions = [tp.precision for tp in performance.per_tag_performance]
    recalls = [tp.recall for tp in performance.per_tag_performance]
    aps = [tp.average_precision for tp in performance.per_tag_performance]
    
    fig, ax = plt.subplots(figsize=(12, 6))
    x = np.arange(len(tag_names_perf))
    width = 0.25
    
    ax.bar(x - width, precisions, width, label='Precision', color='skyblue')
    ax.bar(x, recalls, width, label='Recall', color='lightcoral')
    ax.bar(x + width, aps, width, label='Average Precision', color='lightgreen')
    
    ax.set_xlabel('Object Types')
    ax.set_ylabel('Score')
    ax.set_title('Object Detection Performance by Object Type')
    ax.set_xticks(x)
    ax.set_xticklabels(tag_names_perf)
    ax.legend()
    ax.set_ylim([0, 1.1])
    ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 13. Publish the Model

In [None]:
# Publish the trained model
publish_name = "FruitObjectDetector"
prediction_resource_id = os.getenv('PredictionResourceId', None)

if iteration:
    try:
        if prediction_resource_id:
            training_client.publish_iteration(
                project.id,
                iteration.id,
                publish_name,
                prediction_resource_id
            )
        else:
            training_client.publish_iteration(
                project.id,
                iteration.id,
                publish_name
            )
        
        print(f"✓ Model published successfully as '{publish_name}'")
        print(f"\n⚠️  Save this publish name for detection: {publish_name}")
    except Exception as e:
        print(f"Publishing info: {e}")

## 14. Detect Objects in Test Images

Use the trained model to detect objects in new images. The model will return:
- Object type (tag)
- Bounding box coordinates
- Confidence score

In [None]:
def detect_objects(image_path, project_id, publish_name, prediction_client, confidence_threshold=0.5):
    """
    Detect objects in an image and visualize results.
    
    Args:
        image_path: Path to image file
        project_id: Custom Vision project ID
        publish_name: Published model name
        prediction_client: Authenticated prediction client
        confidence_threshold: Minimum confidence to display
    
    Returns:
        Detection results
    """
    # Make prediction
    with open(image_path, 'rb') as image_file:
        results = prediction_client.detect_image(
            project_id,
            publish_name,
            image_file.read()
        )
    
    # Load image
    img = Image.open(image_path)
    img_width, img_height = img.size
    
    # Create figure
    fig, ax = plt.subplots(1, figsize=(12, 8))
    ax.imshow(img)
    
    # Define colors
    colors = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange'}
    
    # Print and visualize detections
    print(f"\nDetections for {os.path.basename(image_path)}:")
    print("-" * 70)
    
    detected_objects = []
    
    for prediction in results.predictions:
        if prediction.probability >= confidence_threshold:
            # Store detection info
            detected_objects.append({
                'tag': prediction.tag_name,
                'confidence': prediction.probability,
                'bbox': prediction.bounding_box
            })
            
            # Print detection
            print(f"{prediction.tag_name:15} : {prediction.probability:.2%} confidence")
            print(f"{'':15}   Location: left={prediction.bounding_box.left:.3f}, "
                  f"top={prediction.bounding_box.top:.3f}, "
                  f"width={prediction.bounding_box.width:.3f}, "
                  f"height={prediction.bounding_box.height:.3f}")
            
            # Convert normalized coordinates to pixels
            left = prediction.bounding_box.left * img_width
            top = prediction.bounding_box.top * img_height
            width = prediction.bounding_box.width * img_width
            height = prediction.bounding_box.height * img_height
            
            # Draw bounding box
            color = colors.get(prediction.tag_name, 'cyan')
            rect = patches.Rectangle(
                (left, top), width, height,
                linewidth=3, edgecolor=color, facecolor='none'
            )
            ax.add_patch(rect)
            
            # Add label with confidence
            label = f"{prediction.tag_name} ({prediction.probability:.0%})"
            ax.text(
                left, top - 5,
                label,
                color='white',
                fontsize=11,
                fontweight='bold',
                bbox=dict(facecolor=color, alpha=0.8, edgecolor='none', pad=3)
            )
    
    if not detected_objects:
        print(f"No objects detected above {confidence_threshold:.0%} confidence threshold")
    
    ax.set_title(f"Object Detection: {os.path.basename(image_path)}", 
                fontsize=14, fontweight='bold')
    ax.axis('off')
    plt.tight_layout()
    plt.show()
    
    return results

# Test on sample images
test_image = 'python/test-detector/produce.jpg'

if os.path.exists(test_image):
    print(f"Testing object detection on: {test_image}")
    print("=" * 70)
    results = detect_objects(
        test_image,
        project.id,
        publish_name,
        prediction_client,
        confidence_threshold=0.5
    )
else:
    print(f"Test image not found: {test_image}")

## 15. Experiment with Confidence Thresholds

The confidence threshold determines which detections to accept. Let's see how different thresholds affect results.

In [None]:
def compare_thresholds(image_path, project_id, publish_name, prediction_client, thresholds=[0.3, 0.5, 0.7, 0.9]):
    """
    Compare detection results at different confidence thresholds.
    """
    # Get predictions once
    with open(image_path, 'rb') as image_file:
        results = prediction_client.detect_image(
            project_id,
            publish_name,
            image_file.read()
        )
    
    print(f"Confidence Threshold Analysis for {os.path.basename(image_path)}")
    print("=" * 70)
    print(f"{'Threshold':<12} {'Detections':<12} {'Objects Found'}")
    print("-" * 70)
    
    for threshold in thresholds:
        detections = [p for p in results.predictions if p.probability >= threshold]
        objects_list = ', '.join([f"{p.tag_name}" for p in detections[:3]])
        if len(detections) > 3:
            objects_list += "..."
        
        print(f"{threshold:<12.1f} {len(detections):<12} {objects_list}")
    
    # Visualize all detections with confidence scores
    print(f"\nAll Detected Objects (any confidence):")
    print("-" * 70)
    
    for pred in sorted(results.predictions, key=lambda p: p.probability, reverse=True):
        print(f"{pred.tag_name:15} : {pred.probability:.2%}")

if os.path.exists(test_image):
    compare_thresholds(test_image, project.id, publish_name, prediction_client)

## 16. Save Annotated Images

Save detection results with bounding boxes drawn on images for documentation or review.

In [None]:
def save_detection_image(image_path, detections, output_path, confidence_threshold=0.5):
    """
    Save an image with bounding boxes drawn.
    
    Args:
        image_path: Input image path
        detections: Detection results from prediction
        output_path: Where to save annotated image
        confidence_threshold: Minimum confidence to draw
    """
    # Load image
    img = Image.open(image_path)
    img_width, img_height = img.size
    draw = ImageDraw.Draw(img)
    
    # Define colors
    colors = {'apple': 'red', 'banana': 'yellow', 'orange': 'orange'}
    
    # Draw each detection
    for prediction in detections.predictions:
        if prediction.probability >= confidence_threshold:
            # Convert coordinates
            left = prediction.bounding_box.left * img_width
            top = prediction.bounding_box.top * img_height
            right = left + (prediction.bounding_box.width * img_width)
            bottom = top + (prediction.bounding_box.height * img_height)
            
            # Draw box
            color = colors.get(prediction.tag_name, 'cyan')
            line_width = max(3, int(img_width / 200))
            
            for i in range(line_width):
                draw.rectangle(
                    [(left - i, top - i), (right + i, bottom + i)],
                    outline=color
                )
            
            # Draw label
            label = f"{prediction.tag_name}: {prediction.probability:.0%}"
            draw.text((left, top - 20), label, fill=color)
    
    # Save
    img.save(output_path)
    print(f"✓ Saved annotated image to: {output_path}")

# Save detection results
if os.path.exists(test_image):
    with open(test_image, 'rb') as f:
        results = prediction_client.detect_image(project.id, publish_name, f.read())
    
    output_image = 'detection_output.jpg'
    save_detection_image(test_image, results, output_image, confidence_threshold=0.5)

## 17. Best Practices for Object Detection

### Training Data Quality:
- **Variety**: Include objects at different scales, angles, and lighting
- **Accuracy**: Ensure bounding boxes tightly fit objects
- **Quantity**: Minimum 15 images, 50+ recommended for production
- **Balance**: Similar number of instances for each object type
- **Occlusion**: Include partially hidden objects for robustness

### Bounding Box Guidelines:
- Tight fit around object (not too loose or tight)
- Consistent tagging approach across images
- Include small objects if they're important
- Avoid overlapping boxes when possible

### Model Optimization:
- **Threshold tuning**: Balance false positives vs false negatives
- **IoU threshold**: Adjust for overlapping object handling
- **Retrain regularly**: Add misdetected images to training set
- **Domain selection**: Use appropriate domain for your scenario

### Production Deployment:
- Implement confidence threshold filtering
- Handle cases with no detections gracefully
- Monitor detection frequencies and confidence distributions
- Log low-confidence detections for review

## 18. Summary

### What You've Learned:
✓ Understood difference between classification and object detection  
✓ Created an object detection project  
✓ Uploaded images with bounding box annotations  
✓ Trained an object detection model  
✓ Evaluated model performance metrics  
✓ Detected objects in test images with confidence scores  
✓ Visualized detections with bounding boxes  
✓ Experimented with confidence thresholds  

### Key Concepts:
- **Bounding boxes**: Normalized coordinates (0-1) define object locations
- **Confidence threshold**: Filters detections by minimum probability
- **mAP**: Mean average precision measures overall detection quality
- **Precision vs Recall**: Tradeoff between false positives and false negatives

### Next Steps:
- Explore the **Advanced Object Detection** notebook for:
  - IoU (Intersection over Union) metrics
  - Non-maximum suppression
  - Multi-object detection strategies
  - Real-time detection optimization

### Important Information:
```python
Project ID: [YOUR_PROJECT_ID]
Published Model: [YOUR_MODEL_NAME]
```

## Cleanup (Optional)

In [None]:
# Uncomment to delete the project
# training_client.delete_project(project.id)
# print("Project deleted")

## Additional Resources

- [Azure Custom Vision Object Detection Documentation](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/get-started-build-detector)
- [Object Detection Best Practices](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/suggested-tags)
- [Understanding Object Detection Metrics](https://docs.microsoft.com/azure/cognitive-services/custom-vision-service/probability-threshold)