# 06: Tutorial T13 - Object Detection with Pre-trained YOLO

## Learning Objectives
By the end of this tutorial, you will be able to:
- Use YOLOv8 pre-trained model for object detection
- Detect objects in images with bounding boxes
- Visualize detection results with confidence scores
- Understand how confidence thresholds affect detections
- Apply object detection to real-world images

**Estimated Time:** 15-20 minutes  
**Note:** This is a PREVIEW of Week 14! You'll get hands-on experience with YOLO before learning the theory.

---

## What is YOLO?

### YOLO = You Only Look Once

**Key Features:**
- **Real-time object detection** - Can process 30+ frames per second
- **Single CNN forward pass** - Unlike two-stage detectors (R-CNN)
- **80 object classes** - Trained on COCO dataset (Common Objects in Context)
- **State-of-the-art accuracy** - Excellent balance of speed and accuracy

### YOLO Evolution
- **YOLOv1 (2016):** Original paper "You Only Look Once"
- **YOLOv3 (2018):** Multi-scale predictions
- **YOLOv5 (2020):** PyTorch implementation
- **YOLOv8 (2023):** Latest - fastest and most accurate!

### Real-World Applications
- Autonomous driving (detecting pedestrians, vehicles)
- Surveillance systems (security monitoring)
- Retail analytics (customer tracking, product detection)
- Sports analysis (player tracking)
- Medical imaging (tumor detection)

---

## Setup and Installation

In [None]:
# Install Ultralytics YOLOv8 (official implementation)
!pip install -q ultralytics

# Import required libraries
from ultralytics import YOLO
import cv2
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
import requests
from io import BytesIO

print("Libraries imported successfully!")
print("Ready to start object detection!")

## Load Pre-trained Model

In [None]:
# Load YOLOv8 nano model (smallest, fastest)
# Model options (size vs accuracy tradeoff):
#   - yolov8n.pt (nano)   - Fastest, smallest
#   - yolov8s.pt (small)  - Good balance
#   - yolov8m.pt (medium) - More accurate
#   - yolov8l.pt (large)  - Very accurate
#   - yolov8x.pt (xlarge) - Most accurate, slowest

model = YOLO('yolov8n.pt')  # Will auto-download (~6MB) on first run

print("\nModel loaded successfully!")
print(f"Total classes: {len(model.names)}")
print(f"\nSample classes: {list(model.names.values())[:10]}")
print("\nModel is ready for detection!")

## Test Image 1 - Simple Scene

Let's start with a simple image and detect objects!

In [None]:
# Download a sample image from the web
# Using a street scene with people and vehicles
url = "https://ultralytics.com/images/bus.jpg"
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img.save("test_image1.jpg")

# Display original image
plt.figure(figsize=(12, 8))
plt.imshow(img)
plt.axis('off')
plt.title("Original Image")
plt.show()

# Run YOLO detection
results = model("test_image1.jpg")

# Display results with bounding boxes
results[0].show()  # Auto-displays with boxes + labels

# Print detection details
print("\nDetection Results:")
print("-" * 50)
for i, box in enumerate(results[0].boxes):
    class_id = int(box.cls)
    class_name = model.names[class_id]
    confidence = float(box.conf)
    print(f"{i+1}. Class: {class_name:15s} | Confidence: {confidence:.2%}")

## Understanding the Output

### Each Detection Contains:

1. **Class Label** (e.g., "person", "car", "dog")
   - One of 80 COCO classes
   - Predicted by the neural network

2. **Bounding Box Coordinates** `[x1, y1, x2, y2]`
   - `(x1, y1)` = Top-left corner
   - `(x2, y2)` = Bottom-right corner
   - Coordinates in pixels

3. **Confidence Score** `[0.0 - 1.0]`
   - How certain the model is about this detection
   - 0.0 = 0% confident, 1.0 = 100% confident
   - Default threshold: 0.25 (25%)

### Visualizations Show:
- **Colored boxes** - Different color for each object class
- **Labels** - Class name + confidence percentage
- **Filtering** - Only shows detections above confidence threshold

### How It Works:
1. Image is resized to 640×640 (YOLO input size)
2. Single forward pass through CNN
3. Model outputs: class probabilities + box coordinates
4. Non-maximum suppression removes duplicate boxes
5. Results filtered by confidence threshold

---

## Test Image 2 - Complex Scene

Now let's try a more challenging image with multiple objects!

In [None]:
# Download a crowded street scene
url2 = "https://ultralytics.com/images/zidane.jpg"
response2 = requests.get(url2)
img2 = Image.open(BytesIO(response2.content))
img2.save("test_image2.jpg")

# Run detection with higher confidence threshold
results2 = model("test_image2.jpg", conf=0.5)  # Only show detections ≥ 50% confidence

# Show results
results2[0].show()

# Count objects by class
classes = [model.names[int(box.cls)] for box in results2[0].boxes]
counts = Counter(classes)

print("\nObject Count Summary:")
print("-" * 50)
for obj_class, count in counts.items():
    print(f"{obj_class:15s}: {count}")
print("-" * 50)
print(f"Total objects detected: {len(results2[0].boxes)}")

## Custom Visualization Function

Let's create our own visualization with better control!

In [None]:
def visualize_detections(results, image_path):
    """
    Custom visualization with better formatting
    
    Args:
        results: YOLO detection results
        image_path: Path to original image
    """
    # Load original image
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Define colors for different classes (BGR format)
    colors = [
        (0, 255, 0),    # Green
        (255, 0, 0),    # Red
        (0, 0, 255),    # Blue
        (255, 255, 0),  # Cyan
        (255, 0, 255),  # Magenta
        (0, 255, 255),  # Yellow
    ]
    
    # Draw boxes and labels
    for i, box in enumerate(results[0].boxes):
        # Extract box coordinates
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        
        # Get class and confidence
        cls = int(box.cls)
        conf = float(box.conf)
        label = f"{model.names[cls]}: {conf:.2%}"
        
        # Choose color based on class
        color = colors[cls % len(colors)]
        
        # Draw bounding box
        cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
        
        # Draw label background
        (text_width, text_height), _ = cv2.getTextSize(
            label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2
        )
        cv2.rectangle(
            img, (x1, y1 - text_height - 10), 
            (x1 + text_width, y1), color, -1
        )
        
        # Draw label text
        cv2.putText(
            img, label, (x1, y1 - 5), 
            cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2
        )
    
    # Display with matplotlib
    plt.figure(figsize=(14, 10))
    plt.imshow(img)
    plt.axis('off')
    plt.title(f"Detected {len(results[0].boxes)} objects", fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.show()

# Test custom visualization
print("Testing custom visualization...\n")
visualize_detections(results2, "test_image2.jpg")

## Batch Processing

Detect objects in multiple images at once!

In [None]:
# Download multiple test images
image_urls = [
    "https://ultralytics.com/images/bus.jpg",
    "https://ultralytics.com/images/zidane.jpg",
]

image_paths = []
for i, url in enumerate(image_urls):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    path = f"batch_image_{i+1}.jpg"
    img.save(path)
    image_paths.append(path)

# Batch detection
print("Running batch detection...\n")
print("=" * 60)

for i, img_path in enumerate(image_paths):
    results = model(img_path, verbose=False)
    num_objects = len(results[0].boxes)
    
    print(f"\nImage {i+1}: {img_path}")
    print(f"Objects detected: {num_objects}")
    
    # Show class distribution
    classes = [model.names[int(box.cls)] for box in results[0].boxes]
    counts = Counter(classes)
    print(f"Classes found: {dict(counts)}")
    
    # Display results
    results[0].show()

print("\n" + "=" * 60)
print("Batch processing complete!")

## Webcam Detection (Optional)

Real-time object detection from webcam!  
**Note:** This works better in local Jupyter notebook, not Google Colab.

In [None]:
# Real-time detection from webcam
# Uncomment the code below to run (works in local environment)

"""
import cv2

# Open webcam
cap = cv2.VideoCapture(0)  # 0 = default webcam

print("Starting webcam detection...")
print("Press 'q' to quit")

while True:
    # Read frame from webcam
    ret, frame = cap.read()
    if not ret:
        print("Failed to grab frame")
        break
    
    # Run YOLO detection
    results = model(frame, verbose=False)
    
    # Draw results on frame
    annotated_frame = results[0].plot()
    
    # Display frame
    cv2.imshow('YOLO Real-time Detection', annotated_frame)
    
    # Press 'q' to quit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Cleanup
cap.release()
cv2.destroyAllWindows()
print("Webcam detection stopped")
"""

print("Webcam detection code provided above.")
print("Uncomment and run in local Jupyter environment.")
print("\nFor Colab users: Upload images or use URLs instead.")

## Available Classes

Let's see all 80 object classes that YOLOv8 can detect!

In [None]:
# Display all 80 COCO classes
print("YOLOv8 Pre-trained Model - 80 COCO Classes")
print("=" * 60)
print("\nPeople & Animals:")
print("-" * 60)

# Organize classes by category
categories = {
    "People & Animals": [0, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
    "Vehicles": [1, 2, 3, 4, 5, 6, 7, 8],
    "Outdoor Objects": [9, 10, 11, 12, 13],
    "Sports Equipment": [32, 33, 34, 35, 36, 37, 38],
    "Kitchen & Dining": [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
    "Furniture & Electronics": [56, 57, 58, 59, 60, 61, 62, 63],
    "Food": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
    "Accessories": [24, 25, 26, 27, 28, 29, 30, 31],
    "Indoor Objects": [64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
}

# Print all classes organized by category
for category, class_ids in categories.items():
    print(f"\n{category}:")
    print("-" * 60)
    for idx in class_ids:
        if idx in model.names:
            print(f"  {idx:2d}: {model.names[idx]}")

# Print complete list
print("\n\nComplete List (All 80 Classes):")
print("=" * 60)
for idx in range(len(model.names)):
    name = model.names[idx]
    # Print in columns
    if idx % 4 == 0:
        print()
    print(f"{idx:2d}: {name:15s}", end="  ")

print("\n\n" + "=" * 60)
print(f"Total: {len(model.names)} classes")

## Exercises

### TODO Exercise 1: Upload Your Own Image
Upload an image from your computer and detect objects in it.

```python
# For Google Colab:
from google.colab import files
uploaded = files.upload()
# Then run detection on uploaded image
```

### TODO Exercise 2: Confidence Threshold Experiment
Try different confidence thresholds (0.1, 0.5, 0.9) on the same image.  
**Question:** What changes? Why does a lower threshold show more detections?

### TODO Exercise 3: Count Specific Objects
Find a crowded photo (e.g., stadium, concert, street) and count how many "person" objects are detected.  
**Challenge:** Can you find an image with 10+ people?

### TODO Exercise 4: Diverse Detection
Find or create an image that has at least 5 different object classes detected.  
**Hint:** Try a kitchen scene, park, or busy street.

### Bonus Challenge:
Modify the `visualize_detections()` function to:
- Show only detections of a specific class (e.g., only "person")
- Draw boxes in different colors based on confidence score
- Add a legend showing all detected classes

---

## Confidence Threshold Experiment

Let's visualize how confidence threshold affects detection results!

In [None]:
# Test different confidence thresholds
thresholds = [0.25, 0.5, 0.75, 0.9]
test_image = "test_image1.jpg"

print("Running confidence threshold experiment...\n")

# Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(16, 14))
axes = axes.flatten()

for i, conf in enumerate(thresholds):
    # Run detection with specific confidence threshold
    results = model(test_image, conf=conf, verbose=False)
    
    # Get annotated image
    img_annotated = results[0].plot()
    img_rgb = cv2.cvtColor(img_annotated, cv2.COLOR_BGR2RGB)
    
    # Display in subplot
    axes[i].imshow(img_rgb)
    axes[i].set_title(
        f"Confidence ≥ {conf:.0%} ({len(results[0].boxes)} detections)",
        fontsize=14, fontweight='bold'
    )
    axes[i].axis('off')
    
    # Print summary
    print(f"Confidence ≥ {conf:.0%}: {len(results[0].boxes)} objects detected")

plt.tight_layout()
plt.show()

print("\nObservations:")
print("- Lower threshold → More detections (including less certain ones)")
print("- Higher threshold → Fewer detections (only high-confidence ones)")
print("- Default 0.25 is a good balance for most applications")
print("- Adjust based on your use case (precision vs recall tradeoff)")

## What's Next?

### This Week (Week 13): Fundamentals
You learned the **theoretical foundations**:
- **IoU (Intersection over Union)** - How to measure box overlap
- **mAP (mean Average Precision)** - How to evaluate detector performance
- **Precision vs Recall** - Tradeoffs in detection
- **Confidence scores** - How detectors express certainty

### Week 14: YOLO Architecture Deep Dive
You'll learn **HOW YOLO works internally**:
- **Grid-based detection** - How image is divided into cells
- **Anchor boxes** - Predefined box shapes for different objects
- **Multi-scale predictions** - Detecting objects at different sizes
- **Loss function** - How YOLO learns (localization + classification + objectness)
- **Training on custom dataset** - Fine-tuning for your specific objects
- **YOLO variants** - YOLOv3, YOLOv5, YOLOv8 evolution

### Week 15: Alternative Approaches
You'll learn about **R-CNN family**:
- **R-CNN** - Region-based CNN (two-stage detector)
- **Fast R-CNN** - Improved speed with RoI pooling
- **Faster R-CNN** - Region Proposal Network (RPN)
- **Mask R-CNN** - Instance segmentation
- **Comparison** - YOLO vs R-CNN (speed vs accuracy)
- **When to use which** - Application-specific choices

### Progression:
```
Week 13: Evaluation Metrics (IoU, mAP)
    ↓
Week 14: YOLO Architecture (How it works)
    ↓
Week 15: R-CNN Family (Alternative approach)
```

---

## Summary

### Key Takeaways:

1. **YOLOv8 is Fast and Accurate**
   - Real-time performance (30+ FPS)
   - State-of-the-art accuracy on COCO dataset
   - Single forward pass through network

2. **Pre-trained Models Work Out-of-the-Box**
   - No training required for 80 COCO classes
   - Download model once, use everywhere
   - Multiple model sizes (nano to xlarge)

3. **80 Object Classes Supported**
   - People, animals, vehicles, sports equipment
   - Kitchen items, furniture, electronics
   - Food, accessories, outdoor objects

4. **Confidence Threshold Controls Detections**
   - Lower threshold → More detections (less precise)
   - Higher threshold → Fewer detections (more precise)
   - Default 0.25 works well for most cases
   - Adjust based on precision/recall needs

5. **Real-World Applications**
   - **Autonomous Driving** - Pedestrian and vehicle detection
   - **Surveillance** - Security monitoring and alerts
   - **Retail Analytics** - Customer tracking, product recognition
   - **Sports Analysis** - Player tracking and performance
   - **Medical Imaging** - Tumor and anomaly detection

### What You Can Do Now:
- Detect objects in any image
- Understand confidence scores
- Adjust detection parameters
- Count and classify objects
- Visualize results with bounding boxes

### Next Steps:
1. **Try on Your Own Images**
   - Upload personal photos
   - Test on different scenes (indoor, outdoor, crowded)
   - Experiment with various object types

2. **Experiment with Different Models**
   - Try YOLOv8s (small) for better accuracy
   - Try YOLOv8m (medium) for even better results
   - Compare speed vs accuracy tradeoffs

3. **Explore Advanced Features**
   - Tracking objects across video frames
   - Custom class training (Week 14!)
   - Multi-camera setups

4. **Preview Week 14 Materials**
   - Read about YOLO architecture
   - Understand grid-based detection
   - Learn about anchor boxes

### Resources:
- **Official Documentation:** https://docs.ultralytics.com/
- **YOLO Paper:** https://arxiv.org/abs/1506.02640
- **COCO Dataset:** https://cocodataset.org/
- **GitHub:** https://github.com/ultralytics/ultralytics

### Congratulations!
You've successfully completed Tutorial T13! You can now:
- Use pre-trained YOLO models
- Detect and visualize objects
- Understand detection outputs
- Apply object detection to real problems

**You're ready for Week 14's deep dive into YOLO architecture!**

---

*Tutorial T13 - Object Detection with Pre-trained YOLO*  
*Course: Deep Neural Network Architectures (21CSE558T)*  
*Week 13 - Module 5: Object Detection*