# Computer Vision Basics with Omni-Dev Agent

This notebook demonstrates the core computer vision capabilities of Omni-Dev Agent, including:

- Object Detection with YOLOv8
- OCR (Optical Character Recognition)
- Face Recognition
- Image Classification
- Real-time Processing

## Prerequisites

Make sure you have the Omni-Dev Agent server running:

```bash
cd src
python main.py
```

In [None]:
# Import required libraries
import requests
import base64
import cv2
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Image, display
import json
from PIL import Image as PILImage
import io

# Configuration
BASE_URL = "http://localhost:5000"

print("Libraries imported successfully!")

## Helper Functions

In [None]:
def encode_image_to_base64(image_path):
    """Convert image file to base64 string."""
    with open(image_path, 'rb') as f:
        image_data = base64.b64encode(f.read()).decode()
    return f"data:image/jpeg;base64,{image_data}"

def numpy_to_base64(image_array):
    """Convert numpy array to base64 string."""
    _, buffer = cv2.imencode('.jpg', image_array)
    image_base64 = base64.b64encode(buffer).decode()
    return f"data:image/jpeg;base64,{image_base64}"

def display_detection_results(image_path, detections):
    """Display image with detection bounding boxes."""
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    for detection in detections:
        bbox = detection['bbox']
        x1, y1, x2, y2 = bbox['x1'], bbox['y1'], bbox['x2'], bbox['y2']
        
        # Draw bounding box
        cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
        
        # Add label
        label = f"{detection['class_name']}: {detection['confidence']:.2f}"
        cv2.putText(image, label, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
    plt.figure(figsize=(12, 8))
    plt.imshow(image)
    plt.axis('off')
    plt.title(f"Object Detection Results ({len(detections)} objects found)")
    plt.show()

def test_server_connection():
    """Test if the Omni-Dev Agent server is running."""
    try:
        response = requests.get(f"{BASE_URL}/")
        if response.status_code == 200:
            print("✅ Server is running!")
            print(f"Response: {response.text}")
            return True
        else:
            print(f"❌ Server returned status code: {response.status_code}")
            return False
    except requests.exceptions.RequestException as e:
        print(f"❌ Server connection failed: {e}")
        print("Make sure the server is running: cd src && python main.py")
        return False

print("Helper functions defined!")

## Test Server Connection

In [None]:
test_server_connection()

## 1. Object Detection with YOLOv8

Let's start with object detection using the YOLOv8 model. We'll demonstrate detection on various types of images.

In [None]:
# Create a sample image for testing (you can replace with your own image)
# Let's create a simple test image with geometric shapes
test_image = np.zeros((400, 600, 3), dtype=np.uint8)

# Add some colored rectangles and circles
cv2.rectangle(test_image, (50, 50), (200, 150), (0, 255, 0), -1)  # Green rectangle
cv2.circle(test_image, (400, 200), 80, (0, 0, 255), -1)  # Red circle
cv2.rectangle(test_image, (250, 250), (450, 350), (255, 0, 0), -1)  # Blue rectangle

# Save the test image
cv2.imwrite('test_image.jpg', test_image)

# Display the test image
plt.figure(figsize=(10, 6))
plt.imshow(cv2.cvtColor(test_image, cv2.COLOR_BGR2RGB))
plt.title("Test Image for Object Detection")
plt.axis('off')
plt.show()

print("Test image created!")

In [None]:
# Perform object detection
def detect_objects(image_path, confidence_threshold=0.25):
    """Perform object detection using the API."""
    image_base64 = encode_image_to_base64(image_path)
    
    response = requests.post(
        f"{BASE_URL}/vision/detect",
        json={
            'image': image_base64,
            'confidence': confidence_threshold
        }
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Test object detection
detection_result = detect_objects('test_image.jpg', confidence_threshold=0.1)

if detection_result:
    print(f"\n📊 Object Detection Results:")
    print(f"Timestamp: {detection_result['timestamp']}")
    print(f"Objects found: {detection_result['count']}")
    
    for i, detection in enumerate(detection_result['detections'], 1):
        print(f"\n{i}. {detection['class_name']}")
        print(f"   Confidence: {detection['confidence']:.3f}")
        print(f"   Bounding Box: {detection['bbox']}")
    
    # Display results with bounding boxes
    if detection_result['detections']:
        display_detection_results('test_image.jpg', detection_result['detections'])
else:
    print("Object detection failed!")

## 2. OCR (Optical Character Recognition)

Let's test text extraction from images using OCR capabilities.

In [None]:
# Create a test image with text
text_image = np.ones((200, 600, 3), dtype=np.uint8) * 255  # White background

# Add text to the image
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(text_image, 'Hello, Omni-Dev Agent!', (50, 80), font, 1, (0, 0, 0), 2)
cv2.putText(text_image, 'OCR Test - 123456', (50, 140), font, 1, (0, 0, 0), 2)

# Save the text image
cv2.imwrite('text_image.jpg', text_image)

# Display the text image
plt.figure(figsize=(10, 4))
plt.imshow(cv2.cvtColor(text_image, cv2.COLOR_BGR2RGB))
plt.title("Test Image for OCR")
plt.axis('off')
plt.show()

print("Text image created!")

In [None]:
# Perform OCR
def extract_text(image_path, language='eng'):
    """Extract text from image using OCR API."""
    image_base64 = encode_image_to_base64(image_path)
    
    response = requests.post(
        f"{BASE_URL}/vision/ocr",
        json={
            'image': image_base64,
            'language': language
        }
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Test OCR
ocr_result = extract_text('text_image.jpg')

if ocr_result:
    print(f"\n📝 OCR Results:")
    print(f"Timestamp: {ocr_result['timestamp']}")
    print(f"Language: {ocr_result['language']}")
    print(f"Confidence: {ocr_result['confidence']:.3f}")
    print(f"\nExtracted Text:")
    print(f'"{ocr_result["text"]}"')
else:
    print("OCR failed!")

## 3. Face Recognition

Let's test the face recognition capabilities. Note that this requires pre-trained face encodings.

In [None]:
# Create a simple face-like image (or use a real photo)
# For demonstration, we'll create a simple circular "face"
face_image = np.ones((300, 300, 3), dtype=np.uint8) * 240  # Light gray background

# Draw a simple "face"
cv2.circle(face_image, (150, 150), 80, (255, 220, 177), -1)  # Face circle
cv2.circle(face_image, (130, 130), 8, (0, 0, 0), -1)  # Left eye
cv2.circle(face_image, (170, 130), 8, (0, 0, 0), -1)  # Right eye
cv2.ellipse(face_image, (150, 170), (20, 10), 0, 0, 180, (0, 0, 0), 2)  # Smile

# Save the face image
cv2.imwrite('face_image.jpg', face_image)

# Display the face image
plt.figure(figsize=(6, 6))
plt.imshow(cv2.cvtColor(face_image, cv2.COLOR_BGR2RGB))
plt.title("Test Image for Face Recognition")
plt.axis('off')
plt.show()

print("Face image created!")

In [None]:
# Perform face recognition
def recognize_faces(image_path):
    """Recognize faces in image using the API."""
    image_base64 = encode_image_to_base64(image_path)
    
    response = requests.post(
        f"{BASE_URL}/vision/face/identify",
        json={
            'image': image_base64
        }
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Test face recognition
face_result = recognize_faces('face_image.jpg')

if face_result:
    print(f"\n👤 Face Recognition Results:")
    print(f"Timestamp: {face_result['timestamp']}")
    print(f"Faces found: {face_result['count']}")
    
    for i, face in enumerate(face_result['faces'], 1):
        print(f"\n{i}. Face ID: {face['face_id']}")
        print(f"   Name: {face['name']}")
        print(f"   Confidence: {face['confidence']:.3f}")
        print(f"   Bounding Box: {face['bbox']}")
else:
    print("Face recognition failed or no faces detected!")

## 4. Image Classification

Let's test image classification to categorize entire images.

In [None]:
# Perform image classification
def classify_image(image_path, top_k=5):
    """Classify image using the API."""
    image_base64 = encode_image_to_base64(image_path)
    
    response = requests.post(
        f"{BASE_URL}/vision/classify",
        json={
            'image': image_base64,
            'top_k': top_k
        }
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

# Test image classification
classification_result = classify_image('test_image.jpg', top_k=3)

if classification_result:
    print(f"\n🏷️ Image Classification Results:")
    print(f"Timestamp: {classification_result['timestamp']}")
    print(f"Processing Time: {classification_result['processing_time']:.3f}s")
    print(f"\nTop Predictions:")
    
    for i, prediction in enumerate(classification_result['classifications'], 1):
        print(f"{i}. {prediction['class_name']}")
        print(f"   Confidence: {prediction['confidence']:.3f}")
        print(f"   Class ID: {prediction['class_id']}")
        print()
else:
    print("Image classification failed!")

## 5. Batch Processing Example

Let's demonstrate processing multiple images in a batch.

In [None]:
# Create multiple test images
test_images = []

for i in range(3):
    # Create different colored images
    colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]  # Red, Green, Blue
    img = np.zeros((200, 200, 3), dtype=np.uint8)
    cv2.rectangle(img, (50, 50), (150, 150), colors[i], -1)
    
    filename = f'batch_image_{i+1}.jpg'
    cv2.imwrite(filename, img)
    test_images.append(filename)

print(f"Created {len(test_images)} test images for batch processing")

# Display the batch images
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, img_path in enumerate(test_images):
    img = cv2.imread(img_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    axes[i].imshow(img)
    axes[i].set_title(f"Image {i+1}")
    axes[i].axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Process images in batch
def process_batch_images(image_paths):
    """Process multiple images for object detection."""
    results = []
    
    for i, image_path in enumerate(image_paths):
        print(f"Processing image {i+1}/{len(image_paths)}: {image_path}")
        result = detect_objects(image_path, confidence_threshold=0.1)
        results.append({
            'image_path': image_path,
            'result': result
        })
    
    return results

# Process batch
batch_results = process_batch_images(test_images)

print(f"\n📊 Batch Processing Summary:")
print(f"Total images processed: {len(batch_results)}")

for i, batch_result in enumerate(batch_results):
    result = batch_result['result']
    if result:
        print(f"\nImage {i+1}: {batch_result['image_path']}")
        print(f"  Objects detected: {result['count']}")
        for detection in result['detections']:
            print(f"    - {detection['class_name']}: {detection['confidence']:.3f}")
    else:
        print(f"\nImage {i+1}: Processing failed")

## 6. Performance Analysis

Let's analyze the performance of different vision tasks.

In [None]:
import time

def benchmark_vision_apis(image_path, num_iterations=5):
    """Benchmark different vision APIs."""
    apis = {
        'Object Detection': lambda: detect_objects(image_path),
        'OCR': lambda: extract_text(image_path),
        'Face Recognition': lambda: recognize_faces(image_path),
        'Image Classification': lambda: classify_image(image_path)
    }
    
    results = {}
    
    for api_name, api_func in apis.items():
        times = []
        success_count = 0
        
        print(f"\nBenchmarking {api_name}...")
        
        for i in range(num_iterations):
            start_time = time.time()
            result = api_func()
            end_time = time.time()
            
            processing_time = end_time - start_time
            times.append(processing_time)
            
            if result is not None:
                success_count += 1
            
            print(f"  Iteration {i+1}: {processing_time:.3f}s")
        
        avg_time = np.mean(times)
        std_time = np.std(times)
        success_rate = success_count / num_iterations
        
        results[api_name] = {
            'avg_time': avg_time,
            'std_time': std_time,
            'success_rate': success_rate,
            'times': times
        }
    
    return results

# Run benchmark
print("🚀 Starting Performance Benchmark...")
benchmark_results = benchmark_vision_apis('test_image.jpg', num_iterations=3)

print(f"\n📈 Performance Benchmark Results:")
print(f"{'API':<20} {'Avg Time (s)':<12} {'Std Dev':<10} {'Success Rate':<12}")
print("-" * 60)

for api_name, stats in benchmark_results.items():
    print(f"{api_name:<20} {stats['avg_time']:<12.3f} {stats['std_time']:<10.3f} {stats['success_rate']:<12.1%}")

In [None]:
# Visualize performance results
api_names = list(benchmark_results.keys())
avg_times = [stats['avg_time'] for stats in benchmark_results.values()]
std_times = [stats['std_time'] for stats in benchmark_results.values()]

plt.figure(figsize=(12, 6))

# Create bar plot with error bars
bars = plt.bar(api_names, avg_times, yerr=std_times, capsize=5, alpha=0.7)
plt.xlabel('Vision API')
plt.ylabel('Processing Time (seconds)')
plt.title('Vision API Performance Comparison')
plt.xticks(rotation=45)

# Add value labels on bars
for bar, avg_time in zip(bars, avg_times):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
             f'{avg_time:.3f}s', ha='center', va='bottom')

plt.tight_layout()
plt.grid(axis='y', alpha=0.3)
plt.show()

## 7. Advanced Integration Example

Let's create a comprehensive example that combines multiple vision capabilities.

In [None]:
def comprehensive_vision_analysis(image_path):
    """Perform comprehensive vision analysis on an image."""
    print(f"🔍 Performing comprehensive analysis on: {image_path}")
    print("=" * 60)
    
    # Load and display the image
    img = cv2.imread(image_path)
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    plt.figure(figsize=(10, 6))
    plt.imshow(img_rgb)
    plt.title(f"Input Image: {image_path}")
    plt.axis('off')
    plt.show()
    
    analysis_results = {}
    
    # 1. Object Detection
    print("\n1️⃣ Running Object Detection...")
    detection_result = detect_objects(image_path)
    if detection_result:
        analysis_results['object_detection'] = detection_result
        print(f"   ✅ Found {detection_result['count']} objects")
        for det in detection_result['detections']:
            print(f"      - {det['class_name']}: {det['confidence']:.3f}")
    else:
        print("   ❌ Object detection failed")
    
    # 2. OCR
    print("\n2️⃣ Running OCR...")
    ocr_result = extract_text(image_path)
    if ocr_result and ocr_result['text'].strip():
        analysis_results['ocr'] = ocr_result
        print(f"   ✅ Extracted text (confidence: {ocr_result['confidence']:.3f})")
        print(f"      '{ocr_result['text'].strip()}'")
    else:
        print("   ℹ️ No text detected")
    
    # 3. Face Recognition
    print("\n3️⃣ Running Face Recognition...")
    face_result = recognize_faces(image_path)
    if face_result and face_result['count'] > 0:
        analysis_results['face_recognition'] = face_result
        print(f"   ✅ Found {face_result['count']} faces")
        for face in face_result['faces']:
            print(f"      - {face['name']}: {face['confidence']:.3f}")
    else:
        print("   ℹ️ No faces detected")
    
    # 4. Image Classification
    print("\n4️⃣ Running Image Classification...")
    classification_result = classify_image(image_path, top_k=3)
    if classification_result:
        analysis_results['classification'] = classification_result
        print(f"   ✅ Top classifications:")
        for cls in classification_result['classifications']:
            print(f"      - {cls['class_name']}: {cls['confidence']:.3f}")
    else:
        print("   ❌ Image classification failed")
    
    print("\n" + "=" * 60)
    print("✨ Comprehensive analysis complete!")
    
    return analysis_results

# Run comprehensive analysis on our test image
comprehensive_results = comprehensive_vision_analysis('test_image.jpg')

## 8. Error Handling and Best Practices

Let's demonstrate proper error handling and best practices.

In [None]:
def robust_vision_api_call(api_endpoint, payload, max_retries=3, retry_delay=1):
    """Make robust API calls with error handling and retries."""
    import time
    
    for attempt in range(max_retries):
        try:
            response = requests.post(
                f"{BASE_URL}{api_endpoint}",
                json=payload,
                timeout=30  # 30 second timeout
            )
            
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:  # Rate limited
                print(f"Rate limited. Waiting {retry_delay}s before retry...")
                time.sleep(retry_delay)
                continue
            else:
                print(f"API returned status {response.status_code}: {response.text}")
                return None
                
        except requests.exceptions.Timeout:
            print(f"Request timed out. Attempt {attempt + 1}/{max_retries}")
            if attempt < max_retries - 1:
                time.sleep(retry_delay)
                continue
                
        except requests.exceptions.ConnectionError:
            print(f"Connection error. Attempt {attempt + 1}/{max_retries}")
            if attempt < max_retries - 1:
                time.sleep(retry_delay)
                continue
                
        except Exception as e:
            print(f"Unexpected error: {e}")
            return None
    
    print(f"Failed after {max_retries} attempts")
    return None

# Test robust API call
print("Testing robust API call with error handling...")

test_payload = {
    'image': encode_image_to_base64('test_image.jpg'),
    'confidence': 0.25
}

robust_result = robust_vision_api_call('/vision/detect', test_payload)

if robust_result:
    print(f"✅ Robust API call successful: {robust_result['count']} objects detected")
else:
    print("❌ Robust API call failed")

## 9. Performance Optimization Tips

Let's demonstrate some performance optimization techniques.

In [None]:
def optimize_image_for_api(image_path, max_size=800, quality=85):
    """Optimize image size and quality for API calls."""
    img = cv2.imread(image_path)
    
    # Get original dimensions
    h, w = img.shape[:2]
    original_size = len(cv2.imencode('.jpg', img)[1])
    
    # Resize if too large
    if max(h, w) > max_size:
        if w > h:
            new_w = max_size
            new_h = int(h * max_size / w)
        else:
            new_h = max_size
            new_w = int(w * max_size / h)
        
        img = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_AREA)
    
    # Compress with specified quality
    encode_params = [cv2.IMWRITE_JPEG_QUALITY, quality]
    _, buffer = cv2.imencode('.jpg', img, encode_params)
    optimized_size = len(buffer)
    
    # Create base64
    image_base64 = base64.b64encode(buffer).decode()
    data_url = f"data:image/jpeg;base64,{image_base64}"
    
    print(f"Image optimization results:")
    print(f"  Original size: {original_size:,} bytes")
    print(f"  Optimized size: {optimized_size:,} bytes")
    print(f"  Size reduction: {(1 - optimized_size/original_size)*100:.1f}%")
    print(f"  Dimensions: {img.shape[1]}x{img.shape[0]}")
    
    return data_url

# Test image optimization
print("🔧 Testing image optimization...")
optimized_image = optimize_image_for_api('test_image.jpg')

# Test API call with optimized image
optimized_result = requests.post(
    f"{BASE_URL}/vision/detect",
    json={
        'image': optimized_image,
        'confidence': 0.25
    }
)

if optimized_result.status_code == 200:
    result_data = optimized_result.json()
    print(f"✅ Optimized image processing successful: {result_data['count']} objects detected")
else:
    print(f"❌ Optimized image processing failed: {optimized_result.status_code}")

## 10. Cleanup

Let's clean up the temporary files we created.

In [None]:
import os

# List of files to clean up
temp_files = [
    'test_image.jpg',
    'text_image.jpg',
    'face_image.jpg',
    'batch_image_1.jpg',
    'batch_image_2.jpg',
    'batch_image_3.jpg'
]

print("🧹 Cleaning up temporary files...")

for file in temp_files:
    try:
        if os.path.exists(file):
            os.remove(file)
            print(f"  ✅ Removed {file}")
    except Exception as e:
        print(f"  ❌ Failed to remove {file}: {e}")

print("\n✨ Cleanup complete!")

## Summary

In this notebook, we've demonstrated:

1. **Object Detection** - Detecting and localizing objects in images using YOLOv8
2. **OCR** - Extracting text from images with confidence scoring
3. **Face Recognition** - Identifying faces with bounding boxes
4. **Image Classification** - Categorizing entire images with top-k predictions
5. **Batch Processing** - Processing multiple images efficiently
6. **Performance Analysis** - Benchmarking different APIs
7. **Comprehensive Analysis** - Combining multiple vision capabilities
8. **Error Handling** - Robust API calls with retries
9. **Optimization** - Image optimization techniques for better performance

### Key Takeaways

- The Omni-Dev Agent provides a unified API for multiple computer vision tasks
- All APIs return consistent JSON responses with timestamps and confidence scores
- Images should be base64-encoded with data URL prefixes
- Error handling and retries are important for production use
- Image optimization can significantly improve processing speed

### Next Steps

- Explore real-time processing with WebSocket streaming
- Integrate with camera systems for live video analysis
- Implement custom vision pipelines
- Deploy the system in production environments

For more advanced examples and integration patterns, check out the other notebooks in this series!