# Azure Face API & Data Lake Integration

**Project**: FaceGUI Cloud Architecture  
**Technologies**: Azure Blob Storage, Face API, Data Lake Gen2, Python  
**Source**: [https://github.com/anarcoiris/FaceGUI](https://github.com/anarcoiris/FaceGUI)

---

## Executive Summary

Cloud-native architecture for processing image data at scale, integrating Azure Cognitive Services with Data Lake storage for analytics.

---


In [None]:
import sys
from pathlib import Path

# Try to add FaceGUI to path (repository code available for reference only)
try:
    repo_path = Path('FaceGUI').resolve()
    if repo_path.exists():
        sys.path.insert(0, str(repo_path))
        print("✓ Repository code loaded")
    else:
        print("ℹ Note: Repository code not found. Using standalone demo implementations.")
except Exception as e:
    print(f"ℹ Note: Repository import skipped - using demo code ({e})")

import json
import pandas as pd
import matplotlib.pyplot as plt

print("✓ Azure integration environment ready")
print("\n📝 Execution Note:")
print("   This notebook demonstrates Azure cloud architecture.")
print("   Full production code available at: https://github.com/anarcoiris/FaceGUI")

## 2. Azure Face API Integration

### API Capabilities
- **Face Detection**: Bounding boxes, landmarks (eyes, nose, mouth)
- **Face Recognition**: Match faces across images
- **Attributes**: Age, gender, emotion, facial hair, glasses
- **Face Grouping**: Cluster similar faces

### Authentication
```python
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials

client = FaceClient(endpoint, CognitiveServicesCredentials(api_key))
```

In [None]:
# Simulated Face API response structure
sample_face_response = {
    "faceId": "abc123-def456-ghi789",
    "faceRectangle": {
        "top": 120,
        "left": 180,
        "width": 150,
        "height": 150
    },
    "faceAttributes": {
        "age": 32,
        "gender": "male",
        "smile": 0.85,
        "emotion": {
            "happiness": 0.92,
            "neutral": 0.05,
            "sadness": 0.01,
            "surprise": 0.02
        },
        "glasses": "NoGlasses",
        "facialHair": {
            "moustache": 0.1,
            "beard": 0.05
        }
    }
}

print("Face API Response Structure:")
print(json.dumps(sample_face_response, indent=2))

## 3. Blob Storage & Data Lake Integration

### Storage Strategy
- **Hot Tier**: Recently uploaded images (fast access)
- **Cool Tier**: Processed images (lower cost)
- **Archive Tier**: Historical data (lowest cost)

### Metadata Storage
```python
metadata = {
    "image_id": "img_001.jpg",
    "blob_url": "https://.../container/img_001.jpg",
    "upload_timestamp": "2024-11-28T10:30:00Z",
    "faces_detected": 2,
    "processing_status": "completed",
    "face_data": [...]
}
```

In [None]:
# Cost calculator for Azure services
class AzureCostCalculator:
    def __init__(self):
        # Pricing (approximate, USD)
        self.face_api_per_1k = 1.00  # $1 per 1000 transactions
        self.blob_storage_per_gb = 0.018  # $0.018 per GB/month (hot tier)
        self.data_lake_per_gb = 0.02  # $0.02 per GB/month
    
    def calculate_face_api_cost(self, num_images):
        return (num_images / 1000) * self.face_api_per_1k
    
    def calculate_storage_cost(self, gb_images, gb_metadata, months=1):
        blob_cost = gb_images * self.blob_storage_per_gb * months
        datalake_cost = gb_metadata * self.data_lake_per_gb * months
        return blob_cost + datalake_cost
    
    def estimate_monthly_cost(self, images_per_day, avg_image_size_mb, metadata_size_mb):
        # Face API
        monthly_images = images_per_day * 30
        api_cost = self.calculate_face_api_cost(monthly_images)
        
        # Storage
        total_images_gb = (monthly_images * avg_image_size_mb) / 1024
        total_metadata_gb = (monthly_images * metadata_size_mb) / 1024
        storage_cost = self.calculate_storage_cost(total_images_gb, total_metadata_gb)
        
        return {
            'face_api': api_cost,
            'storage': storage_cost,
            'total': api_cost + storage_cost,
            'monthly_images': monthly_images
        }

# Example calculation
calc = AzureCostCalculator()
costs = calc.estimate_monthly_cost(
    images_per_day=1000,
    avg_image_size_mb=2.0,
    metadata_size_mb=0.01
)

print("Monthly Cost Estimate:")
print(f"  Images processed: {costs['monthly_images']:,}")
print(f"  Face API cost: ${costs['face_api']:.2f}")
print(f"  Storage cost: ${costs['storage']:.2f}")
print(f"  TOTAL: ${costs['total']:.2f}/month")

## 4. Batch Processing Pipeline

### Workflow
1. **Upload**: Images to Blob Storage with SAS token
2. **Trigger**: Azure Function or manual batch
3. **Process**: Call Face API for each image
4. **Store**: Metadata to Data Lake
5. **Index**: Update SQLite for quick GUI access

### Error Handling
- Retry with exponential backoff
- Dead letter queue for failed images
- Logging to Application Insights

In [None]:
# Batch processor simulation
import time
import random

class BatchProcessor:
    def __init__(self, batch_size=10, retry_limit=3):
        self.batch_size = batch_size
        self.retry_limit = retry_limit
        self.processed = 0
        self.failed = 0
    
    def process_batch(self, images):
        results = []
        for img in images:
            success = self._process_single_image(img)
            if success:
                self.processed += 1
                results.append({"image": img, "status": "success"})
            else:
                self.failed += 1
                results.append({"image": img, "status": "failed"})
        return results
    
    def _process_single_image(self, image):
        # Simulate API call with 95% success rate
        return random.random() > 0.05
    
    def summary(self):
        total = self.processed + self.failed
        success_rate = (self.processed / total * 100) if total > 0 else 0
        return {
            'processed': self.processed,
            'failed': self.failed,
            'success_rate': success_rate
        }

# Demo
processor = BatchProcessor(batch_size=10)
test_images = [f"image_{i:03d}.jpg" for i in range(100)]

# Process in batches
for i in range(0, len(test_images), processor.batch_size):
    batch = test_images[i:i+processor.batch_size]
    processor.process_batch(batch)

summary = processor.summary()
print("Batch Processing Summary:")
print(f"  Total images: {len(test_images)}")
print(f"  Successful: {summary['processed']}")
print(f"  Failed: {summary['failed']}")
print(f"  Success rate: {summary['success_rate']:.1f}%")

---

## Summary

### Technical Achievements
✅ Azure Face API integration with authentication  
✅ Blob Storage for scalable image storage  
✅ Data Lake for metadata analytics  
✅ Batch processing with error handling  
✅ Cost optimization strategies  
✅ Streamlit GUI for monitoring  

### Skills
**Cloud Engineering**: Azure services, authentication, access control  
**Data Engineering**: ETL pipelines, batch processing, data lake design  
**Computer Vision**: Face detection, attribute extraction  
**Cost Optimization**: Tier management, caching, batching

## References
- **Repository**: https://github.com/anarcoiris/FaceGUI
- **Technologies**: Azure (Face API, Blob Storage, Data Lake), Python, Streamlit
