# Test 14: Azure Blob Storage - Image Upload and Download

**Goal**: Test and document blob storage operations for images
1. Initialize BlobServiceClient with Managed Identity
2. Upload test images to blob storage
3. Download images from blob storage
4. Verify image integrity (before/after)
5. Test different image formats (jpg, png, webp)

**Issue**: Unable to upsert data to blob storage - need to test upload/download functionality

**Storage Pattern** (from notebook 12):
- Storage Account: Uses Managed Identity (Shared Key DISABLED)
- Container: `clothesimages`
- Blob naming: `products/{product_id}.{ext}`
- Auth: `DefaultAzureCredential` (not connection string)

In [1]:
# Setup: Change to repo root and configure Python path
import sys
import os
from pathlib import Path

try:
    os.chdir("../../../")
    target_directory = os.getenv("TARGET_DIRECTORY", os.getcwd())
    if os.path.exists(target_directory):
        os.chdir(target_directory)
        print(f"‚úÖ Changed directory to: {os.getcwd()}")
    else:
        print(f"‚ùå Directory does not exist: {target_directory}")
except Exception as e:
    print(f"‚ùå Error changing directory: {e}")

# Add to Python path
backend_path = os.path.join(os.getcwd(), "apps", "rtagent", "backend")
if backend_path not in sys.path:
    sys.path.insert(0, backend_path)

print(f"‚úÖ Python path configured: {backend_path}")

‚úÖ Changed directory to: c:\Users\pablosal\Desktop\art-voice-agent-accelerator
‚úÖ Python path configured: c:\Users\pablosal\Desktop\art-voice-agent-accelerator\apps\rtagent\backend


## Step 1: Initialize Azure Blob Storage with Managed Identity

**CRITICAL**: This storage account has Shared Key authorization DISABLED.
- ‚úÖ Use: `DefaultAzureCredential` (Managed Identity)
- ‚ùå Do NOT use: Connection strings or account keys

**Prerequisites**:
- Run `az login` before executing this notebook -> az login
- Ensure your account has `Storage Blob Data Contributor` role -> az role assignment list --assignee your-email@domain.com --query "[?roleDefinitionName=='Storage Blob Data Contributor']"

In [5]:
from azure.storage.blob import BlobServiceClient, ContentSettings
from azure.identity import DefaultAzureCredential
from azure.core.exceptions import ResourceExistsError, ResourceNotFoundError
from utils.ml_logging import get_logger

logger = get_logger("test_blob_storage")

# Azure Blob Storage Configuration (from environment or defaults)
AZURE_STORAGE_ACCOUNT_NAME = os.environ.get("AZURE_STORAGE_ACCOUNT_NAME", "storagefactoryeastus")
BLOB_CONTAINER_NAME = os.environ.get("AZURE_BLOB_CONTAINER_PRODUCTS", "clothesimages")

print("üîó Initializing Azure Blob Storage (Managed Identity)...")
print(f"   Storage Account: {AZURE_STORAGE_ACCOUNT_NAME}")
print(f"   Container: {BLOB_CONTAINER_NAME}")
print(f"   üîê Auth: Managed Identity (DefaultAzureCredential)")

try:
    # Initialize Blob Service Client with Managed Identity
    account_url = f"https://{AZURE_STORAGE_ACCOUNT_NAME}.blob.core.windows.net"
    credential = DefaultAzureCredential()
    blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
    
    print(f"‚úÖ BlobServiceClient initialized")
    print(f"   Account URL: {account_url}")
    
    # Test connection by listing containers
    print(f"\nüîç Testing connection...")
    containers = list(blob_service_client.list_containers())
    print(f"‚úÖ Connection successful! Found {len(containers)} containers")
    for container in containers:
        print(f"   - {container.name}")
    
    # Get container client
    container_client = blob_service_client.get_container_client(BLOB_CONTAINER_NAME)
    
    # Check if container exists, create if not
    try:
        container_properties = container_client.get_container_properties()
        print(f"\n‚úÖ Container '{BLOB_CONTAINER_NAME}' exists")
        print(f"   Created: {container_properties.last_modified}")
    except ResourceNotFoundError:
        print(f"\n‚ö†Ô∏è  Container '{BLOB_CONTAINER_NAME}' not found, creating...")
        container_client.create_container()
        print(f"‚úÖ Container created: {BLOB_CONTAINER_NAME}")
    
except Exception as e:
    print(f"‚ùå Failed to initialize blob storage: {e}")
    print(f"\nüí° Troubleshooting:")
    print(f"   1. Run 'az login' to authenticate")
    print(f"   2. Verify you have 'Storage Blob Data Contributor' role")
    print(f"   3. Check if storage account exists: {AZURE_STORAGE_ACCOUNT_NAME}")
    blob_service_client = None
    container_client = None

üîó Initializing Azure Blob Storage (Managed Identity)...
   Storage Account: storagefactoryeastus
   Container: clothesimages
   üîê Auth: Managed Identity (DefaultAzureCredential)
‚úÖ BlobServiceClient initialized
   Account URL: https://storagefactoryeastus.blob.core.windows.net

üîç Testing connection...

   Storage Account: storagefactoryeastus
   Container: clothesimages
   üîê Auth: Managed Identity (DefaultAzureCredential)
‚úÖ BlobServiceClient initialized
   Account URL: https://storagefactoryeastus.blob.core.windows.net

üîç Testing connection...
‚úÖ Connection successful! Found 4 containers
   - agentic-samples
   - clothesimages
   - pre-auth-policies
   - pre-auth-policies-2
‚úÖ Connection successful! Found 4 containers
   - agentic-samples
   - clothesimages
   - pre-auth-policies
   - pre-auth-policies-2

‚úÖ Container 'clothesimages' exists
   Created: 2025-11-01 15:49:03+00:00

‚úÖ Container 'clothesimages' exists
   Created: 2025-11-01 15:49:03+00:00


## Step 2: Find Test Images

Locate sample images from the retail dataset to use for testing upload/download

In [6]:
# Find test images in the retail dataset
data_dir = Path(os.getcwd()) / "utils" / "data" / "clothes"

print(f"üîç Searching for test images...")
print(f"   Base directory: {data_dir}")

if not data_dir.exists():
    print(f"‚ùå Directory not found: {data_dir}")
    test_images = []
else:
    # Find all image files
    test_images = []
    for ext in ["*.jpg", "*.jpeg", "*.png", "*.webp"]:
        test_images.extend(data_dir.glob(f"**/{ext}"))
    
    print(f"‚úÖ Found {len(test_images)} images")
    
    # Display first 5 images
    print(f"\nüìã Sample images (first 5):")
    for i, img_path in enumerate(test_images[:5], 1):
        # Extract category/gender from path
        parts = img_path.parts
        category = parts[-3] if len(parts) >= 3 else "unknown"
        gender = parts[-2] if len(parts) >= 2 else "unknown"
        file_size_mb = img_path.stat().st_size / (1024 * 1024)
        
        print(f"   {i}. {img_path.name}")
        print(f"      Category: {category} / {gender}")
        print(f"      Size: {file_size_mb:.2f} MB")
        print(f"      Path: {img_path.relative_to(os.getcwd())}")

# Select 3 test images for upload/download testing
if test_images:
    selected_images = test_images[:3]
    print(f"\n‚úÖ Selected {len(selected_images)} images for testing")
else:
    selected_images = []
    print(f"\n‚ö†Ô∏è  No images found - will create a test image")

üîç Searching for test images...
   Base directory: c:\Users\pablosal\Desktop\art-voice-agent-accelerator\utils\data\clothes
‚úÖ Found 20 images

üìã Sample images (first 5):
   1. Black Lenny Washed Jeans.png
      Category: jeans / men
      Size: 0.36 MB
      Path: utils\data\clothes\jeans\men\Black Lenny Washed Jeans.png
   2. Black Mended Slim Fit Jeans.png
      Category: jeans / men
      Size: 0.23 MB
      Path: utils\data\clothes\jeans\men\Black Mended Slim Fit Jeans.png
   3. Cowboy Cut Original Fit Jeans.png
      Category: jeans / men
      Size: 0.32 MB
      Path: utils\data\clothes\jeans\men\Cowboy Cut Original Fit Jeans.png
   4. Extreme Motion Straight Fit Tapered Leg Jeans.png
      Category: jeans / men
      Size: 0.33 MB
      Path: utils\data\clothes\jeans\men\Extreme Motion Straight Fit Tapered Leg Jeans.png
   5. Hella Pocket Jeans.png
      Category: jeans / men
      Size: 0.63 MB
      Path: utils\data\clothes\jeans\men\Hella Pocket Jeans.png

‚úÖ Selecte

## Step 3: Upload Images to Blob Storage

Test uploading images with proper content types and blob naming

In [7]:
import uuid
from typing import Dict, List

async def upload_image_to_blob(image_path: Path, blob_name: str = None) -> Dict[str, str]:
    """
    Upload image to Azure Blob Storage with Managed Identity
    
    Args:
        image_path: Path to local image file
        blob_name: Optional custom blob name (default: products/{uuid}.{ext})
    
    Returns:
        Dict with blob_name, blob_url, content_type, size_bytes
    """
    try:
        # Determine file extension and content type
        file_ext = image_path.suffix.lower()
        content_type_map = {
            ".jpg": "image/jpeg",
            ".jpeg": "image/jpeg",
            ".png": "image/png",
            ".webp": "image/webp"
        }
        content_type = content_type_map.get(file_ext, "image/jpeg")
        
        # Generate blob name if not provided
        if not blob_name:
            unique_id = str(uuid.uuid4())[:8]
            blob_name = f"products/test-{unique_id}{file_ext}"
        
        print(f"   üì§ Uploading: {image_path.name}")
        print(f"      Blob name: {blob_name}")
        print(f"      Content type: {content_type}")
        
        # Get blob client
        blob_client = blob_service_client.get_blob_client(
            container=BLOB_CONTAINER_NAME,
            blob=blob_name
        )
        
        # Read and upload image
        with open(image_path, "rb") as data:
            file_data = data.read()
            size_bytes = len(file_data)
            
            blob_client.upload_blob(
                file_data,
                overwrite=True,
                content_settings=ContentSettings(content_type=content_type)
            )
        
        # Get blob URL
        blob_url = blob_client.url
        
        print(f"      ‚úÖ Uploaded successfully!")
        print(f"      URL: {blob_url}")
        print(f"      Size: {size_bytes / 1024:.2f} KB")
        
        return {
            "blob_name": blob_name,
            "blob_url": blob_url,
            "content_type": content_type,
            "size_bytes": size_bytes,
            "local_path": str(image_path)
        }
        
    except Exception as e:
        print(f"      ‚ùå Upload failed: {e}")
        return None

# Upload all selected images
print(f"\n{'='*70}")
print(f"üì§ UPLOADING {len(selected_images)} IMAGES TO BLOB STORAGE")
print(f"{'='*70}\n")

uploaded_blobs = []

if blob_service_client:
    import asyncio
    
    for i, img_path in enumerate(selected_images, 1):
        print(f"\nüñºÔ∏è  Image {i}/{len(selected_images)}:")
        result = await upload_image_to_blob(img_path)
        if result:
            uploaded_blobs.append(result)
    
    print(f"\n{'='*70}")
    print(f"‚úÖ UPLOAD COMPLETE: {len(uploaded_blobs)}/{len(selected_images)} successful")
    print(f"{'='*70}")
else:
    print("‚ùå Blob service client not initialized - cannot upload")


üì§ UPLOADING 3 IMAGES TO BLOB STORAGE


üñºÔ∏è  Image 1/3:
   üì§ Uploading: Black Lenny Washed Jeans.png
      Blob name: products/test-50cda871.png
      Content type: image/png
      ‚úÖ Uploaded successfully!
      URL: https://storagefactoryeastus.blob.core.windows.net/clothesimages/products/test-50cda871.png
      Size: 372.04 KB

üñºÔ∏è  Image 2/3:
   üì§ Uploading: Black Mended Slim Fit Jeans.png
      Blob name: products/test-9a9cb974.png
      Content type: image/png
      ‚úÖ Uploaded successfully!
      URL: https://storagefactoryeastus.blob.core.windows.net/clothesimages/products/test-9a9cb974.png
      Size: 239.64 KB

üñºÔ∏è  Image 3/3:
   üì§ Uploading: Cowboy Cut Original Fit Jeans.png
      Blob name: products/test-889b813f.png
      Content type: image/png
      ‚úÖ Uploaded successfully!
      URL: https://storagefactoryeastus.blob.core.windows.net/clothesimages/products/test-889b813f.png
      Size: 331.28 KB

‚úÖ UPLOAD COMPLETE: 3/3 successful


## Step 4: List Uploaded Blobs

Verify that our images were uploaded successfully by listing blobs in the container

In [8]:
# List blobs in the container
print(f"\n{'='*70}")
print(f"üìã LISTING BLOBS IN CONTAINER: {BLOB_CONTAINER_NAME}")
print(f"{'='*70}\n")

if container_client:
    try:
        blob_list = container_client.list_blobs(name_starts_with="products/")
        
        blobs = list(blob_list)
        print(f"‚úÖ Found {len(blobs)} blobs in 'products/' prefix\n")
        
        # Show first 10 blobs
        for i, blob in enumerate(blobs[:10], 1):
            size_kb = blob.size / 1024
            print(f"   {i}. {blob.name}")
            print(f"      Size: {size_kb:.2f} KB")
            print(f"      Modified: {blob.last_modified}")
            print(f"      Content Type: {blob.content_settings.content_type if blob.content_settings else 'N/A'}")
        
        if len(blobs) > 10:
            print(f"\n   ... and {len(blobs) - 10} more blobs")
    
    except Exception as e:
        print(f"‚ùå Failed to list blobs: {e}")
else:
    print("‚ùå Container client not initialized")


üìã LISTING BLOBS IN CONTAINER: clothesimages

‚úÖ Found 3 blobs in 'products/' prefix

   1. products/test-50cda871.png
      Size: 372.04 KB
      Modified: 2025-11-02 16:56:55+00:00
      Content Type: image/png
   2. products/test-889b813f.png
      Size: 331.28 KB
      Modified: 2025-11-02 16:56:55+00:00
      Content Type: image/png
   3. products/test-9a9cb974.png
      Size: 239.64 KB
      Modified: 2025-11-02 16:56:55+00:00
      Content Type: image/png


## Step 6: Download Images from Blob Storage

Test downloading the uploaded images to verify blob storage read operations

In [9]:
async def download_image_from_blob(blob_name: str, download_path: Path = None) -> Path:
    """
    Download image from Azure Blob Storage
    
    Args:
        blob_name: Name of the blob to download
        download_path: Optional custom download path
    
    Returns:
        Path to downloaded file
    """
    try:
        # Create download directory
        if not download_path:
            download_dir = Path(os.getcwd()) / "temp" / "downloads"
            download_dir.mkdir(parents=True, exist_ok=True)
            download_path = download_dir / Path(blob_name).name
        
        print(f"   üì• Downloading: {blob_name}")
        print(f"      To: {download_path}")
        
        # Get blob client
        blob_client = blob_service_client.get_blob_client(
            container=BLOB_CONTAINER_NAME,
            blob=blob_name
        )
        
        # Download blob
        with open(download_path, "wb") as download_file:
            download_data = blob_client.download_blob()
            download_file.write(download_data.readall())
        
        size_kb = download_path.stat().st_size / 1024
        print(f"      ‚úÖ Downloaded successfully!")
        print(f"      Size: {size_kb:.2f} KB")
        
        return download_path
        
    except Exception as e:
        print(f"      ‚ùå Download failed: {e}")
        return None

# Download all uploaded blobs
print(f"\n{'='*70}")
print(f"üì• DOWNLOADING {len(uploaded_blobs)} IMAGES FROM BLOB STORAGE")
print(f"{'='*70}\n")

downloaded_files = []

if blob_service_client and uploaded_blobs:
    for i, blob_info in enumerate(uploaded_blobs, 1):
        print(f"\nüñºÔ∏è  Image {i}/{len(uploaded_blobs)}:")
        downloaded_path = await download_image_from_blob(blob_info["blob_name"])
        if downloaded_path:
            downloaded_files.append({
                "blob_name": blob_info["blob_name"],
                "local_path": downloaded_path,
                "original_path": blob_info["local_path"]
            })
    
    print(f"\n{'='*70}")
    print(f"‚úÖ DOWNLOAD COMPLETE: {len(downloaded_files)}/{len(uploaded_blobs)} successful")
    print(f"{'='*70}")
else:
    print("‚ùå No blobs to download or service client not initialized")


üì• DOWNLOADING 3 IMAGES FROM BLOB STORAGE


üñºÔ∏è  Image 1/3:
   üì• Downloading: products/test-50cda871.png
      To: c:\Users\pablosal\Desktop\art-voice-agent-accelerator\temp\downloads\test-50cda871.png
      ‚úÖ Downloaded successfully!
      Size: 372.04 KB

üñºÔ∏è  Image 2/3:
   üì• Downloading: products/test-9a9cb974.png
      To: c:\Users\pablosal\Desktop\art-voice-agent-accelerator\temp\downloads\test-9a9cb974.png
      ‚úÖ Downloaded successfully!
      Size: 239.64 KB

üñºÔ∏è  Image 3/3:
   üì• Downloading: products/test-889b813f.png
      To: c:\Users\pablosal\Desktop\art-voice-agent-accelerator\temp\downloads\test-889b813f.png
      ‚úÖ Downloaded successfully!
      Size: 331.28 KB

‚úÖ DOWNLOAD COMPLETE: 3/3 successful


## Step 7: Verify Image Integrity

Compare original and downloaded images to ensure blob storage preserves image data

In [10]:
import hashlib
from PIL import Image

def calculate_file_hash(file_path: Path) -> str:
    """Calculate SHA256 hash of a file"""
    sha256_hash = hashlib.sha256()
    with open(file_path, "rb") as f:
        for byte_block in iter(lambda: f.read(4096), b""):
            sha256_hash.update(byte_block)
    return sha256_hash.hexdigest()

def verify_image_integrity(original_path: Path, downloaded_path: Path) -> Dict:
    """
    Verify that downloaded image matches original
    
    Returns:
        Dict with verification results
    """
    results = {
        "original_path": str(original_path),
        "downloaded_path": str(downloaded_path),
        "hash_match": False,
        "size_match": False,
        "image_valid": False
    }
    
    try:
        # Check file sizes
        original_size = Path(original_path).stat().st_size
        downloaded_size = downloaded_path.stat().st_size
        results["size_match"] = (original_size == downloaded_size)
        
        # Calculate hashes
        original_hash = calculate_file_hash(Path(original_path))
        downloaded_hash = calculate_file_hash(downloaded_path)
        results["hash_match"] = (original_hash == downloaded_hash)
        
        results["original_hash"] = original_hash[:16] + "..."
        results["downloaded_hash"] = downloaded_hash[:16] + "..."
        
        # Try to open image with PIL
        try:
            img = Image.open(downloaded_path)
            results["image_valid"] = True
            results["image_size"] = img.size
            results["image_format"] = img.format
            img.close()
        except Exception as e:
            results["image_error"] = str(e)
        
        return results
        
    except Exception as e:
        results["error"] = str(e)
        return results

# Verify all downloaded images
print(f"\n{'='*70}")
print(f"üîç VERIFYING IMAGE INTEGRITY")
print(f"{'='*70}\n")

verification_results = []

for i, file_info in enumerate(downloaded_files, 1):
    print(f"\nüñºÔ∏è  Image {i}/{len(downloaded_files)}: {Path(file_info['blob_name']).name}")
    
    results = verify_image_integrity(
        Path(file_info['original_path']),
        file_info['local_path']
    )
    verification_results.append(results)
    
    print(f"   Hash Match: {'‚úÖ' if results['hash_match'] else '‚ùå'} {results.get('original_hash', 'N/A')}")
    print(f"   Size Match: {'‚úÖ' if results['size_match'] else '‚ùå'}")
    print(f"   Image Valid: {'‚úÖ' if results['image_valid'] else '‚ùå'}")
    
    if results['image_valid']:
        print(f"   Format: {results.get('image_format', 'Unknown')}")
        print(f"   Dimensions: {results.get('image_size', 'Unknown')}")

# Summary
print(f"\n{'='*70}")
all_valid = all(r['hash_match'] and r['image_valid'] for r in verification_results)
if all_valid:
    print(f"‚úÖ ALL IMAGES VERIFIED SUCCESSFULLY")
    print(f"   - All hashes match ‚úÖ")
    print(f"   - All images valid ‚úÖ")
    print(f"   - Blob storage upload/download working correctly! üéâ")
else:
    print(f"‚ö†Ô∏è  SOME IMAGES FAILED VERIFICATION")
    failed = sum(1 for r in verification_results if not (r['hash_match'] and r['image_valid']))
    print(f"   Failed: {failed}/{len(verification_results)}")
print(f"{'='*70}")


üîç VERIFYING IMAGE INTEGRITY


üñºÔ∏è  Image 1/3: test-50cda871.png
   Hash Match: ‚úÖ 92ef1f3bbe0fceec...
   Size Match: ‚úÖ
   Image Valid: ‚úÖ
   Format: PNG
   Dimensions: (686, 936)

üñºÔ∏è  Image 2/3: test-9a9cb974.png
   Hash Match: ‚úÖ 50412ec8d29f7129...
   Size Match: ‚úÖ
   Image Valid: ‚úÖ
   Format: PNG
   Dimensions: (694, 941)

üñºÔ∏è  Image 3/3: test-889b813f.png
   Hash Match: ‚úÖ 87783e40192db803...
   Size Match: ‚úÖ
   Image Valid: ‚úÖ
   Format: PNG
   Dimensions: (631, 939)

‚úÖ ALL IMAGES VERIFIED SUCCESSFULLY
   - All hashes match ‚úÖ
   - All images valid ‚úÖ
   - Blob storage upload/download working correctly! üéâ


## Step 8: Test Delete Operation

Clean up test blobs by deleting them (optional)

In [11]:
async def delete_blob(blob_name: str) -> bool:
    """
    Delete a blob from Azure Blob Storage
    
    Args:
        blob_name: Name of the blob to delete
    
    Returns:
        True if deleted successfully, False otherwise
    """
    try:
        blob_client = blob_service_client.get_blob_client(
            container=BLOB_CONTAINER_NAME,
            blob=blob_name
        )
        
        blob_client.delete_blob()
        print(f"   ‚úÖ Deleted: {blob_name}")
        return True
        
    except Exception as e:
        print(f"   ‚ùå Failed to delete {blob_name}: {e}")
        return False

# Ask user if they want to delete test blobs
print(f"\n{'='*70}")
print(f"üóëÔ∏è  DELETE TEST BLOBS (OPTIONAL)")
print(f"{'='*70}\n")

# Set to True to delete test blobs, False to keep them
DELETE_TEST_BLOBS = False  # Change to True to enable deletion

if DELETE_TEST_BLOBS and blob_service_client and uploaded_blobs:
    print(f"‚ö†Ô∏è  Deleting {len(uploaded_blobs)} test blobs...\n")
    
    deleted_count = 0
    for blob_info in uploaded_blobs:
        if await delete_blob(blob_info["blob_name"]):
            deleted_count += 1
    
    print(f"\n‚úÖ Deleted {deleted_count}/{len(uploaded_blobs)} blobs")
else:
    print(f"‚ÑπÔ∏è  Skipping deletion (DELETE_TEST_BLOBS = {DELETE_TEST_BLOBS})")
    print(f"   Test blobs remain in storage for manual inspection")
    print(f"   Set DELETE_TEST_BLOBS = True to enable automatic cleanup")


üóëÔ∏è  DELETE TEST BLOBS (OPTIONAL)

‚ÑπÔ∏è  Skipping deletion (DELETE_TEST_BLOBS = False)
   Test blobs remain in storage for manual inspection
   Set DELETE_TEST_BLOBS = True to enable automatic cleanup
