# Microsoft Azure Cloud Tutorial

This tutorial demonstrates how to use Clustrix with Microsoft Azure cloud infrastructure for scalable distributed computing.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ContextLab/clustrix/blob/master/docs/notebooks/azure_cloud_tutorial.ipynb)

## Overview

Azure provides several services that integrate well with Clustrix:

- **Azure Virtual Machines**: Scalable compute instances
- **Azure Batch**: Managed job scheduling service
- **Azure CycleCloud**: HPC cluster orchestration
- **Azure Machine Learning Compute**: ML-optimized infrastructure
- **Azure Container Instances**: Serverless containers
- **Azure Blob Storage**: Object storage for data and results
- **Azure Virtual Network**: Network isolation and security

## Prerequisites

1. Azure subscription with appropriate permissions
2. Azure CLI installed and configured
3. SSH key pair for VM access
4. Basic understanding of Azure services

## Installation and Setup

Install Clustrix with Azure dependencies:

In [None]:
# Install Clustrix with Azure support
!pip install clustrix azure-identity azure-mgmt-compute azure-mgmt-network azure-storage-blob

# Import required libraries
import clustrix
from clustrix import cluster, configure
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.network import NetworkManagementClient
from azure.storage.blob import BlobServiceClient
import os
import numpy as np
import time
import json

## Azure Authentication Setup

Configure your Azure credentials. You can do this in several ways:

### Option 1: Azure CLI Authentication

In [None]:
# Login with Azure CLI (run this in terminal)
# az login

# Set your subscription
# az account set --subscription "your-subscription-id"

# Verify authentication
!az account show --output table

### Option 2: Service Principal Authentication

In [None]:
# Set Azure credentials as environment variables
# os.environ['AZURE_CLIENT_ID'] = 'your-client-id'
# os.environ['AZURE_CLIENT_SECRET'] = 'your-client-secret'
# os.environ['AZURE_TENANT_ID'] = 'your-tenant-id'

# Test Azure connection
try:
    credential = DefaultAzureCredential()
    subscription_id = 'your-subscription-id'  # Replace with actual ID
    
    compute_client = ComputeManagementClient(credential, subscription_id)
    # Test by listing VM sizes in East US
    vm_sizes = list(compute_client.virtual_machine_sizes.list('eastus'))
    print(f"Successfully connected to Azure. Available VM sizes: {len(vm_sizes)}")
except Exception as e:
    print(f"Azure connection failed: {e}")

**Note**: Make sure you have set up authentication and have the correct subscription ID.

## Method 1: Azure Virtual Machines Configuration

### Create Azure VM for Clustrix

In [None]:
def create_clustrix_vm(resource_group, vm_name, location='eastus', vm_size='Standard_D4s_v3'):
    """
    Create an Azure VM configured for Clustrix.
    
    Args:
        resource_group: Azure resource group name
        vm_name: Name for the VM
        location: Azure region
        vm_size: VM size (CPU/memory configuration)
    
    Returns:
        VM details including public IP
    """
    # Cloud-init script for VM setup
    cloud_init_script = '''
#cloud-config
package_update: true
packages:
  - python3
  - python3-pip
  - git
  - htop

runcmd:
  # Install clustrix and common packages
  - pip3 install clustrix numpy scipy pandas scikit-learn
  
  # Install uv for faster package management
  - curl -LsSf https://astral.sh/uv/install.sh | sh
  
  # Create clustrix user
  - useradd -m -s /bin/bash clustrix
  - usermod -aG sudo clustrix
  - echo "clustrix ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
  
  # Setup SSH for clustrix user
  - mkdir -p /home/clustrix/.ssh
  - cp /home/azureuser/.ssh/authorized_keys /home/clustrix/.ssh/
  - chown -R clustrix:clustrix /home/clustrix/.ssh
  - chmod 700 /home/clustrix/.ssh
  - chmod 600 /home/clustrix/.ssh/authorized_keys
  
  # Create working directory
  - mkdir -p /tmp/clustrix
  - chown clustrix:clustrix /tmp/clustrix
'''
    
    # Example Azure CLI commands (would need to be executed)
    azure_commands = f"""
# Create resource group
az group create --name {resource_group} --location {location}

# Create VM with cloud-init
az vm create \
  --resource-group {resource_group} \
  --name {vm_name} \
  --image Ubuntu2204 \
  --size {vm_size} \
  --admin-username azureuser \
  --generate-ssh-keys \
  --custom-data cloud-init.txt \
  --public-ip-sku Standard \
  --tags Purpose=Clustrix Environment=Tutorial

# Get public IP
az vm show \
  --resource-group {resource_group} \
  --name {vm_name} \
  --show-details \
  --query publicIps \
  --output tsv
"""
    
    return {
        'resource_group': resource_group,
        'vm_name': vm_name,
        'location': location,
        'vm_size': vm_size,
        'commands': azure_commands,
        'cloud_init': cloud_init_script
    }

# Example usage
vm_config = create_clustrix_vm(
    resource_group='clustrix-tutorial-rg',
    vm_name='clustrix-vm-01',
    location='eastus',
    vm_size='Standard_D4s_v3'  # 4 vCPUs, 16 GB RAM
)

print("Azure VM Creation Commands:")
print(vm_config['commands'])
print("\nCloud-init script:")
print(vm_config['cloud_init'])

**Execute the Azure CLI commands to create your VM. The cloud-init script will automatically configure Clustrix on the new VM.**

### Configure Clustrix for Azure VM

In [None]:
# Configure Clustrix to use your Azure VM
configure(
    cluster_type="ssh",
    cluster_host="your-vm-public-ip",  # Replace with actual IP from az vm show
    username="clustrix",  # or "azureuser" if using default user
    key_file="~/.ssh/id_rsa",  # Azure CLI generated key
    remote_work_dir="/tmp/clustrix",
    package_manager="auto",  # Will use uv if available
    default_cores=4,
    default_memory="8GB",
    default_time="01:00:00"
)

**Replace `your-vm-public-ip` with the actual public IP from your Azure VM.**

### Example: Remote Computation on Azure VM

In [None]:
@cluster(cores=2, memory="4GB")
def azure_numerical_analysis(matrix_size=1000, iterations=10):
    """Perform numerical analysis on Azure VM."""
    import numpy as np
    import time
    
    results = []
    
    for i in range(iterations):
        # Generate random matrix
        matrix = np.random.rand(matrix_size, matrix_size)
        
        # Perform eigenvalue decomposition
        start_time = time.time()
        eigenvalues = np.linalg.eigvals(matrix)
        computation_time = time.time() - start_time
        
        results.append({
            'iteration': i + 1,
            'max_eigenvalue': float(np.max(eigenvalues.real)),
            'min_eigenvalue': float(np.min(eigenvalues.real)),
            'computation_time': computation_time
        })
    
    return {
        'matrix_size': matrix_size,
        'total_iterations': iterations,
        'average_time': np.mean([r['computation_time'] for r in results]),
        'results': results
    }

# Run computation on Azure VM
# result = azure_numerical_analysis(matrix_size=500, iterations=5)
# print(f"Completed {result['total_iterations']} iterations")
# print(f"Average computation time: {result['average_time']:.3f} seconds")
print("Example function defined. Uncomment the lines above to run on your Azure VM.")

## Method 2: Azure Batch Configuration

Azure Batch provides managed job scheduling for large-scale parallel workloads:

In [None]:
def setup_azure_batch_environment():
    """
    Template for setting up Azure Batch environment.
    This requires manual setup through Azure portal or CLI.
    """
    
    batch_setup_commands = """
# Create Azure Batch account
az batch account create \
  --name clustrixbatch \
  --resource-group clustrix-tutorial-rg \
  --location eastus

# Create storage account for Batch
az storage account create \
  --name clustrixstorage \
  --resource-group clustrix-tutorial-rg \
  --location eastus \
  --sku Standard_LRS

# Link storage to Batch account
az batch account set \
  --name clustrixbatch \
  --resource-group clustrix-tutorial-rg \
  --storage-account clustrixstorage

# Create Batch pool
az batch pool create \
  --id clustrix-pool \
  --vm-size Standard_D2s_v3 \
  --target-dedicated-nodes 2 \
  --image canonical:0001-com-ubuntu-server-jammy:22_04-lts \
  --node-agent-sku-id "batch.node.ubuntu 22.04"

# Create Batch job
az batch job create \
  --id clustrix-job \
  --pool-id clustrix-pool
"""
    
    batch_config = {
        'account_name': 'clustrixbatch',
        'account_url': 'https://clustrixbatch.eastus.batch.azure.com',
        'resource_group': 'clustrix-tutorial-rg',
        'pool_id': 'clustrix-pool',
        'job_id': 'clustrix-job'
    }
    
    print("Azure Batch Setup Commands:")
    print(batch_setup_commands)
    print("\nBatch Configuration:")
    print(json.dumps(batch_config, indent=2))
    
    return batch_config

batch_config = setup_azure_batch_environment()
print("\nNote: Azure Batch integration with Clustrix would require custom implementation.")
print("Consider using Azure CycleCloud for HPC workloads instead.")

## Method 3: Azure CycleCloud Integration

Azure CycleCloud is designed for HPC workloads and provides SLURM integration:

In [None]:
# Azure CycleCloud cluster template for Clustrix
cyclecloud_template = """
# CycleCloud SLURM cluster template
# Save as clustrix-slurm.txt and import into CycleCloud

[cluster clustrix-slurm]
FormLayout = selectionpanel
Category = Schedulers
IconUrl = static/cloud/cluster/ui/ClusterIcon/slurm.png

    [[node defaults]]
    UsePublicNetwork = false
    Credentials = $Credentials
    SubnetId = $SubnetId
    Region = $Region
    KeyPairLocation = ~/.ssh/cyclecloud.pem
    
    # Install clustrix on all nodes
    [[[configuration]]]
    clustrix.version = latest
    
    [[[cluster-init clustrix:default:1.0.0]]]
    
    [[node master]]
    MachineType = $MasterMachineType
    IsReturnProxy = $ReturnProxy
    AdditionalClusterInitSpecs = $MasterClusterInitSpecs
    
        [[[configuration]]]
        slurm.version = $configuration_slurm_version
        
        [[[cluster-init slurm:master:2.7.2]]]
        
        [[[network-interface eth0]]]
        AssociatePublicIpAddress = $UsePublicNetwork

    [[nodearray execute]]
    MachineType = $ExecuteMachineType
    MaxCoreCount = $MaxExecuteCoreCount
    Interruptible = $UseLowPrio
    AdditionalClusterInitSpecs = $ExecuteClusterInitSpecs
    
        [[[configuration]]]
        slurm.version = $configuration_slurm_version
        
        [[[cluster-init slurm:execute:2.7.2]]]
        
        [[[network-interface eth0]]]
        AssociatePublicIpAddress = false

[parameters About]
Order = 1

    [[parameters About Clustrix]]
    
        [[[parameter clustrix]]]
        HideLabel = true
        Config.Plugin = pico.widget.HtmlTemplateWidget
        Config.Template = "Clustrix-enabled SLURM cluster for distributed computing"

[parameters Required Settings]
Order = 10

    [[parameters Virtual Machines]]
    Description = "Configure the VM types and sizes"
    Order = 20

        [[[parameter Region]]]
        Label = Region
        Description = Deployment Location
        ParameterType = Cloud.Region
        DefaultValue = eastus

        [[[parameter MasterMachineType]]]
        Label = Master VM Type
        Description = Master node VM type
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_D4s_v3

        [[[parameter ExecuteMachineType]]]
        Label = Execute VM Type
        Description = Execute node VM type
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_H16r

"""

def configure_for_cyclecloud(master_ip, cluster_name="clustrix-slurm"):
    """Configure Clustrix to use Azure CycleCloud SLURM cluster."""
    configure(
        cluster_type="slurm",
        cluster_host=master_ip,
        username="cyclecloud",  # Default CycleCloud user
        key_file="~/.ssh/cyclecloud.pem",
        remote_work_dir="/shared/clustrix",  # Use shared storage
        package_manager="uv",
        module_loads=["python3"],
        environment_variables={
            "CLUSTRIX_CLUSTER": cluster_name
        },
        default_cores=8,
        default_memory="16GB",
        default_time="02:00:00",
        default_partition="hpc"
    )
    print(f"Configured Clustrix for CycleCloud cluster: {cluster_name}")

print("CycleCloud Template:")
print(cyclecloud_template)

# Example configuration
# configure_for_cyclecloud("10.1.0.4", "my-clustrix-cluster")
print("\nCycleCloud provides the best HPC integration for Azure + Clustrix.")

## Data Management with Azure Blob Storage

In [None]:
@cluster(cores=2, memory="4GB")
def process_blob_data(storage_account, container_name, input_blob, output_blob, storage_key=None):
    """Process data from Azure Blob Storage and save results back."""
    from azure.storage.blob import BlobServiceClient
    from azure.identity import DefaultAzureCredential
    import numpy as np
    import pickle
    import io
    
    # Initialize Blob Service Client
    if storage_key:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
    else:
        # Use managed identity or Azure CLI authentication
        account_url = f"https://{storage_account}.blob.core.windows.net"
        credential = DefaultAzureCredential()
        blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
    
    # Download data from blob storage
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=input_blob)
    blob_data = blob_client.download_blob()
    data = pickle.loads(blob_data.readall())
    
    # Process the data
    processed_data = {
        'original_shape': data.shape if hasattr(data, 'shape') else len(data),
        'mean': float(np.mean(data)) if hasattr(data, '__iter__') else float(data),
        'std': float(np.std(data)) if hasattr(data, '__iter__') else 0.0,
        'max': float(np.max(data)) if hasattr(data, '__iter__') else float(data),
        'min': float(np.min(data)) if hasattr(data, '__iter__') else float(data),
        'processing_timestamp': time.time(),
        'processed_on': 'azure-vm'
    }
    
    # Upload results to blob storage
    output_buffer = io.BytesIO()
    pickle.dump(processed_data, output_buffer)
    output_buffer.seek(0)
    
    output_blob_client = blob_service_client.get_blob_client(container=container_name, blob=output_blob)
    output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)
    
    return f"Processed data saved to blob: {output_blob}"

# Utility functions for Azure Blob Storage
def upload_to_blob(data, storage_account, container_name, blob_name, storage_key=None):
    """Upload data to Azure Blob Storage."""
    if storage_key:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
    else:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        credential = DefaultAzureCredential()
        blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
    
    buffer = io.BytesIO()
    pickle.dump(data, buffer)
    buffer.seek(0)
    
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    blob_client.upload_blob(buffer.getvalue(), overwrite=True)
    print(f"Data uploaded to blob: {blob_name}")

def download_from_blob(storage_account, container_name, blob_name, storage_key=None):
    """Download data from Azure Blob Storage."""
    if storage_key:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(account_url=account_url, credential=storage_key)
    else:
        account_url = f"https://{storage_account}.blob.core.windows.net"
        credential = DefaultAzureCredential()
        blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
    
    blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    blob_data = blob_client.download_blob()
    return pickle.loads(blob_data.readall())

# Example usage:
# sample_data = np.random.rand(1000, 50)
# upload_to_blob(sample_data, 'yourstorageaccount', 'data', 'input/sample.pkl')
# result = process_blob_data('yourstorageaccount', 'data', 'input/sample.pkl', 'output/results.pkl')
print("Azure Blob Storage integration functions defined.")

## Azure Machine Learning Compute Integration

In [None]:
def setup_azure_ml_compute():
    """
    Template for setting up Azure ML compute clusters.
    These can be used with Clustrix for ML workloads.
    """
    
    aml_setup_commands = """
# Create Azure ML workspace
az ml workspace create \
  --name clustrix-ml-workspace \
  --resource-group clustrix-tutorial-rg \
  --location eastus

# Create compute cluster
az ml compute create \
  --name clustrix-compute \
  --type amlcompute \
  --min-instances 0 \
  --max-instances 4 \
  --size Standard_DS3_v2 \
  --workspace-name clustrix-ml-workspace \
  --resource-group clustrix-tutorial-rg

# Create compute instance for development
az ml compute create \
  --name clustrix-dev-instance \
  --type computeinstance \
  --size Standard_DS3_v2 \
  --workspace-name clustrix-ml-workspace \
  --resource-group clustrix-tutorial-rg
"""
    
    print("Azure ML Compute Setup Commands:")
    print(aml_setup_commands)
    
    return {
        'workspace': 'clustrix-ml-workspace',
        'compute_cluster': 'clustrix-compute',
        'compute_instance': 'clustrix-dev-instance'
    }

@cluster(cores=4, memory="8GB")
def azure_ml_training_job(dataset_params, model_params):
    """Example ML training job that could run on Azure ML compute."""
    import numpy as np
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import accuracy_score, classification_report
    from sklearn.model_selection import train_test_split
    from sklearn.datasets import make_classification
    import time
    
    # Generate synthetic dataset (in real scenario, load from Azure ML datasets)
    X, y = make_classification(
        n_samples=dataset_params['n_samples'],
        n_features=dataset_params['n_features'],
        n_classes=dataset_params['n_classes'],
        random_state=42
    )
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Train model
    start_time = time.time()
    model = RandomForestClassifier(**model_params)
    model.fit(X_train, y_train)
    training_time = time.time() - start_time
    
    # Evaluate
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    return {
        'accuracy': accuracy,
        'training_time': training_time,
        'training_samples': len(X_train),
        'test_samples': len(X_test),
        'feature_importance': model.feature_importances_.tolist()[:10],  # Top 10
        'model_params': model_params,
        'dataset_params': dataset_params
    }

aml_config = setup_azure_ml_compute()

# Example usage:
# dataset_config = {'n_samples': 10000, 'n_features': 20, 'n_classes': 3}
# model_config = {'n_estimators': 100, 'max_depth': 10, 'random_state': 42, 'n_jobs': -1}
# result = azure_ml_training_job(dataset_config, model_config)
# print(f"Model trained with accuracy: {result['accuracy']:.4f}")

print("Azure ML integration example defined.")

## Security Best Practices

In [None]:
def setup_azure_security():
    """
    Security configuration templates for Azure resources.
    """
    
    security_commands = """
# Create Network Security Group with restrictive rules
az network nsg create \
  --name clustrix-nsg \
  --resource-group clustrix-tutorial-rg

# Allow SSH only from your IP
az network nsg rule create \
  --name SSH \
  --nsg-name clustrix-nsg \
  --resource-group clustrix-tutorial-rg \
  --priority 1000 \
  --source-address-prefixes YOUR_IP/32 \
  --source-port-ranges '*' \
  --destination-address-prefixes '*' \
  --destination-port-ranges 22 \
  --access Allow \
  --protocol Tcp

# Create Key Vault for secrets
az keyvault create \
  --name clustrix-keyvault-$(date +%s) \
  --resource-group clustrix-tutorial-rg \
  --location eastus \
  --enable-disk-encryption

# Enable managed identity for VMs
az vm identity assign \
  --name clustrix-vm-01 \
  --resource-group clustrix-tutorial-rg

# Setup private endpoint for storage
az storage account update \
  --name clustrixstorage \
  --resource-group clustrix-tutorial-rg \
  --default-action Deny
"""
    
    security_checklist = """
Azure Security Checklist for Clustrix:

✓ Use Azure Active Directory for authentication
✓ Enable managed identities instead of service principals when possible
✓ Restrict Network Security Groups to your IP address only
✓ Use private endpoints for storage accounts
✓ Enable disk encryption for all VMs
✓ Use Azure Key Vault for secrets and certificates
✓ Enable Azure Security Center recommendations
✓ Use Azure Private Link for service connectivity
✓ Enable diagnostic logging and monitoring
✓ Implement Azure Policy for compliance
✓ Use Azure Defender for cloud workload protection
✓ Regularly rotate access keys and certificates
✓ Set up cost alerts and spending limits
✓ Tag all resources for governance and cost tracking
"""
    
    print("Azure Security Setup Commands:")
    print(security_commands)
    print("\nSecurity Checklist:")
    print(security_checklist)
    
    return {
        'nsg_name': 'clustrix-nsg',
        'keyvault_name': 'clustrix-keyvault',
        'security_commands': security_commands
    }

security_config = setup_azure_security()
print("Security configuration templates generated.")

## Cost Management and Optimization

In [None]:
def azure_cost_optimization_guide():
    """
    Cost optimization strategies for Azure + Clustrix.
    """
    
    cost_tips = """
Azure Cost Optimization for Clustrix:

1. Compute Optimization:
   - Use Azure Spot VMs for non-critical workloads (up to 90% savings)
   - Choose B-series burstable VMs for variable workloads
   - Use reserved instances for predictable workloads (1-3 year terms)
   - Enable auto-shutdown for dev/test VMs
   - Right-size VMs based on actual usage

2. Storage Optimization:
   - Use appropriate storage tiers (Hot, Cool, Archive)
   - Enable lifecycle management for blob storage
   - Use managed disks with appropriate performance tiers
   - Implement data deduplication and compression

3. Network Optimization:
   - Minimize data transfer between regions
   - Use Azure CDN for static content
   - Optimize data transfer patterns

4. Monitoring and Management:
   - Set up budget alerts and spending limits
   - Use Azure Cost Management + Billing
   - Implement proper resource tagging
   - Regular cost reviews and optimizations

5. Service-Specific:
   - Use Azure Functions for small, event-driven tasks
   - Consider Azure Container Instances for short-running jobs
   - Use Azure Batch for large-scale parallel processing
"""
    
    cost_monitoring_commands = """
# Set up budget alerts
az consumption budget create \
  --budget-name clustrix-monthly-budget \
  --amount 100 \
  --time-grain Monthly \
  --time-period-start 2025-01-01 \
  --time-period-end 2025-12-31

# Get current costs
az consumption usage list \
  --start-date 2025-01-01 \
  --end-date 2025-01-31

# List resource costs
az costmanagement query \
  --type Usage \
  --dataset-aggregation '{"totalCost":{"name":"PreTaxCost","function":"Sum"}}' \
  --dataset-grouping name=ResourceGroup type=Dimension
"""
    
    print(cost_tips)
    print("\nCost Monitoring Commands:")
    print(cost_monitoring_commands)
    
    return {
        'recommendations': [
            'Use Spot VMs for batch processing',
            'Enable auto-shutdown for dev resources',
            'Implement lifecycle policies for storage',
            'Set up budget alerts',
            'Regular cost reviews'
        ]
    }

# Example Spot VM configuration for cost savings
def configure_spot_vm():
    """Example configuration for using Azure Spot VMs."""
    configure(
        cluster_type="ssh",
        cluster_host="your-spot-vm-ip",
        username="azureuser",
        key_file="~/.ssh/id_rsa",
        remote_work_dir="/tmp/clustrix",
        # Spot VMs can be evicted, so use shorter timeouts
        default_time="00:30:00",
        job_poll_interval=60,  # Check more frequently
        cleanup_on_success=True  # Clean up quickly
    )
    print("Configured for Azure Spot VMs with appropriate timeouts.")

cost_guide = azure_cost_optimization_guide()
print("\nCost optimization guide generated.")

## Resource Cleanup

In [None]:
def cleanup_azure_resources(resource_group='clustrix-tutorial-rg'):
    """
    Clean up Azure resources to avoid ongoing charges.
    
    Args:
        resource_group: Name of the resource group to clean up
    """
    
    cleanup_commands = f"""
# List all resources in the resource group
az resource list --resource-group {resource_group} --output table

# Stop all VMs
az vm deallocate --resource-group {resource_group} --name clustrix-vm-01

# Delete specific resources (optional)
az vm delete --resource-group {resource_group} --name clustrix-vm-01 --yes
az disk delete --resource-group {resource_group} --name clustrix-vm-01_disk1_* --yes
az network public-ip delete --resource-group {resource_group} --name clustrix-vm-01PublicIP

# Delete the entire resource group (WARNING: This deletes everything!)
az group delete --name {resource_group} --yes --no-wait

# Verify deletion
az group list --output table
"""
    
    print(f"Azure Resource Cleanup Commands for Resource Group: {resource_group}")
    print(cleanup_commands)
    print("\n⚠️  WARNING: The 'az group delete' command will permanently delete ALL resources in the group!")
    print("Review the resources first with 'az resource list' before proceeding.")
    
    return {
        'resource_group': resource_group,
        'cleanup_commands': cleanup_commands
    }

cleanup_info = cleanup_azure_resources()
print("\nCleanup commands generated. Remember to clean up resources to avoid charges!")

## Advanced Example: Distributed Image Processing

In [None]:
@cluster(cores=4, memory="8GB", time="00:45:00")
def azure_image_processing_pipeline(storage_config, processing_params):
    """
    Distributed image processing pipeline using Azure Blob Storage.
    """
    from azure.storage.blob import BlobServiceClient
    from azure.identity import DefaultAzureCredential
    import numpy as np
    from PIL import Image
    import io
    import time
    
    # Connect to Azure Blob Storage
    account_url = f"https://{storage_config['account_name']}.blob.core.windows.net"
    credential = DefaultAzureCredential()
    blob_service_client = BlobServiceClient(account_url=account_url, credential=credential)
    
    container_client = blob_service_client.get_container_client(storage_config['container'])
    
    processed_images = []
    processing_stats = []
    
    # List images to process
    blob_list = container_client.list_blobs(name_starts_with=storage_config['input_prefix'])
    
    for blob in blob_list:
        if blob.name.lower().endswith(('.png', '.jpg', '.jpeg')):
            start_time = time.time()
            
            try:
                # Download image
                blob_client = blob_service_client.get_blob_client(
                    container=storage_config['container'], blob=blob.name
                )
                image_data = blob_client.download_blob().readall()
                
                # Process image
                image = Image.open(io.BytesIO(image_data))
                
                # Apply processing operations
                if processing_params.get('resize'):
                    image = image.resize(processing_params['resize'])
                
                if processing_params.get('grayscale'):
                    image = image.convert('L')
                
                if processing_params.get('rotate'):
                    image = image.rotate(processing_params['rotate'])
                
                # Convert back to bytes
                output_buffer = io.BytesIO()
                image.save(output_buffer, format='PNG')
                output_buffer.seek(0)
                
                # Upload processed image
                output_blob_name = blob.name.replace(
                    storage_config['input_prefix'], 
                    storage_config['output_prefix']
                )
                
                output_blob_client = blob_service_client.get_blob_client(
                    container=storage_config['container'], blob=output_blob_name
                )
                output_blob_client.upload_blob(output_buffer.getvalue(), overwrite=True)
                
                processing_time = time.time() - start_time
                
                processed_images.append(output_blob_name)
                processing_stats.append({
                    'input_blob': blob.name,
                    'output_blob': output_blob_name,
                    'processing_time': processing_time,
                    'original_size': image.size,
                    'processed_size': image.size
                })
                
            except Exception as e:
                print(f"Error processing {blob.name}: {e}")
    
    return {
        'processed_count': len(processed_images),
        'total_processing_time': sum(stat['processing_time'] for stat in processing_stats),
        'average_processing_time': np.mean([stat['processing_time'] for stat in processing_stats]) if processing_stats else 0,
        'processed_images': processed_images[:10],  # First 10 for brevity
        'processing_stats': processing_stats[:5]  # First 5 for brevity
    }

# Example usage:
# storage_config = {
#     'account_name': 'yourstorageaccount',
#     'container': 'images',
#     'input_prefix': 'raw/',
#     'output_prefix': 'processed/'
# }
# 
# processing_config = {
#     'resize': (800, 600),
#     'grayscale': True,
#     'rotate': 0
# }
# 
# result = azure_image_processing_pipeline(storage_config, processing_config)
# print(f"Processed {result['processed_count']} images in {result['total_processing_time']:.2f} seconds")

print("Advanced image processing pipeline example defined.")

## Summary

This tutorial covered:

1. **Setup**: Azure authentication and Clustrix installation
2. **VM Integration**: Direct Azure VM configuration
3. **Azure Batch**: Managed job scheduling
4. **CycleCloud**: HPC-optimized clusters with SLURM
5. **Blob Storage**: Data storage and retrieval
6. **Azure ML**: Machine learning compute integration
7. **Security**: Best practices for safe deployment
8. **Cost Management**: Strategies to minimize expenses
9. **Resource Management**: Proper cleanup procedures

### Next Steps

- Set up your Azure credentials and test the basic configuration
- Start with a simple VM for initial testing
- Consider CycleCloud for production HPC workloads
- Implement proper monitoring and cost controls
- Explore Azure Spot VMs for cost-effective batch processing

### Azure-Specific Advantages

- **CycleCloud**: Best-in-class HPC cluster management
- **Azure ML**: Integrated machine learning platform
- **Hybrid Cloud**: Seamless integration with on-premises
- **Enterprise Integration**: Active Directory and enterprise tools
- **Compliance**: Strong compliance and security certifications

### Resources

- [Azure CycleCloud Documentation](https://docs.microsoft.com/en-us/azure/cyclecloud/)
- [Azure Batch Documentation](https://docs.microsoft.com/en-us/azure/batch/)
- [Azure Machine Learning Documentation](https://docs.microsoft.com/en-us/azure/machine-learning/)
- [Azure HPC Documentation](https://docs.microsoft.com/en-us/azure/architecture/topics/high-performance-computing/)
- [Clustrix Documentation](https://clustrix.readthedocs.io/)

**Remember**: Always monitor your Azure costs and clean up resources when not in use!