# ClustriX: Distributed Computing Framework Demo

This notebook demonstrates how ClustriX enables seamless execution of Python functions on remote clusters using a simple `@cluster` decorator.

## 📋 Overview

ClustriX is a Python framework that allows you to:
- Execute functions on remote clusters (SLURM, PBS, SGE, Kubernetes, SSH)
- Automatically parallelize loops across cluster nodes
- Handle GPU detection and GPU-enabled package installation
- Flatten nested functions for remote serialization
- Manage remote environments automatically

## 🔧 Configuration

First, let's understand the configuration structure:

In [None]:
import yaml
from clustrix.config import ClusterConfig

# Load the actual ndoli configuration files
print("📋 Loading Actual ndoli Configuration Files:")
print("=" * 60)

# Load the standalone ndoli config file
with open('ndoli_config.yml', 'r') as f:
    ndoli_standalone_config = yaml.safe_load(f)

print("🔧 Standalone ndoli_config.yml:")
print(yaml.dump(ndoli_standalone_config, default_flow_style=False, indent=2))

print("\n" + "=" * 60)

# Load the main clustrix.yml file
with open('clustrix.yml', 'r') as f:
    main_config = yaml.safe_load(f)

print("🔧 Main clustrix.yml (Ndoli Cluster section):")
ndoli_main_config = main_config.get('Ndoli Cluster', {})
print(yaml.dump(ndoli_main_config, default_flow_style=False, indent=2))

print("\n" + "=" * 60)

# Create a working configuration combining both approaches
# Use the standalone config as the base since it has more complete settings
ndoli_config = ndoli_standalone_config.copy()

# Add some additional settings for the demo
ndoli_config.update({
    # Advanced features for demo
    "auto_parallel": True,
    "use_two_venv": True,
    "gpu_detection_enabled": True,
    "auto_gpu_packages": True,
    "cleanup_remote_files": True,
    
    # VENV setup
    "venv_setup_timeout": 300,
    "conda_env_name": "clustrix_env",
    "use_conda": True,
    
    # GPU settings
    "cuda_version_preference": "11.8",
    "gpu_memory_fraction": 0.9,
    "prefer_gpu_execution": True
})

print("🚀 Working Configuration for Demo:")
print("(Combined from actual config files with demo enhancements)")
print(yaml.dump(ndoli_config, default_flow_style=False, indent=2))

## 🎯 The @cluster Decorator

The `@cluster` decorator is the main interface for remote execution. Here's how it works:

### Basic Usage
```python
@cluster(cores=8, memory="16GB", time="02:00:00")
def my_function(data):
    # Your computation here
    return result
```

### What the Decorator Does
1. **Serializes** the function and its dependencies using cloudpickle/dill
2. **Uploads** the serialized data to the remote cluster
3. **Creates** a conda/venv environment matching your local environment
4. **Generates** and submits a job script (SLURM, PBS, etc.)
5. **Monitors** job execution and downloads results
6. **Cleans up** remote files (optional)

### Advanced Features
- **Loop Parallelization**: Automatically detects and parallelizes `for` loops
- **Function Flattening**: Converts nested functions to flat code for serialization
- **GPU Detection**: Automatically detects and configures GPU resources
- **Environment Management**: Two-VENV architecture for optimal performance

In [None]:
# Let's create a ClusterConfig object
config = ClusterConfig(**ndoli_config)

print("🔧 ClusterConfig Object:")
print(f"  Cluster Type: {config.cluster_type}")
print(f"  Cluster Host: {config.cluster_host}")
print(f"  Username: {config.username}")
print(f"  Remote Work Dir: {config.remote_work_dir}")
print(f"  Auto Parallel: {config.auto_parallel}")
print(f"  GPU Detection: {config.gpu_detection_enabled}")
print(f"  Two VENV Setup: {config.use_two_venv}")

## 🚀 Example 1: Simple Remote Execution

Let's start with a simple function that executes on the cluster:

In [None]:
from clustrix import cluster
import time

@cluster(**ndoli_config)
def simple_cluster_computation(n=1000):
    """Simple computation that runs on the cluster."""
    import math
    import socket
    import os
    
    start_time = time.time()
    
    # Perform some computation
    result = sum(math.sqrt(i) for i in range(n))
    
    end_time = time.time()
    
    return {
        "result": result,
        "computation_time": end_time - start_time,
        "hostname": socket.gethostname(),
        "process_id": os.getpid(),
        "working_directory": os.getcwd(),
        "python_version": os.sys.version,
        "input_size": n
    }

print("🚀 Running simple computation on ndoli cluster...")
print("This will submit a job to the SLURM scheduler and wait for results.")
print("Please be patient - this may take a few minutes.")

# Execute the function
try:
    result = simple_cluster_computation(500)
    
    print("\n✅ Computation completed successfully!")
    print(f"📊 Result: {result['result']:.2f}")
    print(f"⏱️  Computation time: {result['computation_time']:.4f} seconds")
    print(f"🖥️  Executed on: {result['hostname']}")
    print(f"🔢 Process ID: {result['process_id']}")
    print(f"📁 Working directory: {result['working_directory']}")
    print(f"🐍 Python version: {result['python_version'].split()[0]}")
    
except Exception as e:
    print(f"❌ Error during execution: {e}")
    print("This might be due to SSH connection issues or cluster unavailability.")

## 🔄 Example 2: Loop Parallelization

ClustriX can automatically detect and parallelize loops across cluster nodes:

In [None]:
@cluster(
    cluster_type="slurm",
    cluster_host="ndoli.dartmouth.edu",
    username="f002d6b",
    cores=8,  # Request 8 cores for parallelization
    memory="16GB",
    time="01:00:00",
    remote_work_dir="/dartfs-hpc/rc/home/b/f002d6b/clustrix_parallel",
    auto_parallel=True,  # Enable automatic loop parallelization
    use_env_password=True,
    password_env_var="CLUSTRIX_PASSWORD"
)
def parallel_computation(data_size=100):
    """Function with a loop that can be parallelized."""
    import math
    import time
    import socket
    
    start_time = time.time()
    
    # This loop will be automatically parallelized across cores
    results = []
    for i in range(data_size):  # ClustriX detects this loop
        # Simulate some computation
        value = math.sqrt(i) * math.sin(i) * math.cos(i)
        results.append(value)
    
    end_time = time.time()
    
    return {
        "results_count": len(results),
        "sum_results": sum(results),
        "mean_result": sum(results) / len(results) if results else 0,
        "computation_time": end_time - start_time,
        "hostname": socket.gethostname(),
        "data_size": data_size,
        "message": "Loop was automatically parallelized!"
    }

print("🔄 Running parallel computation on ndoli cluster...")
print("ClustriX will detect the loop and parallelize it across 8 cores.")

try:
    result = parallel_computation(50)
    
    print("\n✅ Parallel computation completed!")
    print(f"📊 Processed {result['results_count']} items")
    print(f"📈 Sum of results: {result['sum_results']:.4f}")
    print(f"📊 Mean result: {result['mean_result']:.4f}")
    print(f"⏱️  Total time: {result['computation_time']:.4f} seconds")
    print(f"🖥️  Executed on: {result['hostname']}")
    print(f"✨ {result['message']}")
    
except Exception as e:
    print(f"❌ Error during parallel execution: {e}")

## 🧠 Example 3: Function Flattening (Complex Functions)

ClustriX can handle complex functions with nested functions by automatically "flattening" them:

In [None]:
from clustrix.function_flattening import analyze_function_complexity

@cluster(**ndoli_config)
def complex_nested_function(data_size=50):
    """Function with nested functions that requires flattening."""
    import random
    import socket
    import time
    
    def generate_random_data(size):
        """Generate random data for processing."""
        return [random.random() for _ in range(size)]
    
    def process_data_chunk(chunk):
        """Process a chunk of data."""
        return sum(x * x for x in chunk)
    
    def analyze_results(processed_chunks):
        """Analyze the processed results."""
        if not processed_chunks:
            return {"count": 0, "sum": 0, "mean": 0}
        
        return {
            "count": len(processed_chunks),
            "sum": sum(processed_chunks),
            "mean": sum(processed_chunks) / len(processed_chunks),
            "max": max(processed_chunks),
            "min": min(processed_chunks)
        }
    
    start_time = time.time()
    
    # Main computation using nested functions
    raw_data = generate_random_data(data_size)
    
    # Split into chunks
    chunk_size = 10
    chunks = [raw_data[i:i+chunk_size] for i in range(0, len(raw_data), chunk_size)]
    
    # Process each chunk
    processed = [process_data_chunk(chunk) for chunk in chunks]
    
    # Analyze results
    analysis = analyze_results(processed)
    
    end_time = time.time()
    
    return {
        "data_size": data_size,
        "chunks_processed": len(chunks),
        "analysis": analysis,
        "computation_time": end_time - start_time,
        "hostname": socket.gethostname(),
        "flattening_message": "Nested functions were automatically flattened for remote execution!"
    }

# First, let's analyze the function complexity
print("🧠 Analyzing function complexity...")
complexity = analyze_function_complexity(complex_nested_function)
print(f"📊 Function complexity analysis: {complexity}")

if complexity.get('is_complex', False):
    print("🔧 This function is complex and will be automatically flattened!")
    print(f"📈 Complexity score: {complexity.get('complexity_score', 0)}")
    print(f"🔢 Nested functions: {complexity.get('nested_functions', 0)}")
    print(f"📏 Line count: {complexity.get('line_count', 0)}")
else:
    print("✅ Function is simple and doesn't require flattening.")

print("\n🚀 Running complex function on ndoli cluster...")
try:
    result = complex_nested_function(30)
    
    print("\n✅ Complex function executed successfully!")
    print(f"📊 Data size: {result['data_size']}")
    print(f"📦 Chunks processed: {result['chunks_processed']}")
    print(f"📈 Analysis results:")
    for key, value in result['analysis'].items():
        print(f"   {key}: {value}")
    print(f"⏱️  Computation time: {result['computation_time']:.4f} seconds")
    print(f"🖥️  Executed on: {result['hostname']}")
    print(f"✨ {result['flattening_message']}")
    
except Exception as e:
    print(f"❌ Error during complex execution: {e}")

## 🎮 Example 4: GPU Detection and Simulation

ClustriX can detect GPU capabilities and automatically install GPU-enabled packages:

In [None]:
@cluster(
    cluster_type="slurm",
    cluster_host="ndoli.dartmouth.edu",
    username="f002d6b",
    cores=4,
    memory="8GB",
    gpu_detection_enabled=True,
    auto_gpu_packages=True,
    remote_work_dir="/dartfs-hpc/rc/home/b/f002d6b/clustrix_gpu",
    use_env_password=True,
    password_env_var="CLUSTRIX_PASSWORD"
)
def gpu_detection_demo():
    """Demonstrate GPU detection and simulation."""
    import subprocess
    import socket
    import os
    import importlib.util
    import sys
    
    def check_gpu_availability():
        """Check if GPU is available using multiple methods."""
        gpu_info = {
            "nvidia_smi": False,
            "cuda_available": False,
            "gpu_devices": []
        }
        
        # Check nvidia-smi
        try:
            result = subprocess.run(
                ["nvidia-smi", "--query-gpu=name,memory.total", "--format=csv,noheader"],
                capture_output=True, text=True, timeout=10
            )
            if result.returncode == 0:
                gpu_info["nvidia_smi"] = True
                gpu_info["gpu_devices"] = result.stdout.strip().split('\n')
        except:
            pass
        
        # Check CUDA
        try:
            result = subprocess.run(
                ["nvcc", "--version"], capture_output=True, text=True, timeout=5
            )
            if result.returncode == 0:
                gpu_info["cuda_available"] = True
        except:
            pass
        
        return gpu_info
    
    def check_gpu_packages():
        """Check for GPU-enabled packages."""
        gpu_packages = ["torch", "tensorflow", "cupy", "jax"]
        package_status = {}
        
        for pkg in gpu_packages:
            try:
                spec = importlib.util.find_spec(pkg)
                package_status[pkg] = spec is not None
            except ImportError:
                package_status[pkg] = False
        
        return package_status
    
    def simulate_gpu_computation():
        """Simulate GPU computation (mock)."""
        import random
        import time
        
        start_time = time.time()
        
        # Simulate matrix multiplication
        matrix_size = 100
        result = 0
        for i in range(matrix_size):
            for j in range(matrix_size):
                result += random.random() * random.random()
        
        end_time = time.time()
        
        return {
            "matrix_size": matrix_size,
            "result": result,
            "computation_time": end_time - start_time
        }
    
    # Main execution
    gpu_info = check_gpu_availability()
    packages = check_gpu_packages()
    computation = simulate_gpu_computation()
    
    return {
        "hostname": socket.gethostname(),
        "python_version": sys.version,
        "gpu_detection": gpu_info,
        "gpu_packages": packages,
        "gpu_computation": computation,
        "system_info": {
            "os": os.name,
            "cwd": os.getcwd(),
            "pid": os.getpid()
        }
    }

print("🎮 Running GPU detection demo on ndoli cluster...")
print("This will test GPU detection and package availability.")

try:
    result = gpu_detection_demo()
    
    print("\n✅ GPU detection demo completed!")
    print(f"🖥️  Executed on: {result['hostname']}")
    print(f"🐍 Python version: {result['python_version'].split()[0]}")
    
    print("\n🎮 GPU Detection Results:")
    gpu_info = result['gpu_detection']
    print(f"   nvidia-smi available: {gpu_info['nvidia_smi']}")
    print(f"   CUDA available: {gpu_info['cuda_available']}")
    if gpu_info['gpu_devices']:
        print(f"   GPU devices: {gpu_info['gpu_devices']}")
    
    print("\n📦 GPU Package Status:")
    for pkg, available in result['gpu_packages'].items():
        status = "✅" if available else "❌"
        print(f"   {pkg}: {status}")
    
    print("\n🚀 GPU Computation Simulation:")
    comp = result['gpu_computation']
    print(f"   Matrix size: {comp['matrix_size']}x{comp['matrix_size']}")
    print(f"   Result: {comp['result']:.2f}")
    print(f"   Computation time: {comp['computation_time']:.4f}s")
    
except Exception as e:
    print(f"❌ Error during GPU detection: {e}")

## 📊 Example 5: Comparing Local vs Remote Execution

Let's compare the same computation running locally vs on the cluster:

In [None]:
import time
import socket

def benchmark_computation(n=10000):
    """Benchmark computation for comparison."""
    import math
    
    start_time = time.time()
    
    # Computational task
    result = 0
    for i in range(n):
        result += math.sqrt(i) * math.sin(i / 100) * math.cos(i / 200)
    
    end_time = time.time()
    
    return {
        "result": result,
        "computation_time": end_time - start_time,
        "hostname": socket.gethostname(),
        "iterations": n
    }

# Create cluster version
@cluster(**ndoli_config)
def benchmark_computation_cluster(n=10000):
    """Same computation but on cluster."""
    import math
    import time
    import socket
    
    start_time = time.time()
    
    # Identical computational task
    result = 0
    for i in range(n):
        result += math.sqrt(i) * math.sin(i / 100) * math.cos(i / 200)
    
    end_time = time.time()
    
    return {
        "result": result,
        "computation_time": end_time - start_time,
        "hostname": socket.gethostname(),
        "iterations": n
    }

# Run local computation
print("🖥️  Running computation locally...")
local_result = benchmark_computation(5000)
print(f"✅ Local computation completed in {local_result['computation_time']:.4f}s")
print(f"   Result: {local_result['result']:.2f}")
print(f"   Hostname: {local_result['hostname']}")

# Run cluster computation
print("\n🚀 Running computation on ndoli cluster...")
try:
    cluster_result = benchmark_computation_cluster(5000)
    print(f"✅ Cluster computation completed in {cluster_result['computation_time']:.4f}s")
    print(f"   Result: {cluster_result['result']:.2f}")
    print(f"   Hostname: {cluster_result['hostname']}")
    
    # Compare results
    print("\n📊 Comparison:")
    print(f"   Local time: {local_result['computation_time']:.4f}s")
    print(f"   Cluster time: {cluster_result['computation_time']:.4f}s")
    
    if abs(local_result['result'] - cluster_result['result']) < 0.01:
        print("   ✅ Results match - computation is consistent!")
    else:
        print("   ⚠️  Results differ - may be due to different random seeds")
        
    if local_result['hostname'] != cluster_result['hostname']:
        print("   ✅ Cluster execution verified - different hostnames!")
    else:
        print("   ⚠️  Both executed on same host - cluster execution may have failed")
        
except Exception as e:
    print(f"❌ Cluster computation failed: {e}")

## 🔧 Advanced Configuration Options

ClustriX offers many advanced configuration options:

In [None]:
# Advanced configuration example based on actual ndoli setup
advanced_config = {
    # Basic cluster settings (from actual config)
    "cluster_type": "slurm",
    "cluster_host": "ndoli.dartmouth.edu",
    "username": "f002d6b",
    "remote_work_dir": "/dartfs-hpc/rc/home/b/f002d6b/clustrix_advanced",
    "python_executable": "python3",
    
    # Authentication (environment variables only)
    "use_env_password": True,
    "password_env_var": "CLUSTRIX_PASSWORD",
    
    # Resource management (from actual config)
    "default_cores": 2,
    "default_memory": "4GB",
    "default_time": "00:10:00",
    "default_partition": "standard",
    
    # Environment setup (from actual config)
    "module_loads": ["python"],
    "environment_variables": {"OMP_NUM_THREADS": "1"},
    "pre_execution_commands": [
        "export PATH=/usr/bin:$PATH",
        "which python3 || echo 'Python3 not found in PATH'",
        "module list"
    ],
    
    # Advanced features for production
    "auto_parallel": True,
    "auto_gpu_parallel": True,
    "max_parallel_jobs": 10,
    
    # Environment management
    "use_two_venv": True,
    "venv_setup_timeout": 600,
    "use_conda": True,
    "conda_env_name": "clustrix_gpu",
    
    # GPU settings
    "gpu_detection_enabled": True,
    "auto_gpu_packages": True,
    "cuda_version_preference": "11.8",
    "gpu_memory_fraction": 0.8,
    "prefer_gpu_execution": True,
    "rapids_ecosystem": True,
    
    # File management
    "cleanup_remote_files": True,
    "preserve_logs": True,
    "log_level": "INFO",
    
    # SSH settings
    "ssh_timeout": 30,
    "ssh_port": 22,
    
    # Job monitoring
    "job_poll_interval": 5,
    "max_job_runtime": 7200,  # 2 hours
    "retry_on_failure": True,
    "max_retries": 3
}

print("🔧 Advanced Configuration Options (Based on Actual ndoli Setup):")
print("=" * 60)

# Group settings by category
categories = {
    "🏗️  Basic Settings": ["cluster_type", "cluster_host", "username", "remote_work_dir", "python_executable"],
    "🔐 Authentication": ["use_env_password", "password_env_var"],
    "💾 Resource Management": ["default_cores", "default_memory", "default_time", "default_partition"],
    "🌍 Environment Setup": ["module_loads", "environment_variables", "pre_execution_commands"],
    "⚡ Parallelization": ["auto_parallel", "auto_gpu_parallel", "max_parallel_jobs"],
    "🐍 Environment Management": ["use_two_venv", "use_conda", "conda_env_name", "venv_setup_timeout"],
    "🎮 GPU Settings": ["gpu_detection_enabled", "auto_gpu_packages", "cuda_version_preference", "gpu_memory_fraction"],
    "📁 File Management": ["cleanup_remote_files", "preserve_logs", "log_level"],
    "🔒 SSH Settings": ["ssh_timeout", "ssh_port"],
    "⏰ Job Monitoring": ["job_poll_interval", "max_job_runtime", "retry_on_failure", "max_retries"]
}

for category, keys in categories.items():
    print(f"\n{category}:")
    for key in keys:
        if key in advanced_config:
            value = advanced_config[key]
            if isinstance(value, list):
                print(f"  {key}: {value}")
            elif isinstance(value, dict):
                print(f"  {key}: {value}")
            else:
                print(f"  {key}: {value}")

print("\n💡 Tips for ndoli.dartmouth.edu:")
print("  - Use environment variables for secure authentication")
print("  - Set CLUSTRIX_PASSWORD environment variable for SSH authentication")
print("  - Load 'python' module for Python 3 access")
print("  - Set OMP_NUM_THREADS=1 for optimal performance")
print("  - Use 'standard' partition for regular jobs")
print("  - Default time limit is 10 minutes - adjust as needed")
print("  - Remote work directory uses dartfs-hpc for persistence")
print("  - Set cleanup_remote_files=False for debugging")

## 🎯 Summary

This notebook demonstrated the key features of ClustriX:

### ✅ **Core Features**
- **Simple `@cluster` decorator** for remote execution
- **Automatic loop parallelization** across cluster nodes
- **Function flattening** for complex nested functions
- **GPU detection** and automatic GPU package installation
- **Environment management** with two-VENV architecture
- **Flexible configuration** for different cluster types

### 🚀 **Benefits**
- **Easy to use**: Just add `@cluster` to any function
- **Automatic optimization**: Loop parallelization and GPU detection
- **Robust**: Handles complex functions and environments
- **Flexible**: Works with SLURM, PBS, SGE, SSH, Kubernetes
- **Efficient**: Two-VENV architecture for optimal performance

### 🔧 **Next Steps**
1. **Set up SSH keys** for passwordless authentication
2. **Configure cluster-specific settings** in `clustrix.yml`
3. **Test simple functions** first, then move to complex ones
4. **Monitor job logs** for debugging and optimization
5. **Experiment with parallelization** for performance gains

Happy cluster computing! 🚀