# CuPy + IPyParallel + MPI Distributed Computing Demo

This notebook demonstrates how to combine multiple powerful technologies for distributed, parallel computing:
- **IPyParallel**: For managing parallel computing clusters
- **MPI**: For inter-process communication across nodes
- **CuPy**: For GPU-accelerated computing (optional)

## What This Demo Shows

1. **Cluster Setup**: Creating and managing a parallel computing cluster
2. **Multi-node Execution**: Running code across multiple compute engines
3. **MPI Communication**: Passing data between distributed processes
4. **GPU Integration**: Optional GPU acceleration with CuPy

## Technologies Overview

### IPyParallel
- **Purpose**: Interactive parallel computing in Jupyter
- **Features**: Manages engines, load balancing, and result collection
- **Use Cases**: Distributed computations, parameter sweeps, embarrassingly parallel tasks

### MPI (Message Passing Interface)
- **Purpose**: Standard for parallel computing communication
- **Features**: Point-to-point and collective communication primitives
- **Use Cases**: Scientific computing, distributed simulations, HPC applications

### Integration Benefits
- **Scalability**: From single machine to large clusters
- **Flexibility**: Mix CPU and GPU computing as needed
- **Productivity**: Interactive development with production-scale deployment

## Configuration and Imports

Let's start by configuring our parallel computing environment:

### Cluster Configuration
- **numberOfNodes**: Set to 2 for this demo (can be scaled up)
- **Engine Type**: MPI engines for inter-process communication
- **Libraries**: Import essential parallel computing libraries

### Required Imports
- `mpi4py`: Python bindings for MPI
- `os`: Operating system interface for process management
- `ipyparallel`: Interactive parallel computing framework

**Note**: The number of nodes can be adjusted based on your available resources. For HPC systems, this could be scaled to hundreds or thousands of processes.

In [None]:
numberOfNodes=2
import mpi4py
import os
import ipyparallel as ipp

## Create and Start the Parallel Cluster

Now we'll create an IPyParallel cluster with MPI engines:

### Cluster Parameters
- **engines="mpi"**: Use MPI for inter-engine communication
- **n=numberOfNodes**: Create the specified number of engines
- **controller_ip='*'**: Allow connections from any IP address

### What Happens Here
1. **Cluster Creation**: IPyParallel sets up a cluster manager
2. **Engine Startup**: MPI processes are launched on available resources
3. **Connection**: Client connects to the cluster for sending commands
4. **Synchronization**: Wait for all engines to be ready

**Note**: This step may take a few moments as engines are started and MPI communication is established.

In [None]:
cluster = ipp.Cluster(engines="mpi", n=numberOfNodes,controller_ip='*')
rc = cluster.start_and_connect_sync()

## Verify Cluster Status

Let's verify that our cluster is properly configured and all engines are running:

### Cluster Verification Steps
1. **Wait for Engines**: Ensure all requested engines are available
2. **Create DirectView**: Get a view object for executing code on all engines
3. **Check Engine IDs**: Display the unique identifiers for each engine

### DirectView (`dview`)
- **Purpose**: Execute code synchronously across all engines
- **Usage**: Commands sent to `dview` run on all engines simultaneously
- **Result**: Collects and returns results from all engines

**Expected Output**: You should see engine IDs (typically 0, 1, etc.) indicating successful cluster setup.

In [None]:
rc.wait_for_engines(n=numberOfNodes)
dview=rc[:]
rc.ids

## Test Parallel Execution

Let's test our cluster by running code on specific engines and checking their hostnames:

### IPyParallel Magic Commands
- **`%%px`**: Execute cell code on all engines in parallel
- **`--target 0:2`**: Run only on engines 0 and 1 (range 0 to 2, exclusive)
- **Alternative**: `--target 0:1` would run only on engine 0

### Information Gathering
- **Hostname**: Shows which physical machines the engines are running on
- **Process ID**: Unique identifier for each process (commented out)
- **Distributed Verification**: Confirms code is actually running in parallel

**Expected Output**: You'll see hostnames printed from each engine, potentially showing different machines if running on a cluster.

In [None]:
%%px --target 0:2
#%%px --target 0:1
import os, socket
#print(os.getpid())
print(socket.gethostname())

## MPI Communication Demo

Now for the main demonstration: MPI communication between processes with optional GPU acceleration.

### Key Components

#### 1. **GPU/CPU Selection**
- **`useGPU=False`**: Switch between NumPy (CPU) and CuPy (GPU)
- **Flexibility**: Same code works with both backends
- **Performance**: Set to `True` for GPU acceleration when available

#### 2. **MPI Setup**
- **`MPI.COMM_WORLD`**: Global communicator including all processes
- **`size`**: Total number of processes in the communicator
- **`rank`**: Unique identifier for each process (0, 1, 2, ...)

#### 3. **Communication Pattern**
- **Rank 0 (Sender)**: Creates random data and sends to rank 1
- **Rank 1 (Receiver)**: Receives data from rank 0
- **Point-to-Point**: Direct communication between specific processes
- **Tagged Messages**: Use tag=42 to identify message type

#### 4. **Data Details**
- **Shape**: 4D tensor (N×N×N×N) where N=2
- **Size**: 16 elements total for this demo
- **Type**: Random numbers from normal distribution
- **Scalability**: Can easily increase N for larger data transfers

**Expected Output**: You'll see the sender and receiver processes print the data being transferred, demonstrating successful MPI communication.

In [None]:
%%px
from mpi4py import MPI

useGPU=False

if useGPU:
    import cupy as cp
else:
    import numpy as cp

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

if rank == 0:
    N=2
    data = cp.random.randn(N,N,N,N)
    comm.send(data, dest=1, tag=42)
    print('Process {} sent data:'.format(rank), data)
    
elif rank == 1:
    data = comm.recv(source=0, tag=42)
    print('Process {} received data:'.format(rank), data)

## Key Takeaways and Real-World Applications

This demo showcases the powerful combination of IPyParallel, MPI, and optional GPU computing:

#### **IPyParallel Advantages**
- **Interactive Development**: Test parallel code interactively in Jupyter
- **Easy Scaling**: From laptop to supercomputer with minimal code changes
- **Flexible Execution**: Target specific engines or broadcast to all
- **Result Management**: Automatic collection and synchronization of results

#### **MPI Communication**
- **Standards-Based**: Industry standard for parallel computing
- **Scalable**: From 2 processes to millions of processes
- **Rich Communication**: Point-to-point, collective, and one-sided operations
- **Performance**: Optimized for high-performance networks

#### **GPU Integration**
- **Optional Acceleration**: Same code works with/without GPUs
- **Memory Efficiency**: Process GPU arrays directly without CPU transfer
- **Hybrid Computing**: Mix CPU and GPU computation as needed
