# Parallel Computing in PHASTA

This notebook demonstrates how to use parallel computing features in PHASTA, including:
- MPI parallelization
- Domain decomposition
- Load balancing
- Performance scaling analysis

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from phasta import FlowSolver, FlowConfig, Mesh, ParallelConfig
import time

# Set up plotting style
plt.style.use('seaborn-v0_8-whitegrid')

## MPI Configuration

First, let's set up the parallel computing configuration.

In [None]:
# Create parallel configuration
parallel_config = ParallelConfig()

# Configure MPI settings
parallel_config.mpi = {
    'use_mpi': True,
    'num_processes': 4,  # Number of MPI processes
    'threads_per_process': 2  # OpenMP threads per process
}

# Configure domain decomposition
parallel_config.decomposition = {
    'method': 'metis',  # Use METIS for partitioning
    'balance_criterion': 'elements',  # Balance by number of elements
    'overlap_layers': 2  # Number of overlap layers between domains
}

# Configure communication
parallel_config.communication = {
    'use_nonblocking': True,  # Use non-blocking communication
    'buffer_size': 1024,  # Communication buffer size
    'use_shared_memory': True  # Use shared memory when possible
}

## Domain Decomposition

Let's examine how the domain is decomposed for parallel processing.

In [None]:
# Create base configuration
config = FlowConfig()
config.domain = {
    'width': 1.0,
    'height': 1.0,
    'depth': 1.0,
    'mesh_size': 0.01
}

# Create mesh
mesh = Mesh.generate_structured_3d(
    width=config.domain['width'],
    height=config.domain['height'],
    depth=config.domain['depth'],
    nx=200,
    ny=200,
    nz=200
)

# Create solver with parallel configuration
solver = FlowSolver(config, mesh, parallel_config=parallel_config)

# Get domain decomposition information
decomp_info = solver.get_decomposition_info()

# Plot domain decomposition
plt.figure(figsize=(12, 4))
for i in range(3):
    plt.subplot(1, 3, i+1)
    solver.plot_domain_decomposition(plane=i)
    plt.title(f'Domain Decomposition (Plane {i})')
plt.tight_layout()
plt.show()

## Performance Scaling

Let's analyze how performance scales with the number of processes.

In [None]:
# Test different process counts
process_counts = [1, 2, 4, 8, 16]
times = []

for num_processes in process_counts:
    parallel_config.mpi['num_processes'] = num_processes
    solver = FlowSolver(config, mesh, parallel_config=parallel_config)
    
    start_time = time.time()
    solver.solve()
    times.append(time.time() - start_time)

# Plot scaling results
plt.figure(figsize=(10, 6))
plt.plot(process_counts, times, 'o-', label='Measured')
plt.plot(process_counts, [times[0]/n for n in process_counts], '--', label='Ideal')
plt.xlabel('Number of Processes')
plt.ylabel('Execution Time (s)')
plt.title('Parallel Scaling')
plt.legend()
plt.grid(True)
plt.show()

## Load Balancing

Let's examine the load balancing across processes.

In [None]:
# Get load balancing statistics
load_stats = solver.get_load_balancing_stats()

plt.figure(figsize=(12, 5))

# Plot element distribution
plt.subplot(121)
plt.bar(range(len(load_stats['elements'])), load_stats['elements'])
plt.xlabel('Process ID')
plt.ylabel('Number of Elements')
plt.title('Element Distribution')

# Plot computation time
plt.subplot(122)
plt.bar(range(len(load_stats['compute_time'])), load_stats['compute_time'])
plt.xlabel('Process ID')
plt.ylabel('Computation Time (s)')
plt.title('Computation Time Distribution')

plt.tight_layout()
plt.show()

## Communication Analysis

Let's analyze the communication patterns between processes.

In [None]:
# Get communication statistics
comm_stats = solver.get_communication_stats()

plt.figure(figsize=(12, 5))

# Plot message sizes
plt.subplot(121)
plt.plot(comm_stats['time'], comm_stats['message_sizes'])
plt.xlabel('Time (s)')
plt.ylabel('Message Size (bytes)')
plt.title('Message Sizes')

# Plot communication time
plt.subplot(122)
plt.plot(comm_stats['time'], comm_stats['comm_time'])
plt.xlabel('Time (s)')
plt.ylabel('Communication Time (s)')
plt.title('Communication Time')

plt.tight_layout()
plt.show()

## Best Practices

Here are some best practices for parallel computing in PHASTA:

1. Domain Decomposition:
   - Choose appropriate decomposition method
   - Balance workload across processes
   - Minimize communication overhead

2. Process Configuration:
   - Match process count to available cores
   - Consider memory requirements
   - Use appropriate thread count

3. Communication:
   - Use non-blocking communication when possible
   - Optimize buffer sizes
   - Minimize data transfer

4. Performance Monitoring:
   - Monitor load balancing
   - Track communication patterns
   - Analyze scaling behavior

## Exercises

1. Try different domain decomposition methods
2. Experiment with different process counts
3. Test different load balancing criteria
4. Analyze communication patterns

## Next Steps

- Try the GPU acceleration example
- Explore advanced visualization techniques
- Learn about basic mesh generation