# GPU Acceleration in PHASTA

This notebook demonstrates how to use GPU acceleration features in PHASTA for improved performance. We'll cover:
- GPU device setup and configuration
- Performance comparison between CPU and GPU
- Best practices for GPU usage
- Memory management and optimization

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from phasta import FlowSolver, FlowConfig, Mesh, GPUConfig
import time

# Set up plotting style
plt.style.use('seaborn-v0_8-whitegrid')

## GPU Device Setup

First, let's check available GPU devices and configure PHASTA to use them.

In [None]:
# Create GPU configuration
gpu_config = GPUConfig()

# List available devices
devices = gpu_config.list_devices()
print("Available GPU devices:")
for i, device in enumerate(devices):
    print(f"Device {i}: {device['name']}")
    print(f"  Memory: {device['memory']} GB")
    print(f"  Compute Capability: {device['compute_capability']}")
    print(f"  Multi-Processor Count: {device['mp_count']}")

# Configure GPU settings
gpu_config.device_id = 0  # Use first GPU
gpu_config.memory_fraction = 0.8  # Use 80% of available memory
gpu_config.enable_tensor_cores = True  # Enable tensor cores if available
gpu_config.precision = 'mixed'  # Use mixed precision for better performance

## Performance Comparison

Let's compare CPU and GPU performance for a simple flow simulation.

In [None]:
# Create base configuration
config = FlowConfig()
config.domain = {
    'width': 1.0,
    'height': 1.0,
    'depth': 1.0,
    'mesh_size': 0.01
}

config.flow = {
    'time_step': 0.001,
    'max_time': 1.0
}

# Create mesh
mesh = Mesh.generate_structured_3d(
    width=config.domain['width'],
    height=config.domain['height'],
    depth=config.domain['depth'],
    nx=100,
    ny=100,
    nz=100
)

In [None]:
# Run CPU simulation
print("Running CPU simulation...")
cpu_start = time.time()
cpu_solver = FlowSolver(config, mesh)
cpu_results = cpu_solver.solve()
cpu_time = time.time() - cpu_start

# Run GPU simulation
print("\nRunning GPU simulation...")
gpu_start = time.time()
gpu_solver = FlowSolver(config, mesh, gpu_config=gpu_config)
gpu_results = gpu_solver.solve()
gpu_time = time.time() - gpu_start

# Print performance comparison
print(f"\nPerformance Comparison:")
print(f"CPU Time: {cpu_time:.2f} seconds")
print(f"GPU Time: {gpu_time:.2f} seconds")
print(f"Speedup: {cpu_time/gpu_time:.2f}x")

## Memory Management

Let's explore how to manage GPU memory efficiently.

In [None]:
# Monitor GPU memory usage
memory_stats = gpu_solver.get_memory_stats()

plt.figure(figsize=(10, 6))
plt.plot(memory_stats['time'], memory_stats['used_memory'], label='Used Memory')
plt.plot(memory_stats['time'], memory_stats['total_memory'], label='Total Memory')
plt.xlabel('Time (s)')
plt.ylabel('Memory (GB)')
plt.title('GPU Memory Usage')
plt.legend()
plt.grid(True)
plt.show()

## Multi-GPU Scaling

Let's examine how performance scales with multiple GPUs.

In [None]:
# Test different GPU configurations
gpu_counts = [1, 2, 4]
times = []

for num_gpus in gpu_counts:
    gpu_config.device_ids = list(range(num_gpus))
    gpu_solver = FlowSolver(config, mesh, gpu_config=gpu_config)
    
    start_time = time.time()
    gpu_solver.solve()
    times.append(time.time() - start_time)

# Plot scaling results
plt.figure(figsize=(10, 6))
plt.plot(gpu_counts, times, 'o-', label='Measured')
plt.plot(gpu_counts, [times[0]/n for n in gpu_counts], '--', label='Ideal')
plt.xlabel('Number of GPUs')
plt.ylabel('Execution Time (s)')
plt.title('Multi-GPU Scaling')
plt.legend()
plt.grid(True)
plt.show()

## Best Practices

Here are some best practices for GPU acceleration in PHASTA:

1. Memory Management:
   - Use appropriate memory fraction
   - Monitor memory usage
   - Free unused resources

2. Performance Optimization:
   - Use mixed precision when possible
   - Enable tensor cores for supported operations
   - Optimize data transfer between CPU and GPU

3. Multi-GPU Usage:
   - Balance workload across GPUs
   - Minimize inter-GPU communication
   - Use appropriate domain decomposition

4. Error Handling:
   - Check for GPU errors
   - Implement fallback to CPU if needed
   - Monitor GPU temperature and power usage

## Exercises

1. Try different memory fractions and observe performance
2. Experiment with different precision settings
3. Test performance with different mesh sizes
4. Compare CPU and GPU results for accuracy

## Next Steps

- Try the parallel computing example
- Explore advanced visualization techniques
- Learn about basic mesh generation