# GPU-Accelerated Simulations

[![Brev](https://img.shields.io/badge/Run%20on-Brev-orange)](https://brev.dev)

**Optimized for Brev GPU instances**

This notebook demonstrates GPU-accelerated large-scale simulations using CUDA/PyTorch.

## Requirements

- GPU with CUDA support (Tesla T4, A100, etc.)
- PyTorch with CUDA
- At least 8GB GPU memory recommended

## What You'll Learn

1. GPU availability detection
2. Vectorized simulation on GPU
3. Batch parameter sweeps with GPU acceleration
4. Performance benchmarking (CPU vs GPU)
5. Large-scale mission simulations (millions of time steps)

## Setup and GPU Detection

In [None]:
import sys
import os
import time

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("⚠ This notebook is optimized for Brev with dedicated GPU.")
    print("It will work on Colab but may be slower.\n")
    !pip install -q torch torchvision
else:
    print("Running on Brev or local GPU environment")

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd

try:
    import torch
    TORCH_AVAILABLE = True
except ImportError:
    print("⚠ PyTorch not found. Installing...")
    !pip install -q torch torchvision
    import torch
    TORCH_AVAILABLE = True

print("✓ Libraries imported")

In [None]:
# GPU Detection and Information
print("="*70)
print("GPU ENVIRONMENT CHECK")
print("="*70)

if torch.cuda.is_available():
    print(f"✓ CUDA Available: YES")
    print(f"  GPU Count:      {torch.cuda.device_count()}")
    print(f"  Current Device: {torch.cuda.current_device()}")
    print(f"  Device Name:    {torch.cuda.get_device_name(0)}")
    print(f"  CUDA Version:   {torch.version.cuda}")
    
    # Memory info
    mem_allocated = torch.cuda.memory_allocated(0) / 1e9
    mem_reserved = torch.cuda.memory_reserved(0) / 1e9
    print(f"  Memory Allocated: {mem_allocated:.2f} GB")
    print(f"  Memory Reserved:  {mem_reserved:.2f} GB")
    
    device = torch.device('cuda')
    USE_GPU = True
else:
    print(f"✗ CUDA Available: NO")
    print(f"  Falling back to CPU")
    device = torch.device('cpu')
    USE_GPU = False

print("="*70)
print(f"\nUsing device: {device}")
print("="*70)

## GPU-Accelerated Primal Logic Simulation

Implement the Primal Logic control law using PyTorch tensors for GPU acceleration.

In [None]:
# Primal Logic constants
D = 149.9992314000
I3 = 6.4939394023
S = D / I3
LAMBDA = 0.16905

def primal_logic_gpu(t_max=100.0, dt=0.01, KE=0.3, batch_size=1, device='cuda'):
    """
    GPU-accelerated Primal Logic simulation with batch processing.
    
    Parameters:
    -----------
    t_max : float
        Maximum simulation time
    dt : float
        Time step
    KE : float or tensor
        Error gain (can be array for batch)
    batch_size : int
        Number of parallel simulations
    device : str
        'cuda' or 'cpu'
    
    Returns:
    --------
    Dictionary with tensors: t, psi, gamma, Ec
    """
    # Time array
    n = int(t_max / dt)
    t = torch.arange(0, t_max, dt, device=device)
    
    # Convert KE to tensor if needed
    if isinstance(KE, (int, float)):
        KE = torch.full((batch_size,), KE, device=device)
    else:
        KE = torch.tensor(KE, device=device)
    
    # State tensors (batch_size x n)
    psi = torch.zeros(batch_size, n, device=device)
    gamma = torch.zeros(batch_size, n, device=device)
    Ec = torch.zeros(batch_size, n, device=device)
    
    # Initial conditions
    psi[:, 0] = 1.0
    gamma[:, 0] = 0.01
    
    # Simulation loop
    for i in range(1, n):
        # Error dynamics
        gamma[:, i] = gamma[:, i-1] * torch.exp(torch.tensor(-0.1 * dt, device=device))
        
        # Control law: dψ/dt = -λ·ψ + KE·error
        dpsi_dt = -LAMBDA * psi[:, i-1] + KE * gamma[:, i]
        psi[:, i] = psi[:, i-1] + dpsi_dt * dt
        
        # Integrate control energy
        Ec[:, i] = Ec[:, i-1] + psi[:, i] * gamma[:, i] * dt
    
    return {
        't': t,
        'psi': psi,
        'gamma': gamma,
        'Ec': Ec
    }

print("✓ GPU simulation function defined")

## Performance Benchmark: CPU vs GPU

Compare execution time for CPU and GPU implementations.

In [None]:
# Benchmark parameters
batch_sizes = [1, 10, 100, 1000]
t_max = 100.0
dt = 0.01

results = {'batch_size': [], 'cpu_time': [], 'gpu_time': [], 'speedup': []}

print("Running benchmarks...\n")
print(f"{'Batch Size':<12} {'CPU Time':<12} {'GPU Time':<12} {'Speedup':<12}")
print("="*50)

for batch_size in batch_sizes:
    # CPU benchmark
    start = time.time()
    result_cpu = primal_logic_gpu(t_max, dt, KE=0.3, batch_size=batch_size, device='cpu')
    cpu_time = time.time() - start
    
    # GPU benchmark (if available)
    if USE_GPU:
        torch.cuda.synchronize()
        start = time.time()
        result_gpu = primal_logic_gpu(t_max, dt, KE=0.3, batch_size=batch_size, device='cuda')
        torch.cuda.synchronize()
        gpu_time = time.time() - start
        speedup = cpu_time / gpu_time
    else:
        gpu_time = None
        speedup = None
    
    # Store results
    results['batch_size'].append(batch_size)
    results['cpu_time'].append(cpu_time)
    results['gpu_time'].append(gpu_time if gpu_time else 0)
    results['speedup'].append(speedup if speedup else 0)
    
    # Print
    if USE_GPU:
        print(f"{batch_size:<12} {cpu_time:<12.4f} {gpu_time:<12.4f} {speedup:<12.2f}x")
    else:
        print(f"{batch_size:<12} {cpu_time:<12.4f} {'N/A':<12} {'N/A':<12}")

print("="*50)
print("\n✓ Benchmark complete")

In [None]:
# Visualize benchmark results
if USE_GPU:
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('Execution Time Comparison', 'GPU Speedup Factor')
    )
    
    # Time comparison
    fig.add_trace(
        go.Scatter(x=results['batch_size'], y=results['cpu_time'], name='CPU', 
                   mode='lines+markers', line=dict(color='blue', width=2)),
        row=1, col=1
    )
    fig.add_trace(
        go.Scatter(x=results['batch_size'], y=results['gpu_time'], name='GPU',
                   mode='lines+markers', line=dict(color='green', width=2)),
        row=1, col=1
    )
    
    # Speedup
    fig.add_trace(
        go.Scatter(x=results['batch_size'], y=results['speedup'], name='Speedup',
                   mode='lines+markers', line=dict(color='red', width=2),
                   fill='tozeroy'),
        row=1, col=2
    )
    
    fig.update_xaxes(title_text="Batch Size", type="log", row=1, col=1)
    fig.update_xaxes(title_text="Batch Size", type="log", row=1, col=2)
    fig.update_yaxes(title_text="Time (s)", type="log", row=1, col=1)
    fig.update_yaxes(title_text="Speedup (x)", row=1, col=2)
    
    fig.update_layout(height=400, title_text="CPU vs GPU Performance")
    fig.show()
    
    print(f"\nMax speedup: {max(results['speedup']):.2f}x at batch size {results['batch_size'][results['speedup'].index(max(results['speedup']))]}")
else:
    print("GPU not available - skipping visualization")

## Massive Parameter Sweep

Run thousands of simulations in parallel to explore the parameter space.

In [None]:
# Define parameter grid
KE_values = torch.linspace(0.0, 1.0, 100, device=device)

print(f"Running {len(KE_values)} simulations in parallel...")

# Run batch simulation
if USE_GPU:
    torch.cuda.synchronize()

start = time.time()
results_sweep = primal_logic_gpu(t_max=50.0, dt=0.01, KE=KE_values, 
                                  batch_size=len(KE_values), device=device)

if USE_GPU:
    torch.cuda.synchronize()

sweep_time = time.time() - start

print(f"✓ Completed {len(KE_values)} simulations in {sweep_time:.3f} seconds")
print(f"  Average time per simulation: {sweep_time/len(KE_values)*1000:.2f} ms")

# Extract metrics
max_psi = torch.max(torch.abs(results_sweep['psi']), dim=1).values.cpu().numpy()
final_Ec = results_sweep['Ec'][:, -1].cpu().numpy()
KE_np = KE_values.cpu().numpy()

In [None]:
# Visualize parameter sweep results
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Max |ψ| vs KE', 'Final Ec vs KE')
)

fig.add_trace(
    go.Scatter(x=KE_np, y=max_psi, mode='lines', line=dict(color='blue', width=2)),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=KE_np, y=final_Ec, mode='lines', line=dict(color='green', width=2)),
    row=1, col=2
)

fig.update_xaxes(title_text="KE (Error Gain)", row=1, col=1)
fig.update_xaxes(title_text="KE (Error Gain)", row=1, col=2)
fig.update_yaxes(title_text="Max |ψ|", row=1, col=1)
fig.update_yaxes(title_text="Final Ec", row=1, col=2)

fig.update_layout(height=400, title_text="Parameter Sweep: KE Sensitivity Analysis", showlegend=False)
fig.show()

# Find optimal KE
optimal_idx = np.argmin(max_psi)
print(f"\nOptimal KE for minimum overshoot: {KE_np[optimal_idx]:.4f}")

## 3D Parameter Space Exploration

Visualize the relationship between KE, time, and control response in 3D.

In [None]:
# Sample parameter space for 3D visualization
n_samples = 20
KE_3d = torch.linspace(0.1, 0.8, n_samples, device=device)

results_3d = primal_logic_gpu(t_max=30.0, dt=0.1, KE=KE_3d, 
                               batch_size=n_samples, device=device)

# Convert to numpy for plotting
t_3d = results_3d['t'].cpu().numpy()
psi_3d = results_3d['psi'].cpu().numpy()
KE_3d_np = KE_3d.cpu().numpy()

# Create 3D surface
fig = go.Figure(data=[go.Surface(
    x=t_3d,
    y=KE_3d_np,
    z=psi_3d,
    colorscale='Viridis',
    hovertemplate='Time: %{x:.2f}s<br>KE: %{y:.3f}<br>ψ: %{z:.4f}<extra></extra>'
)])

fig.update_layout(
    title='3D Control Response Surface',
    scene=dict(
        xaxis_title='Time (s)',
        yaxis_title='KE',
        zaxis_title='ψ(t)',
        camera=dict(eye=dict(x=1.5, y=-1.5, z=1.2))
    ),
    height=600
)

fig.show()

## GPU Memory Profiling

Monitor GPU memory usage during simulation.

In [None]:
if USE_GPU:
    # Clear cache
    torch.cuda.empty_cache()
    
    print("GPU Memory Profile")
    print("="*50)
    
    mem_before = torch.cuda.memory_allocated(0) / 1e6
    print(f"Memory before: {mem_before:.2f} MB")
    
    # Run large simulation
    large_batch = 5000
    print(f"\nRunning {large_batch} simulations...")
    
    result_large = primal_logic_gpu(
        t_max=10.0, dt=0.01, 
        KE=torch.rand(large_batch, device='cuda') * 0.5,
        batch_size=large_batch,
        device='cuda'
    )
    
    mem_during = torch.cuda.memory_allocated(0) / 1e6
    print(f"Memory during: {mem_during:.2f} MB")
    print(f"Memory used:   {mem_during - mem_before:.2f} MB")
    
    # Clear results
    del result_large
    torch.cuda.empty_cache()
    
    mem_after = torch.cuda.memory_allocated(0) / 1e6
    print(f"Memory after:  {mem_after:.2f} MB")
    
    print("="*50)
    print(f"\n✓ Peak memory usage: {mem_during:.2f} MB for {large_batch} parallel simulations")
else:
    print("GPU not available - skipping memory profiling")

## Export Results

Save GPU-accelerated simulation results for further analysis.

In [None]:
# Export parameter sweep results
export_data = pd.DataFrame({
    'KE': KE_np,
    'max_psi': max_psi,
    'final_Ec': final_Ec
})

output_file = 'gpu_parameter_sweep_results.csv'
export_data.to_csv(output_file, index=False)

print(f"✓ Exported results to: {output_file}")
print(f"  Records: {len(export_data)}")
print(f"  Columns: {', '.join(export_data.columns)}")

# Summary statistics
print("\nSummary Statistics:")
print(export_data.describe())

## Conclusion

This notebook demonstrated:

✓ GPU detection and configuration  
✓ GPU-accelerated Primal Logic simulations  
✓ Performance comparison (CPU vs GPU)  
✓ Massive parallel parameter sweeps  
✓ 3D visualization of parameter space  
✓ GPU memory profiling  

**Performance Summary:**
- GPU provides significant speedup for batch simulations
- Optimal for parameter sweeps with hundreds to thousands of runs
- Essential for real-time mission planning and optimization

## Next Steps

- Scale to multi-GPU training
- Implement distributed parameter optimization
- Real-time mission trajectory optimization

---

**Patent Pending:** U.S. Provisional Patent Application No. 63/842,846  
© 2025 Donte Lightfoot - The Phoney Express LLC / Locked In Safety