# EFM Density States and Clustering Scale Validation

This notebook performs a high-fidelity simulation to validate the Ehokolo Fluxon Model (EFM) density states and clustering scales on a CoCalc server with 4x NVIDIA H100 GPUs (80 GB VRAM each, total 320 GB), 104 vCPUs, 936 GB RAM, and a 1 TB SSD, running the Colab Docker image. The simulation reproduces the BAO scale (~147 Mpc) through solitonic dynamics and allows larger scales (~628 Mpc) to emerge naturally. It targets a runtime of ~1.3-1.5 hours for a 500^3 grid with 50k steps, optimized for 4-GPU execution with mixed precision, larger chunks, and minimal synchronization, keeping VRAM usage below 200 GB total and RAM below 800 GB. Progress is tracked with `tqdm` bars, with silent chunk processing to minimize screen clutter and robust filesystem error handling.

## Objectives
- Run a simulation on a 500^3 grid (1000 Mpc box) to validate S/T state clustering scales.
- Derive the BAO scale (~147 Mpc) using solitonic dynamics with a tuned mass parameter m.
- Use Gaussian noise as the initial condition to let clustering scales emerge naturally.
- Compute the power spectrum and correlation function to identify clustering scales (147 Mpc, 628 Mpc).
- Validate against DESI BAO data (147.09 ± 0.26 Mpc) and check for the 628 Mpc scale.
- Use local storage (1 TB SSD) for all data.

## Hardware
- GPUs: 4x NVIDIA H100 (80 GB VRAM each, total 320 GB)
- System RAM: 936 GB
- Storage: 1 TB SSD
- Environment: CoCalc with Colab Docker image, PyTorch 2.0+, CUDA 12.x

## Setup Instructions
1. Ensure the CoCalc project runs on the specified server (4x H100).
2. Execute all cells sequentially to run the simulation, or skip to the Compute Final Observables cell to analyze an existing checkpoint.
3. Monitor VRAM (<200 GB total), RAM (<800 GB), and SSD (<1 TB) via `tqdm` postfix.
4. Outputs are saved to local directories (`~/EFM_checkpoints/` and `~/EFM_data/`).
5. Expect a single `tqdm` progress bar for the main simulation, with silent chunk processing and filesystem error resilience.

## Environment Setup

Set up environment variables, install dependencies, create local directories, and verify GPUs.

In [None]:
import os
import subprocess
import torch
import gc
import psutil
from tqdm import tqdm

# Set environment variable to reduce memory fragmentation
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

# Clear GPU memory
torch.cuda.empty_cache()
gc.collect()

# Function to check if a package is installed and meets version requirements
def check_package(package_name, min_version=None):
    try:
        result = subprocess.run(['pip', 'show', package_name], capture_output=True, text=True)
        if result.returncode != 0:
            return False, None
        version_line = [line for line in result.stdout.split('\n') if line.startswith('Version: ')][0]
        version = version_line.split(': ')[1]
        if min_version:
            from pkg_resources import parse_version
            if parse_version(version) < parse_version(min_version):
                return False, version
        return True, version
    except Exception:
        return False, None

# Install required packages if missing
required_packages = [
    ('torch', '2.0.0'),  # Ensure CUDA support for H100
    ('numpy', None),
    ('tqdm', None),
    ('psutil', None),
    ('scipy', None),
    ('matplotlib', None)
]
for pkg_name, min_version in tqdm(required_packages, desc="Checking packages", unit="package"):
    with tqdm(total=1, desc=f"Checking {pkg_name}", leave=False) as pbar:
        installed, version = check_package(pkg_name, min_version)
        if installed:
            print(f"{pkg_name} (version {version}) is already installed.")
        else:
            print(f"Installing {pkg_name}...")
            subprocess.run(['pip', 'install', pkg_name, '--quiet'])
            if pkg_name == 'torch' and min_version:
                subprocess.run(['pip', 'install', f'torch>={min_version}', '--quiet'])
        pbar.update(1)

# Import libraries
import torch
import numpy as np
import time
from scipy.fft import fftn, fftfreq, ifftn
import torch.nn.functional as F
import torch.cuda.amp as amp  # For mixed precision

# Check GPUs and memory
devices = [torch.device(f'cuda:{i}') for i in range(torch.cuda.device_count())]
primary_device = devices[0] if devices else torch.device('cpu')
print(f"Primary device: {primary_device}")
if devices:
    for i, dev in enumerate(devices):
        props = torch.cuda.get_device_properties(dev)
        print(f"GPU {i}: {props.name}, VRAM: {props.total_memory / 1e9:.2f} GB")
print(f"System RAM: {psutil.virtual_memory().total / 1e9:.2f} GB")
try:
    print(f"SSD Storage Available: {psutil.disk_usage('/home/user').free / 1e9:.2f} GB")
except OSError as e:
    print(f"Warning: Could not check disk usage: {e}. Assuming 1000 GB free.")

# Create local directories
checkpoint_path = os.path.expanduser('~/EFM_checkpoints/')
data_path = os.path.expanduser('~/EFM_data/')
with tqdm(total=2, desc="Creating directories", leave=False) as pbar:
    os.makedirs(checkpoint_path, exist_ok=True)
    pbar.update(1)
    os.makedirs(data_path, exist_ok=True)
    pbar.update(1)
print(f"Checkpoints will be saved to: {checkpoint_path}")
print(f"Data will be saved to: {data_path}")

# Verify GPUs
print("Verifying GPUs...")
subprocess.run(['nvidia-smi'])


## Configuration

Set simulation parameters, tuned for a 500^3 grid to reproduce the BAO scale (~147 Mpc).

In [None]:
# Simulation parameters
config = {}
config['N'] = 500  # Grid size (N x N x N)
config['L'] = 1000.0  # Box size (1000 Mpc)
config['dx'] = config['L'] / config['N']  # Spatial step
config['c'] = 3e8  # Wave speed (m/s, speed of light)
config['dt_cfl_factor'] = 0.000007  # Reduced CFL factor for stability
config['dt'] = config['dt_cfl_factor'] * config['dx'] / config['c']  # Time step
config['T'] = 50000  # Total steps (~1 Gyr with smaller dt)
config['chunk_size'] = 125  # Increased for H100 (must divide N evenly)
config['boundary_width_factor'] = 0.0  # Periodic boundaries

# Physical parameters
config['m'] = 4.16e-16  # Mass term (s^-1, tuned for ~147 Mpc)
config['g'] = 0.01  # Cubic nonlinearity
config['eta'] = 0.001  # Quintic nonlinearity
config['k'] = 0.0  # Density scaling (no gravitational term)
config['G'] = 6.674e-11  # Gravitational constant (m^3 kg^-1 s^-2)

# Assign to local variables
N = config['N']
L = config['L']
dx = config['dx']
c = config['c']
dt = config['dt']
T = config['T']
chunk_size = config['chunk_size']
m = config['m']
g = config['g']
eta = config['eta']
k = config['k']
G = config['G']

# Validate chunk_size
if N % chunk_size != 0:
    raise ValueError(f"chunk_size ({chunk_size}) must divide N ({N}) evenly.")

print(f"Grid size: {N} x {N} x {N}")
print(f"Box size: {L} Mpc")
print(f"Total steps: {T}")
print(f"Chunk size: {chunk_size} z-slices per batch")
print(f"Time step: {dt:.2e} seconds (~{dt / (3.156e7):.2f} years)")
print(f"Soliton wavelength (from m): {2 * 3.14159 * c / m / (3.086e22):.2f} Mpc")


## Simulation Setup

- Grid Size: 500 x 500 x 500
- Box Size: 1000 Mpc
- Spatial Step: dx = 2 Mpc
- Time Step: dt ~ 1e9 seconds (~0.032 Myr)
- Steps: 50000 (~1 Gyr)
- Chunk Size: 125 z-slices (optimized for H100)
- Initial Conditions: Gaussian noise (amplitude 0.01)
- Boundary Condition: Periodic
- Equation: Nonlinear Klein-Gordon, no gravitational term

## Numerical Methods
- Integrator: 4th-order Runge-Kutta (RK4), optimized for 4 GPUs with mixed precision
- Laplacian: 7-point stencil convolution
- Boundary Conditions: Periodic
- Power Spectrum: Full 3D FFT
- Correlation Function: FFT-based

## Parameter Justifications
- m = 4.16e-16 s^-1: Sets ~147 Mpc scale
- g = 0.01, eta = 0.001: Ensures soliton stability
- k = 0.0: Removes destabilizing term
- c = 3e8 m/s: Cosmological scale
- dt_cfl_factor = 0.000007: Ensures stability

In [None]:
import torch
import numpy as np
from tqdm import tqdm
import torch.nn.functional as F
import torch.cuda.amp as amp

# Parameter set
param_sets = [{"m": m, "g": g, "eta": eta, "k": k, "label": "Baseline"}]
boundary_conditions = ["periodic"]
results = []

# Potential function
def potential(phi, m, g, eta, k):
    return m**2 * phi + g * phi**3 + eta * phi**5

# Damping mask (periodic boundaries)
def create_damping_mask(N, boundary_width, damping_factor, device):
    return torch.ones((N, N, N), dtype=torch.float16, device=device)

# Convolution-based Laplacian with mixed precision
def conv_laplacian(phi, dx, device):
    try:
        with amp.autocast():
            stencil = torch.tensor([[[0, 0, 0], [0, 1, 0], [0, 0, 0]],
                                    [[0, 1, 0], [1, -6, 1], [0, 1, 0]],
                                    [[0, 0, 0], [0, 1, 0], [0, 0, 0]]],
                                   dtype=torch.float16, device=device)
            stencil = stencil / (dx**2)
            stencil = stencil.view(1, 1, 3, 3, 3)
            phi = phi.view(1, 1, phi.shape[0], phi.shape[1], phi.shape[2])
            try:
                laplacian = F.conv3d(phi, stencil, padding=1, padding_mode='circular')
            except TypeError:
                phi_padded = torch.nn.functional.pad(phi, (1, 1, 1, 1, 1, 1), mode='circular')
                laplacian = F.conv3d(phi_padded, stencil, padding=0)
            return laplacian.view(phi.shape[2], phi.shape[3], phi.shape[4])
    except Exception as e:
        print(f"Error in conv_laplacian: {e}")
        raise

# NLKG derivative with mixed precision
def nlkg_derivative(phi, phi_dot, m, g, eta, k, damping_mask):
    try:
        with torch.no_grad():
            with amp.autocast():
                laplacian = conv_laplacian(phi, dx, phi.device)
                if phi.shape != damping_mask.shape or laplacian.shape != phi.shape:
                    raise ValueError(f"Shape mismatch: phi {phi.shape}, damping_mask {damping_mask.shape}, laplacian {laplacian.shape}")
                phi.mul_(damping_mask)
                phi_dot.mul_(damping_mask)
                dV_dphi = 2 * m**2 * phi + 3 * g * phi**2 + 5 * eta * phi**4
                phi_ddot = c**2 * laplacian - dV_dphi
                return phi, phi_ddot
    except Exception as e:
        print(f"Error in nlkg_derivative: {e}")
        raise

# RK4 integrator with multi-GPU support and mixed precision
def update_phi_rk4_chunked(phi, phi_dot, dt, m, g, eta, k, damping_mask, chunk_size, devices):
    try:
        with torch.no_grad():
            if phi.shape != phi_dot.shape or phi.shape != damping_mask.shape:
                raise ValueError(f"Shape mismatch: phi {phi.shape}, phi_dot {phi_dot.shape}, damping_mask {damping_mask.shape}")
            phi_new = torch.empty_like(phi, device=phi.device, dtype=torch.float16)
            phi_dot_new = torch.empty_like(phi_dot, device=phi_dot.device, dtype=torch.float16)
            num_gpus = len(devices)
            total_chunks = (phi.shape[0] + chunk_size - 1) // chunk_size
            for i in range(0, phi.shape[0], chunk_size):
                chunk = slice(i, min(i + chunk_size, phi.shape[0]))
                gpu_idx = (i // chunk_size) % num_gpus
                dev = devices[gpu_idx]
                stream = torch.cuda.Stream(device=dev)
                with torch.cuda.stream(stream):
                    with amp.autocast():
                        phi_chunk = phi[chunk].to(dev, non_blocking=True)
                        phi_dot_chunk = phi_dot[chunk].to(dev, non_blocking=True)
                        damping_chunk = damping_mask[chunk].to(dev, non_blocking=True)
                        temp = torch.empty_like(phi_chunk, device=dev, dtype=torch.float16)
                        k1_v, k1_a = nlkg_derivative(phi_chunk, phi_dot_chunk, m, g, eta, k, damping_chunk)
                        temp.copy_(phi_chunk + 0.5 * dt * k1_v)
                        k2_v, k2_a = nlkg_derivative(temp, phi_dot_chunk + 0.5 * dt * k1_a, m, g, eta, k, damping_chunk)
                        temp.copy_(phi_chunk + 0.5 * dt * k2_v)
                        k3_v, k3_a = nlkg_derivative(temp, phi_dot_chunk + 0.5 * dt * k2_a, m, g, eta, k, damping_chunk)
                        temp.copy_(phi_chunk + dt * k3_v)
                        k4_v, k4_a = nlkg_derivative(temp, phi_dot_chunk + dt * k3_a, m, g, eta, k, damping_chunk)
                        phi_new[chunk] = phi_chunk + (dt / 6.0) * (k1_v + 2 * k2_v + 2 * k3_v + k4_v)
                        phi_dot_new[chunk] = phi_dot_chunk + (dt / 6.0) * (k1_a + 2 * k2_a + 2 * k3_a + k4_a)
                        phi_new[chunk].clamp_(-5, 5)
                        phi_dot_new[chunk].clamp_(-5, 5)
                        del phi_chunk, phi_dot_chunk, damping_chunk, temp, k1_v, k1_a, k2_v, k2_a, k3_v, k3_a, k4_v, k4_a
            return phi_new, phi_dot_new
    except Exception as e:
        print(f"Error in update_phi_rk4_chunked: {e}")
        raise

# Energy calculation with mixed precision
def compute_energy(phi, phi_dot, m, g, eta, k, chunk_size, dx, c):
    try:
        with torch.no_grad():
            total_energy = 0.0
            kinetic_total = 0.0
            gradient_total = 0.0
            potential_total = 0.0
            num_chunks = 0
            total_chunks = (phi.shape[0] + chunk_size - 1) // chunk_size
            for i in range(0, phi.shape[0], chunk_size):
                chunk = slice(i, min(i + chunk_size, phi.shape[0]))
                phi_chunk = phi[chunk]
                phi_dot_chunk = phi_dot[chunk]
                if torch.any(torch.isinf(phi_chunk)) or torch.any(torch.isnan(phi_chunk)):
                    print(f"Warning: phi contains inf or nan in chunk {i}")
                    return float('inf'), float('inf'), float('inf'), float('inf')
                if torch.any(torch.isinf(phi_dot_chunk)) or torch.any(torch.isnan(phi_dot_chunk)):
                    print(f"Warning: phi_dot contains inf or nan in chunk {i}")
                    return float('inf'), float('inf'), float('inf'), float('inf')
                with amp.autocast():
                    kinetic = 0.5 * phi_dot_chunk**2
                    potential_energy = 0.5 * m**2 * phi_chunk**2 + 0.25 * g * phi_chunk**4 + 0.1667 * eta * phi_chunk**6
                    gradient = torch.zeros_like(phi_chunk, dtype=torch.float16)
                    for d in range(3):
                        grad_d = torch.gradient(phi_chunk, spacing=dx, dim=d)[0]
                        gradient += grad_d**2
                    gradient *= 0.5 * c**2
                    kinetic_mean = torch.mean(kinetic).item() if not torch.isnan(kinetic).any() else 0.0
                    gradient_mean = torch.mean(gradient).item() if not torch.isnan(gradient).any() else 0.0
                    potential_mean = torch.mean(potential_energy).item() if not torch.isnan(potential_energy).any() else 0.0
                    kinetic_total += kinetic_mean
                    gradient_total += gradient_mean
                    potential_total += potential_mean
                    num_chunks += 1
            kinetic_total /= num_chunks
            gradient_total /= num_chunks
            potential_total /= num_chunks
            total_energy = kinetic_total + gradient_total + potential_total
            return total_energy, kinetic_total, gradient_total, potential_total
    except Exception as e:
        print(f"Error in compute_energy: {e}")
        raise

# Power spectrum calculation
def compute_power_spectrum(phi, k_range=[0.005, 0.1], chunk_size=125, dx=1.0, N=500):
    try:
        fft_result = np.zeros((phi.shape[0], phi.shape[1], phi.shape[2]), dtype=np.complex64)
        total_chunks = (phi.shape[0] + chunk_size - 1) // chunk_size
        pbar = tqdm(range(0, phi.shape[0], chunk_size), desc="Computing FFT for power spectrum", leave=False, unit="chunk", total=total_chunks)
        for i in pbar:
            chunk = slice(i, min(i + chunk_size, phi.shape[0]))
            phi_chunk = phi[chunk].cpu().numpy()
            fft_chunk = fftn(phi_chunk)
            fft_result[chunk] = fft_chunk
            del phi_chunk, fft_chunk
            gc.collect()
        pbar.close()
        kx = fftfreq(N, d=dx)
        ky = fftfreq(N, d=dx)
        kz = fftfreq(N, d=dx)
        kx, ky, kz = np.meshgrid(kx, ky, kz, indexing='ij')
        k = np.sqrt(kx**2 + ky**2 + kz**2)
        power = np.abs(fft_result)**2
        k_bins = np.linspace(k_range[0], k_range[1], 50)
        power_binned = np.zeros(len(k_bins) - 1)
        pbar_bin = tqdm(range(len(k_bins) - 1), desc="Binning power spectrum", leave=False, unit="bin")
        for i in pbar_bin:
            mask = (k >= k_bins[i]) & (k < k_bins[i + 1])
            power_binned[i] = np.mean(power[mask]) if np.any(mask) else 0
        pbar_bin.close()
        del fft_result, kx, ky, kz, k, power
        gc.collect()
        return k_bins[:-1], power_binned
    except Exception as e:
        print(f"Error in compute_power_spectrum: {e}")
        raise

# Correlation function
def compute_correlation_function(phi, chunk_size=125, dx=1.0, N=500):
    try:
        fft_result = np.zeros((phi.shape[0], phi.shape[1], phi.shape[2]), dtype=np.complex64)
        total_chunks = (phi.shape[0] + chunk_size - 1) // chunk_size
        pbar = tqdm(range(0, phi.shape[0], chunk_size), desc="Computing FFT for correlation", leave=False, unit="chunk", total=total_chunks)
        for i in pbar:
            chunk = slice(i, min(i + chunk_size, phi.shape[0]))
            phi_chunk = phi[chunk].cpu().numpy()
            fft_chunk = fftn(phi_chunk)
            fft_result[chunk] = fft_chunk
            del phi_chunk, fft_chunk
            gc.collect()
        pbar.close()
        power = np.abs(fft_result)**2
        corr = ifftn(power).real
        indices = np.arange(-N//2, N//2)
        x, y, z = np.meshgrid(indices, indices, indices, indexing='ij')
        r = np.sqrt(x**2 + y**2 + z**2) * dx
        r_bins = np.linspace(0, 500, 50)
        corr_binned = np.zeros(len(r_bins) - 1)
        pbar_bin = tqdm(range(len(r_bins) - 1), desc="Binning correlation function", leave=False, unit="bin")
        for i in pbar_bin:
            mask = (r >= r_bins[i]) & (r < r_bins[i + 1])
            corr_binned[i] = np.mean(corr[mask]) if np.any(mask) else 0
        pbar_bin.close()
        del fft_result, power, corr, x, y, z, r
        gc.collect()
        return r_bins[:-1], corr_binned / np.max(corr_binned) if np.max(corr_binned) != 0 else corr_binned
    except Exception as e:
        print(f"Error in compute_correlation_function: {e}")
        raise


## Precompute Initial Conditions

Compute and save initial fields to disk.

In [None]:
import numpy as np
import torch
import os
import gc
from tqdm import tqdm

# Precompute initial conditions
init_path = f"{data_path}initial_conditions_N{N}.npz"
if not os.path.exists(init_path):
    with tqdm(total=2, desc="Computing initial conditions", leave=False) as pbar:
        try:
            phi_ST = np.random.normal(0, 1, (N, N, N)).astype(np.float32) * 0.01
            phi_dot_ST = np.zeros((N, N, N), dtype=np.float32)
            pbar.update(1)
            np.savez_compressed(init_path, phi_ST=phi_ST, phi_dot_ST=phi_dot_ST)
            pbar.update(1)
            print("Initial conditions saved.")
        except Exception as e:
            print(f"Error computing initial conditions: {e}")
            raise
else:
    with tqdm(total=1, desc="Loading initial conditions", leave=False) as pbar:
        try:
            init_data = np.load(init_path)
            phi_ST = init_data['phi_ST']
            phi_dot_ST = init_data['phi_dot_ST']
            pbar.update(1)
            print("Initial conditions loaded.")
        except Exception as e:
            print(f"Error loading initial conditions: {e}")
            raise

# Load to GPU
with tqdm(total=3, desc="Preparing tensors", leave=False) as pbar:
    phi_ST_tensor = torch.from_numpy(phi_ST).to(primary_device, dtype=torch.float16, non_blocking=True)
    pbar.update(1)
    phi_dot_ST_tensor = torch.from_numpy(phi_dot_ST).to(primary_device, dtype=torch.float16, non_blocking=True)
    pbar.update(1)
    damping_mask = torch.ones((N, N, N), dtype=torch.float16, device=primary_device)
    pbar.update(1)

# Validate
if phi_ST_tensor.shape != (N, N, N) or phi_dot_ST_tensor.shape != (N, N, N) or damping_mask.shape != (N, N, N):
    raise ValueError(f"Shape mismatch: phi_ST {phi_ST_tensor.shape}, phi_dot_ST {phi_dot_ST_tensor.shape}, damping_mask {damping_mask.shape}")
if torch.any(torch.isnan(phi_ST_tensor)) or torch.any(torch.isinf(phi_ST_tensor)):
    raise ValueError("phi_ST_tensor contains NaN or Inf values")
if torch.any(torch.isnan(phi_dot_ST_tensor)) or torch.any(torch.isinf(phi_dot_ST_tensor)):
    raise ValueError("phi_dot_ST_tensor contains NaN or Inf values")

print("Initial conditions prepared on device:", primary_device)


## Main Simulation Loop

Runs the simulation and saves a checkpoint if completed, with partial checkpointing on interruption.

In [None]:
import torch
import numpy as np
from tqdm import tqdm
import psutil
import time
import os
import gc

# Confirmation prompt
confirm = input(f"Are you sure you want to run the main simulation ({N}^3 grid, {T} steps)? This should take approximately {(T * 0.1) / 3600:.2f} hours. Type 'yes' to proceed: ")
if confirm.lower() != 'yes':
    print("Simulation aborted.")
else:
    for param in param_sets:
        for boundary_type in boundary_conditions:
            print(f"Running simulation: {param['label']}, Boundary: {boundary_type}")
            energy_history = np.zeros(2, dtype=np.float32)
            kinetic_history = np.zeros(2, dtype=np.float32)
            gradient_history = np.zeros(2, dtype=np.float32)
            potential_history = np.zeros(2, dtype=np.float32)
            history_idx = 0
            start_time = time.time()
            last_disk_free = 1000.0  # Default in GB
            start_step = 0

            # Check for partial checkpoint
            partial_checkpoint = max([f for f in os.listdir(checkpoint_path) if f.startswith(f"checkpoint_{param['label']}_{boundary_type}_partial_step") and f.endswith(f"_N{N}.npz")], key=lambda x: int(x.split('_step')[1].split('_')[0]), default=None)
            if partial_checkpoint:
                partial_checkpoint = os.path.join(checkpoint_path, partial_checkpoint)
                with tqdm(total=3, desc="Loading partial checkpoint", leave=False) as pbar:
                    try:
                        checkpoint_data = np.load(partial_checkpoint)
                        phi_ST_tensor = torch.from_numpy(checkpoint_data['phi_ST']).to(primary_device, dtype=torch.float16, non_blocking=True)
                        pbar.update(1)
                        phi_dot_ST_tensor = torch.from_numpy(checkpoint_data['phi_dot_ST']).to(primary_device, dtype=torch.float16, non_blocking=True)
                        pbar.update(1)
                        energy_history = checkpoint_data['energy_history']
                        kinetic_history = checkpoint_data['kinetic_history']
                        gradient_history = checkpoint_data['gradient_history']
                        potential_history = checkpoint_data['potential_history']
                        start_step = int(checkpoint_data['last_step']) + 1
                        history_idx = 1
                        print(f"Resuming from partial checkpoint at step {start_step}")
                        pbar.update(1)
                    except Exception as e:
                        print(f"Error loading partial checkpoint: {e}. Starting from step 0.")
                        start_step = 0

            if start_step == 0:
                with tqdm(total=1, desc="Computing initial energy", leave=False) as pbar_energy:
                    total_energy, kinetic, gradient, pot_energy = compute_energy(phi_ST_tensor, phi_dot_ST_tensor, param['m'], param['g'], param['eta'], param['k'], chunk_size, dx, c)
                    pbar_energy.update(1)
                if np.isinf(total_energy) or np.isnan(total_energy):
                    print("Error: Initial energy is invalid (Inf or NaN). Check input tensors.")
                    break
                energy_history[history_idx] = total_energy
                kinetic_history[history_idx] = kinetic
                gradient_history[history_idx] = gradient
                potential_history[history_idx] = pot_energy
                history_idx += 1

            pbar = tqdm(range(start_step, T), desc=f"Simulation Progress ({param['label']}, {boundary_type})", unit="step")
            for t in pbar:
                try:
                    phi_ST_tensor, phi_dot_ST_tensor = update_phi_rk4_chunked(phi_ST_tensor, phi_dot_ST_tensor, dt, param['m'], param['g'], param['eta'], param['k'], damping_mask, chunk_size, devices)
                    if torch.any(torch.isnan(phi_ST_tensor)) or torch.any(torch.isinf(phi_ST_tensor)):
                        print(f"Error at step {t}: phi_ST_tensor contains NaN or Inf values")
                        break
                    if torch.any(torch.isnan(phi_dot_ST_tensor)) or torch.any(torch.isinf(phi_dot_ST_tensor)):
                        print(f"Error at step {t}: phi_dot_ST_tensor contains NaN or Inf values")
                        break
                except Exception as e:
                    print(f"Error at step {t}: {e}")
                    # Save partial checkpoint
                    try:
                        with tqdm(total=1, desc="Saving partial checkpoint", leave=False) as pbar_save:
                            np.savez_compressed(
                                f"{checkpoint_path}checkpoint_{param['label']}_{boundary_type}_partial_step{t}_N{N}.npz",
                                phi_ST=phi_ST_tensor.cpu().numpy(),
                                phi_dot_ST=phi_dot_ST_tensor.cpu().numpy(),
                                energy_history=energy_history,
                                kinetic_history=kinetic_history,
                                gradient_history=gradient_history,
                                potential_history=potential_history,
                                last_step=t
                            )
                            pbar_save.update(1)
                        print(f"Partial checkpoint saved at step {t}")
                    except Exception as save_e:
                        print(f"Error saving partial checkpoint at step {t}: {save_e}")
                    break

                if t % 500 == 0:  # Check resources less frequently
                    vram_used = sum(torch.cuda.memory_allocated(dev) / 1e9 for dev in devices)
                    vram_reserved = sum(torch.cuda.memory_reserved(dev) / 1e9 for dev in devices)
                    ram_used = psutil.virtual_memory().used / 1e9
                    try:
                        disk_free = psutil.disk_usage('/home/user').free / 1e9
                        last_disk_free = disk_free
                    except OSError as e:
                        print(f"Warning: Disk usage check failed at step {t}: {e}. Using last known value: {last_disk_free:.2f} GB")
                        disk_free = last_disk_free
                    pbar.set_postfix({'VRAM': f'{vram_used:.2f}GB', 'RAM': f'{ram_used:.2f}GB', 'Disk Free': f'{disk_free:.2f}GB'})
                    if vram_used > 200 or vram_reserved > 280 or ram_used > 800 or disk_free < 50:
                        print(f"Warning: Resource usage high at step {t}")
                        # Save partial checkpoint
                        try:
                            with tqdm(total=1, desc="Saving partial checkpoint", leave=False) as pbar_save:
                                np.savez_compressed(
                                    f"{checkpoint_path}checkpoint_{param['label']}_{boundary_type}_partial_step{t}_N{N}.npz",
                                    phi_ST=phi_ST_tensor.cpu().numpy(),
                                    phi_dot_ST=phi_dot_ST_tensor.cpu().numpy(),
                                    energy_history=energy_history,
                                    kinetic_history=kinetic_history,
                                    gradient_history=gradient_history,
                                    potential_history=potential_history,
                                    last_step=t
                                )
                                pbar_save.update(1)
                            print(f"Partial checkpoint saved at step {t}")
                        except Exception as save_e:
                            print(f"Error saving partial checkpoint at step {t}: {save_e}")
                        break

            pbar.close()

            try:
                if t == T - 1:
                    with tqdm(total=1, desc="Computing final energy", leave=False) as pbar_energy:
                        total_energy, kinetic, gradient, pot_energy = compute_energy(phi_ST_tensor, phi_dot_ST_tensor, param['m'], param['g'], param['eta'], param['k'], chunk_size, dx, c)
                        pbar_energy.update(1)
                    energy_history[history_idx] = total_energy
                    kinetic_history[history_idx] = kinetic
                    gradient_history[history_idx] = gradient
                    potential_history[history_idx] = pot_energy

                    with tqdm(total=1, desc="Saving final checkpoint", leave=False) as pbar_save:
                        np.savez_compressed(
                            f"{checkpoint_path}checkpoint_{param['label']}_{boundary_type}_{T}_N{N}.npz",
                            phi_ST=phi_ST_tensor.cpu().numpy(),
                            phi_dot_ST=phi_dot_ST_tensor.cpu().numpy(),
                            energy_history=energy_history,
                            kinetic_history=kinetic_history,
                            gradient_history=gradient_history,
                            potential_history=potential_history
                        )
                        pbar_save.update(1)
                    print(f"Checkpoint saved at step {T}")
                else:
                    print(f"Simulation stopped early at step {t}. No final checkpoint saved.")
            except Exception as e:
                print(f"Error saving final checkpoint: {e}")

            end_time = time.time()
            runtime = end_time - start_time
            print(f"Simulation completed in {runtime:.2f} seconds (~{runtime / 3600:.2f} hours)")

            torch.cuda.empty_cache()
            gc.collect()


## Compute Final Observables

Load checkpoint and compute power spectrum and correlation function.

In [None]:
import torch
import numpy as np
from tqdm import tqdm
import gc

# Parameters to match checkpoint
N = 500
T = 50000
L = 1000.0
dx = L / N
chunk_size = 125
m = 4.16e-16
g = 0.01
eta = 0.001
k = 0.0
G = 6.674e-11
c = 3e8
label = "Baseline"
boundary_type = "periodic"

# Load checkpoint
checkpoint_file = f"{checkpoint_path}checkpoint_{label}_{boundary_type}_{T}_N{N}.npz"
with tqdm(total=1, desc="Loading checkpoint", leave=False) as pbar:
    try:
        checkpoint_data = np.load(checkpoint_file)
        pbar.update(1)
    except FileNotFoundError:
        print(f"Checkpoint file {checkpoint_file} not found.")
        raise

with tqdm(total=2, desc="Loading tensors", leave=False) as pbar:
    phi_ST = torch.from_numpy(checkpoint_data['phi_ST']).to(primary_device, dtype=torch.float16, non_blocking=True)
    pbar.update(1)
    phi_dot_ST = torch.from_numpy(checkpoint_data['phi_dot_ST']).to(primary_device, dtype=torch.float16, non_blocking=True)
    pbar.update(1)
energy_history = checkpoint_data['energy_history']
kinetic_history = checkpoint_data['kinetic_history']
gradient_history = checkpoint_data['gradient_history']
potential_history = checkpoint_data['potential_history']
print("Checkpoint loaded.")

# Validate tensors
if torch.any(torch.isinf(phi_ST)) or torch.any(torch.isnan(phi_ST)):
    print("Warning: phi_ST contains inf or nan values.")
if torch.any(torch.isinf(phi_dot_ST)) or torch.any(torch.isnan(phi_dot_ST)):
    print("Warning: phi_dot_ST contains inf or nan values.")

# Compute observables
try:
    density_norm = torch.sum(phi_ST**2).item() * k
    if np.isinf(density_norm) or np.isnan(density_norm):
        print("Warning: Density norm is inf or nan.")

    with tqdm(total=1, desc="Computing power spectrum", leave=False) as pbar:
        k_bins, power_spectrum = compute_power_spectrum(phi_ST, k_range=[0.005, 0.1], chunk_size=chunk_size, dx=dx, N=N)
        pbar.update(1)
    with tqdm(total=1, desc="Computing correlation function", leave=False) as pbar:
        r, corr_func = compute_correlation_function(phi_ST, chunk_size=chunk_size, dx=dx, N=N)
        pbar.update(1)

    results.append({
        'params': {"m": m, "g": g, "eta": eta, "k": k, "label": label},
        'boundary': boundary_type,
        'density_norm': density_norm,
        'power_spectrum': (k_bins, power_spectrum),
        'correlation_function': (r, corr_func),
        'energy_history': energy_history,
        'runtime': 5400  # Placeholder: 1.5 hours
    })
    print("Final observables computed.")
except Exception as e:
    print(f"Error computing observables: {e}")

torch.cuda.empty_cache()
gc.collect()


## Validation Against Public Datasets

Validate clustering scales against DESI BAO data.

In [None]:
import numpy as np
from tqdm import tqdm

# Validation
for result in tqdm(results, desc="Validating results", unit="result"):
    print(f"\nValidation for {result['params']['label']}, Boundary: {result['boundary']}")
    print(f"Density Norm (S/T): {result['density_norm']}")
    r_peak = result['correlation_function'][0][np.argmax(result['correlation_function'][1])]
    print(f"Clustering Scale (Correlation): {r_peak:.2f} Mpc (DESI BAO: 147.09 ± 0.26 Mpc, EFM Expected: ~147 Mpc, ~628 Mpc)")
    k_peak = result['power_spectrum'][0][np.argmax(result['power_spectrum'][1])]
    lambda_peak = 2 * np.pi / k_peak if k_peak != 0 else float('inf')
    print(f"Clustering Scale (Power Spectrum): {lambda_peak:.2f} Mpc (DESI BAO: 147.09 ± 0.26 Mpc, EFM Expected: ~147 Mpc, ~628 Mpc)")

# Save results
with tqdm(total=1, desc="Saving results", leave=False) as pbar:
    try:
        np.save(f"{data_path}simulation_results_N{N}.npy", results)
        pbar.update(1)
        print(f"Results saved to {data_path}simulation_results_N{N}.npy")
    except Exception as e:
        print(f"Error saving results: {e}")


## Post-Processing: Generate Plots

Visualize field distributions, energy, power spectrum, and correlation function.

In [None]:
import matplotlib.pyplot as plt
import torch
import numpy as np
from tqdm import tqdm

# Parameters
N = 500
T = 50000
L = 1000.0
dx = L / N
chunk_size = 125
label = "Baseline"
boundary_type = "periodic"

# Plot
final_checkpoint = f"{checkpoint_path}checkpoint_{label}_{boundary_type}_{T}_N{N}.npz"
if os.path.exists(final_checkpoint):
    try:
        for plot_type in tqdm(["field", "energy", "power_spectrum", "correlation"], desc="Generating plots", unit="plot"):
            if plot_type == "field":
                with tqdm(total=1, desc="Plotting field", leave=False) as pbar:
                    plt.figure(figsize=(10, 8))
                    plt.imshow(phi_ST[N//2, :, :].cpu().numpy(), extent=[-L/2, L/2, -L/2, L/2], cmap='viridis')
                    plt.colorbar(label='phi_ST')
                    plt.title(f'S/T Field (z=0) at Step {T}')
                    plt.xlabel('x (Mpc)')
                    plt.ylabel('y (Mpc)')
                    plt.savefig(f"{data_path}field_ST_{label}_{boundary_type}_N{N}_final.png")
                    plt.close()
                    pbar.update(1)

            elif plot_type == "energy":
                with tqdm(total=1, desc="Plotting energy", leave=False) as pbar:
                    plt.figure(figsize=(10, 5))
                    plt.plot(energy_history, label='Total Energy')
                    plt.plot(kinetic_history, label='Kinetic', linestyle='--')
                    plt.plot(gradient_history, label='Gradient', linestyle='-.')
                    plt.plot(potential_history, label='Potential', linestyle=':')
                    plt.xlabel('Step (0 and End)')
                    plt.ylabel('Energy')
                    plt.title(f'Energy Evolution ({label}, {boundary_type})')
                    plt.legend()
                    plt.grid()
                    plt.savefig(f"{data_path}energy_{label}_{boundary_type}_N{N}_final.png")
                    plt.close()
                    pbar.update(1)

            elif plot_type == "power_spectrum":
                with tqdm(total=1, desc="Plotting power spectrum", leave=False) as pbar:
                    k_bins, power_spectrum = compute_power_spectrum(phi_ST, chunk_size=chunk_size, dx=dx, N=N)
                    plt.figure(figsize=(10, 5))
                    plt.loglog(k_bins, power_spectrum, label='Power Spectrum')
                    plt.axvline(x=2 * np.pi / 147, color='r', linestyle='--', label='147 Mpc')
                    plt.axvline(x=2 * np.pi / 628, color='g', linestyle='--', label='628 Mpc')
                    plt.xlabel('k (Mpc^-1)')
                    plt.ylabel('P(k)')
                    plt.title(f'Power Spectrum ({label}, {boundary_type})')
                    plt.legend()
                    plt.grid()
                    plt.savefig(f"{data_path}power_spectrum_{label}_{boundary_type}_N{N}_final.png")
                    plt.close()
                    pbar.update(1)

            elif plot_type == "correlation":
                with tqdm(total=1, desc="Plotting correlation", leave=False) as pbar:
                    r, corr_func = compute_correlation_function(phi_ST, chunk_size=chunk_size, dx=dx, N=N)
                    plt.figure(figsize=(10, 5))
                    plt.plot(r, corr_func, label='Correlation Function')
                    plt.axvline(x=147, color='r', linestyle='--', label='147 Mpc')
                    plt.axvline(x=628, color='g', linestyle='--', label='628 Mpc')
                    plt.xlabel('r (Mpc)')
                    plt.ylabel('Correlation')
                    plt.title(f'Correlation Function ({label}, {boundary_type})')
                    plt.legend()
                    plt.grid()
                    plt.savefig(f"{data_path}correlation_{label}_{boundary_type}_N{N}_final.png")
                    plt.close()
                    pbar.update(1)
    except Exception as e:
        print(f"Error in post-processing: {e}")
else:
    print(f"Checkpoint file {final_checkpoint} not found.")


## Parameter Justifications

- m = 4.16e-16 s^-1: Sets ~147 Mpc scale
- g = 0.01, eta = 0.001: Ensures soliton stability
- k = 0.0: Removes destabilizing term
- c = 3e8 m/s: Cosmological scale
- dt_cfl_factor = 0.000007: Ensures stability
- Initial Conditions: Gaussian noise
- Boundary Condition: Periodic

## Next Steps

- Run test simulation to verify stability
- Perform full simulation
- Validate against DESI, SDSS datasets
- Draft LaTeX paper

## Test Mode

Run a small-scale simulation to debug.

In [None]:
import torch
import numpy as np
from tqdm import tqdm
import psutil

# Test mode
N_test = 100
L_test = 1000.0
dx_test = L_test / N_test
dt_cfl_factor = 0.000007
dt_test = dt_cfl_factor * dx_test / c
T_test = min(T, 10)
chunk_size_test = 25

with tqdm(total=3, desc="Preparing test tensors", leave=False) as pbar:
    phi_ST_test = torch.from_numpy(np.random.normal(0, 1, (N_test, N_test, N_test)).astype(np.float32) * 0.01).to(primary_device, dtype=torch.float16, non_blocking=True)
    pbar.update(1)
    phi_dot_ST_test = torch.zeros((N_test, N_test, N_test), device=primary_device, dtype=torch.float16)
    pbar.update(1)
    damping_mask_test = torch.ones((N_test, N_test, N_test), device=primary_device, dtype=torch.float16)
    pbar.update(1)

pbar = tqdm(range(T_test), desc="Test Simulation Progress", unit="step")
for t in pbar:
    try:
        param = param_sets[0]
        phi_ST_test, phi_dot_ST_test = update_phi_rk4_chunked(phi_ST_test, phi_dot_ST_test, dt_test, param['m'], param['g'], param['eta'], param['k'], damping_mask_test, chunk_size_test, devices)
        vram_used = sum(torch.cuda.memory_allocated(dev) / 1e9 for dev in devices)
        ram_used = psutil.virtual_memory().used / 1e9
        try:
            disk_free = psutil.disk_usage('/home/user').free / 1e9
        except OSError as e:
            print(f"Warning: Disk usage check failed at test step {t}: {e}. Assuming 1000 GB free.")
            disk_free = 1000.0
        pbar.set_postfix({'VRAM': f'{vram_used:.2f}GB', 'RAM': f'{ram_used:.2f}GB', 'Disk Free': f'{disk_free:.2f}GB'})
    except Exception as e:
        print(f"Test simulation failed at step {t}: {e}")
        break
pbar.close()
print("Test simulation completed.")
