Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration by jameslehoux · Pull Request #188 · BASE-Laboratory/OpenImpala

jameslehoux · 2026-03-22T21:05:41Z

Summary

This is a comprehensive modernization of OpenImpala's solver infrastructure, spanning architecture, performance, and extensibility improvements across 80 files.

Architecture & Refactoring

Decompose monolithic Diffusion.cpp into reusable modules (Refactor Deconstruct the Monolithic Diffusion Driver #51)
Extract HypreStructSolver base class for shared HYPRE infrastructure
Extract TortuositySolverBase shared base class for tortuosity solvers
Centralize duplicated solver/direction helpers into SolverConfig.H
Consolidate duplicated SolverType enums into a shared definition (Refactor Consolidate Duplicated SolverType Enums #50)
Add boundary condition abstraction layer for configurable transport BCs

New Solvers

Implement HYPRE PCG solver for TortuosityHypre
Implement TortuosityMLMG: matrix-free AMReX MLMG solver (Epic: Intelligent Solver Routing and Matrix-Free AMReX MLMG Integration #171)

Fortran → C++ Migration

Migrate Fortran compute kernels to native C++ AMReX ParallelFor lambdas (Migrate Fortran Compute Kernels to native C++ AMReX Lambdas #52)
Extract shared GPU-compatible FloodFill utility, deduplicating 4 copies

GPU Acceleration

Enable GPU acceleration for HYPRE linear solvers (Enable and Optimize GPU Acceleration for Solve #25)
GPU-accelerate remaining CPU-only loops across all solvers and utilities
Add CUDA GPU wheel build workflow for PyPI distribution

New Features

Add microstructural parameterization engine with Cauchy-Crofton SSA correction (Feature Request: Expand OpenImpala into a comprehensive Microstructural Parameterization Engine for DFN models #170)
Add interactive profiling and tuning Colab notebook

Bug Fixes & Stability

Fix EffDiff matrix fill: restore Neumann BC diagonal and RHS signs from Fortran
Fix EffDiff solver NaN divergence and PFMG convergence
Fix SSA face counting at box boundaries and ghost cell assertions
Fix FloodFill DeviceScalar build error
Fix implicit widening conversion in ConnectedComponents.cpp
Fix const-correctness, missing includes, and AMReX compatibility across solvers
Multiple rounds of code review fixes (28+ bugs addressed)

Test Coverage

Add test coverage for PCG solver, Diffusion.cpp, and EffDiffusivityHypre (Epic: Improve Code Coverage (Current: 27.31%) #109)

Test plan

CI build passes (CMake + CTest in dependency container)
All existing regression benchmarks pass (uniform block, series layers, parallel layers)
New PCG solver tests pass
EffDiff and tortuosity tests produce correct analytical results
clang-format and clang-tidy checks pass
GPU build compiles with CUDA-enabled AMReX

The solve() method previously only implemented FlexGMRES — all other SolverType enum values (including PCG) hit amrex::Abort(). This adds the PCG path with SMG preconditioner, matching the FlexGMRES pattern. PCG is the optimal Krylov method for symmetric positive-definite systems. The single-phase diffusion Laplacian with harmonic-mean face coefficients is SPD, so PCG converges in fewer iterations than FlexGMRES. This is now the default solver when using the Python API with solver="auto". https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

Three categories of new tests targeting the largest coverage gaps: 1. PCG Solver Regression Benchmarks - tRegressionBenchmark_uniform_PCG: validates PCG on uniform 32^3 domain against tau = (N-1)/N analytical solution - tRegressionBenchmark_series_PCG: validates PCG on heterogeneous series-layer geometry against harmonic-mean analytical bound These exercise the new PCG code path in TortuosityHypre::solve(). 2. Diffusion.cpp Integration Tests (0% -> estimated ~60%) - tDiffusion_dryrun: exercises dry-run mode (file I/O, volume fraction, percolation checks in all 3 directions, no solver) - tDiffusion_tiff: full homogenization pipeline on TIFF data - tDiffusion_hdf5: exercises HDF5 reader branch in Diffusion.cpp These cover the main application entry point (642 uncovered lines). 3. Synthetic EffectiveDiffusivityHypre Benchmark - tSyntheticEffectiveDiffusivity: creates uniform 16^3 domain, solves cell problem in 3 directions, validates chi_k ≈ 0 (analytically correct for uniform medium with periodic BCs), checks active mask completeness and solver convergence. Also corrects issue deprecation note: Tortuosity_filcc.F90 is NOT dead code — tortuosity_remspot is still called by TortuosityHypre. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

Break the 1080-line Diffusion.cpp god-function into focused, reusable components: 1. ImageLoader (src/io/ImageLoader.H/cpp) - Unified image loading: detects format from extension (.tif/.tiff, .h5/.hdf5, .dat), instantiates the correct reader, thresholds, and returns an ImageData struct with geometry, BoxArray, DistributionMapping, and phase field. - Eliminates the duplicated read-threshold-copy boilerplate that was repeated 3 times in Diffusion.cpp (once per file format). 2. DeffTensor (src/props/DeffTensor.H/cpp) - Extracts the D_eff tensor assembly from corrector fields chi_k. - Pure function: calculateDeffTensor(tensor, chi_x, chi_y, chi_z, mask, geom, verbose). - Was previously an anonymous-namespace function duplicated in both Diffusion.cpp and tEffectiveDiffusivity.cpp. 3. REVStudy (src/props/REVStudy.H/cpp) - Encapsulates the entire REV convergence study workflow: sub-volume extraction, cell-problem solves, tensor assembly, and CSV output. - Configuration via REVConfig struct. - Was ~250 lines of deeply nested code in Diffusion.cpp. 4. Diffusion.cpp rewritten as thin orchestrator (1080 → 657 lines) - main() now: parse config → loadImage() → dispatch to runDryRunChecks / runREVStudy / runHomogenization / runFlowThrough. - Each calculation mode is a focused static function. - No physics logic changes — identical output for all modes. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

…51) Extract stringToSolverType() and stringToDirection() helpers duplicated across 5 files into a single shared header (SolverConfig.H). Removes ~144 lines of duplicated code. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

Replace all four Fortran kernel files with pure C++ implementations: - TortuosityHypreFill.F90 → TortuosityHypreFill.H (tortuosityFillMatrix) - EffDiffFillMtx.F90 → EffDiffFillMtx.H (effDiffFillMatrix) - Tortuosity_filcc.F90 → TortuosityKernels.H (removeIsolatedCells) - Tortuosity_poisson_3d.F90 → TortuosityKernels.H (computeFaceFluxes, forwardEulerUpdate, integrateBoundaryFluxes) The new kernels use amrex::ParallelFor with C++ lambdas and Array4 accessors for the flux/update/fio routines, preparing the codebase for GPU offloading. The HYPRE matrix-fill routines remain sequential (they fill flat buffers for HYPRE) but are now native C++. Legacy Fortran-only routines (tortuosity_filct, tortuosity_filbc, tortuosity_filic) that were never called from C++ have been dropped. Build system updated: Fortran removed from CMake LANGUAGES, MPI and OpenMP Fortran components removed, GNUmakefile and CI workflows updated. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

- TortuosityHypreFill.H: add nval consistency check (AMREX_ALWAYS_ASSERT), remove unused len_x/len_y variables that would trigger -Wunused-variable - EffDiffFillMtx.H: include TortuosityHypreFill.H for shared constants (STN_C, CELL_ACTIVE, SMALL_REAL etc.) that were undefined — compilation would have failed. Add AMREX_SPACEDIM==3 guards on Z-face stencil entries matching the original Fortran. Add post-loop m_idx==npts assertion. - TortuosityHypre.cpp: remove unused phase_iab/p_ptr/pbox variables left over from the old Fortran call signature — would fail with -Werror. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

Move the identical SolverType enum from TortuosityHypre and EffectiveDiffusivityHypre into Tortuosity.H as OpenImpala::SolverType. Both solver classes now use a `using SolverType = OpenImpala::SolverType` alias for backward compatibility. The duplicate tortuositySolverType() and effDiffSolverType() functions in SolverConfig.H are consolidated into a single parseSolverType(), with the old names retained as deprecated thin wrappers. Python bindings bind the enum once and alias EffDiffSolverType to SolverType. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

…15) Deduplicate ~165 lines of HYPRE lifecycle code (grid, stencil, matrix/vector creation, solver dispatch, cleanup) shared between TortuosityHypre and EffectiveDiffusivityHypre into a common HypreStructSolver base class. - HypreStructSolver owns HYPRE handles and provides setupGrid(), setupStencil(), createMatrixAndVectors(), assembleSystem(), and runSolver() methods - runSolver() supports FlexGMRES and PCG with configurable SMG/PFMG preconditioner via PrecondType enum, giving EffectiveDiffusivityHypre PCG support it lacked - TortuosityHypre inherits from both Tortuosity and HypreStructSolver - EffectiveDiffusivityHypre inherits from HypreStructSolver - Safe MPI_Finalized guard in base destructor for Python binding shutdown - CMakeLists.txt updated to include HypreStructSolver.cpp https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

…Cs (#16) Introduces a BoundaryCondition strategy pattern that decouples BC logic from the tortuosity fill kernel. Users can now configure inlet/outlet and lateral BCs at runtime via input file parameters (bc.inlet_outlet, bc.sides, bc.value_lo, bc.value_hi). New files: - BoundaryCondition.H: BCType enum, parseBCType(), abstract base class, and four concrete implementations (DirichletExternal, DirichletPhase- Boundary, Neumann, Periodic) with factory function createBC(). Key changes: - TortuosityHypreFill.H: Interior stencil separated from BC application; new BC-aware overload takes BoundaryCondition pointers. - TortuosityHypre: Parses bc.* ParmParse params, creates BC objects, passes them through setupMatrixEquation. Adds failed plotfile output when solver does not converge. - Python bindings: BCType enum exposed; solver BC type getters added. - Tests: Two new CTest cases (explicit DirichletExternal, Periodic sides). https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

Implements six new capabilities for extracting DFN battery model parameters directly from 3D voxel images: Phase 1 - Foundation: - SpecificSurfaceArea: counts interface faces between phases (SSA = faces/volume) - MacroGeometry: extracts electrode thickness, cross-section from AMReX Geometry - Multi-phase volume fractions in ResultsJSON output Phase 2 - Profiling: - ThroughThicknessProfile: per-slice VF profile along any direction Phase 3 - Particle Size Distribution: - ConnectedComponents: parallel flood-fill CCL reusing PercolationCheck pattern - ParticleSizeDistribution: equivalent sphere radii from component volumes Integration: - ResultsJSON extended with microstructure block and BPX DFN parameters - Diffusion.cpp wired with microstructure.* ParmParse namespace - Synthetic test (tSyntheticMicrostructure) with 7 analytical validation cases - Catch2 unit tests for MacroGeometry and ResultsJSON microstructure setters https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

…bias The raw voxel face-counting SSA overestimates true surface area by ~50% for isotropic surfaces due to the staircase/laddering effect on curved interfaces. This commit adds: - value_corrected() method applying the 2/3 Cauchy-Crofton factor - boundary_padding constructor parameter to exclude outermost N voxel layers, eliminating artificial interfaces from truncated features - Both raw and corrected SSA emitted in ResultsJSON - Diffusion.cpp reads microstructure.ssa_boundary_padding from inputs - Tests: corrected vs raw ratio, boundary padding reduction, diagonal plane demonstrating laddering bias https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

Add full GPU support infrastructure for accelerating the computationally intensive linear solve phase via CUDA or HIP backends: Build system: - CMakeLists.txt: GPU_BACKEND option (NONE/CUDA/HIP), auto-enables CUDA/HIP language and defines OPENIMPALA_USE_GPU/OPENIMPALA_USE_CUDA - FindHYPRE.cmake: discovers GPU library variants (libHYPRE_cuda/hip) - Container def: conditional GPU flags for AMReX and HYPRE builds HYPRE device memory: - HypreStructSolver.cpp: HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE) and HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE) when GPU enabled GPU matrix fill kernels: - TortuosityHypreFill.H: tortuosityFillMatrixGpu() using ParallelFor with AMREX_GPU_DEVICE lambdas and Array4<> accessors - EffDiffFillMtx.H: effDiffFillMatrixGpu() same pattern GPU data path in solvers: - TortuosityHypre.cpp: #ifdef GPU path with DeviceVector buffers, GPU kernel dispatch, device-to-host copy for BC application - EffectiveDiffusivityHypre.cpp: same pattern CPU path is preserved unchanged behind #else for non-GPU builds. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu

…als-BFzd5' into claude/review-branch-changes-8fGVe

…MP, GPU, BCs - HypreStructSolver: restore GMRES, BiCGSTAB, SMG, PFMG, Jacobi solver support that was dropped during base class extraction (regression) - REVStudy: use rank-independent RNG seed so all MPI ranks agree on sub-volume coordinates (was producing corrupt data in multi-rank runs) - DeffTensor: replace non-portable OMP reduction on 2D C array with thread-local accumulator + critical section (GCC < 9 compatibility) - HypreStructSolver: remove HYPRE_MEMORY_DEVICE for matrix/vectors since SetBoxValues is called with host pointers after device-to-host copy - BoundaryCondition: fix DirichletPhaseBoundaryBC quarter-domain heuristic (was pinning 25% of interior cells); narrow to 1-cell boundary layer. Also handle ambiguous dual-flag case for very small domains - BoundaryCondition: remove redundant per-cell initial guess computation in both Dirichlet BC classes (interior kernel already sets this) - TortuosityHypre: replace iMultiFab::sum() with manual valid-cell-only loop to avoid potential ghost cell inclusion in volume fraction - ImageLoader: add RawReader include and informative error for .raw files (requires dimensions/type, so directs users to RawReader directly) https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW

Initial header defining the TortuosityMLMG class interface. This solver uses AMReX's native MLABecLaplacian operator and MLMG geometric multigrid instead of HYPRE, eliminating matrix assembly overhead for small grids (e.g., Google Colab tutorials). Implementation pending architectural decision on shared base class extraction. https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW

Extracts solver-independent logic (phase management, diffusion coefficient construction, activity mask generation, flux integration) into a common base class. Subclasses only need to implement solve(). https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW

…hase 2) Adds TortuosityMLMG.cpp implementing the full solver pipeline using AMReX's native MLABecLaplacian + MLMG geometric multigrid instead of HYPRE. No explicit matrix assembly - uses matrix-free operator application for lower memory footprint and faster setup on small grids. - Constructor: phase preconditioning, diffusion coefficient field, activity mask via parallel flood fill (same algorithm as HYPRE solver) - solve(): configures MLABecLaplacian with Dirichlet/Neumann BCs, harmonic-mean face B-coefficients, and runs MLMG V-cycle solver - globalFluxes() + computePlaneFluxes(): identical flux integration and conservation checking as TortuosityHypre - value(): tortuosity calculation with full flux conservation validation Also adds: - CMake build integration (TortuosityMLMG.cpp in openimpala_props) - pybind11 bindings (TortuosityMLMG class with same API as HYPRE) - Python facade: solver="mlmg" routes to the new solver backend https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW

- Replace LoopOnCpu with ParallelFor+AMREX_GPU_DEVICE for GPU acceleration (diffusion coefficients, initial guess, B-coefficients, mask intersection) - Use ReduceOps for GPU-safe reductions (active cell count, boundary fluxes) - Fix convergence: wrap MLMG solve in try/catch instead of arbitrary *100 check - Fix iteration count: use mlmg.getNumIters() instead of hardcoded maxiter - Fix plotfile: convert iMultiFab->MultiFab via ParallelFor (not amrex::Copy) - Clean up B-coefficient loop: remove redundant assignments and confused comments - Add Gpu::streamSynchronize before CPU-side plane flux accumulation https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu

The identical flood fill algorithm was duplicated across TortuosityMLMG, TortuosityHypre, PercolationCheck, and ConnectedComponents (~160 lines each). Extract into FloodFill.H/cpp with: - ParallelFor + AMREX_GPU_DEVICE for GPU acceleration - Gpu::DeviceScalar atomic flag for early termination - Parameterized label value (FLOOD_ACTIVE=1 default, custom for CCL) - Shared collectBoundarySeeds with MPI Allgatherv + dedup Net: -652 lines duplicated code, +210 lines shared utility. https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu

Convert LoopOnCpu patterns to GPU-compatible equivalents throughout the codebase: - DeffTensor: 9-component ReduceOps for D_eff tensor gradient accumulation - TortuosityHypre: ReduceOps for boundary flux integration, ParallelFor with atomic scatter-add for plane flux profiles, ReduceOps for active cell counting, device lookup table for diff coeff map fill, ParallelFor for binary traversable mask - TortuosityMLMG: ParallelFor with atomic scatter-add for plane flux profiles - EffectiveDiffusivityHypre: ReduceOps for ManualSumActiveCells, device lookup table for diff coeff map fill, ParallelFor for plotfile mask copy - Diffusion/REVStudy: ParallelFor for active mask generation - VolumeFraction: ReduceOps for phase cell counting - SpecificSurfaceArea: ReduceOps for interface face counting - ConnectedComponents: ParallelFor with atomic scatter-add for component volume counting - ThroughThicknessProfile: ParallelFor with atomic scatter-add for per-slice phase counting Remaining LoopOnCpu instances are justified: HYPRE host buffer copy-back, sequential seed search, boundary seed collection, and I/O from host memory. https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu

Key fixes: - HYPRE_Init/Finalize ordering (must be inside amrex::Initialize/Finalize) - Remove OMP parallel around HYPRE Set/GetBoxValues (data race) - Replace MPI_COMM_WORLD with amrex::ParallelDescriptor::Communicator() - Fix DeffTensor sign convention (1+dchi, not 1-dchi) - Fix REVStudy ghost cell fill and MPI-incorrect nested MFIter copy - Fix EffDiffFillMtx RHS at pore-solid interfaces (mask-guarded reads) - Fix dangling const& members in solver/utility classes (store by value) - Add GPU stream synchronization before host-side loops - Add integer overflow assertions for HYPRE int casts - Complete TortuosityMLMG refactor to inherit from TortuositySolverBase - Extract shared HypreCheck.H macro, add TortuositySolverBase.cpp - Add input validation, error handling, and HYPRE_ClearAllErrors https://claude.ai/code/session_01QKvcKHR7GgCG2UNapN1GP7

Critical fixes: - DeffTensor.cpp: Fix sign error in tensor assembly (minus -> plus) to match the documented formula D_eff[i][j] = (1/V) * sum{ delta_ij + d(chi_j)/d(x_i) } - EffDiffFillMtx.H: Remove spurious diagonal inflation at internal Neumann BCs (inactive neighbor faces should not contribute to diagonal), fix flux_bc_rhs sign to match correct Neumann condition D*(grad(chi)+e_k).n=0, and use one-sided differences for RHS at active/inactive boundaries instead of central differences that read D=0 from inactive cells - TortuosityMLMG.cpp: Fix bdryLo() returning box outside domain (inlet flux was always 0, producing NaN/inf tortuosity). Use domain.smallEnd() slice instead High-severity fixes: - TortuosityMLMG.cpp: Fix convergence threshold from 100*eps to eps - TortuosityMLMG.cpp: Use mlmg.getNumIters() instead of hardcoded maxiter - tSyntheticEffectiveDiffusivity.cpp: Add missing solver->solve() call before checking getSolverConverged() (m_converged defaults to false) Lower-severity fixes: - TortuosityHypre.cpp: Restore m_write_plotfile guard on failed-solve plotfile - SpecificSurfaceArea: Clamp face-counting box to valid region to prevent out-of-bounds ghost cell access at internal FAB boundaries https://claude.ai/code/session_01NEQqrvJNCrPn4mpyjcMsHv

Merges 9 bug fixes from the CeIgl review branch, with conflict resolution against the PMNDO 28-bug-fix commit already merged. Key unique fixes incorporated from CeIgl: - EffDiffFillMtx.H: Remove spurious diagonal inflation at internal Neumann boundaries (was inflating matrix diagonal incorrectly) - EffDiffFillMtx.H: Fix flux_bc_rhs signs for Neumann BC correction - EffDiffFillMtx.H: Use one-sided differences at pore-solid interfaces instead of mirroring (more accurate for non-uniform D fields) - tSyntheticEffectiveDiffusivity.cpp: Add missing solve() call before checking convergence - TortuositySolverBase.cpp: Fix bdryLo() returning face-centered box outside cell domain (was causing zero inlet flux / NaN tortuosity) https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Consolidate feature branches: GPU acceleration, MLMG solver, microstructural parameterization, and bug fixes

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix clang-format violations across 23 source files

AMReX was built with AMReX_FORTRAN=ON, so the project must enable Fortran before find_package(AMReX). Added Fortran to the LANGUAGES list in the project() declaration. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Add Fortran to project languages to fix AMReX CMake error

AMReX 25.03 DeviceScalar lacks setVal(). Use ifdef guards to handle GPU and CPU paths separately — plain int on CPU, reconstructed DeviceScalar on GPU. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix FloodFill build error: DeviceScalar has no setVal member

Box::grow() mutates in place and is non-const. Copy the domain box before calling grow() to avoid discarding const qualifiers. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix const-correctness error in SpecificSurfaceArea

AMReX does not provide ReduceLongLongSum. Use ReduceLongSum with long types instead. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix ReduceLongLongSum not found in AMReX ParallelDescriptor

ResultsJSON.H uses AMREX_SPACEDIM and AMREX_D_DECL but did not include AMReX_SPACE.H, causing build failures when included without other AMReX headers pulling it in transitively. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Add missing AMReX_SPACE.H include in ResultsJSON.H

The unit test compiles without AMReX (OPENIMPALA_UNIT_TEST defined). Guard the AMReX_SPACE.H include and provide fallback definitions for AMREX_SPACEDIM and AMREX_D_DECL, matching the pattern in PhysicsConfig.H. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Guard AMReX includes in ResultsJSON.H for unit test mode

The unit test compiles without AMReX (OPENIMPALA_UNIT_TEST defined). Guard the AMReX_SPACE.H include and provide fallback definitions for AMREX_SPACEDIM and AMREX_D_DECL, matching the pattern in PhysicsConfig.H. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Two test failures fixed: 1. tEffectiveDiffusivity: The periodic cell problem div(D grad(chi)) = -div(D e_k) produces a singular linear system (constant null space). Without pinning, HYPRE FlexGMRES diverges with NaN residuals. Fix: pin the first active cell to chi=0 after matrix fill, removing the null space without affecting D_eff. Uses MPI_MINLOC to coordinate the pinned cell across ranks. 2. tSyntheticMicrostructure: SpecificSurfaceArea requires iMultiFab with >= 1 ghost cell for neighbor access across MFIter box boundaries. Tests 1, 2, 2b, 2c, 2d were allocating with 0 ghost cells. Fix: allocate with 1 ghost cell, initialize with setVal(0), and call FillBoundary() after filling valid data. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix EffDiff solver NaN divergence and SSA ghost cell assertion

SSA: Remove overly aggressive vbx.bigEnd(d)-1 clamping that prevented face counting at multi-box boundaries. The interface at the box boundary was missed because the check box was shrunk below the valid cell range. Ghost cells are already guaranteed filled (nGrow>=1 asserted). EffDiff: Switch preconditioner from PFMG to SMG for the periodic cell problem. PFMG diverges (NaN at iteration 1) on the periodic structured grid with a pinned null-space cell. SMG handles periodic grids more robustly, matching the approach used by TortuosityHypre. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix SSA face counting at box boundaries and EffDiff PFMG divergence

Cast mpi_size to size_t before multiplication to satisfy bugprone-implicit-widening-of-multiplication-result clang-tidy check. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

…om Fortran The C++ migration of effDiffFillMatrix (commit 2995552) introduced two bugs compared to the original Fortran kernel: 1. Missing diagonal contribution for inactive neighbors: the Fortran adds D_c/dx^2 to the center stencil entry when a face neighbor is inactive (internal Neumann ghost cell). The C++ kernel omitted this, producing near-zero or zero diagonals that made the matrix singular/ill-conditioned, causing HYPRE SMG/PFMG to produce NaN on iteration 1. 2. Reversed RHS flux BC signs: the Fortran uses +D_c/dx for -face and -D_c/dx for +face. The C++ had these swapped. 3. Unnecessary one-sided RHS differences: the Fortran always uses central differences for -div(D e_k) regardless of neighbor activity (ghost cells provide the needed data). The C++ added conditional one-sided stencils that were incorrect. Also reverts the SMG workaround back to PFMG (the proper preconditioner for periodic structured grids), since the root cause was the broken matrix. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Fix EffDiff matrix fill: restore Neumann BC diagonal and RHS signs from Fortran

Add pypi-wheels-gpu.yml that builds GPU-accelerated wheels using NVIDIA's CUDA 12.6 manylinux container (sameli/manylinux_2_28_x86_64_cuda_12.6). Key changes: - HYPRE built with --with-cuda for GPU-accelerated structured solvers - AMReX built with -DAMReX_GPU_BACKEND=CUDA targeting compute capabilities 6.0 through 9.0 (Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper) - OpenImpala compiled with -DGPU_BACKEND=CUDA via CMAKE_ARGS - CUDA runtime libs excluded from auditwheel (user provides CUDA toolkit) - Dependency cache keyed separately from CPU builds The existing CPU-only workflow is renamed to pypi-wheels-cpu.yml. Both workflows trigger on release publication and manual dispatch, allowing parallel CPU and GPU wheel builds. No changes to pyproject.toml or CMakeLists needed — scikit-build-core picks up GPU_BACKEND from the CMAKE_ARGS environment variable set in CI, and CUDA libraries propagate transitively through AMReX::amrex and HYPRE::HYPRE link targets. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Jupyter notebook (notebooks/profiling_and_tuning.ipynb) for profiling OpenImpala solver performance on Google Colab's free tier (T4 GPU). Includes sections for: - Synthetic dataset generation via PoreSpy - Solver profiling sweep (PCG, FlexGMRES, GMRES, BiCGSTAB, SMG, PFMG, MLMG) - AMReX max_grid_size tuning sweep - Dataset size scaling with power-law fit - Direction anisotropy check - Multi-porosity comparison against Bruggeman relation - Summary with HPC scaling recommendations and template inputs file https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

- Section 3: Dual-axis bar chart showing wall time (bars) + iteration count (diamond markers) on a second x-axis - Section 4: Solver x grid size heatmap with annotated cell values and green-bordered "BEST" marker on the optimal configuration - Section 5: Filled area under the scaling curve with smooth power-law interpolation and arrow-style annotations - Section 6: Radar/spider chart for directional anisotropy with filled polygon, isotropic reference circle, and LaTeX axis labels https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7

…ing-notebook-JUBs9 Add CUDA GPU wheel build workflow and profiling tutorial

github-actions · 2026-03-22T22:08:07Z

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      116    43%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,259,263-264,270-271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,457,463-465,471-472,521-524,526,528-530,532,534-536,538,540-542,544,546-548,550,554-556,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      245    48%   69,73,75-80,82-87,89-90,93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,205,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,475,482-483,487-488,490-493,496-497,499-504,507-509,513-516,518-522,524-527,529-537,539,543,545,547,551,553-555,559-564,567,571,573-574,576,578-581,584-591,594,597-601,603-607,609-610,614,617-618,620-622,624-625,627-633,636,638,642-644,646,648-649,735-736,739-740,757-760,771-772,774,786-787,789,791-792,794-801,803-804,818-819,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      387      345    89%   213-215,217-221,303,365-368,477,610-613,615-617,619-622,631-634,641,670,682-685,687-689,691,703,714-715
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       82    97%   94-95
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151        0     0%   27,32,34-40,42-44,47,49,53-56,58,62,64-65,67-72,74-75,79-80,82-91,93-96,100-101,103,106-109,111-114,120-124,127-136,139-141,143,146-147,149,152-156,158-159,161,163,165-173,175,177-186,188-191,195-197,201,203-204,206,209-222,225-231,233-234,236-237,239-241,243,246-247,249
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         35       35   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                779      559    71%   148-149,154-155,276-279,282-284,329-331,334-335,337,347-349,352-354,384-387,567,591,595,616,632-633,635-637,639-648,650,653-657,661-663,666-673,675-679,683-685,687-689,691-700,702-706,708-719,721-724,726,736,742-745,747-749,758-761,763-765,781,784-785,808-813,824-827,829,866,871-874,877-879,883-886,888,890-893,895,900-902,904,953,962,967,970-975,991-994,1008-1012,1017-1022,1032-1036,1041-1046,1051-1055,1058-1061,1068-1071,1082,1091,1093,1097,1099,1121,1152-1153,1239-1241,1367-1370
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  96        0     0%   28,33,35,37-40,42-43,45-48,50,55,58,63-64,66,71-74,76-77,80,83,85-87,90-93,95,100-108,110,112,115,118,121,124-126,129-135,138,146-151,153,155,158,161-162,165-168,170,172-176,178-179,181-182,185-186,189,191-194,198-199,202-203
src/props/TortuosityMLMG.H                     1        0     0%   67
src/props/TortuositySolverBase.cpp           301        0     0%   40,46-52,54-55,57,61,64,67,69-72,74-75,77,80,85-88,90-91,94-101,104,106-107,110,113,115,122-124,126-127,130,134,136-138,140,142-145,148-152,154,157,159,166-169,171,173,175-182,184-186,188-190,193-195,197,200,203,205,208,210-211,213-217,219-221,223-224,226-232,234-239,241,243-252,255,259-264,266,268-277,280,283-286,288-289,291,293-296,298,302-303,305-306,309,312-318,320,322-325,327,332-334,336-340,342-343,345-355,357-358,360-361,364-365,367-370,372-376,379-380,382-386,388,391,394-396,398,406-409,411-417,420-422,425-429,432,434-436,438-440,443,446-454,458-460,463-465,467,469-471,473-476,478,482,486-490,492,495-502,504-506,508,512-513,518-519
src/props/TortuositySolverBase.H               1        0     0%   69
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5382     3244    60%
------------------------------------------------------------------------------

Generated by CI — coverage data from gcovr

codecov · 2026-03-22T22:08:17Z

Codecov Report

❌ Patch coverage is 36.74419% with 1632 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/props/Diffusion.cpp	12.50%	240 Missing and 117 partials ⚠️
src/props/TortuositySolverBase.cpp	0.00%	289 Missing ⚠️
src/props/HypreStructSolver.cpp	27.51%	133 Missing and 112 partials ⚠️
src/props/REVStudy.cpp	0.00%	142 Missing ⚠️
src/props/TortuosityHypre.cpp	48.70%	40 Missing and 59 partials ⚠️
src/props/TortuosityMLMG.cpp	0.00%	92 Missing ⚠️
src/props/BoundaryCondition.H	51.14%	57 Missing and 7 partials ⚠️
src/props/TortuosityKernels.H	45.36%	44 Missing and 9 partials ⚠️
src/io/ImageLoader.cpp	18.33%	20 Missing and 29 partials ⚠️
src/props/EffectiveDiffusivityHypre.cpp	54.65%	1 Missing and 38 partials ⚠️
... and 15 more

📢 Thoughts on this report? Let us know!

jameslehoux and others added 30 commits March 15, 2026 08:31

Merge remote-tracking branch 'origin/claude/develop-additional-tutori…

a2cfc81

…als-BFzd5' into claude/review-branch-changes-8fGVe

Merge pull request #175 from BASE-Laboratory/claude/merge-branches-jnhJZ

a924404

Consolidate feature branches: GPU acceleration, MLMG solver, microstructural parameterization, and bug fixes

Fix clang-format violations across 23 source files

851a6d1

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #176 from BASE-Laboratory/claude/merge-branches-jnhJZ

d09dd40

Fix clang-format violations across 23 source files

Add Fortran to project languages to fix AMReX CMake error

45b7c0c

AMReX was built with AMReX_FORTRAN=ON, so the project must enable Fortran before find_package(AMReX). Added Fortran to the LANGUAGES list in the project() declaration. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #177 from BASE-Laboratory/claude/merge-branches-jnhJZ

43d8373

Add Fortran to project languages to fix AMReX CMake error

Fix FloodFill build error: DeviceScalar has no setVal member

56ff55c

AMReX 25.03 DeviceScalar lacks setVal(). Use ifdef guards to handle GPU and CPU paths separately — plain int on CPU, reconstructed DeviceScalar on GPU. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #178 from BASE-Laboratory/claude/merge-branches-jnhJZ

4143915

Fix FloodFill build error: DeviceScalar has no setVal member

jameslehoux and others added 20 commits March 22, 2026 11:51

Fix const-correctness error in SpecificSurfaceArea

183d63d

Box::grow() mutates in place and is non-const. Copy the domain box before calling grow() to avoid discarding const qualifiers. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #179 from BASE-Laboratory/claude/merge-branches-jnhJZ

a5db52a

Fix const-correctness error in SpecificSurfaceArea

Fix ReduceLongLongSum not found in AMReX ParallelDescriptor

9391a18

AMReX does not provide ReduceLongLongSum. Use ReduceLongSum with long types instead. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #180 from BASE-Laboratory/claude/merge-branches-jnhJZ

cc375a7

Fix ReduceLongLongSum not found in AMReX ParallelDescriptor

Add missing AMReX_SPACE.H include in ResultsJSON.H

de318b7

ResultsJSON.H uses AMREX_SPACEDIM and AMREX_D_DECL but did not include AMReX_SPACE.H, causing build failures when included without other AMReX headers pulling it in transitively. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #181 from BASE-Laboratory/claude/merge-branches-jnhJZ

3cc1d9d

Add missing AMReX_SPACE.H include in ResultsJSON.H

Merge pull request #182 from BASE-Laboratory/claude/merge-branches-jnhJZ

a53ea0a

Guard AMReX includes in ResultsJSON.H for unit test mode

Merge pull request #183 from BASE-Laboratory/claude/merge-branches-jnhJZ

16dab95

Fix EffDiff solver NaN divergence and SSA ghost cell assertion

Merge pull request #184 from BASE-Laboratory/claude/merge-branches-jnhJZ

9084ce8

Fix SSA face counting at box boundaries and EffDiff PFMG divergence

Fix implicit widening conversion in ConnectedComponents.cpp

7faedf5

Cast mpi_size to size_t before multiplication to satisfy bugprone-implicit-widening-of-multiplication-result clang-tidy check. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC

Merge pull request #185 from BASE-Laboratory/claude/merge-branches-jnhJZ

55fa6a4

Fix EffDiff matrix fill: restore Neumann BC diagonal and RHS signs from Fortran

Merge pull request #187 from BASE-Laboratory/claude/add-profiling-tun…

743de5c

…ing-notebook-JUBs9 Add CUDA GPU wheel build workflow and profiling tutorial

jameslehoux merged commit 395cbbb into master Mar 22, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration#188

Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration#188
jameslehoux merged 50 commits intomasterfrom
working

jameslehoux commented Mar 22, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

codecov bot commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jameslehoux commented Mar 22, 2026

Summary

Architecture & Refactoring

New Solvers

Fortran → C++ Migration

GPU Acceleration

New Features

Bug Fixes & Stability

Test Coverage

Test plan

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2026

Code Coverage Report

Uh oh!

codecov bot commented Mar 22, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant