Skip to content

Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration#188

Merged
jameslehoux merged 50 commits intomasterfrom
working
Mar 22, 2026
Merged

Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration#188
jameslehoux merged 50 commits intomasterfrom
working

Conversation

@jameslehoux
Copy link
Copy Markdown

Summary

This is a comprehensive modernization of OpenImpala's solver infrastructure, spanning architecture, performance, and extensibility improvements across 80 files.

Architecture & Refactoring

New Solvers

Fortran → C++ Migration

GPU Acceleration

New Features

Bug Fixes & Stability

  • Fix EffDiff matrix fill: restore Neumann BC diagonal and RHS signs from Fortran
  • Fix EffDiff solver NaN divergence and PFMG convergence
  • Fix SSA face counting at box boundaries and ghost cell assertions
  • Fix FloodFill DeviceScalar build error
  • Fix implicit widening conversion in ConnectedComponents.cpp
  • Fix const-correctness, missing includes, and AMReX compatibility across solvers
  • Multiple rounds of code review fixes (28+ bugs addressed)

Test Coverage

Test plan

  • CI build passes (CMake + CTest in dependency container)
  • All existing regression benchmarks pass (uniform block, series layers, parallel layers)
  • New PCG solver tests pass
  • EffDiff and tortuosity tests produce correct analytical results
  • clang-format and clang-tidy checks pass
  • GPU build compiles with CUDA-enabled AMReX

jameslehoux and others added 30 commits March 15, 2026 08:31
The solve() method previously only implemented FlexGMRES — all other
SolverType enum values (including PCG) hit amrex::Abort(). This adds
the PCG path with SMG preconditioner, matching the FlexGMRES pattern.

PCG is the optimal Krylov method for symmetric positive-definite systems.
The single-phase diffusion Laplacian with harmonic-mean face coefficients
is SPD, so PCG converges in fewer iterations than FlexGMRES. This is now
the default solver when using the Python API with solver="auto".

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Three categories of new tests targeting the largest coverage gaps:

1. PCG Solver Regression Benchmarks
   - tRegressionBenchmark_uniform_PCG: validates PCG on uniform 32^3
     domain against tau = (N-1)/N analytical solution
   - tRegressionBenchmark_series_PCG: validates PCG on heterogeneous
     series-layer geometry against harmonic-mean analytical bound
   These exercise the new PCG code path in TortuosityHypre::solve().

2. Diffusion.cpp Integration Tests (0% -> estimated ~60%)
   - tDiffusion_dryrun: exercises dry-run mode (file I/O, volume
     fraction, percolation checks in all 3 directions, no solver)
   - tDiffusion_tiff: full homogenization pipeline on TIFF data
   - tDiffusion_hdf5: exercises HDF5 reader branch in Diffusion.cpp
   These cover the main application entry point (642 uncovered lines).

3. Synthetic EffectiveDiffusivityHypre Benchmark
   - tSyntheticEffectiveDiffusivity: creates uniform 16^3 domain,
     solves cell problem in 3 directions, validates chi_k ≈ 0
     (analytically correct for uniform medium with periodic BCs),
     checks active mask completeness and solver convergence.

Also corrects issue deprecation note: Tortuosity_filcc.F90 is NOT
dead code — tortuosity_remspot is still called by TortuosityHypre.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Break the 1080-line Diffusion.cpp god-function into focused,
reusable components:

1. ImageLoader (src/io/ImageLoader.H/cpp)
   - Unified image loading: detects format from extension
     (.tif/.tiff, .h5/.hdf5, .dat), instantiates the correct reader,
     thresholds, and returns an ImageData struct with geometry,
     BoxArray, DistributionMapping, and phase field.
   - Eliminates the duplicated read-threshold-copy boilerplate that
     was repeated 3 times in Diffusion.cpp (once per file format).

2. DeffTensor (src/props/DeffTensor.H/cpp)
   - Extracts the D_eff tensor assembly from corrector fields chi_k.
   - Pure function: calculateDeffTensor(tensor, chi_x, chi_y, chi_z,
     mask, geom, verbose).
   - Was previously an anonymous-namespace function duplicated in both
     Diffusion.cpp and tEffectiveDiffusivity.cpp.

3. REVStudy (src/props/REVStudy.H/cpp)
   - Encapsulates the entire REV convergence study workflow:
     sub-volume extraction, cell-problem solves, tensor assembly,
     and CSV output.
   - Configuration via REVConfig struct.
   - Was ~250 lines of deeply nested code in Diffusion.cpp.

4. Diffusion.cpp rewritten as thin orchestrator (1080 → 657 lines)
   - main() now: parse config → loadImage() → dispatch to
     runDryRunChecks / runREVStudy / runHomogenization / runFlowThrough.
   - Each calculation mode is a focused static function.
   - No physics logic changes — identical output for all modes.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…51)

Extract stringToSolverType() and stringToDirection() helpers duplicated
across 5 files into a single shared header (SolverConfig.H). Removes
~144 lines of duplicated code.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Replace all four Fortran kernel files with pure C++ implementations:

- TortuosityHypreFill.F90 → TortuosityHypreFill.H (tortuosityFillMatrix)
- EffDiffFillMtx.F90 → EffDiffFillMtx.H (effDiffFillMatrix)
- Tortuosity_filcc.F90 → TortuosityKernels.H (removeIsolatedCells)
- Tortuosity_poisson_3d.F90 → TortuosityKernels.H (computeFaceFluxes,
  forwardEulerUpdate, integrateBoundaryFluxes)

The new kernels use amrex::ParallelFor with C++ lambdas and Array4
accessors for the flux/update/fio routines, preparing the codebase for
GPU offloading. The HYPRE matrix-fill routines remain sequential (they
fill flat buffers for HYPRE) but are now native C++.

Legacy Fortran-only routines (tortuosity_filct, tortuosity_filbc,
tortuosity_filic) that were never called from C++ have been dropped.

Build system updated: Fortran removed from CMake LANGUAGES, MPI and
OpenMP Fortran components removed, GNUmakefile and CI workflows updated.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
- TortuosityHypreFill.H: add nval consistency check (AMREX_ALWAYS_ASSERT),
  remove unused len_x/len_y variables that would trigger -Wunused-variable
- EffDiffFillMtx.H: include TortuosityHypreFill.H for shared constants
  (STN_C, CELL_ACTIVE, SMALL_REAL etc.) that were undefined — compilation
  would have failed. Add AMREX_SPACEDIM==3 guards on Z-face stencil entries
  matching the original Fortran. Add post-loop m_idx==npts assertion.
- TortuosityHypre.cpp: remove unused phase_iab/p_ptr/pbox variables left
  over from the old Fortran call signature — would fail with -Werror.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Move the identical SolverType enum from TortuosityHypre and
EffectiveDiffusivityHypre into Tortuosity.H as OpenImpala::SolverType.
Both solver classes now use a `using SolverType = OpenImpala::SolverType`
alias for backward compatibility. The duplicate tortuositySolverType()
and effDiffSolverType() functions in SolverConfig.H are consolidated
into a single parseSolverType(), with the old names retained as
deprecated thin wrappers. Python bindings bind the enum once and alias
EffDiffSolverType to SolverType.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…15)

Deduplicate ~165 lines of HYPRE lifecycle code (grid, stencil, matrix/vector
creation, solver dispatch, cleanup) shared between TortuosityHypre and
EffectiveDiffusivityHypre into a common HypreStructSolver base class.

- HypreStructSolver owns HYPRE handles and provides setupGrid(), setupStencil(),
  createMatrixAndVectors(), assembleSystem(), and runSolver() methods
- runSolver() supports FlexGMRES and PCG with configurable SMG/PFMG preconditioner
  via PrecondType enum, giving EffectiveDiffusivityHypre PCG support it lacked
- TortuosityHypre inherits from both Tortuosity and HypreStructSolver
- EffectiveDiffusivityHypre inherits from HypreStructSolver
- Safe MPI_Finalized guard in base destructor for Python binding shutdown
- CMakeLists.txt updated to include HypreStructSolver.cpp

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…Cs (#16)

Introduces a BoundaryCondition strategy pattern that decouples BC logic
from the tortuosity fill kernel. Users can now configure inlet/outlet
and lateral BCs at runtime via input file parameters (bc.inlet_outlet,
bc.sides, bc.value_lo, bc.value_hi).

New files:
- BoundaryCondition.H: BCType enum, parseBCType(), abstract base class,
  and four concrete implementations (DirichletExternal, DirichletPhase-
  Boundary, Neumann, Periodic) with factory function createBC().

Key changes:
- TortuosityHypreFill.H: Interior stencil separated from BC application;
  new BC-aware overload takes BoundaryCondition pointers.
- TortuosityHypre: Parses bc.* ParmParse params, creates BC objects,
  passes them through setupMatrixEquation. Adds failed plotfile output
  when solver does not converge.
- Python bindings: BCType enum exposed; solver BC type getters added.
- Tests: Two new CTest cases (explicit DirichletExternal, Periodic sides).

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Implements six new capabilities for extracting DFN battery model parameters
directly from 3D voxel images:

Phase 1 - Foundation:
- SpecificSurfaceArea: counts interface faces between phases (SSA = faces/volume)
- MacroGeometry: extracts electrode thickness, cross-section from AMReX Geometry
- Multi-phase volume fractions in ResultsJSON output

Phase 2 - Profiling:
- ThroughThicknessProfile: per-slice VF profile along any direction

Phase 3 - Particle Size Distribution:
- ConnectedComponents: parallel flood-fill CCL reusing PercolationCheck pattern
- ParticleSizeDistribution: equivalent sphere radii from component volumes

Integration:
- ResultsJSON extended with microstructure block and BPX DFN parameters
- Diffusion.cpp wired with microstructure.* ParmParse namespace
- Synthetic test (tSyntheticMicrostructure) with 7 analytical validation cases
- Catch2 unit tests for MacroGeometry and ResultsJSON microstructure setters

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…bias

The raw voxel face-counting SSA overestimates true surface area by ~50%
for isotropic surfaces due to the staircase/laddering effect on curved
interfaces. This commit adds:

- value_corrected() method applying the 2/3 Cauchy-Crofton factor
- boundary_padding constructor parameter to exclude outermost N voxel
  layers, eliminating artificial interfaces from truncated features
- Both raw and corrected SSA emitted in ResultsJSON
- Diffusion.cpp reads microstructure.ssa_boundary_padding from inputs
- Tests: corrected vs raw ratio, boundary padding reduction, diagonal
  plane demonstrating laddering bias

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Add full GPU support infrastructure for accelerating the computationally
intensive linear solve phase via CUDA or HIP backends:

Build system:
- CMakeLists.txt: GPU_BACKEND option (NONE/CUDA/HIP), auto-enables
  CUDA/HIP language and defines OPENIMPALA_USE_GPU/OPENIMPALA_USE_CUDA
- FindHYPRE.cmake: discovers GPU library variants (libHYPRE_cuda/hip)
- Container def: conditional GPU flags for AMReX and HYPRE builds

HYPRE device memory:
- HypreStructSolver.cpp: HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE)
  and HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE) when GPU enabled

GPU matrix fill kernels:
- TortuosityHypreFill.H: tortuosityFillMatrixGpu() using ParallelFor
  with AMREX_GPU_DEVICE lambdas and Array4<> accessors
- EffDiffFillMtx.H: effDiffFillMatrixGpu() same pattern

GPU data path in solvers:
- TortuosityHypre.cpp: #ifdef GPU path with DeviceVector buffers,
  GPU kernel dispatch, device-to-host copy for BC application
- EffectiveDiffusivityHypre.cpp: same pattern

CPU path is preserved unchanged behind #else for non-GPU builds.

https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…als-BFzd5' into claude/review-branch-changes-8fGVe
…MP, GPU, BCs

- HypreStructSolver: restore GMRES, BiCGSTAB, SMG, PFMG, Jacobi solver
  support that was dropped during base class extraction (regression)
- REVStudy: use rank-independent RNG seed so all MPI ranks agree on
  sub-volume coordinates (was producing corrupt data in multi-rank runs)
- DeffTensor: replace non-portable OMP reduction on 2D C array with
  thread-local accumulator + critical section (GCC < 9 compatibility)
- HypreStructSolver: remove HYPRE_MEMORY_DEVICE for matrix/vectors since
  SetBoxValues is called with host pointers after device-to-host copy
- BoundaryCondition: fix DirichletPhaseBoundaryBC quarter-domain heuristic
  (was pinning 25% of interior cells); narrow to 1-cell boundary layer.
  Also handle ambiguous dual-flag case for very small domains
- BoundaryCondition: remove redundant per-cell initial guess computation
  in both Dirichlet BC classes (interior kernel already sets this)
- TortuosityHypre: replace iMultiFab::sum() with manual valid-cell-only
  loop to avoid potential ghost cell inclusion in volume fraction
- ImageLoader: add RawReader include and informative error for .raw files
  (requires dimensions/type, so directs users to RawReader directly)

https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
Initial header defining the TortuosityMLMG class interface. This solver
uses AMReX's native MLABecLaplacian operator and MLMG geometric multigrid
instead of HYPRE, eliminating matrix assembly overhead for small grids
(e.g., Google Colab tutorials). Implementation pending architectural
decision on shared base class extraction.

https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
Extracts solver-independent logic (phase management, diffusion coefficient
construction, activity mask generation, flux integration) into a common
base class. Subclasses only need to implement solve().

https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
…hase 2)

Adds TortuosityMLMG.cpp implementing the full solver pipeline using
AMReX's native MLABecLaplacian + MLMG geometric multigrid instead of
HYPRE. No explicit matrix assembly - uses matrix-free operator
application for lower memory footprint and faster setup on small grids.

- Constructor: phase preconditioning, diffusion coefficient field,
  activity mask via parallel flood fill (same algorithm as HYPRE solver)
- solve(): configures MLABecLaplacian with Dirichlet/Neumann BCs,
  harmonic-mean face B-coefficients, and runs MLMG V-cycle solver
- globalFluxes() + computePlaneFluxes(): identical flux integration
  and conservation checking as TortuosityHypre
- value(): tortuosity calculation with full flux conservation validation

Also adds:
- CMake build integration (TortuosityMLMG.cpp in openimpala_props)
- pybind11 bindings (TortuosityMLMG class with same API as HYPRE)
- Python facade: solver="mlmg" routes to the new solver backend

https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
- Replace LoopOnCpu with ParallelFor+AMREX_GPU_DEVICE for GPU acceleration
  (diffusion coefficients, initial guess, B-coefficients, mask intersection)
- Use ReduceOps for GPU-safe reductions (active cell count, boundary fluxes)
- Fix convergence: wrap MLMG solve in try/catch instead of arbitrary *100 check
- Fix iteration count: use mlmg.getNumIters() instead of hardcoded maxiter
- Fix plotfile: convert iMultiFab->MultiFab via ParallelFor (not amrex::Copy)
- Clean up B-coefficient loop: remove redundant assignments and confused comments
- Add Gpu::streamSynchronize before CPU-side plane flux accumulation

https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu
The identical flood fill algorithm was duplicated across TortuosityMLMG,
TortuosityHypre, PercolationCheck, and ConnectedComponents (~160 lines
each). Extract into FloodFill.H/cpp with:
- ParallelFor + AMREX_GPU_DEVICE for GPU acceleration
- Gpu::DeviceScalar atomic flag for early termination
- Parameterized label value (FLOOD_ACTIVE=1 default, custom for CCL)
- Shared collectBoundarySeeds with MPI Allgatherv + dedup

Net: -652 lines duplicated code, +210 lines shared utility.

https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu
Convert LoopOnCpu patterns to GPU-compatible equivalents throughout the
codebase:

- DeffTensor: 9-component ReduceOps for D_eff tensor gradient accumulation
- TortuosityHypre: ReduceOps for boundary flux integration, ParallelFor
  with atomic scatter-add for plane flux profiles, ReduceOps for active
  cell counting, device lookup table for diff coeff map fill, ParallelFor
  for binary traversable mask
- TortuosityMLMG: ParallelFor with atomic scatter-add for plane flux
  profiles
- EffectiveDiffusivityHypre: ReduceOps for ManualSumActiveCells, device
  lookup table for diff coeff map fill, ParallelFor for plotfile mask copy
- Diffusion/REVStudy: ParallelFor for active mask generation
- VolumeFraction: ReduceOps for phase cell counting
- SpecificSurfaceArea: ReduceOps for interface face counting
- ConnectedComponents: ParallelFor with atomic scatter-add for component
  volume counting
- ThroughThicknessProfile: ParallelFor with atomic scatter-add for
  per-slice phase counting

Remaining LoopOnCpu instances are justified: HYPRE host buffer copy-back,
sequential seed search, boundary seed collection, and I/O from host memory.

https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu
Key fixes:
- HYPRE_Init/Finalize ordering (must be inside amrex::Initialize/Finalize)
- Remove OMP parallel around HYPRE Set/GetBoxValues (data race)
- Replace MPI_COMM_WORLD with amrex::ParallelDescriptor::Communicator()
- Fix DeffTensor sign convention (1+dchi, not 1-dchi)
- Fix REVStudy ghost cell fill and MPI-incorrect nested MFIter copy
- Fix EffDiffFillMtx RHS at pore-solid interfaces (mask-guarded reads)
- Fix dangling const& members in solver/utility classes (store by value)
- Add GPU stream synchronization before host-side loops
- Add integer overflow assertions for HYPRE int casts
- Complete TortuosityMLMG refactor to inherit from TortuositySolverBase
- Extract shared HypreCheck.H macro, add TortuositySolverBase.cpp
- Add input validation, error handling, and HYPRE_ClearAllErrors

https://claude.ai/code/session_01QKvcKHR7GgCG2UNapN1GP7
Critical fixes:
- DeffTensor.cpp: Fix sign error in tensor assembly (minus -> plus) to match
  the documented formula D_eff[i][j] = (1/V) * sum{ delta_ij + d(chi_j)/d(x_i) }
- EffDiffFillMtx.H: Remove spurious diagonal inflation at internal Neumann BCs
  (inactive neighbor faces should not contribute to diagonal), fix flux_bc_rhs
  sign to match correct Neumann condition D*(grad(chi)+e_k).n=0, and use
  one-sided differences for RHS at active/inactive boundaries instead of
  central differences that read D=0 from inactive cells
- TortuosityMLMG.cpp: Fix bdryLo() returning box outside domain (inlet flux
  was always 0, producing NaN/inf tortuosity). Use domain.smallEnd() slice instead

High-severity fixes:
- TortuosityMLMG.cpp: Fix convergence threshold from 100*eps to eps
- TortuosityMLMG.cpp: Use mlmg.getNumIters() instead of hardcoded maxiter
- tSyntheticEffectiveDiffusivity.cpp: Add missing solver->solve() call before
  checking getSolverConverged() (m_converged defaults to false)

Lower-severity fixes:
- TortuosityHypre.cpp: Restore m_write_plotfile guard on failed-solve plotfile
- SpecificSurfaceArea: Clamp face-counting box to valid region to prevent
  out-of-bounds ghost cell access at internal FAB boundaries

https://claude.ai/code/session_01NEQqrvJNCrPn4mpyjcMsHv
Merges 9 bug fixes from the CeIgl review branch, with conflict
resolution against the PMNDO 28-bug-fix commit already merged.

Key unique fixes incorporated from CeIgl:
- EffDiffFillMtx.H: Remove spurious diagonal inflation at internal
  Neumann boundaries (was inflating matrix diagonal incorrectly)
- EffDiffFillMtx.H: Fix flux_bc_rhs signs for Neumann BC correction
- EffDiffFillMtx.H: Use one-sided differences at pore-solid interfaces
  instead of mirroring (more accurate for non-uniform D fields)
- tSyntheticEffectiveDiffusivity.cpp: Add missing solve() call before
  checking convergence
- TortuositySolverBase.cpp: Fix bdryLo() returning face-centered box
  outside cell domain (was causing zero inlet flux / NaN tortuosity)

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Consolidate feature branches: GPU acceleration, MLMG solver, microstructural parameterization, and bug fixes
Fix clang-format violations across 23 source files
AMReX was built with AMReX_FORTRAN=ON, so the project must enable
Fortran before find_package(AMReX). Added Fortran to the LANGUAGES
list in the project() declaration.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Add Fortran to project languages to fix AMReX CMake error
AMReX 25.03 DeviceScalar lacks setVal(). Use ifdef guards to handle
GPU and CPU paths separately — plain int on CPU, reconstructed
DeviceScalar on GPU.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix FloodFill build error: DeviceScalar has no setVal member
jameslehoux and others added 20 commits March 22, 2026 11:51
Box::grow() mutates in place and is non-const. Copy the domain box
before calling grow() to avoid discarding const qualifiers.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix const-correctness error in SpecificSurfaceArea
AMReX does not provide ReduceLongLongSum. Use ReduceLongSum with
long types instead.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix ReduceLongLongSum not found in AMReX ParallelDescriptor
ResultsJSON.H uses AMREX_SPACEDIM and AMREX_D_DECL but did not
include AMReX_SPACE.H, causing build failures when included without
other AMReX headers pulling it in transitively.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Add missing AMReX_SPACE.H include in ResultsJSON.H
The unit test compiles without AMReX (OPENIMPALA_UNIT_TEST defined).
Guard the AMReX_SPACE.H include and provide fallback definitions for
AMREX_SPACEDIM and AMREX_D_DECL, matching the pattern in
PhysicsConfig.H.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Guard AMReX includes in ResultsJSON.H for unit test mode
The unit test compiles without AMReX (OPENIMPALA_UNIT_TEST defined).
Guard the AMReX_SPACE.H include and provide fallback definitions for
AMREX_SPACEDIM and AMREX_D_DECL, matching the pattern in
PhysicsConfig.H.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Two test failures fixed:

1. tEffectiveDiffusivity: The periodic cell problem div(D grad(chi)) =
   -div(D e_k) produces a singular linear system (constant null space).
   Without pinning, HYPRE FlexGMRES diverges with NaN residuals. Fix:
   pin the first active cell to chi=0 after matrix fill, removing the
   null space without affecting D_eff. Uses MPI_MINLOC to coordinate
   the pinned cell across ranks.

2. tSyntheticMicrostructure: SpecificSurfaceArea requires iMultiFab
   with >= 1 ghost cell for neighbor access across MFIter box boundaries.
   Tests 1, 2, 2b, 2c, 2d were allocating with 0 ghost cells. Fix:
   allocate with 1 ghost cell, initialize with setVal(0), and call
   FillBoundary() after filling valid data.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix EffDiff solver NaN divergence and SSA ghost cell assertion
SSA: Remove overly aggressive vbx.bigEnd(d)-1 clamping that prevented
face counting at multi-box boundaries. The interface at the box boundary
was missed because the check box was shrunk below the valid cell range.
Ghost cells are already guaranteed filled (nGrow>=1 asserted).

EffDiff: Switch preconditioner from PFMG to SMG for the periodic cell
problem. PFMG diverges (NaN at iteration 1) on the periodic structured
grid with a pinned null-space cell. SMG handles periodic grids more
robustly, matching the approach used by TortuosityHypre.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix SSA face counting at box boundaries and EffDiff PFMG divergence
Cast mpi_size to size_t before multiplication to satisfy
bugprone-implicit-widening-of-multiplication-result clang-tidy check.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
…om Fortran

The C++ migration of effDiffFillMatrix (commit 2995552) introduced two bugs
compared to the original Fortran kernel:

1. Missing diagonal contribution for inactive neighbors: the Fortran adds
   D_c/dx^2 to the center stencil entry when a face neighbor is inactive
   (internal Neumann ghost cell). The C++ kernel omitted this, producing
   near-zero or zero diagonals that made the matrix singular/ill-conditioned,
   causing HYPRE SMG/PFMG to produce NaN on iteration 1.

2. Reversed RHS flux BC signs: the Fortran uses +D_c/dx for -face and
   -D_c/dx for +face. The C++ had these swapped.

3. Unnecessary one-sided RHS differences: the Fortran always uses central
   differences for -div(D e_k) regardless of neighbor activity (ghost cells
   provide the needed data). The C++ added conditional one-sided stencils
   that were incorrect.

Also reverts the SMG workaround back to PFMG (the proper preconditioner
for periodic structured grids), since the root cause was the broken matrix.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix EffDiff matrix fill: restore Neumann BC diagonal and RHS signs from Fortran
Add pypi-wheels-gpu.yml that builds GPU-accelerated wheels using NVIDIA's
CUDA 12.6 manylinux container (sameli/manylinux_2_28_x86_64_cuda_12.6).

Key changes:
- HYPRE built with --with-cuda for GPU-accelerated structured solvers
- AMReX built with -DAMReX_GPU_BACKEND=CUDA targeting compute capabilities
  6.0 through 9.0 (Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper)
- OpenImpala compiled with -DGPU_BACKEND=CUDA via CMAKE_ARGS
- CUDA runtime libs excluded from auditwheel (user provides CUDA toolkit)
- Dependency cache keyed separately from CPU builds

The existing CPU-only workflow is renamed to pypi-wheels-cpu.yml. Both
workflows trigger on release publication and manual dispatch, allowing
parallel CPU and GPU wheel builds.

No changes to pyproject.toml or CMakeLists needed — scikit-build-core
picks up GPU_BACKEND from the CMAKE_ARGS environment variable set in CI,
and CUDA libraries propagate transitively through AMReX::amrex and
HYPRE::HYPRE link targets.

https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Jupyter notebook (notebooks/profiling_and_tuning.ipynb) for profiling
OpenImpala solver performance on Google Colab's free tier (T4 GPU).

Includes sections for:
- Synthetic dataset generation via PoreSpy
- Solver profiling sweep (PCG, FlexGMRES, GMRES, BiCGSTAB, SMG, PFMG, MLMG)
- AMReX max_grid_size tuning sweep
- Dataset size scaling with power-law fit
- Direction anisotropy check
- Multi-porosity comparison against Bruggeman relation
- Summary with HPC scaling recommendations and template inputs file

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
- Section 3: Dual-axis bar chart showing wall time (bars) + iteration
  count (diamond markers) on a second x-axis
- Section 4: Solver x grid size heatmap with annotated cell values and
  green-bordered "BEST" marker on the optimal configuration
- Section 5: Filled area under the scaling curve with smooth power-law
  interpolation and arrow-style annotations
- Section 6: Radar/spider chart for directional anisotropy with filled
  polygon, isotropic reference circle, and LaTeX axis labels

https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9

Add CUDA GPU wheel build workflow and profiling tutorial
@jameslehoux jameslehoux merged commit 395cbbb into master Mar 22, 2026
9 checks passed
@github-actions
Copy link
Copy Markdown

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         135      105    77%   26-27,30,35,92-93,99-100,107-109,135-137,141,144-148,152-155,162,164,208-209,242,245
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         266      116    43%   49-50,89-90,111-112,115-117,120-121,140-142,155-157,166-168,174-177,185-186,192-196,200-204,209-212,219-224,231-237,259,263-264,270-271,273-274,276,283-284,301,312,314,318,325,327,331-334,338,346-347,353-355,361-363,365-366,369,372,374,377-380,382-384,386,388-389,391,393-394,396,398-399,401,403-404,406,410-411,413,417-418,420,425,457,463-465,471-472,521-524,526,528-530,532,534-536,538,540-542,544,546-548,550,554-556,558,562-564,566,588
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        384      130    33%   59-65,67-69,71-73,75-77,79-80,82-84,86-88,90-92,94-96,98-99,101-103,106-108,111-112,114-117,119,122,124-127,143-144,148-150,152-158,160,186,210,217,226,228-231,240,242-245,248,255,288-293,306,309-317,319-320,323-327,331-335,338-342,344-348,351-357,359-363,367,369,375-377,379-393,396,398-402,404-409,413-418,420-425,428-429,432-434,555-575,577-578,581-588,590,593-609,612-614,670,673-674,677-683,685,689-700,702-703
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             69       67    97%   94-95
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      245    48%   69,73,75-80,82-87,89-90,93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,205,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,475,482-483,487-488,490-493,496-497,499-504,507-509,513-516,518-522,524-527,529-537,539,543,545,547,551,553-555,559-564,567,571,573-574,576,578-581,584-591,594,597-601,603-607,609-610,614,617-618,620-622,624-625,627-633,636,638,642-644,646,648-649,735-736,739-740,757-760,771-772,774,786-787,789,791-792,794-801,803-804,818-819,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      387      345    89%   213-215,217-221,303,365-368,477,610-613,615-617,619-622,631-634,641,670,682-685,687-689,691,703,714-715
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       84       82    97%   94-95
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,299,309,311,314,346,356,358,361,367-370,372-376,378-379,381-385,388-389,391-392,394,397-398,401-402,404-407,409-413,415-416,418-422,425-426,428-429,431,434-435,438-439,441-443,445-451,453-457,460-461,463-464,466,469-470,473,475-477,479-485,487-491,494-495,497-498,500,503-504,507,509-511,513-516,518-522,525-526,528-529,531,534-535,538,541-542,555
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151        0     0%   27,32,34-40,42-44,47,49,53-56,58,62,64-65,67-72,74-75,79-80,82-91,93-96,100-101,103,106-109,111-114,120-124,127-136,139-141,143,146-147,149,152-156,158-159,161,163,165-173,175,177-186,188-191,195-197,201,203-204,206,209-222,225-231,233-234,236-237,239-241,243,246-247,249
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         35       35   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                779      559    71%   148-149,154-155,276-279,282-284,329-331,334-335,337,347-349,352-354,384-387,567,591,595,616,632-633,635-637,639-648,650,653-657,661-663,666-673,675-679,683-685,687-689,691-700,702-706,708-719,721-724,726,736,742-745,747-749,758-761,763-765,781,784-785,808-813,824-827,829,866,871-874,877-879,883-886,888,890-893,895,900-902,904,953,962,967,970-975,991-994,1008-1012,1017-1022,1032-1036,1041-1046,1051-1055,1058-1061,1068-1071,1082,1091,1093,1097,1099,1121,1152-1153,1239-1241,1367-1370
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                  96        0     0%   28,33,35,37-40,42-43,45-48,50,55,58,63-64,66,71-74,76-77,80,83,85-87,90-93,95,100-108,110,112,115,118,121,124-126,129-135,138,146-151,153,155,158,161-162,165-168,170,172-176,178-179,181-182,185-186,189,191-194,198-199,202-203
src/props/TortuosityMLMG.H                     1        0     0%   67
src/props/TortuositySolverBase.cpp           301        0     0%   40,46-52,54-55,57,61,64,67,69-72,74-75,77,80,85-88,90-91,94-101,104,106-107,110,113,115,122-124,126-127,130,134,136-138,140,142-145,148-152,154,157,159,166-169,171,173,175-182,184-186,188-190,193-195,197,200,203,205,208,210-211,213-217,219-221,223-224,226-232,234-239,241,243-252,255,259-264,266,268-277,280,283-286,288-289,291,293-296,298,302-303,305-306,309,312-318,320,322-325,327,332-334,336-340,342-343,345-355,357-358,360-361,364-365,367-370,372-376,379-380,382-386,388,391,394-396,398,406-409,411-417,420-422,425-429,432,434-436,438-440,443,446-454,458-460,463-465,467,469-471,473-476,478,482,486-490,492,495-502,504-506,508,512-513,518-519
src/props/TortuositySolverBase.H               1        0     0%   69
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5382     3244    60%
------------------------------------------------------------------------------


Generated by CI — coverage data from gcovr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant