Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration#188
Merged
jameslehoux merged 50 commits intomasterfrom Mar 22, 2026
Merged
Major architecture overhaul: modular solvers, Fortran→C++ migration, GPU acceleration#188jameslehoux merged 50 commits intomasterfrom
jameslehoux merged 50 commits intomasterfrom
Conversation
The solve() method previously only implemented FlexGMRES — all other SolverType enum values (including PCG) hit amrex::Abort(). This adds the PCG path with SMG preconditioner, matching the FlexGMRES pattern. PCG is the optimal Krylov method for symmetric positive-definite systems. The single-phase diffusion Laplacian with harmonic-mean face coefficients is SPD, so PCG converges in fewer iterations than FlexGMRES. This is now the default solver when using the Python API with solver="auto". https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Three categories of new tests targeting the largest coverage gaps:
1. PCG Solver Regression Benchmarks
- tRegressionBenchmark_uniform_PCG: validates PCG on uniform 32^3
domain against tau = (N-1)/N analytical solution
- tRegressionBenchmark_series_PCG: validates PCG on heterogeneous
series-layer geometry against harmonic-mean analytical bound
These exercise the new PCG code path in TortuosityHypre::solve().
2. Diffusion.cpp Integration Tests (0% -> estimated ~60%)
- tDiffusion_dryrun: exercises dry-run mode (file I/O, volume
fraction, percolation checks in all 3 directions, no solver)
- tDiffusion_tiff: full homogenization pipeline on TIFF data
- tDiffusion_hdf5: exercises HDF5 reader branch in Diffusion.cpp
These cover the main application entry point (642 uncovered lines).
3. Synthetic EffectiveDiffusivityHypre Benchmark
- tSyntheticEffectiveDiffusivity: creates uniform 16^3 domain,
solves cell problem in 3 directions, validates chi_k ≈ 0
(analytically correct for uniform medium with periodic BCs),
checks active mask completeness and solver convergence.
Also corrects issue deprecation note: Tortuosity_filcc.F90 is NOT
dead code — tortuosity_remspot is still called by TortuosityHypre.
https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Break the 1080-line Diffusion.cpp god-function into focused,
reusable components:
1. ImageLoader (src/io/ImageLoader.H/cpp)
- Unified image loading: detects format from extension
(.tif/.tiff, .h5/.hdf5, .dat), instantiates the correct reader,
thresholds, and returns an ImageData struct with geometry,
BoxArray, DistributionMapping, and phase field.
- Eliminates the duplicated read-threshold-copy boilerplate that
was repeated 3 times in Diffusion.cpp (once per file format).
2. DeffTensor (src/props/DeffTensor.H/cpp)
- Extracts the D_eff tensor assembly from corrector fields chi_k.
- Pure function: calculateDeffTensor(tensor, chi_x, chi_y, chi_z,
mask, geom, verbose).
- Was previously an anonymous-namespace function duplicated in both
Diffusion.cpp and tEffectiveDiffusivity.cpp.
3. REVStudy (src/props/REVStudy.H/cpp)
- Encapsulates the entire REV convergence study workflow:
sub-volume extraction, cell-problem solves, tensor assembly,
and CSV output.
- Configuration via REVConfig struct.
- Was ~250 lines of deeply nested code in Diffusion.cpp.
4. Diffusion.cpp rewritten as thin orchestrator (1080 → 657 lines)
- main() now: parse config → loadImage() → dispatch to
runDryRunChecks / runREVStudy / runHomogenization / runFlowThrough.
- Each calculation mode is a focused static function.
- No physics logic changes — identical output for all modes.
https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…51) Extract stringToSolverType() and stringToDirection() helpers duplicated across 5 files into a single shared header (SolverConfig.H). Removes ~144 lines of duplicated code. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Replace all four Fortran kernel files with pure C++ implementations: - TortuosityHypreFill.F90 → TortuosityHypreFill.H (tortuosityFillMatrix) - EffDiffFillMtx.F90 → EffDiffFillMtx.H (effDiffFillMatrix) - Tortuosity_filcc.F90 → TortuosityKernels.H (removeIsolatedCells) - Tortuosity_poisson_3d.F90 → TortuosityKernels.H (computeFaceFluxes, forwardEulerUpdate, integrateBoundaryFluxes) The new kernels use amrex::ParallelFor with C++ lambdas and Array4 accessors for the flux/update/fio routines, preparing the codebase for GPU offloading. The HYPRE matrix-fill routines remain sequential (they fill flat buffers for HYPRE) but are now native C++. Legacy Fortran-only routines (tortuosity_filct, tortuosity_filbc, tortuosity_filic) that were never called from C++ have been dropped. Build system updated: Fortran removed from CMake LANGUAGES, MPI and OpenMP Fortran components removed, GNUmakefile and CI workflows updated. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
- TortuosityHypreFill.H: add nval consistency check (AMREX_ALWAYS_ASSERT), remove unused len_x/len_y variables that would trigger -Wunused-variable - EffDiffFillMtx.H: include TortuosityHypreFill.H for shared constants (STN_C, CELL_ACTIVE, SMALL_REAL etc.) that were undefined — compilation would have failed. Add AMREX_SPACEDIM==3 guards on Z-face stencil entries matching the original Fortran. Add post-loop m_idx==npts assertion. - TortuosityHypre.cpp: remove unused phase_iab/p_ptr/pbox variables left over from the old Fortran call signature — would fail with -Werror. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Move the identical SolverType enum from TortuosityHypre and EffectiveDiffusivityHypre into Tortuosity.H as OpenImpala::SolverType. Both solver classes now use a `using SolverType = OpenImpala::SolverType` alias for backward compatibility. The duplicate tortuositySolverType() and effDiffSolverType() functions in SolverConfig.H are consolidated into a single parseSolverType(), with the old names retained as deprecated thin wrappers. Python bindings bind the enum once and alias EffDiffSolverType to SolverType. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…15) Deduplicate ~165 lines of HYPRE lifecycle code (grid, stencil, matrix/vector creation, solver dispatch, cleanup) shared between TortuosityHypre and EffectiveDiffusivityHypre into a common HypreStructSolver base class. - HypreStructSolver owns HYPRE handles and provides setupGrid(), setupStencil(), createMatrixAndVectors(), assembleSystem(), and runSolver() methods - runSolver() supports FlexGMRES and PCG with configurable SMG/PFMG preconditioner via PrecondType enum, giving EffectiveDiffusivityHypre PCG support it lacked - TortuosityHypre inherits from both Tortuosity and HypreStructSolver - EffectiveDiffusivityHypre inherits from HypreStructSolver - Safe MPI_Finalized guard in base destructor for Python binding shutdown - CMakeLists.txt updated to include HypreStructSolver.cpp https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…Cs (#16) Introduces a BoundaryCondition strategy pattern that decouples BC logic from the tortuosity fill kernel. Users can now configure inlet/outlet and lateral BCs at runtime via input file parameters (bc.inlet_outlet, bc.sides, bc.value_lo, bc.value_hi). New files: - BoundaryCondition.H: BCType enum, parseBCType(), abstract base class, and four concrete implementations (DirichletExternal, DirichletPhase- Boundary, Neumann, Periodic) with factory function createBC(). Key changes: - TortuosityHypreFill.H: Interior stencil separated from BC application; new BC-aware overload takes BoundaryCondition pointers. - TortuosityHypre: Parses bc.* ParmParse params, creates BC objects, passes them through setupMatrixEquation. Adds failed plotfile output when solver does not converge. - Python bindings: BCType enum exposed; solver BC type getters added. - Tests: Two new CTest cases (explicit DirichletExternal, Periodic sides). https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Implements six new capabilities for extracting DFN battery model parameters directly from 3D voxel images: Phase 1 - Foundation: - SpecificSurfaceArea: counts interface faces between phases (SSA = faces/volume) - MacroGeometry: extracts electrode thickness, cross-section from AMReX Geometry - Multi-phase volume fractions in ResultsJSON output Phase 2 - Profiling: - ThroughThicknessProfile: per-slice VF profile along any direction Phase 3 - Particle Size Distribution: - ConnectedComponents: parallel flood-fill CCL reusing PercolationCheck pattern - ParticleSizeDistribution: equivalent sphere radii from component volumes Integration: - ResultsJSON extended with microstructure block and BPX DFN parameters - Diffusion.cpp wired with microstructure.* ParmParse namespace - Synthetic test (tSyntheticMicrostructure) with 7 analytical validation cases - Catch2 unit tests for MacroGeometry and ResultsJSON microstructure setters https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…bias The raw voxel face-counting SSA overestimates true surface area by ~50% for isotropic surfaces due to the staircase/laddering effect on curved interfaces. This commit adds: - value_corrected() method applying the 2/3 Cauchy-Crofton factor - boundary_padding constructor parameter to exclude outermost N voxel layers, eliminating artificial interfaces from truncated features - Both raw and corrected SSA emitted in ResultsJSON - Diffusion.cpp reads microstructure.ssa_boundary_padding from inputs - Tests: corrected vs raw ratio, boundary padding reduction, diagonal plane demonstrating laddering bias https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
Add full GPU support infrastructure for accelerating the computationally intensive linear solve phase via CUDA or HIP backends: Build system: - CMakeLists.txt: GPU_BACKEND option (NONE/CUDA/HIP), auto-enables CUDA/HIP language and defines OPENIMPALA_USE_GPU/OPENIMPALA_USE_CUDA - FindHYPRE.cmake: discovers GPU library variants (libHYPRE_cuda/hip) - Container def: conditional GPU flags for AMReX and HYPRE builds HYPRE device memory: - HypreStructSolver.cpp: HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE) and HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE) when GPU enabled GPU matrix fill kernels: - TortuosityHypreFill.H: tortuosityFillMatrixGpu() using ParallelFor with AMREX_GPU_DEVICE lambdas and Array4<> accessors - EffDiffFillMtx.H: effDiffFillMatrixGpu() same pattern GPU data path in solvers: - TortuosityHypre.cpp: #ifdef GPU path with DeviceVector buffers, GPU kernel dispatch, device-to-host copy for BC application - EffectiveDiffusivityHypre.cpp: same pattern CPU path is preserved unchanged behind #else for non-GPU builds. https://claude.ai/code/session_01MdecuXryJ5Rza4Yns2P6Tu
…als-BFzd5' into claude/review-branch-changes-8fGVe
…MP, GPU, BCs - HypreStructSolver: restore GMRES, BiCGSTAB, SMG, PFMG, Jacobi solver support that was dropped during base class extraction (regression) - REVStudy: use rank-independent RNG seed so all MPI ranks agree on sub-volume coordinates (was producing corrupt data in multi-rank runs) - DeffTensor: replace non-portable OMP reduction on 2D C array with thread-local accumulator + critical section (GCC < 9 compatibility) - HypreStructSolver: remove HYPRE_MEMORY_DEVICE for matrix/vectors since SetBoxValues is called with host pointers after device-to-host copy - BoundaryCondition: fix DirichletPhaseBoundaryBC quarter-domain heuristic (was pinning 25% of interior cells); narrow to 1-cell boundary layer. Also handle ambiguous dual-flag case for very small domains - BoundaryCondition: remove redundant per-cell initial guess computation in both Dirichlet BC classes (interior kernel already sets this) - TortuosityHypre: replace iMultiFab::sum() with manual valid-cell-only loop to avoid potential ghost cell inclusion in volume fraction - ImageLoader: add RawReader include and informative error for .raw files (requires dimensions/type, so directs users to RawReader directly) https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
Initial header defining the TortuosityMLMG class interface. This solver uses AMReX's native MLABecLaplacian operator and MLMG geometric multigrid instead of HYPRE, eliminating matrix assembly overhead for small grids (e.g., Google Colab tutorials). Implementation pending architectural decision on shared base class extraction. https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
Extracts solver-independent logic (phase management, diffusion coefficient construction, activity mask generation, flux integration) into a common base class. Subclasses only need to implement solve(). https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
…hase 2) Adds TortuosityMLMG.cpp implementing the full solver pipeline using AMReX's native MLABecLaplacian + MLMG geometric multigrid instead of HYPRE. No explicit matrix assembly - uses matrix-free operator application for lower memory footprint and faster setup on small grids. - Constructor: phase preconditioning, diffusion coefficient field, activity mask via parallel flood fill (same algorithm as HYPRE solver) - solve(): configures MLABecLaplacian with Dirichlet/Neumann BCs, harmonic-mean face B-coefficients, and runs MLMG V-cycle solver - globalFluxes() + computePlaneFluxes(): identical flux integration and conservation checking as TortuosityHypre - value(): tortuosity calculation with full flux conservation validation Also adds: - CMake build integration (TortuosityMLMG.cpp in openimpala_props) - pybind11 bindings (TortuosityMLMG class with same API as HYPRE) - Python facade: solver="mlmg" routes to the new solver backend https://claude.ai/code/session_01DYzuFcpfACfpTE4b21DyzW
- Replace LoopOnCpu with ParallelFor+AMREX_GPU_DEVICE for GPU acceleration (diffusion coefficients, initial guess, B-coefficients, mask intersection) - Use ReduceOps for GPU-safe reductions (active cell count, boundary fluxes) - Fix convergence: wrap MLMG solve in try/catch instead of arbitrary *100 check - Fix iteration count: use mlmg.getNumIters() instead of hardcoded maxiter - Fix plotfile: convert iMultiFab->MultiFab via ParallelFor (not amrex::Copy) - Clean up B-coefficient loop: remove redundant assignments and confused comments - Add Gpu::streamSynchronize before CPU-side plane flux accumulation https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu
The identical flood fill algorithm was duplicated across TortuosityMLMG, TortuosityHypre, PercolationCheck, and ConnectedComponents (~160 lines each). Extract into FloodFill.H/cpp with: - ParallelFor + AMREX_GPU_DEVICE for GPU acceleration - Gpu::DeviceScalar atomic flag for early termination - Parameterized label value (FLOOD_ACTIVE=1 default, custom for CCL) - Shared collectBoundarySeeds with MPI Allgatherv + dedup Net: -652 lines duplicated code, +210 lines shared utility. https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu
Convert LoopOnCpu patterns to GPU-compatible equivalents throughout the codebase: - DeffTensor: 9-component ReduceOps for D_eff tensor gradient accumulation - TortuosityHypre: ReduceOps for boundary flux integration, ParallelFor with atomic scatter-add for plane flux profiles, ReduceOps for active cell counting, device lookup table for diff coeff map fill, ParallelFor for binary traversable mask - TortuosityMLMG: ParallelFor with atomic scatter-add for plane flux profiles - EffectiveDiffusivityHypre: ReduceOps for ManualSumActiveCells, device lookup table for diff coeff map fill, ParallelFor for plotfile mask copy - Diffusion/REVStudy: ParallelFor for active mask generation - VolumeFraction: ReduceOps for phase cell counting - SpecificSurfaceArea: ReduceOps for interface face counting - ConnectedComponents: ParallelFor with atomic scatter-add for component volume counting - ThroughThicknessProfile: ParallelFor with atomic scatter-add for per-slice phase counting Remaining LoopOnCpu instances are justified: HYPRE host buffer copy-back, sequential seed search, boundary seed collection, and I/O from host memory. https://claude.ai/code/session_01D7FoBBTjaqNrtStGfbpAZu
Key fixes: - HYPRE_Init/Finalize ordering (must be inside amrex::Initialize/Finalize) - Remove OMP parallel around HYPRE Set/GetBoxValues (data race) - Replace MPI_COMM_WORLD with amrex::ParallelDescriptor::Communicator() - Fix DeffTensor sign convention (1+dchi, not 1-dchi) - Fix REVStudy ghost cell fill and MPI-incorrect nested MFIter copy - Fix EffDiffFillMtx RHS at pore-solid interfaces (mask-guarded reads) - Fix dangling const& members in solver/utility classes (store by value) - Add GPU stream synchronization before host-side loops - Add integer overflow assertions for HYPRE int casts - Complete TortuosityMLMG refactor to inherit from TortuositySolverBase - Extract shared HypreCheck.H macro, add TortuositySolverBase.cpp - Add input validation, error handling, and HYPRE_ClearAllErrors https://claude.ai/code/session_01QKvcKHR7GgCG2UNapN1GP7
Critical fixes:
- DeffTensor.cpp: Fix sign error in tensor assembly (minus -> plus) to match
the documented formula D_eff[i][j] = (1/V) * sum{ delta_ij + d(chi_j)/d(x_i) }
- EffDiffFillMtx.H: Remove spurious diagonal inflation at internal Neumann BCs
(inactive neighbor faces should not contribute to diagonal), fix flux_bc_rhs
sign to match correct Neumann condition D*(grad(chi)+e_k).n=0, and use
one-sided differences for RHS at active/inactive boundaries instead of
central differences that read D=0 from inactive cells
- TortuosityMLMG.cpp: Fix bdryLo() returning box outside domain (inlet flux
was always 0, producing NaN/inf tortuosity). Use domain.smallEnd() slice instead
High-severity fixes:
- TortuosityMLMG.cpp: Fix convergence threshold from 100*eps to eps
- TortuosityMLMG.cpp: Use mlmg.getNumIters() instead of hardcoded maxiter
- tSyntheticEffectiveDiffusivity.cpp: Add missing solver->solve() call before
checking getSolverConverged() (m_converged defaults to false)
Lower-severity fixes:
- TortuosityHypre.cpp: Restore m_write_plotfile guard on failed-solve plotfile
- SpecificSurfaceArea: Clamp face-counting box to valid region to prevent
out-of-bounds ghost cell access at internal FAB boundaries
https://claude.ai/code/session_01NEQqrvJNCrPn4mpyjcMsHv
Merges 9 bug fixes from the CeIgl review branch, with conflict resolution against the PMNDO 28-bug-fix commit already merged. Key unique fixes incorporated from CeIgl: - EffDiffFillMtx.H: Remove spurious diagonal inflation at internal Neumann boundaries (was inflating matrix diagonal incorrectly) - EffDiffFillMtx.H: Fix flux_bc_rhs signs for Neumann BC correction - EffDiffFillMtx.H: Use one-sided differences at pore-solid interfaces instead of mirroring (more accurate for non-uniform D fields) - tSyntheticEffectiveDiffusivity.cpp: Add missing solve() call before checking convergence - TortuositySolverBase.cpp: Fix bdryLo() returning face-centered box outside cell domain (was causing zero inlet flux / NaN tortuosity) https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Consolidate feature branches: GPU acceleration, MLMG solver, microstructural parameterization, and bug fixes
Fix clang-format violations across 23 source files
AMReX was built with AMReX_FORTRAN=ON, so the project must enable Fortran before find_package(AMReX). Added Fortran to the LANGUAGES list in the project() declaration. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Add Fortran to project languages to fix AMReX CMake error
AMReX 25.03 DeviceScalar lacks setVal(). Use ifdef guards to handle GPU and CPU paths separately — plain int on CPU, reconstructed DeviceScalar on GPU. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix FloodFill build error: DeviceScalar has no setVal member
Box::grow() mutates in place and is non-const. Copy the domain box before calling grow() to avoid discarding const qualifiers. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix const-correctness error in SpecificSurfaceArea
AMReX does not provide ReduceLongLongSum. Use ReduceLongSum with long types instead. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix ReduceLongLongSum not found in AMReX ParallelDescriptor
ResultsJSON.H uses AMREX_SPACEDIM and AMREX_D_DECL but did not include AMReX_SPACE.H, causing build failures when included without other AMReX headers pulling it in transitively. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Add missing AMReX_SPACE.H include in ResultsJSON.H
The unit test compiles without AMReX (OPENIMPALA_UNIT_TEST defined). Guard the AMReX_SPACE.H include and provide fallback definitions for AMREX_SPACEDIM and AMREX_D_DECL, matching the pattern in PhysicsConfig.H. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Guard AMReX includes in ResultsJSON.H for unit test mode
The unit test compiles without AMReX (OPENIMPALA_UNIT_TEST defined). Guard the AMReX_SPACE.H include and provide fallback definitions for AMREX_SPACEDIM and AMREX_D_DECL, matching the pattern in PhysicsConfig.H. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Two test failures fixed: 1. tEffectiveDiffusivity: The periodic cell problem div(D grad(chi)) = -div(D e_k) produces a singular linear system (constant null space). Without pinning, HYPRE FlexGMRES diverges with NaN residuals. Fix: pin the first active cell to chi=0 after matrix fill, removing the null space without affecting D_eff. Uses MPI_MINLOC to coordinate the pinned cell across ranks. 2. tSyntheticMicrostructure: SpecificSurfaceArea requires iMultiFab with >= 1 ghost cell for neighbor access across MFIter box boundaries. Tests 1, 2, 2b, 2c, 2d were allocating with 0 ghost cells. Fix: allocate with 1 ghost cell, initialize with setVal(0), and call FillBoundary() after filling valid data. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix EffDiff solver NaN divergence and SSA ghost cell assertion
SSA: Remove overly aggressive vbx.bigEnd(d)-1 clamping that prevented face counting at multi-box boundaries. The interface at the box boundary was missed because the check box was shrunk below the valid cell range. Ghost cells are already guaranteed filled (nGrow>=1 asserted). EffDiff: Switch preconditioner from PFMG to SMG for the periodic cell problem. PFMG diverges (NaN at iteration 1) on the periodic structured grid with a pinned null-space cell. SMG handles periodic grids more robustly, matching the approach used by TortuosityHypre. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix SSA face counting at box boundaries and EffDiff PFMG divergence
Cast mpi_size to size_t before multiplication to satisfy bugprone-implicit-widening-of-multiplication-result clang-tidy check. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
…om Fortran The C++ migration of effDiffFillMatrix (commit 2995552) introduced two bugs compared to the original Fortran kernel: 1. Missing diagonal contribution for inactive neighbors: the Fortran adds D_c/dx^2 to the center stencil entry when a face neighbor is inactive (internal Neumann ghost cell). The C++ kernel omitted this, producing near-zero or zero diagonals that made the matrix singular/ill-conditioned, causing HYPRE SMG/PFMG to produce NaN on iteration 1. 2. Reversed RHS flux BC signs: the Fortran uses +D_c/dx for -face and -D_c/dx for +face. The C++ had these swapped. 3. Unnecessary one-sided RHS differences: the Fortran always uses central differences for -div(D e_k) regardless of neighbor activity (ghost cells provide the needed data). The C++ added conditional one-sided stencils that were incorrect. Also reverts the SMG workaround back to PFMG (the proper preconditioner for periodic structured grids), since the root cause was the broken matrix. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Fix EffDiff matrix fill: restore Neumann BC diagonal and RHS signs from Fortran
Add pypi-wheels-gpu.yml that builds GPU-accelerated wheels using NVIDIA's CUDA 12.6 manylinux container (sameli/manylinux_2_28_x86_64_cuda_12.6). Key changes: - HYPRE built with --with-cuda for GPU-accelerated structured solvers - AMReX built with -DAMReX_GPU_BACKEND=CUDA targeting compute capabilities 6.0 through 9.0 (Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper) - OpenImpala compiled with -DGPU_BACKEND=CUDA via CMAKE_ARGS - CUDA runtime libs excluded from auditwheel (user provides CUDA toolkit) - Dependency cache keyed separately from CPU builds The existing CPU-only workflow is renamed to pypi-wheels-cpu.yml. Both workflows trigger on release publication and manual dispatch, allowing parallel CPU and GPU wheel builds. No changes to pyproject.toml or CMakeLists needed — scikit-build-core picks up GPU_BACKEND from the CMAKE_ARGS environment variable set in CI, and CUDA libraries propagate transitively through AMReX::amrex and HYPRE::HYPRE link targets. https://claude.ai/code/session_012awTC848VbvuhVZzFhNCDC
Jupyter notebook (notebooks/profiling_and_tuning.ipynb) for profiling OpenImpala solver performance on Google Colab's free tier (T4 GPU). Includes sections for: - Synthetic dataset generation via PoreSpy - Solver profiling sweep (PCG, FlexGMRES, GMRES, BiCGSTAB, SMG, PFMG, MLMG) - AMReX max_grid_size tuning sweep - Dataset size scaling with power-law fit - Direction anisotropy check - Multi-porosity comparison against Bruggeman relation - Summary with HPC scaling recommendations and template inputs file https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
- Section 3: Dual-axis bar chart showing wall time (bars) + iteration count (diamond markers) on a second x-axis - Section 4: Solver x grid size heatmap with annotated cell values and green-bordered "BEST" marker on the optimal configuration - Section 5: Filled area under the scaling curve with smooth power-law interpolation and arrow-style annotations - Section 6: Radar/spider chart for directional anisotropy with filled polygon, isotropic reference circle, and LaTeX axis labels https://claude.ai/code/session_01WR9HkUD95rp3XzZU95j2y7
…ing-notebook-JUBs9 Add CUDA GPU wheel build workflow and profiling tutorial
Code Coverage ReportGenerated by CI — coverage data from gcovr |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is a comprehensive modernization of OpenImpala's solver infrastructure, spanning architecture, performance, and extensibility improvements across 80 files.
Architecture & Refactoring
Diffusion.cppinto reusable modules (Refactor Deconstruct the Monolithic Diffusion Driver #51)HypreStructSolverbase class for shared HYPRE infrastructureTortuositySolverBaseshared base class for tortuosity solversSolverConfig.HSolverTypeenums into a shared definition (Refactor Consolidate Duplicated SolverType Enums #50)New Solvers
TortuosityHypreTortuosityMLMG: matrix-free AMReX MLMG solver (Epic: Intelligent Solver Routing and Matrix-Free AMReX MLMG Integration #171)Fortran → C++ Migration
FloodFillutility, deduplicating 4 copiesGPU Acceleration
New Features
Bug Fixes & Stability
DeviceScalarbuild errorConnectedComponents.cppTest Coverage
Diffusion.cpp, andEffDiffusivityHypre(Epic: Improve Code Coverage (Current: 27.31%) #109)Test plan