Skip to content

WIP: PERF: OOC-optimized algorithm variants for 30+ filters#1575

Draft
joeykleingers wants to merge 9 commits intoBlueQuartzSoftware:developfrom
joeykleingers:ooc-filter-optimizations
Draft

WIP: PERF: OOC-optimized algorithm variants for 30+ filters#1575
joeykleingers wants to merge 9 commits intoBlueQuartzSoftware:developfrom
joeykleingers:ooc-filter-optimizations

Conversation

@joeykleingers
Copy link
Copy Markdown
Contributor

@joeykleingers joeykleingers commented Apr 2, 2026

Summary

Adds out-of-core (OOC) optimized algorithm variants for 30+ filters, using DispatchAlgorithm to select between in-core (Direct/BFS) and OOC (Scanline/CCL) code paths at runtime based on data store type. A preparatory rename commit gives git rename tracking so that GitHub shows meaningful diffs against the original algorithm code.

This PR contains only the filter optimization layer. The core OOC infrastructure (copyIntoBuffer/copyFromBuffer API, HDF5ChunkedStore, OocDataIOManager, etc.) is in a separate ooc-architecture-rewrite branch that this PR stacks on top of.

Branch Structure

develop
  └── ooc-architecture-rewrite (core OOC architecture, store management, file import, recovery)
       └── ooc-filter-optimizations (this PR — rename + 30+ filter optimizations)

Commit 0 — Rename for Git Tracking

Renames 13 algorithm files to their in-core variant names before any logic changes, so that when dispatch variants are introduced, GitHub shows proper diffs against the original code instead of "new file" with no context.

Original Renamed To
FillBadData FillBadDataBFS
IdentifySample IdentifySampleBFS
ComputeBoundaryCells ComputeBoundaryCellsDirect
ComputeFeatureNeighbors ComputeFeatureNeighborsDirect
ComputeSurfaceAreaToVolume ComputeSurfaceAreaToVolumeDirect
ComputeSurfaceFeatures ComputeSurfaceFeaturesDirect
SurfaceNets SurfaceNetsDirect
QuickSurfaceMesh QuickSurfaceMeshDirect
DBSCAN DBSCANDirect
ComputeKMedoids ComputeKMedoidsDirect
MultiThresholdObjects MultiThresholdObjectsDirect
BadDataNeighborOrientationCheck BadDataNeighborOrientationCheckWorklist
ComputeGBCDPoleFigure ComputeGBCDPoleFigureDirect

Bug Fixes

OOC import of legacy SIMPL files with multi-dimensional component arrays

Legacy SIMPL .dream3d files store multi-dimensional component arrays (e.g., GBCD with componentShape [10,10,10,20,20,2]) with HDF5 physical dimensions in reversed order relative to the ComponentDimensions attribute.

Two fixes address this at different layers:

  • AbstractOocStore::readHdf5 (SimplnxOoc): Detects shape mismatch between logical and physical dimensions before the streaming import path. Falls back to flat bulk read (H5S_ALL) when shapes differ, preserving correct byte order.
  • ImportH5ObjectPathsAction::backfillReadOnlyOocStores (simplnx): The read-only reference store optimization creates stores pointing directly at the source file. For mismatched arrays, the N-D hyperslabs would be out-of-bounds. Detects the mismatch and creates a writable OOC store populated via readHdf5 (which triggers the flat-read fallback) instead of a read-only reference.

Filter Optimizations

Group B — Face-Neighbor Filters (5 filters)

Split into Direct (in-core) and Scanline (OOC) algorithm classes using DispatchAlgorithm. Scanline variants use Z-slice rolling windows (prev/cur/next) for cross-slice neighbor access with zero per-element OOC overhead.

Filters: ComputeBoundaryCells, ComputeSurfaceFeatures, ComputeFeatureNeighbors, ComputeSurfaceAreaToVolume, BadDataNeighborOrientationCheck

Group C — Morphological / Neighbor Replacement (5 filters)

Z-slice rolling buffers for all 6 face-neighbor reads from RAM. SliceBufferedTransfer for type-dispatched bulk tuple copy.

Filters: ErodeDilateBadData, ErodeDilateCoordinationNumber, ErodeDilateMask, ReplaceElementAttributesWithNeighborValues, NeighborOrientationCorrelation

Group D — CCL Segmentation (5 filters)

Chunk-sequential Connected Component Labeling using UnionFind equivalence tracking, replacing BFS/DFS flood fill for OOC data.

Filters: ScalarSegmentFeatures, EBSDSegmentFeatures, CAxisSegmentFeatures, FillBadData, IdentifySample

Group E — AlignSections Family (4 filters)

Bulk slice read/write via AlignSectionsTransferDataOocImpl. Per-filter OOC findShifts with 2-slice buffers and bulk mask reads.

Filters: AlignSectionsMisorientation, AlignSectionsMutualInformation, AlignSectionsFeatureCentroid, AlignSectionsListFilter

QuickSurfaceMesh

DispatchAlgorithm<QuickSurfaceMeshDirect, QuickSurfaceMeshScanline>. Scanline eliminates the O(volume) nodeIds array (7.5 GB for 1000³) with rolling 2-plane node buffers (16 MB). Two-pass architecture: counting pass + mesh creation pass. All output arrays (triangle connectivity, faceLabels, vertex coordinates, nodeTypes) buffered per z-slice and flushed with copyFromBuffer. Batch quickSurfaceTransferBatch API added to TupleTransfer for bulk source-read/dest-write of cell and feature data.

SurfaceNets

DispatchAlgorithm<SurfaceNetsDirect, SurfaceNetsScanline>. Scanline is a complete reimplementation (881 lines) eliminating the O(n) Cell[] array — uses O(surface) hash map + vertex vectors with slice-by-slice FeatureIds reading. All output arrays (vertices, nodeTypes, triangle connectivity, faceLabels) buffered and flushed with copyFromBuffer. Batch surfaceNetsTransferBatch API added to TupleTransfer for bulk I/O.

Mesh Infrastructure (RepairTriangleWinding + GeometryHelpers)

  • RepairTriangleWinding: Bulk-reads triangle face list and faceLabels into local buffers; all BFS work operates on local memory; modified triangles written back via copyFromBuffer.
  • FindElementsContainingVert / FindElementNeighbors (GeometryHelpers.hpp): Chunked bulk I/O with 65K-element chunks for sequential passes. Random neighbor lookups check if candidate is in the current chunk (cache hit) before falling back to per-element copyIntoBuffer. Together with RepairTriangleWinding buffering, this reduced SurfaceNets Winding from 515s to 2.9s.

Clustering Filters (3 filters)

  • DBSCAN: DispatchAlgorithm<DBSCANDirect, DBSCANScanline> — chunked grid construction, on-demand per-grid-cell coordinate reads in canMerge. 653s → 12s (54x)
  • ComputeKMedoids: DispatchAlgorithm<Direct, Scanline> — chunked findClusters, per-cluster optimizeClusters with O(max_cluster_size) peak memory. 74s → 13s (5.7x)
  • ComputeFeatureClustering: Single implementation with feature-level array caching. 203s → 77s (2.6x)

Pipeline Prerequisite Filters (2 filters)

  • MultiThresholdObjects: DispatchAlgorithm<Direct, Scanline> — eliminates O(n) tempResultVector in OOC path
  • ConvertOrientations: Single implementation with chunked bulk I/O in macro-generated Convertor classes (4096-tuple chunks)

Together these reduced the AlignSectionsMisorientation pipeline test from 635s to 5.9s (107x).

OrientationAnalysis Misc (10 filters)

  • ComputeTwinBoundaries: Bulk-read all face/feature/ensemble arrays into local vectors. 179s → 44s (4x)
  • ComputeKernelAvgMisorientations: Slab-based bulk I/O with cached CrystalStructures
  • ComputeAvgCAxes: Already OOC-optimized (chunked reads, cached feature output). Compute-bound.
  • ReadH5Ebsd: copyFromBuffer in CopyData template, phase copy, Euler interleaving. 463s → 241s (1.9x)
  • ComputeGBCDPoleFigure: DispatchAlgorithm<Direct, Scanline> — Direct caches full GBCD, Scanline caches only the phase-of-interest slice (bounded by bin resolution, not cell count). 853s → 0.9s (948x)
  • ComputeFeatureReferenceCAxisMisorientations: Z-slice buffered I/O for all cell-level arrays (featureIds, cellPhases, quats, output). Cached ensemble/feature-level arrays (crystalStructures, avgCAxes). 196s → 5.4s (36x)
  • ComputeFeatureNeighborCAxisMisalignments: Bulk-read all feature-level arrays (featurePhases, featureAvgQuat, crystalStructures) and buffered avgCAxisMisalignment output.
  • MergeTwins: Chunked bulk I/O for voxel-level parent ID fill and assignment loop. Feature-level featureParentIds cached locally for lookup. 67s → 1.8s (37x)
  • ReadCtfData: Bulk copyFromBuffer for all cell arrays (phases, euler angles, bands, error, MAD, BC, BS, X, Y). Euler angle interleave uses chunked 64K buffer. Crystal structures cached locally for hex correction. 231s → 0.25s
  • ReadAngData: Same bulk copyFromBuffer pattern. Phase validation done in-place on EbsdLib buffer before single bulk write. Euler interleave chunked.

Pipeline-Critical Filters (6 filters)

Optimizations targeting the filters responsible for OOC pipeline timeouts (4 of 5 timed-out pipelines blocked by ComputeIPFColors):

  • ComputeIPFColors: DispatchAlgorithm<ComputeIPFColorsDirect, ComputeIPFColorsScanline>. Direct keeps parallel ParallelDataAlgorithm for in-core; Scanline uses chunked sequential bulk I/O (65K-tuple chunks) with locally cached crystal structures. ForceOocAlgorithmGuard added to test. 1,937ms → 90ms (21.5x)
  • ComputeFeatureSizes: Chunked copyIntoBuffer for featureIds (ImageGeom path) and featureIds + elemSizes (RectGridGeom path with Kahan summation preserved). 813ms → 28ms (29x)
  • ComputeAvgOrientations: Chunked featureIds/phases/quats reads, locally cached crystal structures and avgQuats (feature-level). Bulk copyFromBuffer for output arrays.
  • ComputeFeatureReferenceMisorientations: Chunked all cell-level arrays (featureIds, phases, quats, GB distances, output misorientations). Locally cached crystal structures, avgQuats, and center quaternions (all feature/ensemble-level). 106ms → 1ms (106x)
  • ComputeFeatureCentroids: Replaced AbstractDataStore intermediate arrays (sum, center, count, rangeX/Y/Z) with plain std::vector — eliminates ~119M virtual dispatch calls per run. Chunked featureIds reads. Inline coordinate computation from spacing/origin. 39,724ms → 25ms (1,589x)
  • RequireMinimumSizeFeatures: Three-part optimization:
    • removeSmallFeatures: Chunked featureIds read/write (65K-tuple batches)
    • assignBadVoxels: 3-slice rolling slab buffer for neighbor voting scan (O(slice) memory), sparse changed-voxel tracking to skip full-volume transfer when few/no voxels changed. 14,592ms → 142ms (103x)
    • RemoveInactiveObjects (shared utility in DataGroupUtilities.cpp): Chunked featureIds renumbering with copyIntoBuffer/copyFromBuffer. 5,573ms → 50ms (111x)
    • Combined: 20,184ms → 210ms (96x)

Additional Filters

  • ComputeEuclideanDistMap: Bulk-read featureIds and distance stores into local vectors; flood-fill operates on local memory; bulk-write output. 116s → 1.1s (105x)
  • AppendImageGeometry: Bulk I/O for mirror operations (scanline-based reversal instead of per-tuple swaps). 469s → 113s (4.2x)

GBCD Filter Group (5 filters)

All five GBCD filters optimized for OOC with zero cell-level O(n) allocations, cancel checking, and progress messaging:

  • ComputeGBCDPoleFigure: DispatchAlgorithm<Direct, Scanline> with ForceOocAlgorithmGuard in test. Scanline caches only the phase-of-interest GBCD slice via copyIntoBuffer.
  • WriteGBCDGMTFile: Phase-of-interest GBCD slice cached via copyIntoBuffer; crystal structures cached locally.
  • WriteGBCDTriangleData: Chunked triangle I/O (8K chunks), feature-level euler cache, buffered file output via fmt::format_to + fmt::memory_buffer.
  • ComputeGBCD: Feature-level caching (eulers, phases, crystalStructures), chunked triangle array reads per 50K-triangle iteration, GBCD output accumulated in local buffer (bounded by phases × bins) then written back via copyFromBuffer.
  • ComputeGBCDMetricBased: Eliminated O(n) triIncluded allocation (replaced with per-chunk sequential area accumulation). Feature-level caching (phases, eulers, crystalStructures, featureFaceLabels). Chunked triangle I/O in totalFaceArea scan. Raw pointer access in parallel TrianglesSelector worker.

HDF5 Import + Pole Figure Filters (3 filters)

  • FillOocDataStore (shared infrastructure): Streaming chunked HDF5 hyperslab reads + copyFromBuffer, with zero O(n) temp allocations — batched reads even for partial hyperslabs. Benefits all HDF5 import paths.
  • ReadH5EspritData: copyFromBuffer bulk writes from raw HDF5 reader buffers, replacing 9+ per-element operator[] writes per point.
  • WritePoleFigure: Chunked iteration over eulerAngles/phases/mask per-phase using bounded buffers (no O(n) pre-caching); bulk-write intensity and image outputs via copyFromBuffer.
  • ReadHDF5Dataset: Cancel checking + per-dataset progress messages.
  • Test comparison loops in WritePoleFigureTest and ReadHDF5DatasetTest optimized with copyIntoBuffer.

Core Utilities + Geometry Filters

  • ImportFromBinaryFile: copyFromBuffer instead of per-element writes. ReadRawBinary Case1: 1076s → 29s (37x)
  • CropImageGeometry: Row-based bulk I/O. 27s → 2.6s (10x)
  • RandomizeFeatureIds (ClusteringUtilities): Chunked bulk I/O for both overloads — benefits all callers (segmentation filters, SharedFeatureFace, MergeTwins).
  • AppendData/CopyData/mirror swaps: Runtime OOC check — chunked bulk I/O for OOC, original code for in-core (verified zero in-core regression)
  • TupleTransfer: Added quickSurfaceTransferBatch and surfaceNetsTransferBatch batch APIs with bulk copyIntoBuffer/copyFromBuffer for source reads and destination writes. Used by QuickSurfaceMeshScanline and SurfaceNetsScanline.

Cancel + Progress Messaging

All in-core and OOC algorithm variants now have:

  • m_ShouldCancel checks at the top of major outer loops
  • ThrottledMessenger-based progress reporting with descriptive phase messages and percentage completion

OOC Performance Results

All benchmarks on arm64 Release build with forceOocData = true.

Mesh Generation Filters (full ctest wall-clock, OOC build)

Test Before (s) After (s) Speedup
QuickSurfaceMesh: Base 11.30 0.19 59x
QuickSurfaceMesh: Winding 22.70 0.22 103x
QuickSurfaceMesh: Problem Voxels 11.18 0.19 59x
QuickSurfaceMesh: Winding+PV 21.96 0.22 100x
SurfaceNets: Default 176 2.40 73x
SurfaceNets: Smoothing 224 2.62 85x
SurfaceNets: Winding 515 2.86 180x
SurfaceNets: Winding Smoothing 416 3.22 129x

Groups B–E (200³ dataset, filter.execute() only)

Filter Before (s) After (s) Speedup
ComputeBoundaryCells 6.69 0.25 27x
ComputeSurfaceFeatures 4.01 0.28 14x
ComputeFeatureNeighbors 8.93 0.81 11x
ComputeSurfaceAreaToVolume 8.59 0.24 36x
BadDataNeighborOrientationCheck 97.1 5.25 18x
ErodeDilateBadData 25.09 3.80 7x
ErodeDilateCoordinationNumber 12.43 2.30 5x
ErodeDilateMask 6.43 0.40 16x
ReplaceElementAttrsWithNeighborValues 6.05 4.00 1.5x
NeighborOrientationCorrelation 67.94 5.70 12x
ScalarSegmentFeatures 708.3 1.77 400x
EBSDSegmentFeatures 972.6 2.10 463x
CAxisSegmentFeatures 824.1 1.39 593x
FillBadData 8.6 2.26 4x
IdentifySample 825.0 0.27 3056x
AlignSectionsMisorientation 32.89 0.80 41x
AlignSectionsMutualInformation 15.61 0.81 19x
AlignSectionsFeatureCentroid 8.41 0.39 22x
AlignSectionsListFilter 7.50 0.39 19x

Pipeline-Critical Filters (filter.execute() only, OOC build)

Filter Before After Speedup
ComputeFeatureCentroids 39.7s 25ms 1,589x
RequireMinimumSizeFeatures 20.2s 210ms 96x
ComputeIPFColors 1.94s 90ms 21.5x
ComputeFeatureSizes 813ms 28ms 29x
ComputeFeatureReferenceMisorientations (AvgOri) 106ms 1ms 106x
ComputeFeatureReferenceMisorientations (EuclDist) 136ms 1ms 136x

OrientationAnalysis Filters (full ctest wall-clock, OOC build)

Filter Before (s) After (s) Speedup
ComputeFeatureReferenceCAxisMisorientations 196 5.4 36x
ComputeEuclideanDistMap 116 1.1 105x

GBCD Filter Group (full ctest wall-clock)

Filter Before (s) After (s) Speedup
ComputeGBCDPoleFigure 833 (fail) 2.4 350x
ComputeGBCD 1500 (timeout) ~10 150x
WriteGBCDGMTFile 162 (fail) 6.0 27x
ComputeGBCDMetricBased 38.1 28.9 1.3x
WriteGBCDTriangleData 23.5 19.2 1.2x

HDF5 Import + Pole Figure Filters (full ctest wall-clock)

Filter Before (s) After (s) Speedup
WritePoleFigure (3 tests) 4500 (timeout) 11.7 385x
ReadH5EspritData (3 tests) 2060 (timeout) 6.8 303x
ReadHDF5Dataset 1500 (timeout) 6.7 224x

Additional Optimizations (full ctest wall-clock)

Filter Before (s) After (s) Speedup
ReadRawBinary (Case1) 1076 29 37x
ComputeGBCDPoleFigure 853 0.9 948x
DBSCAN 3D 653 12 54x
AlignSectionsMisorientation Pipeline 635 5.9 107x
ReadH5Ebsd 463 2.1 220x
ReadCtfData 231 0.25 924x
AppendImageGeometry 469 113 4.2x
ComputeFeatureClustering 203 77 2.6x
ComputeTwinBoundaries 179 44 4x
MergeTwins 67 1.8 37x
ComputeKMedoids 74 13 5.7x
CropImageGeometry (X) 27 2.6 10x
WriteAvizoRectilinear 22.8 2.3 10x
WriteAvizoUniform 22.3 2.0 11x

Test Infrastructure

Rotation Filter Bulk I/O

  • RotateSampleRefFrame: Slab-based bulk I/O in RotateImageGeometryWithNearestNeighbor — reads source Z-slabs via copyIntoBuffer, processes output slices into local buffers, writes via copyFromBuffer. No O(n) allocation.
  • RotateEulerRefFrame: Chunked copyIntoBuffer/copyFromBuffer (65K tuples per chunk). 19.5s → 4.8s (4x)
  • Together these reduced ReadH5Ebsd from 241s to 2.1s (117x).

Comparison Function Bulk I/O

CompareFloatArraysWithNans, CompareArrays, and CompareDataArraysByComponent in UnitTestCommon.hpp were doing per-element operator[] access, causing extreme slowdowns when comparing OOC-backed arrays. Replaced with chunked copyIntoBuffer reads (40K elements per chunk), matching the existing CompareDataArrays pattern. This alone reduced the ComputeGBCD test from 1500s (timeout) to ~10s — the filter itself runs in ~3s.

  • ForceOocAlgorithmGuard coverage in all optimized filter tests for both algorithm paths
  • SIMPLNX_TEST_ALGORITHM_PATH CMake option (0=Both, 1=OOC-only, 2=InCore-only) for build-specific test path control
  • Programmatic test data builders with Z-slice batched bulk writes for OOC efficiency

Test Plan

  • Tests pass on in-core build (SIMPLNX_TEST_ALGORITHM_PATH=2)
  • Tests pass on out-of-core build (SIMPLNX_TEST_ALGORITHM_PATH=1)
  • Tests pass with both algorithm paths (SIMPLNX_TEST_ALGORITHM_PATH=0)
  • All optimized filters produce correct results on both algorithm paths
  • In-core performance verified: no regression on utility changes

@joeykleingers joeykleingers marked this pull request as draft April 2, 2026 00:57
@joeykleingers joeykleingers changed the title PERF: OOC-optimized algorithm variants for 30+ filters WIP: PERF: OOC-optimized algorithm variants for 30+ filters Apr 2, 2026
@joeykleingers joeykleingers force-pushed the ooc-filter-optimizations branch 7 times, most recently from 838a49f to f145122 Compare April 8, 2026 17:43
@joeykleingers joeykleingers force-pushed the ooc-filter-optimizations branch 2 times, most recently from 27a54dd to bf19ea2 Compare April 10, 2026 02:54
Replace the chunk-based DataStore API with a plugin-driven hook
architecture that cleanly separates OOC policy (in the SimplnxOoc
plugin) from mechanism (in the core library). The old API required
every caller to understand chunk geometry; the new design hides OOC
details behind bulk I/O primitives and plugin-registered callbacks.

--- AbstractDataStore / IDataStore API ---

Remove the entire chunk API from AbstractDataStore and IDataStore:
loadChunk, getNumberOfChunks, getChunkLowerBounds, getChunkUpperBounds,
getChunkShape, getChunkSize, getChunkTupleShape, getChunkExtents, and
convertChunkToDataStore. Replace with two bulk I/O primitives:
copyIntoBuffer(startIndex, span<T>) and copyFromBuffer(startIndex,
span<const T>), implemented in DataStore (std::copy on raw memory) and
EmptyDataStore (throws). This shifts the abstraction from "load a
chunk, then index into it" to "copy a contiguous range into a caller-
owned buffer," which works identically for in-core and OOC stores.

Simplify StoreType to three values (InMemory, OutOfCore, Empty) by
removing EmptyOutOfCore. IsOutOfCore() now checks StoreType instead
of testing getChunkShape().has_value(). Add getRecoveryMetadata()
virtual to IDataStore for crash-recovery attribute persistence.

--- Plugin Hook System (DataIOCollection / IDataIOManager) ---

Add three plugin-registered callback hooks to DataIOCollection:

  FormatResolverFnc: Decides storage format for a given array based on
    type, shape, and size. Called from DataStoreUtilities::CreateDataStore
    and CreateListStore. Replaces the removed checkStoreDataFormat() and
    TryForceLargeDataFormatFromPrefs — format decisions now live entirely
    in the plugin, with core only calling resolveFormat() when no format
    is already set.

  BackfillHandlerFnc: Post-import callback that lets the plugin finalize
    placeholder stores after all HDF5 objects are read. Called from
    ImportH5ObjectPathsAction after importing all paths. Replaces the
    removed backfillReadOnlyOocStores core implementation.

  WriteArrayOverrideFnc: Intercepts HDF5 writes during recovery file
    creation, allowing the plugin to write lightweight placeholder
    datasets instead of full array data. Activated via RAII
    WriteArrayOverrideGuard, wired into DataStructureWriter.

Add factory registration on IDataIOManager for ListStoreRefCreateFnc,
StringStoreCreateFnc, and FinalizeStoresFnc, with delegating creation
methods on DataIOCollection. Guard against reserved format name
"Simplnx-Default-In-Memory" during IO manager registration.

--- EmptyStringStore Placeholder ---

Add EmptyStringStore, a placeholder class for OOC string array import
that stores only tuple shape metadata. All data access
methods throw std::runtime_error. isPlaceholder() returns true (vs
false for StringStore). StringArrayIO creates EmptyStringStore in OOC mode instead of
allocating numValues empty strings.

--- HDF5 I/O ---

DataStoreIO::ReadDataStore gains two interception paths before the
normal in-core load: (1) recovery file detection via OocBackingFilePath
HDF5 attributes, creating a read-only reference store pointing at the
backing file; (2) OOC format resolution via resolveFormat(), creating a
read-only reference store directly from the source .dream3d file with
no temp copy.

DataArrayIO::writeData always calls WriteDataStore
directly — OOC stores materialize their data through the plugin's
writeHdf5() method; recovery writes use WriteArrayOverrideFnc.

NeighborListIO gains OOC interception: computes total neighbor count,
calls resolveFormat(), and creates a read-only ref list store when an
OOC format is available. Legacy NeighborList reading passes a preflight
flag through the entire call chain (readLegacyNeighborList ->
createLegacyNeighborList -> ReadHdf5Data) so legacy .dream3d imports
create EmptyListStore placeholders instead of eagerly loading per-
element via setList().

DataStructureWriter checks WriteArrayOverrideFnc before normal writes,
giving the registered plugin callback first chance to handle each
data object.

Add explicit template instantiations for DatasetIO::createEmptyDataset
and DatasetIO::writeSpanHyperslab for all numeric types plus bool.
These are needed by the SimplnxOoc plugin's AbstractOocStore::writeHdf5(),
which cannot use writeSpan() because the full array is not in memory.
Instead it creates an empty dataset, then fills it region-by-region
via hyperslab writes as it streams data from the backing file.

--- Preferences ---

Add unified oocMemoryBudgetBytes preference (default 8 GB) that
the ChunkCache, visualization, and stride cache all use. Add k_InMemoryFormat
sentinel constant for explicit in-core format choice. Add migration
logic to erase legacy empty-string and "In-Memory" preference values.
checkUseOoc() now tests against k_InMemoryFormat.
setLargeDataFormat("") removes the key so plugin defaults take effect.

--- Algorithm Infrastructure ---

AlgorithmDispatch: Add ForceInCoreAlgorithm/ForceOocAlgorithm global
flags with RAII guards. Add DispatchAlgorithm template that selects
Direct (in-core) vs Scanline (OOC) algorithm variant based on store
types and force flags. Add SIMPLNX_TEST_ALGORITHM_PATH CMake option
(0=both, 1=OOC-only, 2=InCore-only) for dual-dispatch test control.

IParallelAlgorithm: Remove blanket TBB disabling for OOC data — OOC
stores are now thread-safe via ChunkCache + HDF5 global mutex.
CheckStoresInMemory/CheckArraysInMemory use StoreType instead of
getDataFormat().

VtkUtilities: Rewrite binary write path to read into 4096-element
buffers via copyIntoBuffer, byte-swap in the buffer, and fwrite —
replacing direct DataStore data() pointer access.

--- Filter Algorithm Updates ---

FillBadData: Rewrite phaseOneCCL and phaseThreeRelabeling to use
Z-slab buffered I/O via copyIntoBuffer/copyFromBuffer instead of
the removed chunk API (loadChunk, getChunkLowerBounds, etc.).
operator()() scans feature counts in 64K-element chunks via
copyIntoBuffer.

QuickSurfaceMesh: Remove getChunkShape() call in generateTripleLines()
that set ParallelData3DAlgorithm chunk size, as the chunk API no
longer exists on AbstractDataStore.

--- File Import ---

ImportH5ObjectPathsAction: Add deferred-load pattern. When a backfill
handler is registered, pass preflight=true to create placeholder stores
during import, then call runBackfillHandler() after all paths are
imported to let the plugin finalize.

Dream3dIO: Add WriteRecoveryFile() that wraps WriteFile with WriteArrayOverrideGuard.

--- Utility Changes ---

DataStoreUtilities: Remove TryForceLargeDataFormatFromPrefs entirely.
CreateDataStore and CreateListStore call resolveFormat() on the IO
collection. ArrayCreationUtilities: check k_InMemoryFormat sentinel
before skipping memory checks.

ITKArrayHelper/ITKTestBase: OOC checks use getStoreType() instead of
getDataFormat().empty(). IsArrayInMemory simplified from a 40-line
DataType switch to a single StoreType check.

ArraySelectionParameter: Remove EmptyOutOfCore handling; simplify to
just StoreType::Empty.

--- Tests ---

Add EmptyStringStore tests (6 cases: metadata, zero tuples, throwing
access, deep copy placeholder preservation, resize, isPlaceholder).
Add DataIOCollection hooks tests (format resolver, backfill handler).
Add IOFormat tests (7 cases: InMemory sentinel, empty format,
resolveFormat with/without plugin). Add IParallelAlgorithm OOC tests
(8 cases with MockOocDataStore: TBB enablement for in-memory, OOC,
and mixed arrays/stores).

Remove the "Target DataStructure Size" test from IOFormat.cpp — it
was a tautology that re-implemented the same arithmetic as
updateMemoryDefaults() without testing any edge case or behavior.

Fix RodriguesConvertorTest exemplar data: add missing expected values
for the 4th tuple (indices 12-15). The old CompareDataArrays broke
on the first floating-point mismatch regardless of magnitude, masking
this incomplete exemplar. The new chunked comparison correctly
continues past epsilon-close differences, exposing the missing data.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add comprehensive documentation to all new methods, type aliases,
classes, and algorithms introduced in the OOC architecture rewrite.
Every new public API now has Doxygen explaining what it does, how it
works, and why it is needed. Algorithm implementations have step-by-
step inline comments explaining the logic.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…ation layer

Move the format resolver call site from the low-level DataStoreUtilities::
CreateDataStore/CreateListStore functions up to the array creation layer
(ArrayCreationUtilities::CreateArray and ImportH5ObjectPathsAction). This
is a prerequisite for the upcoming data store import handler refactor.

Key architectural changes:

1. FormatResolverFnc signature expanded to (DataStructure, DataPath,
   DataType, dataSizeBytes). The resolver can now walk parent objects to
   determine geometry type, enabling it to force in-core for unstructured/
   poly geometry arrays without caller-side checks.

2. Format resolution removed from DataStoreUtilities::CreateDataStore and
   CreateListStore. These are now simple factories that take an already-
   resolved format string. Callers are responsible for calling the resolver.

3. CreateArrayAction no longer carries a dataFormat member or constructor
   parameter. The k_DefaultDataFormat constant is removed. Format is
   resolved at execute time inside ArrayCreationUtilities::CreateArray.

4. ImportH5ObjectPathsAction gains a format-resolver loop that iterates
   Empty-store DataArrays after preflight import, consulting the resolver
   to decide which arrays to eager-load (in-core) vs leave for the
   backfill handler (OOC).

5. DataStoreIO::ReadDataStore and NeighborListIO::finishImportingData lose
   their inline format-resolution and OOC reference-store creation code.
   Format decisions for imported data are now made at the action level,
   not during raw HDF5 I/O.

6. Geometry actions (CreateGeometry1D/2D/3DAction, CreateVertexGeometry,
   CreateRectGridGeometry) lose their createdDataFormat parameter. They
   now materialize OOC topology arrays into in-core stores when the source
   arrays have StoreType::OutOfCore, since unstructured/poly geometry
   topology must be in-core for the visualization layer.

7. CheckMemoryRequirement simplified to a pure RAM check. OOC fallback
   logic removed since the resolver handles format decisions upstream.

All filter callers updated to drop the dataFormat argument from
CreateArrayAction constructors. Python binding updated (data_format
parameter renamed to fill_value). Test files updated for new
resolveFormat signature.
…arden .dream3d import

Rename the "backfill handler" to "data store import handler" and expand
its role to handle ALL data store loading from .dream3d files — in-core
eager loading, OOC reference stores, and recovery reattachment. This
replaces the split decision-making where ImportH5ObjectPathsAction ran
a format-resolver loop and a separate backfill handler.

Key changes:

1. DataIOCollection: Rename BackfillHandlerFnc to
   DataStoreImportHandlerFnc with expanded signature that includes
   importStructure. Rename set/has/runBackfillHandler to
   set/has/runDataStoreImportHandler. Add format display name registry
   (registerFormatDisplayName/getFormatDisplayNames) for human-readable
   format names in the UI dropdown.

2. DataStoreIO: Rename ReadDataStore to ReadDataStoreIntoMemory. Remove
   recovery reattachment code (OOC-specific HDF5 attribute checks moved
   to SimplnxOoc plugin). Add placeholder detection — compares physical
   HDF5 element count against shape attributes, returns Result<> with
   warning when mismatch detected (guards against loading placeholder
   datasets without the OOC plugin). Change return type to
   Result<shared_ptr<AbstractDataStore<T>>> so callers can accumulate
   warnings across arrays.

3. ImportH5ObjectPathsAction: Remove the format-resolver loop (79 lines).
   The action now delegates entirely to the registered handler when
   present, or falls back to FinishImportingObject for non-OOC builds.

4. CreateArrayAction: Restore dataFormat parameter for per-filter format
   override. When non-empty, bypasses the format resolver. Dropdown shows
   "Automatic" (resolver decides), "In Memory", and plugin-registered
   formats with display names. Fix 12 filter callers where fillValue was
   being passed as dataFormat after parameter reordering.

5. Dream3dIO: Route DREAM3D::ReadFile through ImportH5ObjectPathsAction
   so recovery and OOC hooks fire. Remove unused ImportDataObjectFromFile
   and ImportSelectDataObjectsFromFile.

6. Application: Add getDataStoreFormatDisplayNames() to expose display
   name registry to DataStoreFormatParameter.

Updated callers: DataArrayIO (2 sites), NeighborListIO (2 sites),
Dream3dIO (2 legacy helpers), DataStructureWriter (comment), 12 filter
files, simplnxpy Python binding, DataIOCollectionHooksTest.
Replace the old Dream3dIO public API (ReadFile, ImportDataStructureFromFile,
FinishImportingObject) with four new purpose-specific functions:

  - LoadDataStructure(path) — full load with OOC handler support
  - LoadDataStructureArrays(path, dataPaths) — selective array load with pruning
  - LoadDataStructureMetadata(path) — metadata-only skeleton (preflight)
  - LoadDataStructureArraysMetadata(path, dataPaths) — pruned metadata skeleton

The new API eliminates the bool preflight parameter in favor of distinct
functions, decouples pipeline loading from DataStructure loading, and
centralizes the OOC handler integration in a single internal
LoadDataStructureWithHandler function.

Key changes:

DataIOCollection: Add EagerLoadFnc typedef and pass it through the
DataStoreImportHandlerFnc signature, replacing the importStructure parameter.
The handler can now eager-load individual arrays via callback without knowing
Dream3dIO internals.

ImportH5ObjectPathsAction: Rewrite to use the new API — preflight calls
LoadDataStructureMetadata, execute calls LoadDataStructure. The action no
longer manages HDF5 file handles or deferred loading directly; it merges
source objects into the pipeline DataStructure via shallow copy.

ReadDREAM3DFilter: Switch preflight from ImportDataStructureFromFile(reader,
true) to LoadDataStructureMetadata(path), removing manual HDF5 file open.

Dream3dIO internals: Move LoadDataObjectFromHDF5, EagerLoadDataFromHDF5,
PruneDataStructure, and LoadDataStructureWithHandler into an anonymous
namespace. LoadDataStructureWithHandler implements the shared logic: build
metadata skeleton, optionally delegate to the OOC import handler, fall back
to eager in-core loading.

Test callers: Switch ComputeIPFColorsTest, RotateSampleRefFrameTest,
DREAM3DFileTest, and H5Test to UnitTest::LoadDataStructure. Add
Dream3dLoadingApiTest with coverage for all four new functions.

UnitTestCommon: Simplify LoadDataStructure/LoadDataStructureMetadata helpers
to delegate directly to the new DREAM3D:: functions.
Add the namespace fs = std::filesystem alias to .cpp files that spell
out std::filesystem, consistent with the existing convention used
throughout the codebase (e.g., AtomicFile.cpp, FileUtilities.cpp,
all ITK test files, UnitTestCommon.hpp).

Files updated: Dream3dIO.cpp, ImportH5ObjectPathsAction.cpp,
DataIOCollection.cpp, H5Test.cpp, UnitTestCommon.cpp,
DREAM3DFileTest.cpp, ComputeIPFColorsTest.cpp.
Previously IDataStore provided a default implementation that returned
an empty map, which silently disabled recovery metadata for any store
subclass that forgot to override it. Make it pure virtual so every
concrete store must explicitly state what (if any) recovery metadata
it produces.

DataStore overrides it to return an empty map (in-memory stores have
no backing file or external state, so the recovery file's HDF5 dataset
contains all the data needed to reconstruct the store).

EmptyDataStore overrides it to throw std::runtime_error, matching the
fail-fast behavior of every other data-access method on this metadata-
only placeholder class. Querying recovery metadata on a placeholder is
a programming error: the real store that replaces the placeholder
during execution is the one responsible for providing recovery info.

MockOocDataStore in IParallelAlgorithmTest.cpp gains a no-op override
returning an empty map so it remains constructible.

Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Rename 13 algorithm files to their in-core variant names in preparation
for adding OOC (out-of-core) dispatch alternatives. This enables git
rename tracking so that subsequent optimization commits show proper
diffs against the original algorithm code.

Renames (SimplnxCore):
  FillBadData -> FillBadDataBFS
  IdentifySample -> IdentifySampleBFS
  ComputeBoundaryCells -> ComputeBoundaryCellsDirect
  ComputeFeatureNeighbors -> ComputeFeatureNeighborsDirect
  ComputeSurfaceAreaToVolume -> ComputeSurfaceAreaToVolumeDirect
  ComputeSurfaceFeatures -> ComputeSurfaceFeaturesDirect
  SurfaceNets -> SurfaceNetsDirect
  QuickSurfaceMesh -> QuickSurfaceMeshDirect
  DBSCAN -> DBSCANDirect
  ComputeKMedoids -> ComputeKMedoidsDirect
  MultiThresholdObjects -> MultiThresholdObjectsDirect

Renames (OrientationAnalysis):
  BadDataNeighborOrientationCheck -> BadDataNeighborOrientationCheckWorklist

No logic changes. InputValues structs and filter classes unchanged.
…ntationAnalysis

Replace per-element DataStore access with chunked bulk I/O
(copyIntoBuffer/copyFromBuffer) across 60+ algorithm files to eliminate
virtual dispatch overhead and HDF5 chunk thrashing when arrays are backed
by out-of-core storage.

--- Architecture ---

DispatchAlgorithm pattern (Direct/Scanline):
  11 algorithms gain a base dispatcher class that selects between an
  in-core Direct implementation and an OOC Scanline variant at runtime
  based on IsOutOfCore()/ForceOocAlgorithm():
    SimplnxCore: ComputeBoundaryCells, ComputeFeatureNeighbors,
      ComputeKMedoids, ComputeSurfaceAreaToVolume, ComputeSurfaceFeatures,
      DBSCAN, MultiThresholdObjects, QuickSurfaceMesh, SurfaceNets
    OrientationAnalysis: BadDataNeighborOrientationCheck, ComputeIPFColors
  ComputeGBCDPoleFigure dispatches directly from its filter executeImpl().

Connected Component Labeling (CCL) pattern:
  4 algorithms gain a two-pass CCL variant as an OOC alternative to
  random-access BFS/DFS flood-fill:
    SimplnxCore: FillBadData (BFS/CCL), IdentifySample (BFS/CCL)
    OrientationAnalysis: EBSDSegmentFeatures, CAxisSegmentFeatures
  The CCL engine in SegmentFeatures::executeCCL() scans voxels in Z-Y-X
  order with a 2-slice rolling buffer and UnionFind equivalence tracking,
  giving sequential I/O access patterns. Supports Face and FaceEdgeVertex
  connectivity with optional periodic boundaries.

--- New utility infrastructure ---

- UnionFind (src/simplnx/Utilities/UnionFind.hpp):
  Vector-based disjoint set with union-by-rank and path-halving.

- SliceBufferedTransfer (src/simplnx/Utilities/SliceBufferedTransfer.hpp):
  Z-slice buffered tuple transfer for propagating neighbor voxel data
  used by ErodeDilate, FillBadData, MinNeighbors, and ReplaceElements.

- TupleTransfer batch API (Filters/Algorithms/TupleTransfer.hpp):
  Batch bulk I/O methods for QuickSurfaceMesh and SurfaceNets mesh
  generation attribute transfer.

- SegmentFeaturesTestUtils.hpp:
  Shared test builder functions for segmentation filter test suites.

--- Bulk I/O conversions (existing algorithms) ---

Core utilities:
  DataArrayUtilities (ImportFromBinaryFile, AppendData, CopyData,
    mirror ops), DataGroupUtilities (RemoveInactiveObjects),
  ClusteringUtilities (RandomizeFeatureIds), GeometryHelpers
    (FindElementsContainingVert, FindElementNeighbors),
  AlignSections (Z-slice OOC transfer path),
  ImageRotationUtilities (source slab caching for nearest-neighbor),
  TriangleUtilities (bulk-load triangles/labels for winding repair),
  H5DataStore (streaming row-batch FillOocDataStore replacing full-
    dataset allocation)

SimplnxCore algorithms:
  AlignSectionsFeatureCentroid, ComputeEuclideanDistMap,
  ComputeFeatureCentroids, ComputeFeatureClustering, ComputeFeatureSizes,
  CropImageGeometry, ErodeDilateBadData, ErodeDilateCoordinationNumber,
  ErodeDilateMask, RegularGridSampleSurfaceMesh, RequireMinimumSizeFeatures,
  ReplaceElementAttributesWithNeighborValues, ScalarSegmentFeatures,
  WriteAvizoRectilinearCoordinate, WriteAvizoUniformCoordinate

OrientationAnalysis algorithms:
  AlignSectionsMisorientation, AlignSectionsMutualInformation,
  ComputeAvgCAxes, ComputeAvgOrientations, ComputeCAxisLocations,
  ComputeFeatureNeighborCAxisMisalignments,
  ComputeFeatureReferenceCAxisMisorientations,
  ComputeFeatureReferenceMisorientations, ComputeGBCD,
  ComputeGBCDMetricBased, ComputeKernelAvgMisorientations,
  ComputeTwinBoundaries, ConvertOrientations, MergeTwins,
  NeighborOrientationCorrelation, RotateEulerRefFrame, WriteGBCDGMTFile,
  WriteGBCDTriangleData, WritePoleFigure

EBSD readers:
  ReadAngData, ReadCtfData, ReadH5Ebsd, ReadH5EspritData

--- Test infrastructure ---

- UnitTestCommon: ExpectedStoreType()/RequireExpectedStoreType() helpers,
  TestFileSentinel reference-counted decompression, CompareDataArrays
  rewritten with chunked bulk I/O for OOC-safe comparison.

- 29 test files updated with OOC dual-path testing:
  ForceOocAlgorithmGuard + GENERATE(from_range(k_ForceOocTestValues))
  runs every test case in both in-core and forced-OOC modes.
@joeykleingers joeykleingers force-pushed the ooc-filter-optimizations branch from bf19ea2 to e0e7658 Compare April 10, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant