Conversation
The hdf5_v1 printer aborted with "H5Fflush returned error value (-1)... not a file or file object" whenever a scanner explicitly called flush() on an auxiliary print stream. MultiNest hits this on its very first dumper callback, where it flushes the "txt" (Posterior) and "live" (LastLive) RA streams. Root cause: only the primary HDF5Printer opens the HDF5 file and sets its file_id member; auxiliary printers inherit location_id, RA_location_id,and metadata_location_id from the primary but never initialise their own file_id. HDF5Printer::empty_sync_buffers() called H5Fflush(file_id, ...)unconditionally, so on an aux printer it was flushing an uninitialised hid_t. Fix: route the flush through primary_printer->file_id. primary_printer defaults to "this" for the primary and is set to the actual primary in the aux constructor, so the same call is correct in both cases. This also makes the dumper's flush() calls actually useful (they now push HDF5's internal buffers to disk) instead of erroring out. Closes #495.
…t final_size on workers Issue ----- In HDF5Printer2::finalise() (Printers/src/printers/hdf5printer_v2/ hdf5printer_v2.cpp), the local `std::size_t final_size` was declared unconditionally but only assigned inside an `if(myRank==0)` block (via `buffermaster.get_next_free_position()`). Immediately after, the loop that flushes unsynchronised (random-access) aux buffers and calls `(*it)->extend_all_datasets_to(final_size)` ran on **every** rank. Worker ranks therefore passed an uninitialised value into `extend_all_datasets_to`, which is undefined behaviour. In practice this hit any MPI scan using the v2 hdf5 printer with auxiliary RA streams (i.e. anything beyond a single-rank run): worker ranks with non-empty `aux_buffers` would attempt local writes / dataset extensions that they have no business performing — only rank 0 owns the output file. The matching synchronised-buffer flush block earlier in finalise() is already correctly gated by `if(myRank==0)`, so the fix is to apply the same guard symmetrically. Fix --- - Move the `final_size` query and the entire RA-flush / extend_all_datasets_to loop inside `if(myRank==0)`. Worker ranks have already shipped their RA buffer contents to rank 0 via the preceding `gather_and_print(...)` call, which remains unguarded (it is a collective and must be called by all ranks). - Tighten the scope of the `final_size` declaration so it lives only inside the rank-0 block where it is meaningful. - Move the `add_aux_buffer(RAbuffer)` call inside the rank-0 block too, since the RAbuffer only contains gathered data on rank 0 and the loop that consumes `aux_buffers` is now rank-0-only. (Note: RAbuffer being a stack local appended to the member `aux_buffers` vector is a separate latent issue, tracked in `hdf5_printer_issues.md`.) - Added a comment explaining why the block is rank-0-only. Verification ------------ - Built GAMBIT with `-DWITH_MPI=True`. - Ran the spartan example single-rank (`OMP_NUM_THREADS=1 ./gambit -rf yaml_files/spartan.yaml`): successful, "Final dataset size is 9003". - Ran spartan with 2 MPI ranks (`mpiexec -np 2 ./gambit -rf yaml_files/spartan.yaml`): successful, "Final dataset size is 9001". Inspected the output HDF5: all 34 datasets under /data have identical length (9001), confirming sync and RA datasets were extended consistently and no output was corrupted. https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
Add an "Optional: MPI" subsection to step 1 covering the `libopenmpi-dev openmpi-bin` install and the corresponding `-DWITH_MPI=True` cmake flag, plus an MPI run example in step 6 (including the `--allow-run-as-root --oversubscribe` flags that are typically needed in containerised dev environments). Needed for testing any printer / scanner code paths gated on `#ifdef WITH_MPI` (e.g. the hdf5_v2 finalise rank-0 guard fixed in the preceding commit). https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
Cmake writes downloaded_tarball_paths.txt at the repo root listing the paths of scanner / backend tarballs it downloaded into ScannerBit/downloaded/ and Backends/downloaded/ during a build. It is purely a build-time bookkeeping file and should not be tracked. https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
…F5 errors Issue ----- In Printers/src/printers/hdf5printer/hdf5tools.cpp, errorsOff() saved the current HDF5 error handler into module-globals old_func / old_client_data and then installed a NULL handler; errorsOn() restored from those globals. This is fine for an isolated pair, but it breaks when errorsOff() is called while errors are already off: errorsOff(); // saves real handler -> globals; installs NULL ... errorsOff(); // saves NULL (current handler) -> globals; installs NULL ... errorsOn(); // restores globals = NULL -> errors stay off forever There is exactly one such nested case in the v1 printer code today (hdf5tools.cpp lines 369/377/386 in the dataset corruption-recovery helper), and it permanently disables HDF5 error reporting for the rest of the process whenever it is hit. Any subsequent HDF5 problem then fails silently. There are also ~25 paired call sites across hdf5tools.cpp and hdf5_combine_tools.cpp that work fine in isolation but would corrupt the global state if any of them were ever wrapped in another errorsOff() / errorsOn() block. Fix --- Make errorsOff() / errorsOn() idempotent with a bool guard: - errorsOff(): if errors are already off, do nothing (in particular, do not call H5Eget_auto2, which would overwrite the saved handler with the NULL we installed ourselves). - errorsOn(): if errors are already on (i.e. unbalanced call), do nothing rather than blindly restoring whatever is in the saved slots. The bool / handler state is moved into an unnamed namespace at file scope, giving it internal linkage. Functionally equivalent to the existing namespace-scope globals for this codebase (no other TU references them), but it (a) makes intent explicit -- these are strictly the implementation detail of the two functions just below them -- and (b) prevents any future accidental ODR collision on the short generic names. A counter-based nesting scheme was considered but rejected: no real caller pairs an inner errorsOn() with an inner errorsOff(), so all the counter would do is silently absorb the one buggy nested call we already have. The bool keeps the logic minimal and matches what every existing caller actually wants. Note: the underlying state is still a module global, so errorsOff / errorsOn remain not thread-safe. That is a separate (lower-priority) issue tracked in hdf5_printer_issues.md. Verification ------------ - Rebuilt GAMBIT (-DWITH_MPI=True) cleanly. - Single-rank v1 spartan run: success. - 2-rank MPI v1 spartan run: success. - Resume run via combine-routines path through hdf5_combine_tools.cpp (which has the bulk of the errorsOff / errorsOn call sites): success. No HDF5 warnings or errors in any of the runs. https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
The hdf5_v1 printer aborted with "H5Fflush returned error value (-1)... not a file or file object" whenever a scanner explicitly called flush() on an auxiliary print stream. MultiNest hits this on its very first dumper callback, where it flushes the "txt" (Posterior) and "live" (LastLive) RA streams. Root cause: only the primary HDF5Printer opens the HDF5 file and sets its file_id member; auxiliary printers inherit location_id, RA_location_id,and metadata_location_id from the primary but never initialise their own file_id. HDF5Printer::empty_sync_buffers() called H5Fflush(file_id, ...)unconditionally, so on an aux printer it was flushing an uninitialised hid_t. Fix: route the flush through primary_printer->file_id. primary_printer defaults to "this" for the primary and is set to the actual primary in the aux constructor, so the same call is correct in both cases. This also makes the dumper's flush() calls actually useful (they now push HDF5's internal buffers to disk) instead of erroring out. Closes #495.
…r paths The HDF5Printer class has seven hid_t members for HDF5 file / group / location handles: file_id, group_id, RA_group_id, metadata_id, location_id, RA_location_id, metadata_location_id The four file/group handles are only assigned by the *primary* constructor; the aux constructor at hdf5printer.cpp:683-699 only sets the three *_location_id members (inheriting them from primary_printer). With no in-class initialisers, the four file/group handles held indeterminate values on every auxiliary printer instance. This was the root cause of issue #495 (H5Fflush(file_id, ...) firing on aux printers when MultiNest's dumper called flush() on its 'txt' / 'live' aux streams). That specific call site is now correctly routed through primary_printer->file_id (merge of upstream fix_hdf5v1_and_multinest_issue), but the underlying class invariant remained fragile: any future aux call site that forgets to go through primary_printer would silently read uninitialised memory. Fix: give all seven hid_t members an in-class default initialiser of -1. This matches the sentinel value the class already treats as "unset HDF5 handle" -- the existing get_location() / get_RA_location() / get_metadata_location() checks at hdf5printer.cpp:1144,1155,1166 already raise a printer_error when they see -1. With this change the class invariant becomes uniform: an unset HDF5 handle in this class is -1, period. Any accidental HDF5 call on an aux's file/group handle now passes H5I_INVALID_HID, which HDF5 reliably rejects -- failures will surface deterministically as printer_errors instead of as indeterminate behaviour or random "lucky" successes. The bottom three *_location_id members are assigned by both constructors today, so the = -1 there is purely defensive. Including them keeps the rule uniform across all seven handles. Verification ------------ Rebuilt and ran four scenarios cleanly (no HDF5 errors, all exit 0): - v1 + Diver, single rank - v1 + Diver, 2-rank MPI - v1 + MultiNest, single rank (was the original #495 reproducer) - v2 + Diver, single rank (regression check) https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
… leak)
In HDF5DataSet<T>::write_buffer (the v2 hdf5 printer's per-flush write
path for synchronised datasets), the on-disk dtype is queried with
hid_t dtype = H5Dget_type(get_dset_id());
and used only for an H5Tequal sanity check against the in-memory
expected_dtype. The handle was never closed before the function
returned, leaking one HDF5 type handle per call. write_buffer is
invoked once per flush per synchronised dataset, so on long scans with
many datasets and frequent flushes the leak accumulates -- eventually
HDF5's internal table or the OS file-handle limit gives, depending on
the build/runtime.
The companion write_RA_buffer (lines 438/467 in the same header)
already closes its corresponding H5Dget_type handle, so this is purely
a missing close in the synchronised path.
Fix: H5Tclose(dtype) immediately after the H5Tequal check, where
dtype stops being needed. Mirrors the close site in write_RA_buffer.
A short comment notes why it's there so the next reader doesn't
"clean it up" again.
Verification
------------
- Built cleanly.
- v2 + Diver, single rank: exit 0, no HDF5 errors, output file has
34 datasets all sized 9002.
- v2 + Diver, 2 MPI ranks: exit 0, no HDF5 errors.
(The leak itself is slow-burn -- not visible in a short spartan run --
but the fix is mechanical and the smoke tests confirm it doesn't
disturb the write path.)
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
…empty
The destructors of DataSetInterfaceBase and VertexBufferNumeric1D_HDF5
each carried a TODO suggesting that one day they ought to close their
HDF5 dataset / flush their pending writes, with the actual cleanup
calls left commented out. The TODOs implied the empty body was a
hack waiting to be fixed, but in fact the empty body is required by
how these objects are stored:
- DataSetInterfaceBase (and DataSetInterfaceScalar) are populated via
the assign-from-temporary pattern, e.g.
_dsetdata = DataSetInterfaceScalar<T,CL>(loc, name, ...);
local_buffers[key] = BuffType(loc, label, vertexID, ...);
With the default copy/move semantics, the temporary and the
surviving slot share the same hid_t dset_id; closing in the
destructor would invalidate the handle that the surviving copy
still relies on.
- The same shape applies to VertexBufferNumeric1D_HDF5 itself, whose
destructor would similarly run on a temporary that shares all the
buffer state (HDF5 handles, sync counters, write queues) with the
container slot. Doing write_to_disk() / RA_write_to_disk() there
would either double-write or write through stale state.
The actual close / flush is driven deterministically from
HDF5Printer::flush() and HDF5Printer::finalise(), iterating over the
surviving buffers in the printer's all_buffers map. If finalise() is
bypassed (exception during scan setup or shutdown), HDF5's own
H5Fclose() at process exit auto-closes any still-open dataset handles
and flushes their pending I/O, so the file on disk remains
well-formed.
This commit replaces the misleading TODOs with comments explaining
the actual ownership invariant, so the next reader doesn't try to
"fix" the empty body and reintroduce the original handle-aliasing
bug. Properly closing in the destructor would require giving these
classes ownership semantics they currently don't have (e.g. wrapping
the hid_t in a shared_ptr with an H5Dclose deleter, or making the
chain non-copyable / move-only) -- a larger change that is out of
scope here.
No behaviour change. Verified by rebuilding and running the spartan
example with both the v1 and v2 hdf5 printer (both exit 0, no HDF5
errors).
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
…aise()
In HDF5Printer::common_constructor (Printers/src/printers/hdf5printer/
hdf5printer.cpp), the rank-0 setup block at lines 414-472 handles the
case where the requested output file already exists. Inside the
"restart, not resume, not overwrite" branch (lines 448-469), the code
opens the file, checks whether the requested group is already present,
and either:
- group exists -> raise a printer_error with a clear, multi-option
message explaining how to fix the situation (line 465); or
- group missing -> closeFile + bare exit(1) (line 468), with no
message at all.
The exit(1) path was buggy in two ways:
1. exit(1) bypasses MPI shutdown -- no MPI_Finalize, no MPI_Abort.
Worker ranks (which never enter this rank-0-only block) sail on
to the upcoming myComm.Barrier() at line 610 and hang forever.
The user sees a job that never terminates, with no log message
explaining why.
2. The user has no idea what went wrong: the path triggers when
they pointed the printer at a file that already exists from
some unrelated previous run, and the only signal is a non-zero
exit code from rank 0.
Fix: mirror the sibling group-exists case at line 465. Build a
clear printer_error message ("file exists, requested group not in it,
not in resume mode, delete_file_on_restart not set; here are the
three things you can do") and raise it. printer_error().raise()
throws an exception that goes through GAMBIT's normal error-reporting
machinery, which at least tells the user what happened on rank 0
before tearing down -- a strict improvement over silent exit(1).
Note that this commit does *not* address the broader rank-0-error-
deadlocks-workers problem: if any of the four printer_error sites in
this constructor block (lines 438, 445, 465, and now this one) fire
on rank 0, worker ranks still hang at the Barrier downstream. That
deadlock-class issue is the same one tracked under "v1
masterWaitForAll(FINAL_SYNC) deadlocks" and "Same risk on the
resume-mode myComm.Barrier() at line 610" in
hdf5_printer_issues.md, and needs a separate, larger fix using
BarrierWithTimeout. The point of the present commit is just to bring
this one site up to the same error-reporting standard as its three
siblings.
Verification
------------
- Built cleanly.
- Regular v1 spartan run: exit 0, no change to the happy path.
- Reproduced the failing scenario by seeding an output file with
group "/data" via a normal run, then re-running with the same
output_file but group "/data2", restart mode (-rf), no
delete_file_on_restart. Before the fix this would silent-exit(1);
after the fix the job terminates with a FATAL ERROR carrying the
full guidance message and "Calling MPI_Finalize..." in the trailer.
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
…I_request_buffer_data path
The v2 hdf5 printer carried two parallel sets of MPI machinery for
shipping buffer data from worker ranks to the rank-0 master:
1. The currently-used "gather and Gatherv" path in finalise() and
flush(), built around HDF5MasterBuffer::add_to_buffers and the
HDF5bufferchunk message format. This is what actually runs in
every MPI scan today.
2. A second, unwired Send/Recv path consisting of:
- HDF5MasterBuffer::MPI_flush_to_rank
- HDF5MasterBuffer::MPI_request_buffer_data
- HDF5MasterBuffer::MPI_recv_all_buffers
- HDF5MasterBuffer::MPI_recv_buffer<T>
- HDF5Buffer<T>::MPI_flush_to_rank
- HDF5Buffer<T>::MPI_recv_from_rank
- HDF5BufferBase::MPI_flush_to_rank (pure-virtual decl)
plus eight dedicated MPI tag constants
(h5v2_bufname / h5v2_bufdata_points / h5v2_bufdata_ranks /
h5v2_bufdata_valid / h5v2_bufdata_type / h5v2_bufdata_values /
h5v2_BLOCK / h5v2_BEGIN) and some commented-out
recv_counter / send_counter debug scaffolding.
The path in (2) is internally consistent (the functions only call each
other) but has no external callers anywhere in the codebase, and the
MPI_request_buffer_data side sends an h5v2_BEGIN tag for which there
is no matching Recv -- a strong sign the path was abandoned mid-wiring
rather than being a finished alternative.
This commit deletes (2) outright. Two reasons for choosing delete over
finishing the wiring:
- The currently-used gather_and_print path is correct and tested
(passes our spartan smoke tests in single-rank and 2-rank MPI
modes, including with auxiliary RA streams).
- The send-side comment in MPI_recv_all_buffers explicitly warns of a
deadlock if MPI_request_buffer_data is not called first; revisiting
that ordering and the request/recv handshake to make it deadlock-free
would be a real design effort, not a hygiene fix. The audit
recommended either deleting or wiring up; given there is no concrete
reason to believe the unfinished path would outperform the working
one, deletion is the lower-risk choice.
Net change: -266 lines / +4 lines across hdf5printer_v2.hpp and
hdf5printer_v2.cpp. add_to_buffers, MPI_add_int_block_to_buffer,
MPI_add_float_block_to_buffer (which ARE used by the live path) are
untouched.
Verification
------------
- Built cleanly.
- v2 + Diver, single rank: exit 0, output has 34 datasets all sized
9001 (uniform, as expected).
- v2 + Diver, 2 MPI ranks: exit 0, output has 34 datasets all sized
9001 (uniform).
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
hdf5_combine_tools.cpp called H5Dget_type at 8 sites to obtain the
HDF5 datatype of an existing dataset (in order to recreate the dataset
in the combined output via setup_hdf5_points), but never released the
returned datatype id. H5Dget_type returns a fresh id that the caller
owns and must close with H5Tclose; otherwise the id (and its underlying
in-memory state) lingers until the file/library is shut down.
The leaks were at three call sites within hdf5_stuff::Combine_HDF5:
- Primary-dataset combine loop (~L949-956): type/type2 obtained per
batch from either the previous combined dataset or a temp-file
dataset, used by setup_hdf5_points only on batch 0, but allocated
on every batch and never freed. Fixed by H5Tclose(type/type2) after
the dataset_out/dataset2_out closes inside the same per-batch
if(valid_dset>=0 or old_dataset>=0) block, so each iteration
balances its allocations.
- Auxiliary-dataset path (~L1144-1145): type/type2 obtained inside
the "param not also a primary" branch, used immediately by
setup_hdf5_points. Fixed by H5Tclose right after that call, guarded
by >=0 since the variables are pre-initialised to -1 above.
- Metadata-dataset combine loop (~L1273, L1278): single 'type' id,
same pattern as the primary loop. Fixed by H5Tclose(type) after
the dataset_out close inside the per-batch valid branch.
These were straightforward resource leaks: setup_hdf5_points only
reads 'type' to pass it to H5Dcreate2 and does not take ownership,
so adding the closes at the end of the using-scope is correct and
safe.
Verification
------------
- Built cleanly (only Printers translation unit recompiled).
- Ran spartan with the v1 (hdf5_v1) printer, 2 MPI ranks, with
disable_combine_routines: false so the combine path actually
exercises all three call sites. Run completed cleanly (exit 0):
* 17 primary parameters combined, no errors
* 51 metadata parameters combined, no errors
* 0 auxiliary parameters in this scan (path not exercised, but
that branch is structurally identical to the primary path)
* combined output gambit_output.hdf5 contains uniform 9501-row
1-D datasets across all primary parameters.
Out of scope (not fixed here)
-----------------------------
- hdf5_combine_tools.hpp Enter_HDF5() at line 377 calls
H5Tget_native_type(dtype, ...) and stores the result in 'type', then
closes only 'dtype' (line 448) before returning. The native-type id
is also a fresh id that needs H5Tclose. This is a separate leak (a
different HDF5 API) and the issue title is scoped specifically to
H5Dget_type; flagging here for follow-up.
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
…size-trim loops
Two near-identical loops in hdf5_stuff (one in the primary-dataset
scanning pass, one in the auxiliary-dataset Enter_Aux_Parameters pass)
walked a std::vector<bool> backwards to count the number of trailing
invalid entries to trim from a per-file size:
for (auto it = valids.end()-1; size > 0; --it)
{
if (*it) break;
else --size;
}
There were two problems with this idiom:
1. If valids.size() == 0 (which happens cleanly when the temp file's
sync group is empty), valids.end()-1 is undefined behavior. In
practice the loop condition `size > 0` would skip the body, but
the iterator is still constructed.
2. More importantly: when ALL entries in valids are invalid (so the
"break" branch is never taken), the final iteration decrements
size from 1 to 0 and dereferences valids.front() correctly, but
then the for-loop's post-step runs `--it` once more, producing an
iterator one before begin(). That decrement is undefined behavior
on std::vector::iterator -- even though it's typically a pointer
under the hood and "happens to work", a sanitizer or a debug
iterator implementation can flag/abort. The condition `size > 0`
is then evaluated on the already-invalid iterator's surrounding
state, but the UB has already occurred at the --it.
Replaced both loops with a straightforward index-based trim:
while (size > 0 && !valids[size-1])
--size;
Equivalent semantics (trim trailing invalids, keep first valid),
but no iterator arithmetic, no past-the-begin decrement, and no
empty-vector trap. Both call sites already guarantee
size <= valids.size() (size and valids both come from the
pointID_isvalid / RA_pointID_isvalid dataset, with the prior
size != size2 check raising on mismatch), so valids[size-1] is
safely in range whenever size > 0.
Verification
------------
- Built cleanly (Printers TU only).
- Spartan + hdf5_v1 + 2 MPI ranks + disable_combine_routines: false:
exit 0, combined output has uniform 9000-row primary datasets and
1-row metadata datasets. The primary-pass trim loop runs once per
temp file (2x in this scan); the aux-pass trim loop is structurally
identical and exercised by the same Enter_HDF5/read_hdf5 path.
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
Before this change, HDF5Printer2::flush() only invoked buffermaster.flush() on the calling printer instance. That meant a public flush() on the primary printer left every auxilliary printer's buffermaster -- metadata streams, RA streams, and any other aux streams the scan has registered -- with their points still in RAM until either the per-stream sync auto-flush triggered (when a sync buffer fills) or finalise() ran. The two known external callers of the public API (twalk's out_stream->flush() at the end of a temp-file replay batch and the Python scanner binding in py_module_scanbit.hpp) both expect "persist what I've written so far", so the partial flush was a behavioural mismatch with caller intent and a divergence from the v1 printer (HDF5Printer::flush() iterates all buffers). The fix: after flushing this->buffermaster, if this is the primary printer, iterate the aux_buffers list and flush each. Aux printers register their own buffermaster into the primary's aux_buffers in their constructor (~hdf5printer_v2.cpp:1342 add_aux_buffer) and have an empty aux_buffers themselves, so guarding the loop on not is_auxilliary_printer() preserves aux-printer semantics (unchanged: still just flushes its own buffermaster). Two passes -- sync streams first, RA streams second -- mirror the ordering already used in finalise() (~:1824-1833 then :1895-1909). Reason: RA writes anchor to positions established by sync writes, so flushing RA before its sync targets are on disk can leave RA points stuck in the postpone queue. What is intentionally NOT done here: * No MPI gather. The architecture defers cross-rank gathering to finalise() as a performance optimisation for HPC networked filesystems (see comment block at hdf5printer_v2.cpp:1786-1793). Mid-scan, every rank just writes its own buffered points to the shared file under the existing cross-rank Utils::FileLock taken inside HDF5MasterBuffer::flush() (lock_and_open_file -> hdf5out.get_lock() at :741). Adding an MPI collective inside the public flush() would deadlock for non-collective callers (twalk's per-rank flush, Python scanners) and is unnecessary for "persist my data" semantics. * No explicit H5Fflush. v1 keeps the file open between buffer flushes and ends flush() with H5Fflush(file_id, H5F_SCOPE_GLOBAL). v2's HDF5MasterBuffer::flush() always closes the file at the end via close_and_unlock_file -> H5Fclose, which already drains the HDF5 internal cache to disk, making H5Fflush redundant. * Auto-flush path unchanged. add_buffered_point's "buffer full" auto flush at hdf5printer_v2.hpp:1128 / 1137 calls HDF5MasterBuffer::flush() (same class), not HDF5Printer2::flush(). That path is untouched. * finalise() unchanged. It still does its own MPI gather and per-buffer flush logic without going through the public flush(). Cost: the new loop incurs N additional lock_and_open_file -> close_and_unlock_file cycles per public flush() call (N = number of registered aux buffers), but HDF5MasterBuffer::flush() short-circuits on get_Npoints() == 0 (:479) without taking the lock, so empty aux buffers cost essentially zero. flush() is rare on the hot path; this is fine. Verification ------------ - Built cleanly (Printers TU only). - Spartan + v2 + single rank: exit 0, 34 primary datasets all sized 9502 (uniform). - Spartan + v2 + 2 MPI ranks: exit 0, 34 primary datasets all sized 9003 (uniform). Both runs produce well-formed output; the change is additive and preserves existing on-disk layout because finalise() already flushes everything at end-of-scan -- this commit just makes the same flushing happen on demand when flush() is called. https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
…buffer fast_forward
HDF5Printer's sync_pos member is unsigned long, initialised to 0
(hdf5printer.hpp:416), and only ever incremented (in
increment_sync_pos at hdf5printer.cpp:1650). Two call sites compute
get_sync_pos()-1 without first checking that sync_pos>0, which
would underflow to ULONG_MAX when sync_pos==0:
1. HDF5Printer::synchronise_buffers (hdf5printer.cpp:1320):
const unsigned long sync_pos = get_sync_pos()-1;
...
it->second->synchronise_output_to_position(sync_pos);
The underflowed ULONG_MAX would be stored into each buffer's
target_sync_pos. For RA buffers this later feeds extend_dset()
in RA_write_to_disk, which would attempt to extend the dataset
to ULONG_MAX rows. For sync buffers, the movediff calculation
in synchronise_output_to_position relies on sync_pos+1
(ulong), which then wraps to 0 -- harmless only when
dset_head_pos() also happens to be 0.
2. H5P_LocalBufferManager::get_buffer (hdf5printer.hpp:558):
if(synchronised) it->second.fast_forward(
printer->get_sync_pos()-1);
fast_forward takes a signed long target_pos. With sync_pos==0,
get_sync_pos()-1 == ULONG_MAX (ulong), implicitly converted to
long it becomes -1 on two's-complement platforms. fast_forward
then computes needed_steps = -1 - 0 = -1 (negative) and raises
the "would need to move backwards" error
(VertexBufferBase.hpp:218).
In normal scans both bugs are dormant because of call ordering:
- synchronise_buffers' first invocation (from the first
check_for_new_point at :1632) happens before any buffer has been
added to all_buffers, so the for-loop iterates over nothing and
the underflowed value is never delivered.
- fast_forward via get_buffer is reached only after
check_for_new_point has run increment_sync_pos at :1650, taking
sync_pos from 0 to 1, so by the time fast_forward executes
sync_pos>=1 and target_pos>=0.
The underflow is therefore real but masked by today's lifecycle.
Adding a buffer earlier than the first check_for_new_point, or
calling synchronise_buffers from a new code path before any
pointID arrives, would surface either bug.
Fix: guard each call site with sync_pos>0. Semantically, sync_pos==0
means "no point has yet been registered as the current sync
position", so there is nothing to synchronise to or catch up to;
returning early / skipping the call is the correct no-op behaviour
and is consistent with buffers being at dset_head_pos==0 from
construction.
This is a surgical fix that preserves all existing behaviour: the
two guarded paths were never executed with sync_pos==0 in practice,
so guarding them doesn't change any observable behaviour today --
it only removes the latent underflow.
Verification
------------
- Built cleanly.
- Spartan + hdf5_v1 + 1 rank: exit 0, 34 primary datasets all
sized 9000 (uniform).
- Spartan + hdf5_v1 + 2 MPI ranks (with combine routines enabled):
exit 0, 34 primary datasets all sized 9501 (uniform).
https://claude.ai/code/session_01QZuy7gyVUMZa3FCXShjj4X
Collaborator
Author
|
Sorry for requesting one more review from you, @ChrisJChang, feel free to forward it to someone else. Or I can just take care of it myself – I've already gone over the changes once to check that the Claude-suggested changes make sense to me, and to tweak some comments and code snippets. |
Collaborator
Author
|
@carlmfe volunteered to review this one (thanks!), so you can forget about it @ChrisJChang. |
carlmfe
reviewed
May 7, 2026
carlmfe
approved these changes
May 7, 2026
Contributor
carlmfe
left a comment
There was a problem hiding this comment.
Code is good. Comments are generally on the verbose side, but do convey clearly why the changes are as they are. Other than a specific commenting referencing old code, I approve of the rest.
Collaborator
Author
|
Thanks for reviewing – and I agree, I'll revise some of the comments before I merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A series of targeted fixes to the v1 (
HDF5Printer) and v2 (HDF5Printer2) HDF5 printers, surfaced by a code audit of the two printer source trees plus their combine tools. Each commit is independently small, locally verified against the spartan example (1 and 2 MPI processes, where applicable), and behaviourally additive — no on-disk format changes, no API changes for callers.[v1] Default-initialise
hid_tmembers to-1to harden aux-printer paths.HDF5Printer's sevenhid_tfile/group/location handles had no in-class initialisers; the aux constructor only assigned the three*_location_idmembers, leaving the four file/group handles indeterminate on every aux instance. This was the root cause of Multi-nest fails with hdf5 version 1 #495 (resolved at the call site by using the file handle throughprimary_printer); this commit fixes the underlying class invariant so any future aux call site that forgets to route throughprimary_printerfails deterministically (-1is the sentinel the existing checks already treat as "unset").[v2] Close HDF5 type handle leak in
HDF5DataSet::write_buffer. The synchronised-write path calledH5Dget_typefor anH5Tequalsanity check but never closed the returned id, leaking one type handle per flush per dataset. Mirrors the close that already exists in the siblingwrite_RA_buffer. Slow-burn leak, mechanical fix.[v1] Clarify why buffer / dataset destructors are intentionally empty. Replaces misleading
TODO: close on destructcomments inDataSetInterfaceBaseandVertexBufferNumeric1D_HDF5with comments documenting the real reason: these objects are populated via assign-from-temporary, so a closing destructor would invalidate the surviving copy'shid_t. Cleanup is correctly driven fromHDF5Printer::flush()/finalise(). Pure documentation change to prevent a future "fix" from reintroducing handle aliasing.[v1] Replace bare
exit(1)in printer constructor withprinter_error().raise(). The "file exists, group missing, not in resume mode" branch inHDF5Printer::common_constructorcalledexit(1)with no message — bypassing MPI shutdown (worker ranks then hung at the downstreamBarrier) and giving the user no diagnostic. Replaced with the sameprinter_errorpattern its three sibling branches already use. (Does not address the broader rank-0-error-deadlocks-workers class of issues, which needBarrierWithTimeout.)[v2] Delete unused
MPI_flush_to_rank/MPI_recv_all_buffers/MPI_request_buffer_datapath. The v2 printer carried a complete second MPI shipping mechanism (seven functions, eight dedicated MPI tag constants, plusrecv_counter/send_counterdebug scaffolding) with no external callers and a missingRecvfor theh5v2_BEGINSend— abandoned mid-wiring. The currently-usedgather_and_printpath is correct and tested. Net-262lines.[v1] Close
H5Dget_typeids leaked in combine_tools. Eight call sites inhdf5_combine_tools.cpp(4 in the primary loop, 2 in the auxilliary branch, 2 in the metadata loop) obtained datatype ids viaH5Dget_typeforsetup_hdf5_pointsand never released them. Fixed by addingH5Tcloseafter each datatype's last use.[v1] Avoid undefined iterator decrement past
begin()in combine size-trim loops. Two near-identicalfor (auto it = valids.end()-1; size > 0; --it)loops inhdf5_stuffincurred UB on empty input (end()-1is undefined) and again on the final post-step decrement when all entries are invalid (--ittobegin()-1). Replaced with the equivalentwhile (size > 0 && !valids[size-1]) --size;, which carries no iterator arithmetic.[v2]
HDF5Printer2::flush()now also flushes registered aux buffermasters. Previouslyflush()invoked onlybuffermaster.flush(), leaving every aux printer's metadata / RA / sync stream still in RAM until either an auto-flush orfinalise()ran. Now flushes the primary's ownbuffermasterplus each entry inaux_buffers(sync streams first, then RA, matchingfinalise()'s ordering). No MPI gather added — each rank still writes its own data under the existing cross-rankUtils::FileLock, consistent with how the auto-flush path works.[v1] Guard
sync_pos == 0underflow insynchronise_buffersandget_bufferfast_forward. Both sites computedget_sync_pos() - 1on an unsigned member that's zero-initialised, underflowing toULONG_MAX. The bug was dormant in practice (call ordering ensured the underflow value was never delivered to a buffer with non-zero state), but fragile. Adds explicitif (sync_pos > 0)guards at both sites;sync_pos == 0is now a no-op, which matches the semantics ("no previous slot to catch up to").Tests:
exit(1)deadlock scenario for the printer-ctor commit and confirmed it now terminates with aFATAL ERRORandMPI_Finalize.