Release partial mmaps when MemoryMappedDataset init fails#56
Conversation
MemoryMappedDataset.__init__ opens one mmap per file in a loop and appends to self._mmaps. If any open raised partway through, the prior opens stayed reachable through the exception traceback (pytest frames, logger.exception, post-mortem debuggers), leaving their virtual-memory mappings pinned for the lifetime of the traceback. Under retry loops or long-lived test sessions on Lustre/NFS, this accumulates. Wrap the open loop in try/except that calls _close_mmaps() on failure and re-raises. Add close() for explicit release and __del__ for GC fallback. _close_mmaps uses contextlib.suppress(BufferError, ValueError) because an inner close can fail when views into the mapping are still live or the mmap was already closed by another path.
Naeemkh
left a comment
There was a problem hiding this comment.
Relying on del for cleanup is generally discouraged (interpreter shutdown ordering, reference cycles, etc.). The contextlib.suppress(Exception) is the right defensive move, but consider documenting that close() is the preferred path and __del__ is a safety net only.
Good catch! Pushing a follow-up on this branch: docstring on |
close() is the explicit API; __del__ is a GC safety net. Empty __del__ docstring previously left the intent implicit. Per review feedback.
Summary
MemoryMappedDataset.__init__'s file-open loop intry/exceptthat calls_close_mmaps()on failure and re-raises, so partial state does not leak through the exception traceback (pytest frames,logger.exception, post-mortem debuggers).close()for explicit release and__del__for GC fallback._close_mmapssuppressesBufferError(live views into the mapping) andValueError(already closed by another path) so cleanup is always idempotent.Closes #55
Test plan
uv run pytest tests/unit/test_dataset_mmap_cleanup.py -v(4 tests).