fix(io): close aio_fd in FastFileWriter._fini to prevent fd leak#8005
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 44d545ee3d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| try: | ||
| os.fsync(self._aio_fd) | ||
| finally: | ||
| os.close(self._aio_fd) | ||
| self._aio_fd = INVALID_FD |
There was a problem hiding this comment.
Reset aio fd state even when fsync/close raises
If os.fsync() or os.close() throws (e.g., ENOSPC/EIO reported during close on Linux), execution exits _fini() before self._aio_fd is set to INVALID_FD. That leaves a stale integer in _aio_fd; later __del__() can call _fini() again and attempt to close that stale descriptor, which may already have been reused for an unrelated file descriptor in the same process. Move the state reset into a finally that always runs so the object never retains a potentially reused fd value after a close-path exception.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in the latest push. The new _fini() moves the fd into a local and resets self._aio_fd = INVALID_FD before calling os.fsync() / os.close(), so any close-path exception leaves the object in a clean state and del won't reuse a potentially-reassigned descriptor.
Without explicit os.close(), every save() leaked one fd pointing at the just-written file. Combined with unlink-based rotation, the leaked fd held the unlinked inode in the ext4 orphan list, so its blocks were never returned to the filesystem's free pool. Long-running checkpoint workloads exhausted filesystem space within tens of iterations even when only N files were visible on disk; the userland symptom was OSError: [Errno 28] No space left on device, while NVMe NUSE / device- side capacity showed terabytes free. The leak existed on both _fini() paths: the originally-opened fd from __init__, and the re-opened fd from _unaligned_drain(). __del__'s assert (self._aio_fd == INVALID_FD) did not detect this because _fini() unconditionally overwrites the Python attribute regardless of whether the OS-level fd is closed. This change moves the fd into a local variable and resets self._aio_fd to INVALID_FD *before* calling os.fsync() / os.close(), so that if either of them raises (e.g. EIO/ENOSPC reported on close), the object state is already cleared and a subsequent __del__() call will not attempt to re-close a stale descriptor number that the kernel may have reassigned to an unrelated file in the meantime. os.fsync() is kept before close() to make the post-close durability guarantee match what callers of a checkpoint writer typically expect; dropping it would still fix the leak. Measured overhead on a 60 GB/ iter workload: ~5% wall time. Tested with a 700-iter / 42 TB / 60 h endurance run on ext4/NVMe: df_used stable at 736 GB (+281 MB drift over 697 rotations) vs. prior 60 GB/iter leak that hit ENOSPC at ~60 iterations. Signed-off-by: jg-heo <csjg.heo@gmail.com>
44d545e to
1693759
Compare
Fixes #8003
Summary
FastFileWriter._fini()overwroteself._aio_fd = INVALID_FDwithoutcalling
os.close(), leaking one fd per save. With unlink-basedcheckpoint rotation this stranded the unlinked inode in the ext4
orphan list, fs blocks were never reclaimed, and long-running save
loops hit ENOSPC at iter ~60 (60 GB/iter on a 4 TB partition).
This PR adds explicit
os.fsync()+os.close()in_fini()and aregression test that asserts no
/proc/self/fdentry points at adeleted file after a save+close+unlink cycle.
Verification
save() / close() / unlink()leaked 20 fdsbefore the fix, 0 after.
df_usedstable at 736 GB (drift +281 MB / 697 rotations) with the fix;
same workload hit ENOSPC at iter ~60 without it.
os.fsync()at ~10 GB/s peak.Test plan
tests/unit/ops/aio/test_fast_file_writer_fd_close.pyverifies fdcleanup after a single save and after 5/20-iter rotation loops via
/proc/self/fdscoped totmp_path.async_iocompatibility, Linux, and CUDA acceleratorso unsupported CI matrix entries skip cleanly.
_fini()change andPASSES with it.
pre-commit run --files <changed files>clean.Notes
__del__assertionassert self._aio_fd == INVALID_FDpasseseven with the bug because it checks the Python attribute that
_finiitself sets. The new test checks OS-level state via/proc/self/fd.os.fsync()is included for post-close durability — required forcorrectness on the unaligned-tail path that re-opens the file as
buffered I/O. If maintainers prefer to drop it for performance,
removing only the
os.fsync(...)line still fixes the leak.Happy to adjust shape, naming, or test placement to fit project
conventions. Thanks for the review.