Skip to content

enhance(storage): Use atomic writes for conversation persistence#517

Merged
JeanMertz merged 3 commits intomainfrom
prr86
Apr 5, 2026
Merged

enhance(storage): Use atomic writes for conversation persistence#517
JeanMertz merged 3 commits intomainfrom
prr86

Conversation

@JeanMertz
Copy link
Copy Markdown
Collaborator

Previously, persist_conversation wrote files directly into the conversation directory and write_json wrote directly to the target path. A process crash at any point during a persist could leave a partially-written conversation on disk with no way to recover.

This change introduces two levels of atomicity:

At the file level, write_json now writes to a sibling .tmp file, flushes, then renames over the target. If anything fails before the rename, the original file is untouched and the temp file is removed.

At the directory level, persist_conversation now writes all managed files into a .staging-{name} directory first, copies non-managed files (e.g. QUERY_MESSAGE.md) from the existing conversation dir into the staging dir, renames the existing dir to .old-{name}, then renames the staging dir to the final name in a single syscall. Readers never see a partially-written directory. The .old- backup is removed as a final step.

On the next startup, the validation pass calls cleanup_staging_dirs to detect and handle any crash remnants: orphaned staging dirs are removed, orphaned .old- backups are removed, and a crash that occurred between steps 3 and 4 (both .old-X and .staging-X exist, X missing) is rolled back by renaming .old-X back to X. Orphaned .tmp files inside conversation dirs are also cleaned up.

Previously, `persist_conversation` wrote files directly into the
conversation directory and `write_json` wrote directly to the target
path. A process crash at any point during a persist could leave a
partially-written conversation on disk with no way to recover.

This change introduces two levels of atomicity:

At the file level, `write_json` now writes to a sibling `.tmp` file,
flushes, then renames over the target. If anything fails before the
rename, the original file is untouched and the temp file is removed.

At the directory level, `persist_conversation` now writes all managed
files into a `.staging-{name}` directory first, copies non-managed files
(e.g. `QUERY_MESSAGE.md`) from the existing conversation dir into the
staging dir, renames the existing dir to `.old-{name}`, then renames the
staging dir to the final name in a single syscall. Readers never see a
partially-written directory. The `.old-` backup is removed as a final
step.

On the next startup, the validation pass calls `cleanup_staging_dirs` to
detect and handle any crash remnants: orphaned staging dirs are removed,
orphaned `.old-` backups are removed, and a crash that occurred between
steps 3 and 4 (both `.old-X` and `.staging-X` exist, `X` missing) is
rolled back by renaming `.old-X` back to `X`. Orphaned `.tmp` files
inside conversation dirs are also cleaned up.

Signed-off-by: Jean Mertz <git@jeanmertz.com>
@JeanMertz JeanMertz merged commit d300cdb into main Apr 5, 2026
13 checks passed
@JeanMertz JeanMertz deleted the prr86 branch April 5, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant