Skip to content

Separate history files per user with per-user subdirectories#169

Merged
dcellison merged 3 commits intomainfrom
fix/history-per-user
Mar 26, 2026
Merged

Separate history files per user with per-user subdirectories#169
dcellison merged 3 commits intomainfrom
fix/history-per-user

Conversation

@dcellison
Copy link
Copy Markdown
Owner

Summary

Write conversation history to per-user subdirectories (DATA_DIR/history/<chat_id>/) instead of shared flat files. This physically isolates each user's messages on disk, adding defense-in-depth to the existing read-side chat_id filter.

Before

DATA_DIR/history/
  2026-03-25.jsonl    # all users mixed together

After

DATA_DIR/history/
  <chat_id>/
    2026-03-25.jsonl  # only this user's messages

Changes

File Change
src/kai/history.py log_message() writes to _LOG_DIR/<chat_id>/; get_recent_history() scans per-user dirs with legacy flat file fallback
src/kai/claude.py Injected history path scoped to DATA_DIR/history/<chat_id>/

Migration

No migration step needed. Legacy flat files at _LOG_DIR/*.jsonl are included in reads transparently via the existing chat_id filter in the record-parsing loop. They age out naturally as new writes go to per-user directories.

Test plan

  • test_creates_per_user_directory - log_message creates <chat_id>/ subdirectory
  • test_different_users_get_separate_directories - users 111 and 222 get separate dirs
  • test_no_flat_file_created - new writes don't go to the flat root
  • test_reads_only_target_user - get_recent_history(chat_id=111) returns only 111's messages
  • test_none_chat_id_returns_all - get_recent_history(chat_id=None) returns all users
  • test_legacy_flat_files_included - pre-migration flat files are included in reads
  • test_new_user_empty_history - user with no directory gets empty string
  • All 1113 tests pass, make check clean

Fixes #157

Write conversation logs to DATA_DIR/history/<chat_id>/YYYY-MM-DD.jsonl
instead of a shared flat file. Each user's messages are physically
isolated on disk, so grep/jq searches are naturally scoped and one
user's history can be managed independently.

Legacy flat files (pre-migration) are included in reads transparently
via the existing chat_id filter. They age out naturally as new writes
go to per-user directories. No migration step needed.

Inner Claude's injected history path is scoped to the user's
subdirectory so searches target the right directory automatically.

Fixes #157
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: fix/history-per-user

The core change is correct and the isolation logic is sound. A few issues worth flagging:


Warning — test_excludes_other_users is a tautological no-op (tests/test_history.py:+63)

assert "private" not in result or "also private" in result

User 222's message is "also private", which contains "private" as a substring. So "private" not in result is always False (since "also private" is in the result). The or clause then makes the entire assertion vacuously true — it passes even if user 111's "private" leaks into user 222's history. The test provides zero isolation coverage for user 222.

Fix: use non-overlapping strings, e.g. "alice" / "bob", matching the pattern in test_reads_only_target_user.


Warning — Falsy check in claude.py should be is not None (claude.py:+419)

history_dir = str(DATA_DIR / "history" / str(chat_id)) if chat_id else str(DATA_DIR / "history")

if chat_id is False for both None and 0. Telegram doesn't assign chat ID 0, but the type annotation is int | None and the rest of the file consistently uses if chat_id is not None. This is inconsistent and fragile — use if chat_id is not None to match the type contract.


Suggestion — log_message() has no guard for chat_id=None (history.py:+62)

user_dir = _LOG_DIR / str(chat_id)   # creates _LOG_DIR/None/ if chat_id is None

The type signature allows chat_id: int | None, so a None call silently creates a _LOG_DIR/None/ directory. This then shows up in rglob scans for the admin view but not in per-user reads (since _LOG_DIR/None/ won't match any integer chat_id). A guard or assertion at the top of log_message would make the contract explicit.


Clean areas

  • The set-union deduplication for legacy + per-user files is correct.
  • The if f.is_file() filter on the legacy glob is correct.
  • Path traversal is not a concern since chat_id is typed/enforced as int.
  • The rglob fallback for chat_id=None and the legacy read-through are well-reasoned and properly documented.

Test used overlapping strings ('private' / 'also private') making the
assertion vacuously true. Use distinct strings ('from alice' / 'from
bob') for real isolation coverage.

Also fix falsy check to 'is not None' in claude.py history path
injection, consistent with the rest of the codebase.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: fix/history-per-user (updated)

The two prior warnings have been addressed. One minor new finding.


Fixed since last review

  • test_excludes_other_users: Now uses "from alice" / "from bob" — non-overlapping strings, assertion is no longer vacuous. Correct.
  • claude.py:+419: Now uses if chat_id is not None. Correct.

Suggestion — Legacy record in test is missing media field (tests/test_history.py:+237)

legacy_record = {
    "ts": datetime.now(UTC).isoformat(),
    "dir": "user",
    "chat_id": 111,
    "text": "old message",
    # "media" key absent
}

The log_message() function always writes a media field. This synthetic record omits it. If the parsing loop in get_recent_history ever accesses record["media"] rather than record.get("media"), the test would fail with a KeyError on this record rather than exercising the legacy read-through path. Since all tests pass, the parser uses .get(), but the test record should match the actual on-disk format to avoid brittle future breakage:

legacy_record = {
    "ts": datetime.now(UTC).isoformat(),
    "dir": "user",
    "chat_id": 111,
    "text": "old message",
    "media": None,
}

Clean areas

  • Set-union deduplication for legacy + per-user files is correct (different Path objects even with same p.name).
  • rglob for the chat_id=None branch correctly picks up both legacy flat files and per-user subdirectory files.
  • Sort key p.name (date string) interleaves same-date files from different users chronologically — correct for the admin view.
  • test_no_flat_file_created and test_different_users_get_separate_directories close the coverage gaps from the old tests cleanly.

The PR is ready to merge once the test record is patched (or confirmed a non-issue by reading the parsing loop).

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: fix/history-per-user (second update)

No new issues found. The diff is clean relative to prior review state.

The one open item from the last review — the missing media field in the test_legacy_flat_files_included synthetic record (tests/test_history.py, legacy_record) — is still unaddressed, but you've seen it and made your call. Since the parsing loop uses .get(), it's not a correctness risk.

Everything else checks out:

  • chat_id is not None guard in claude.py — correct
  • Non-overlapping strings in test_excludes_other_users — correct
  • set-union deduplication for legacy + per-user files — correct; the paths are always distinct since legacy files are directly under _LOG_DIR and per-user files are in subdirectories
  • _LOG_DIR.glob("*.jsonl") (non-recursive) for legacy scan — correct
  • rglob for chat_id=None admin view — correct
  • No path traversal risk (chat_id is typed int)

Ready to merge.

@dcellison dcellison merged commit 961aa3e into main Mar 26, 2026
1 check passed
@dcellison dcellison deleted the fix/history-per-user branch March 26, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

History files not separated per-user on disk

2 participants