Skip to content

Add age-based cleanup for uploaded files#188

Merged
dcellison merged 5 commits intomainfrom
feature/file-cleanup
Mar 26, 2026
Merged

Add age-based cleanup for uploaded files#188
dcellison merged 5 commits intomainfrom
feature/file-cleanup

Conversation

@dcellison
Copy link
Copy Markdown
Owner

Summary

Add age-based cleanup for uploaded files. New FILE_RETENTION_DAYS env var (default 0 = disabled) enables periodic deletion of files older than N days. Cleanup runs once every 24 hours as a background task.

How it works

  • File age is determined from the YYYYMMDD_HHMMSS timestamp prefix that _save_upload() prepends to every filename
  • Files without a recognized prefix are never deleted (protects manually-placed files)
  • Empty per-user directories are removed after cleanup
  • The feature is opt-in with a conservative default (0 = keep everything forever)

Configuration

FILE_RETENTION_DAYS=30  # delete files older than 30 days
# FILE_RETENTION_DAYS=0  # or unset = keep everything (default)

Changes

File Change
src/kai/config.py file_retention_days field, parsed from FILE_RETENTION_DAYS
src/kai/main.py _file_age() timestamp parser, _file_cleanup_loop() background task
.env.example FILE_RETENTION_DAYS entry with comment

Test plan

  • test_parses_valid_timestamp - extracts datetime from filename prefix
  • test_returns_none_for_no_prefix - non-timestamped files return None
  • test_returns_none_for_malformed_timestamp - invalid dates return None
  • test_returns_none_for_partial_match - incomplete patterns return None
  • test_deletes_old_files - files past retention cutoff are deleted
  • test_preserves_files_without_timestamp - non-timestamped files survive
  • test_removes_empty_user_directories - empty per-user dirs cleaned up
  • test_leaves_nonempty_user_directories - non-empty dirs left intact
  • test_handles_missing_files_directory - no crash when files/ doesn't exist
  • test_handles_unlink_oserror - permission errors counted, not propagated
  • All 1172 tests pass, make check clean

Fixes #151

New FILE_RETENTION_DAYS env var (default 0 = disabled) enables
periodic deletion of uploaded files older than N days. Cleanup runs
once every 24 hours as a background task.

File age is determined from the YYYYMMDD_HHMMSS timestamp prefix
that _save_upload() prepends to every filename. Files without a
recognized prefix are never deleted.

Empty per-user directories are removed after cleanup. The feature
is opt-in with a conservative default so existing deployments are
unaffected.

Fixes #151
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Overall: Clean PR with two findings worth addressing.


Warning

FILE_RETENTION_DAYS with a non-integer value crashes at startup (config.py:915)

file_retention_days=int(os.environ.get("FILE_RETENTION_DAYS", "0")),

FILE_RETENTION_DAYS=thirty raises an unguarded ValueError during load_config(), crashing the bot with a confusing traceback. A misconfigured env var should produce a clear error, not an uncaught exception. Wrap it or validate before the int() call:

raw = os.environ.get("FILE_RETENTION_DAYS", "0")
try:
    file_retention_days = int(raw)
except ValueError:
    raise ValueError(f"FILE_RETENTION_DAYS must be an integer, got: {raw!r}")

Suggestion

Cleanup task is never explicitly cancelled on shutdown (main.py:335-340)

cleanup_task = asyncio.create_task(_file_cleanup_loop(...))
app.bot_data["cleanup_task"] = cleanup_task

The comment says "task self-cancels on loop shutdown" — that's not accurate. asyncio doesn't automatically cancel tasks on loop close; it just destroys them with a Task was destroyed but it is pending! warning. PTB likely handles this for its own tasks, but this one is external. Consider cancelling it in a shutdown hook or post-stop callback:

async def _on_stop(app):
    task = app.bot_data.get("cleanup_task")
    if task:
        task.cancel()
        await asyncio.gather(task, return_exceptions=True)

Clean

  • Regex + strptime for timestamp parsing is correct and handles all edge cases (malformed dates, partial matches, missing prefix) — the tests confirm this.
  • OSError on unlink() is properly counted and swallowed; the loop won't crash on permission errors.
  • subdir.rmdir() silently skipping non-empty dirs is the right behaviour.
  • The rglob("*") → file deletion → iterdir() → dir cleanup ordering is correct (dirs are pruned after their contents are gone).
  • Feature is safely opt-in (default 0 = disabled).
  • Test coverage is thorough.

Non-integer values now produce a clear SystemExit message instead of
an unguarded ValueError traceback. Matches the existing validation
pattern for WEBHOOK_PORT, CLAUDE_IDLE_TIMEOUT, etc.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Overall: Prior warning resolved. One new minor finding.


Resolved from prior review

The FILE_RETENTION_DAYS ValueError is now handled correctly (config.py:825-828), consistent with the WEBHOOK_PORT pattern above it.


Suggestion

Unhandled exception from rglob can permanently kill the cleanup task (main.py, _file_cleanup_loop)

for path in files_dir.rglob("*"):

files_dir.is_dir() passes, but rglob is a lazy generator — the first PermissionError (e.g., permissions changed at runtime) propagates uncaught out of the for loop, terminating _file_cleanup_loop with an exception. The task stored in bot_data["cleanup_task"] becomes done-with-exception and never runs again, with no log message indicating why cleanup stopped.

Wrapping the cleanup body in a broad try/except Exception with a logging.exception(...) call would keep the loop alive and make failures observable:

while True:
    if not files_dir.is_dir():
        await asyncio.sleep(_CLEANUP_INTERVAL)
        continue
    try:
        # ... cutoff / rglob / rmdir block ...
    except Exception:
        logging.exception("File cleanup error (will retry in %ds)", _CLEANUP_INTERVAL)
    await asyncio.sleep(_CLEANUP_INTERVAL)

Clean

  • config.py negative-value edge case is safe: the > 0 guard in _init_and_run silently treats negatives as disabled.
  • _file_age correctly operates on path.name only; no traversal risk.
  • subdir.rmdir() on a symlink-to-directory fails with OSError, caught and skipped — safe.
  • Test coverage is solid; the hardcoded 20260101 date is well outside the 30-day retention window.

…rrors

An unhandled PermissionError from rglob or iterdir would kill the
cleanup task permanently. Now logs the error and retries next cycle,
matching the webhook health monitor pattern.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review

Both prior findings have been addressed: the ValueErrorSystemExit path was fixed in the previous revision, and the try/except Exception block around the cleanup body (main concern from the second review) is now present and correctly structured.

One new finding:

Suggestion

The except Exception fix has no test coverage (main.py, _file_cleanup_loop)

The second review's specific concern was that a PermissionError from rglob would silently kill the task. The fix was added, but no test exercises that path. A future refactor could remove or narrow the except clause without any test failing.

A test along these lines would close the gap:

@pytest.mark.asyncio
async def test_rglob_exception_does_not_kill_loop(self, tmp_path, monkeypatch):
    """PermissionError from rglob is logged and the loop continues."""
    monkeypatch.setattr("kai.main.DATA_DIR", tmp_path)
    monkeypatch.setattr("kai.main._CLEANUP_STARTUP_DELAY", 0)
    monkeypatch.setattr("kai.main._CLEANUP_INTERVAL", 0)
    (tmp_path / "files").mkdir()

    call_count = 0
    async def mock_sleep(duration):
        nonlocal call_count
        call_count += 1
        if call_count > 2:
            raise asyncio.CancelledError

    with (
        patch("kai.main.asyncio.sleep", side_effect=mock_sleep),
        patch("pathlib.Path.rglob", side_effect=PermissionError("denied")),
        patch("kai.main.logging.exception") as mock_log,
    ):
        try:
            await _file_cleanup_loop(30)
        except asyncio.CancelledError:
            pass

    # Loop ran twice (not terminated after first exception)
    assert call_count == 3
    mock_log.assert_called()

Clean

  • except Exception correctly excludes asyncio.CancelledError (which is BaseException in Python 3.8+), so task cancellation propagates cleanly.
  • await asyncio.sleep(_CLEANUP_INTERVAL) sits outside the try/except, so both the success and exception paths sleep before the next iteration — correct.
  • _file_age operates on path.name only — no traversal risk.
  • subdir.rmdir() failing silently on non-empty dirs and symlink-to-dir is correct.
  • files_dir is resolved inside the function body (not at module level), so monkeypatch.setattr("kai.main.DATA_DIR", ...) in tests works correctly.

Verifies that a PermissionError from rglob does not kill the cleanup
task. The loop runs twice (call_count == 3 including startup delay),
proving the exception was caught and the loop continued.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review

Overall: Clean PR. One new finding, one compatibility note.


Suggestion

test_rglob_exception_does_not_kill_loop doesn't assert the error was logged (tests/test_main.py, bottom of diff)

The test correctly verifies the loop survives (call_count == 3), but the logging.exception call is never asserted. The prior review's suggested test included mock_log.assert_called() — the implementation dropped it. A future refactor that silently removes the logging.exception call would pass all tests, making rglob failures completely invisible in production.

Fix: add patch("kai.main.logging.exception") as mock_log to the context manager and assert mock_log.assert_called() at the end.


Warning (verify)

from datetime import UTC requires Python 3.11+ (main.py:37)

datetime.UTC was added in Python 3.11. On 3.10 and earlier this raises ImportError: cannot import name 'UTC' from 'datetime' at startup. The tests pass, so the test environment is at least 3.11 — but if the project publishes a minimum Python version lower than 3.11, this is a hard startup crash. Confirm the minimum supported version; if it's 3.11+, this is fine. If not, use timezone.utc from from datetime import datetime, timedelta, timezone.


Clean

  • All three prior findings resolved: SystemExit on bad FILE_RETENTION_DAYS, try/except Exception guarding rglob, and the rglob-exception test now present.
  • except Exception correctly excludes CancelledError (which is BaseException in Python 3.8+) — cancellation propagates cleanly.
  • await asyncio.sleep sits outside the try/except, so both success and exception paths sleep before the next iteration.
  • _file_age operates on path.name only — no traversal risk.
  • subdir.rmdir() silently skips non-empty dirs and symlink-to-dir — correct.
  • files_dir resolved inside the loop body — monkeypatching DATA_DIR in tests works correctly.
  • Test coverage is thorough across all the meaningful paths.

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review

All four prior findings have been addressed. The PR is clean — no new bugs, security issues, or style violations.

Verified clean:

  • mock_log.assert_called() is present in test_rglob_exception_does_not_kill_loop — the final gap from the last review is closed.
  • except Exception correctly excludes CancelledError (BaseException in Python 3.8+); task cancellation propagates cleanly.
  • await asyncio.sleep is outside the try/except, so both success and error paths sleep before the next iteration — correct.
  • _file_age operates on path.name only; no path traversal risk.
  • path.unlink() removes the symlink itself, not any target — safe against symlink attacks.
  • subdir.rmdir() on a symlink-to-directory raises OSError, caught and skipped — safe.
  • files_dir is resolved after the startup delay but before the loop; monkeypatch.setattr("kai.main.DATA_DIR", ...) in tests works correctly.
  • rglob iteration count in test_rglob_exception_does_not_kill_loop is correct: 1 startup sleep + 2 loop sleeps = call_count == 3, matching "loop ran twice."

The two open items from prior reviews (task cancellation on shutdown, datetime.UTC Python 3.11+ dependency) remain unaddressed but are the author's call per the review history.

This PR is ready to merge.

@dcellison dcellison merged commit b27c962 into main Mar 26, 2026
1 check passed
@dcellison dcellison deleted the feature/file-cleanup branch March 26, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No cleanup policy for uploaded files

2 participants