Skip to content

fix(cron): schedule missing GDPR retention and orphan cleanup jobs#863

Merged
2witstudios merged 5 commits intomasterfrom
pu/cron-gdpr-fix
Apr 9, 2026
Merged

fix(cron): schedule missing GDPR retention and orphan cleanup jobs#863
2witstudios merged 5 commits intomasterfrom
pu/cron-gdpr-fix

Conversation

@2witstudios
Copy link
Copy Markdown
Owner

@2witstudios 2witstudios commented Apr 9, 2026

Summary

  • Two cron endpoints (/api/cron/retention-cleanup and /api/cron/cleanup-orphaned-files) were fully implemented and authenticated but never scheduled in the Docker crontab — expired personal data and orphaned files accumulated indefinitely
  • Adds retention-cleanup at 1 AM UTC daily: cleans expired sessions, tokens, page versions, drive backups, permissions, and monitoring data
  • Adds orphaned file cleanup at 5 AM UTC Sundays: detects file records with zero references and deletes physical files via processor service
  • Removes security_audit_log from retention cleanup — the tamper-evident SHA-256 hash chain requires infinite retention since the chain verifier treats any gap as tampering (no gap-aware mode)
  • Fixes non-atomic orphan delete path: DB records are now only removed for orphans whose physical files were successfully deleted; failed ones retry on the next weekly run
  • Updates endgame prototype: DatabasePane shows 10 cron jobs, CompliancePane promotes orphan cleanup from amber to green

Changes

File Change
docker/cron/crontab 2 new cron entries
packages/lib/src/compliance/retention/monitoring-retention.ts Remove cleanupSecurityAuditLog() for infinite hash chain retention
packages/lib/src/compliance/retention/monitoring-retention.test.ts Remove security audit log test cases
packages/lib/src/compliance/retention/retention-engine.test.ts Update expected table counts (12→11)
apps/web/src/app/api/cron/cleanup-orphaned-files/route.ts Gate DB deletion on successful physical delete
prototypes/pagespace-endgame/src/components/panes/DatabasePane.tsx Add 2 cron jobs, update count 8→10
prototypes/pagespace-endgame/src/components/panes/CompliancePane.tsx Split PII/orphan card, orphan now green

Test plan

  • Retention and monitoring-retention tests pass (31/31)
  • TypeScript check passes for packages/lib
  • Verify crontab syntax is valid
  • Deploy to staging and confirm both endpoints respond to cron-curl with 200
  • Check /var/log/cron/retention-cleanup.log and /var/log/cron/orphan-cleanup.log populate after schedule fires
  • Confirm existing cron jobs are unaffected

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added automated daily retention cleanup (1:00am UTC) and weekly orphaned-file cleanup (Sunday 5:00am UTC) cron tasks; UI updated to show both jobs and a new "Orphaned file cleanup" card.
  • Bug Fixes / Behavior Changes

    • Orphaned-file cleanup now only removes database records when physical deletions succeed.
    • Security audit log removed from automated retention cleanup.

Two cron endpoints were fully implemented but never scheduled, causing
expired personal data and orphaned files to accumulate indefinitely.

- Add retention-cleanup at 1 AM UTC daily (sessions, tokens, versions, backups, permissions, monitoring)
- Add orphaned file cleanup at 5 AM UTC Sundays (zero-reference file records + physical files)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Warning

Rate limit exceeded

@2witstudios has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 57 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 57 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cbe40390-6f8c-41bd-9585-dd2790fa7686

📥 Commits

Reviewing files that changed from the base of the PR and between a8489b5 and b58db10.

📒 Files selected for processing (2)
  • prototypes/pagespace-endgame/src/components/panes/CompliancePane.tsx
  • prototypes/pagespace-endgame/src/components/panes/GdprPane.tsx
📝 Walkthrough

Walkthrough

Added two UTC cron jobs (daily retention cleanup and weekly orphaned-files cleanup); orphaned-files cleanup now only deletes DB records for files successfully physically deleted; removed security_audit_log retention logic and related tests.

Changes

Cohort / File(s) Summary
Cron Job Configuration
docker/cron/crontab
Added two scheduled maintenance tasks: daily retention-cleanup at 01:00 UTC and weekly cleanup-orphaned-files on Sundays at 05:00 UTC, each logging to separate files.
Orphan cleanup API
apps/web/src/app/api/cron/cleanup-orphaned-files/route.ts
Adjusted orphaned-files cleanup to track physical delete outcomes and only delete DB records for safeToDelete IDs; avoids unconditional DB deletions when physical deletion failed or was malformed.
Retention monitoring core
packages/lib/src/compliance/retention/monitoring-retention.ts, packages/lib/src/compliance/retention/monitoring-retention.test.ts
Removed security_audit_log cleanup: deleted import/function, removed securityAuditDays from RetentionConfig and getRetentionConfig, and updated tests to no longer expect security audit cleanup.
Retention engine tests
packages/lib/src/compliance/retention/retention-engine.test.ts
Updated expectations/mocks to reflect removal of security_audit_log from retention cleanup result set (expected count from 12→11).
UI / Prototypes
prototypes/pagespace-endgame/src/components/panes/CompliancePane.tsx, prototypes/pagespace-endgame/src/components/panes/DatabasePane.tsx
Split PII/orphan card into separate “PII scrubber” and new “Orphaned file cleanup” cards; incremented displayed cron job count from 8→10 and added retention & orphan cron entries in the Database pane.

Sequence Diagram(s)

sequenceDiagram
  participant Cron as Cron (container)
  participant Web as Web API (web:3000)
  participant Processor as Processor Service
  participant DB as Database

  Cron->>Web: GET /api/cron/cleanup-orphaned-files
  Web->>DB: query orphaned file records -> orphanIds
  loop for each orphanId
    Web->>Processor: request physical delete (storagePath)
    alt Processor responds OK
      Processor-->>Web: 200 OK
      Web->>DB: mark id as safeToDelete (collect)
    else Processor fails / error / malformed path
      Processor-->>Web: non-OK / exception
      Web-->>Web: record failure (exclude from safeToDelete)
    end
  end
  alt safeToDelete not empty
    Web->>DB: delete file records WHERE id IN safeToDelete
    DB-->>Web: returning deleted ids
  else no safeToDelete
    Web-->>DB: skip deletions (dbDeleted = 0)
  end
  Web-->>Cron: HTTP response (summary/log)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I hopped through cron at break of day,
Tidied orphans, swept decay,
Security logs pared away,
Two new jobs now keep things bright—
Hop, hop, maintenance overnight! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: scheduling two missing GDPR-related cron jobs (retention cleanup and orphan cleanup) that were previously implemented but not scheduled in the crontab.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pu/cron-gdpr-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pagespace-master-plan Ready Ready Preview, Comment Apr 9, 2026 4:53pm

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/cron/crontab`:
- Around line 42-44: The scheduled endpoint /api/cron/cleanup-orphaned-files
currently removes DB records even when physical file deletion fails; open the
route handler in cleanup-orphaned-files/route.ts and change the flow so DB
deletions are conditional on successful physical deletes: for each orphaned
record attempt the physical delete with retries/backoff (or mark and enqueue for
retry), collect any failures and abort DB row removal when any physical delete
fails, or mark failed records with a persistent "delete_failed" state instead of
removing them; additionally return a non-2xx/error status when any physical
deletes failed so the cron caller can see the failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 75ac8a04-c475-4f4a-b43f-7ac68ce7481e

📥 Commits

Reviewing files that changed from the base of the PR and between 3817aab and ed033f0.

📒 Files selected for processing (1)
  • docker/cron/crontab

Comment thread docker/cron/crontab
…type

Remove security_audit_log from retention cleanup — the tamper-evident
hash chain requires infinite retention since the chain verifier has no
gap handling and treats missing entries as tampering. Update endgame
prototype to reflect the two newly scheduled cron jobs (retention-cleanup
at 1am, orphan-cleanup Sundays 5am) and promote orphan cleanup status
from amber to green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address CodeRabbit review: if processor is down and physical file
deletion fails, the DB record is now preserved so it retries on the
next weekly run instead of creating a permanent storage leak.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@2witstudios
Copy link
Copy Markdown
Owner Author

Re: CodeRabbit review — non-atomic orphan delete path

Good catch. Fixed in a8489b5 — orphans whose physical file deletion fails are now excluded from the DB deletion batch. They'll be rediscovered as orphans on the next weekly run and retried automatically.

Specifically in apps/web/src/app/api/cron/cleanup-orphaned-files/route.ts:

  • Added a safeToDelete filter that excludes IDs in failedPhysicalDeletes
  • Only passes successfully-cleaned orphan IDs to deleteFileRecords()
  • If all physical deletes fail, zero DB records are removed

This prevents the permanent storage leak scenario where a transient processor failure + DB record deletion = unreferenceable physical files.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@prototypes/pagespace-endgame/src/components/panes/CompliancePane.tsx`:
- Around line 175-180: Update the text in the CompliancePane component (the
"Orphaned file cleanup" paragraph) to clarify that DB records are removed only
when the physical file deletion succeeds; replace the phrase "then removes DB
records" with wording such as "and, if the physical deletion succeeds, removes
the corresponding DB records" so the description matches the cron route's
conditional DB deletion behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ba86b7cf-32b1-46ee-a1dc-c673af0d01c4

📥 Commits

Reviewing files that changed from the base of the PR and between ed033f0 and a8489b5.

📒 Files selected for processing (6)
  • apps/web/src/app/api/cron/cleanup-orphaned-files/route.ts
  • packages/lib/src/compliance/retention/monitoring-retention.test.ts
  • packages/lib/src/compliance/retention/monitoring-retention.ts
  • packages/lib/src/compliance/retention/retention-engine.test.ts
  • prototypes/pagespace-endgame/src/components/panes/CompliancePane.tsx
  • prototypes/pagespace-endgame/src/components/panes/DatabasePane.tsx

Update CompliancePane wording to reflect that DB records are only
removed for orphans whose physical file deletion succeeded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@2witstudios
Copy link
Copy Markdown
Owner Author

Re: CodeRabbit review — orphan cleanup wording

Fixed in 70641cd — updated the CompliancePane prototype text to say "attempts physical deletion via the processor service, then removes DB records only for files whose physical deletion succeeded" to match the conditional behavior.

…l behavior

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@2witstudios 2witstudios merged commit 4ae850d into master Apr 9, 2026
5 checks passed
2witstudios added a commit that referenced this pull request Apr 9, 2026
Merge origin/master — both the calendar-trigger cron (this branch)
and the orphaned-file-cleanup cron (#863) are kept.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@2witstudios 2witstudios deleted the pu/cron-gdpr-fix branch April 9, 2026 19:02
2witstudios added a commit that referenced this pull request Apr 10, 2026
Update for #865 (Redis export rate limit), #866 (activity log PII
exclusion), #867 (activity chain serialization), #868-870 (audit
service wiring), #863 (GDPR cron jobs), #861 (password auth removed).

Fix code review comments: AI usage log deletion is explicit call not
FK cascade; note shared-page assistant messages survive account
deletion; resolve P2 #8 and #9 contradictions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2witstudios added a commit that referenced this pull request Apr 10, 2026
* docs: update compliance doc and prototype panes to reflect implemented fixes

DSAR export, message hard-delete, AI log purge on account deletion, and
audit chain verification are now implemented — update stale gap claims.
Note in-progress work on pu/hash-chain-pii and pu/export-rate-limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: reflect merged PRs and fix stale review comments

Update for #865 (Redis export rate limit), #866 (activity log PII
exclusion), #867 (activity chain serialization), #868-870 (audit
service wiring), #863 (GDPR cron jobs), #861 (password auth removed).

Fix code review comments: AI usage log deletion is explicit call not
FK cascade; note shared-page assistant messages survive account
deletion; resolve P2 #8 and #9 contradictions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix stale password auth reference, add userId:null caveat

Section 7.1 referenced "local email+password auth" — password auth was
removed in #861; on-prem now uses magic links + passkeys. GdprPane
message deletion cards now note shared-page assistant messages with
userId: null may survive account deletion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(prototype): restructure panes so resolved items live in Current

Resolved items were sitting in Gaps/End Game with "Fixed"/"Done" labels,
breaking the narrative flow. Now:
- Current: hash chain integrity, distributed rate limit, message
  hard-delete all live in their natural subsections
- Gaps: only genuine gaps remain (cookie consent, data residency,
  SIEM, audit coverage, agent trails)
- End Game: only future work (no "Done" items cluttering the roadmap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant