Skip to content

fix(kilobase): fix hourly backup bug and add WAL retention cleanup#7757

Merged
h0lybyte merged 1 commit intodevfrom
trunk/s3-backup-cleanup-1772923406
Mar 7, 2026
Merged

fix(kilobase): fix hourly backup bug and add WAL retention cleanup#7757
h0lybyte merged 1 commit intodevfrom
trunk/s3-backup-cleanup-1772923406

Conversation

@h0lybyte
Copy link
Member

@h0lybyte h0lybyte commented Mar 7, 2026

Summary

  • Fixes backup running every hour instead of daily — CNPG uses 6-field cron (with seconds), so 0 2 * * * was parsed as sec=0 min=2 hour=* (hourly at :02). Fixed to 0 0 2 * * * (daily at 2:00 AM UTC).
  • Adds WAL retention cleanupinstanceSidecarConfiguration.retentionPolicyIntervalSeconds: 1800 runs barman cleanup every 30 minutes, removing WAL files and base backups beyond the retention window.
  • Increases retention from 3d to 7d — keeps a full week of backups for safety.

Root Cause of S3 Cost Increase

The 5-field cron 0 2 * * * was creating 24 base backups per day instead of 1. Combined with no WAL cleanup, S3 storage was growing unbounded. This has been running since the ScheduledBackup was created (~101 days ago).

Post-merge Manual Steps

After ArgoCD syncs, clean up the 29 failed backup CRDs:

kubectl get backups.postgresql.cnpg.io -n kilobase --no-headers | grep failed | awk '{print $1}' | xargs kubectl delete backups.postgresql.cnpg.io -n kilobase

The WAL cleanup sidecar will automatically start purging old WAL files from S3 within 30 minutes of deployment.

Test plan

  • ScheduledBackup shows nextScheduleTime ~24h from now (not ~1h)
  • No new backup created at the next hour mark
  • WAL cleanup runs within 30 minutes (check barman-cloud sidecar logs)
  • S3 bucket size starts decreasing over the next few hours

🤖 Generated with Claude Code

The ScheduledBackup cron '0 2 * * *' was a 5-field format being
parsed as 6-field by CNPG (sec=0 min=2 hour=* → every hour at :02).
Fix to '0 0 2 * * *' for daily at 2 AM UTC. This was causing 24x
more base backups per day and inflating S3 costs.

Also adds instanceSidecarConfiguration with retentionPolicyIntervalSeconds
to clean up WAL files beyond the retention window. Increased retention
from 3d to 7d for safety.
@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@h0lybyte h0lybyte merged commit 590419a into dev Mar 7, 2026
5 checks passed
@h0lybyte h0lybyte deleted the trunk/s3-backup-cleanup-1772923406 branch March 7, 2026 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant