feat(analytics): make retention windows configurable#719
Conversation
Add analytics.rawRetentionMs and analytics.aggregateRetentionMs config params to let operators age out hdb_analytics data faster than the hardcoded defaults (1h raw / 1y aggregate). Defaults are unchanged so existing deployments are unaffected. Raw retention is clamped to at least the aggregation period to prevent raw records from being deleted before they can be rolled up (avoids silent data loss when the value is set below the aggregation cadence). Closes: #566 Jira: CORE-3074 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| // Clamp raw retention to at least one full aggregation period so raw records | ||
| // are never deleted before they can be rolled up. | ||
| const rawRetentionMs = Math.max(envGet(CONFIG_PARAMS.ANALYTICS_RAWRETENTIONMS) ?? RAW_EXPIRATION, AGGREGATE_PERIOD); | ||
| const aggregateRetentionMs = envGet(CONFIG_PARAMS.ANALYTICS_AGGREGATERETENTIONMS) ?? AGGREGATE_EXPIRATION; |
There was a problem hiding this comment.
aggregateRetentionMs: 0 silently deletes all aggregate data.
cleanup() computes end = Date.now() - expiration. If aggregateRetentionMs is 0, every cleanup cycle removes every aggregate record (end = Date.now() → all keys qualify). The schema allows minimum: 0 without documenting this semantics, so an operator who sets 0 expecting "disabled" instead loses all historical aggregates.
rawRetentionMs is protected by the Math.max(…, AGGREGATE_PERIOD) clamp, but aggregateRetentionMs has no guard.
Suggested fix — skip cleanup when the configured value is zero:
| const aggregateRetentionMs = envGet(CONFIG_PARAMS.ANALYTICS_AGGREGATERETENTIONMS) ?? AGGREGATE_EXPIRATION; | |
| const aggregateRetentionMs = envGet(CONFIG_PARAMS.ANALYTICS_AGGREGATERETENTIONMS) ?? AGGREGATE_EXPIRATION; | |
| if (!aggregateRetentionMs) return; // 0 = disabled |
Or, if full deletion is the intended "purge" semantics for 0, document it explicitly in the schema description and raise the minimum to a meaningful floor (e.g., 1).
|
Reviewed; no blockers found. Prior blocker (aggregateRetentionMs=0 mass-delete) addressed by commit fe8f8b0. |
…lete Setting aggregateRetentionMs to 0 would pass Date.now()-0=Date.now() to cleanup(), deleting every aggregate record. Treat 0 as "keep forever" to match the storageInterval: 0 = disabled convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fixed the blocker in the follow-up commit: The Windows 🤖 — Claude |
Summary
analytics.rawRetentionMsandanalytics.aggregateRetentionMsconfig paramsCONFIG_PARAMS(hdbTerms.ts) and document inconfig-root.schema.jsonstartScheduledTasks()inresources/analytics/write.tsreads configured values (with existing constants as defaults)Purpose
Closes #566 / Jira CORE-3074. A customer's
system.mdbreached ~5 GB of analytics data (31M rows / 1 year of aggregates). Operators previously couldn't shorten the window without a code change.Notable design decision
Raw retention is clamped to
Math.max(configured, AGGREGATE_PERIOD)at startup. If the configured value is shorter than the aggregation cadence, raw records could be deleted beforeaggregation()rolls them up — a silent data loss path flagged in cross-model review. The schema description notes this behavior.🤖 Generated by Claude on behalf of Kris.