Skip to content

Make lineage retention periods configurable via table config#18282

Merged
swaminathanmanish merged 3 commits into
apache:masterfrom
abhishekbafna:configurable-lineage-retention
Apr 29, 2026
Merged

Make lineage retention periods configurable via table config#18282
swaminathanmanish merged 3 commits into
apache:masterfrom
abhishekbafna:configurable-lineage-retention

Conversation

@abhishekbafna
Copy link
Copy Markdown
Collaborator

@abhishekbafna abhishekbafna commented Apr 22, 2026

Summary

Currently, DefaultLineageManager has two hardcoded 1-day wait periods that prevent rapid segment cleanup in the lineage path:

  1. REFRESH table source segments are kept for 1 day after lineage is COMPLETED before being scheduled for deletion.
  2. Stale IN_PROGRESS/REVERTED lineage entries are not cleaned up until they are 1 day old.

Combined with the default 7-day deep-store tombstone (deletedSegmentsRetentionPeriod), segments can linger in deep store for up to 8 days with defaults. This is particularly painful for use cases that frequently swap (REFRESH) tables, causing significant storage pressure.

Changes

  • Adds two new optional period-string fields to SegmentsValidationAndRetentionConfig (same format as the existing deletedSegmentsRetentionPeriod, e.g. "1d", "12h", "0d"):
    • replacedSegmentsRetentionPeriod: controls how long REFRESH source segments are preserved after lineage is COMPLETED before being scheduled for deletion (default: 1d). Only applies to REFRESH ingestion; APPEND tables ignore this setting.
    • lineageEntryCleanupRetentionPeriod: controls how long stale IN_PROGRESS/REVERTED lineage entries wait before cleanup (default: 1d)
  • Both fields default to the existing 1-day constants when unset — no behaviour change for existing tables.
  • A DEBUG log is emitted when either value resolves to <= 0ms (immediate deletion, no rollback window) — kept at DEBUG to avoid log spam since operators deliberately opt in.
  • Unparseable values fall back to the 1-day default with a WARN log (consistent with existing deletedSegmentsRetentionPeriod behavior).
  • Log messages include both the table name and config field name for easier debugging.
  • Both retention values are parsed once per retention pass (not per lineage entry).
  • Corresponding fluent builder methods added to TableConfigBuilder.
  • No upfront validation added — consistent with the existing deletedSegmentsRetentionPeriod field which also relies on parse-with-warn-and-fallback at use-time.

Example usage — immediate cleanup for a high-frequency REFRESH table

{
  "segmentsConfig": {
    "replacedSegmentsRetentionPeriod": "0d",
    "lineageEntryCleanupRetentionPeriod": "0d",
    "deletedSegmentsRetentionPeriod": "0d"
  }
}

Test plan

  • New unit test class DefaultLineageManagerTest (11 tests) covering:
    • JSON round-trip serialization of both new fields
    • Default 1-day behaviour preserved when fields are unset
    • "0d" causes immediate deletion for both REFRESH replaced segments and stale IN_PROGRESS lineage entries
    • APPEND tables are unaffected by replacedSegmentsRetentionPeriod
    • Unparseable period strings (e.g. "foo") fall back to 1-day default for both fields

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes segment lineage cleanup behavior configurable per-table by adding two optional retention period strings to the table’s segmentsConfig (validation/retention config) and wiring them into DefaultLineageManager’s retention pass logic.

Changes:

  • Add replacedSegmentsRetentionPeriod and lineageEntryCleanupRetentionPeriod to SegmentsValidationAndRetentionConfig (+ fluent setters in TableConfigBuilder).
  • Update DefaultLineageManager.updateLineageForRetention() to parse these periods once per retention pass (with default fallback).
  • Add DefaultLineageManagerTest covering JSON round-trip and retention behavior for REFRESH vs APPEND tables.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
pinot-spi/src/main/java/org/apache/pinot/spi/utils/builder/TableConfigBuilder.java Adds builder setters and propagates the new retention fields into segmentsConfig.
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/SegmentsValidationAndRetentionConfig.java Introduces the two new optional config fields with getters/setters and documentation.
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/lineage/DefaultLineageManager.java Uses table-configurable retention periods instead of hardcoded 1-day constants and adds parsing/warn logging.
pinot-controller/src/test/java/org/apache/pinot/controller/helix/core/lineage/DefaultLineageManagerTest.java New unit tests validating defaults, 0d behavior, and APPEND unaffectedness.

Copy link
Copy Markdown
Contributor

@deeppatel710 deeppatel710 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement — the 8-day worst-case retention has definitely been a pain for frequently-refreshed tables. A few comments:

1. getRetentionMsFromConfig warn-log placement (minor noisy-log concern)

The 0d warn is emitted inside updateLineageForRetention, which runs once per retention pass per table. A REFRESH table configured with 0d and hourly retention will produce 24 warns/day/table indefinitely. Consider either:

  • Logging once per controller lifetime per (table, field) pair (e.g. via a guarded set), or
  • Dropping the warn to DEBUG — operators deliberately set 0d to opt out of the rollback window, so re-warning on every pass adds little value.

2. Strict < boundary with 0d is subtle

return lineageEntry.getTimestamp() < (System.currentTimeMillis() - replacedSegmentsRetentionMs);

With retentionMs = 0 and timestamp == now the condition is now < now → false, i.e. the segments are not deleted on the same pass they completed. Your unit tests correctly work around this with now - 1. Arguably <= would be more intuitive for 0d (delete immediately without waiting for the clock to tick), but changing it is a subtle semantic shift for the existing default path, so probably fine to leave — worth a one-line code comment though.

3. shouldDeleteReplacedSegments for APPEND + custom retention

With APPEND tables, replacedSegmentsRetentionPeriod is silently ignored (method short-circuits on non-REFRESH). The javadoc correctly reflects this, but operators who set the field on an APPEND table will get no feedback. Given the existing TODO ("Once we support data rollback for APPEND tables, we should remove this check"), this is probably the right tradeoff — but one line in the getReplacedSegmentsRetentionPeriod SPI javadoc noting "only applies to REFRESH ingestion" would save confused users a trip to the code.

4. Default constant vs. default field value

TableConfigBuilder gives _deletedSegmentsRetentionPeriod a static default (DEFAULT_DELETED_SEGMENTS_RETENTION_PERIOD), but the two new fields default to null and rely on DefaultLineageManager to fill in 1d. Minor consistency nit — either approach works, but having the default live in one place (the manager) is cleaner because null is a clear "use default" signal and the value is versioned with the consumer.

5. Unparseable value fallback

getRetentionMsFromConfig silently falls back to the default on parse failure with a warn. I think this is the right call (matches existing deletedSegmentsRetentionPeriod behavior) but the default behavior is worth documenting on the SPI javadoc so operators don't assume the controller will refuse the config.

6. Tests look solid

  • JSON round-trip ✓
  • Default preserved when unset ✓
  • 0d immediate deletion for both paths ✓
  • APPEND unaffected ✓

One gap: no test for the unparseable path (e.g. setReplacedSegmentsRetentionPeriod("foo") → falls back to 1 day). Would be worth adding to prevent silent regressions on the fallback path.

Overall LGTM once the log spam point is addressed.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.63%. Comparing base (ab9ac39) to head (1fa5d63).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18282      +/-   ##
============================================
+ Coverage     63.61%   63.63%   +0.01%     
  Complexity     1659     1659              
============================================
  Files          3246     3246              
  Lines        197514   197541      +27     
  Branches      30578    30579       +1     
============================================
+ Hits         125656   125701      +45     
+ Misses        61813    61801      -12     
+ Partials      10045    10039       -6     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.61% <100.00%> (+0.01%) ⬆️
java-21 63.60% <100.00%> (+0.02%) ⬆️
temurin 63.63% <100.00%> (+0.01%) ⬆️
unittests 63.62% <100.00%> (+0.01%) ⬆️
unittests1 55.59% <66.66%> (+<0.01%) ⬆️
unittests2 35.07% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

abhishekbafna and others added 3 commits April 24, 2026 10:49
Hardcoded 1-day waits in DefaultLineageManager prevent rapid segment
cleanup for use cases that frequently swap (REFRESH) tables. Combined
with the default 7-day deep-store tombstone, segments could linger for
up to 8 days.

Adds two new optional period-string fields to SegmentsValidationAndRetentionConfig:
- replacedSegmentsRetentionPeriod: how long REFRESH source segments are
  preserved after lineage is COMPLETED (default: 1 day)
- lineageEntryCleanupRetentionPeriod: how long stale IN_PROGRESS/REVERTED
  lineage entries wait before cleanup (default: 1 day)

Both default to the existing 1-day constants when unset, so behaviour is
unchanged for existing tables.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Downgrade 0d/negative retention log from WARN to DEBUG since operators
  deliberately opt in; avoids log spam on every retention pass
- Add code comment explaining the strict < boundary with 0ms retention
- Add "only applies to REFRESH ingestion" note to SPI Javadoc for
  replacedSegmentsRetentionPeriod
- Update shouldDeleteReplacedSegments Javadoc to reflect configurable
  retention behavior
- Pass config field name into getRetentionMsFromConfig for clearer logs;
  warn on ms <= 0 (not just == 0) to cover negative periods
- Clarify SPI Javadoc: defaults are applied by the consumer, not the getter
- Add tests for unparseable retention period falling back to 1-day default

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@abhishekbafna abhishekbafna force-pushed the configurable-lineage-retention branch from 3ebb54f to 1fa5d63 Compare April 24, 2026 05:19
@abhishekbafna
Copy link
Copy Markdown
Collaborator Author

@deeppatel710 addressed your review comments.

Copy link
Copy Markdown
Collaborator

@shounakmk219 shounakmk219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please update the doc page with these new params as well https://docs.pinot.apache.org/operate-pinot/segment-management/consistent-push-and-rollback

@shounakmk219 shounakmk219 added the configuration Config changes (addition/deletion/change in behavior) label Apr 28, 2026
Copy link
Copy Markdown
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@abhishekbafna
Copy link
Copy Markdown
Collaborator Author

LGTM, please update the doc page with these new params as well https://docs.pinot.apache.org/operate-pinot/segment-management/consistent-push-and-rollback

I will do that. Thank you.

@swaminathanmanish swaminathanmanish merged commit 05de2a8 into apache:master Apr 29, 2026
31 of 32 checks passed
xiangfu0 pushed a commit to pinot-contrib/pinot-docs that referenced this pull request Apr 30, 2026
## Summary

- Documents the new `replacedSegmentsRetentionPeriod` and
`lineageEntryCleanupRetentionPeriod` table config properties on the
[Consistent Push and
Rollback](https://docs.pinot.apache.org/operate-pinot/segment-management/consistent-push-and-rollback)
page
- Adds a config reference table, example JSON snippet, and updates
existing cleanup/implication text to reference the configurable periods
instead of the hardcoded 1-day defaults

Related Pinot PR: apache/pinot#18282

## Test plan
- [x] Verify rendered page on GitBook preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: abhishekbafna <abhishek.bafna@startree.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Config changes (addition/deletion/change in behavior)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants