Skip to content

Fix false-positive "column deleted" stale check for timestamp-index derived columns#18294

Merged
gortiz merged 1 commit into
apache:masterfrom
shounakmk219:fix-is-stale-for-timestamp-index-cols
Apr 23, 2026
Merged

Fix false-positive "column deleted" stale check for timestamp-index derived columns#18294
gortiz merged 1 commit into
apache:masterfrom
shounakmk219:fix-is-stale-for-timestamp-index-cols

Conversation

@shounakmk219
Copy link
Copy Markdown
Collaborator

Summary

  • Timestamp-index derived columns ($col$GRANULARITY) are materialized as physical segment columns when a TimestampConfig is active, but are not present in the user-facing schema. When the timestamp index is later removed from the table config, applyTimestampIndex no longer injects the derived column into the IndexLoadingConfig schema, causing isSegmentStale to fire "column deleted: $col$GRANULARITY" and trigger an unnecessary segment reload.
  • The fix skips the "column deleted" verdict for $col$GRANULARITY columns when the base column is still in the schema (confirming this is a timestamp-derived orphan, not a genuine user-schema deletion). If the base column is also absent the guard does not activate, so user-defined columns whose names happen to match the pattern are still correctly caught.
  • Six tests added: no-change, index added (stale), index removed (not stale), granularity added (stale), granularity removed (not stale), and a boundary test for a user-defined $col$GRANULARITY-named column that is genuinely deleted.

Root cause

FieldSpec.isVirtualColumn() returns true only when _virtualColumnProvider is set — a transformFunction alone does not make a column virtual. So $col$DAY added by applyTimestampIndex lands in schema.getPhysicalColumnNames(), and the segment also stores it as a physical column (with isAutoGenerated=false). When the TimestampConfig is removed, the schema no longer contains the derived column but the segment still does, triggering the false stale verdict.

Test plan

  • BaseTableDataManagerNeedRefreshTest — all 43 tests pass (./mvnw -pl pinot-core -am -Dtest=BaseTableDataManagerNeedRefreshTest -Dsurefire.failIfNoSpecifiedTests=false test)
  • spotless:apply, license:format, checkstyle:check, license:check all pass on pinot-core

🤖 Generated with Claude Code

…erived columns

Timestamp-index derived columns ($col$GRANULARITY) are physically
materialized in segment files when a TimestampConfig is active, but are
not first-class columns in the user-facing schema. When the timestamp
index is later removed from the table config, applyTimestampIndex no
longer injects the derived column into the IndexLoadingConfig schema,
causing isSegmentStale to incorrectly report "column deleted:
$col$GRANULARITY" and trigger an unnecessary segment reload.

The fix skips the "column deleted" verdict for $col$GRANULARITY columns
when the base column is still present in the schema (confirming the
column is a timestamp-derived orphan rather than a genuine user-schema
deletion). When the base column is also absent, the guard does not
activate, preserving correct behavior for user-defined columns whose
names happen to match the pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.61%. Comparing base (78e38eb) to head (ede7d19).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18294      +/-   ##
============================================
+ Coverage     63.58%   63.61%   +0.03%     
  Complexity     1659     1659              
============================================
  Files          3245     3245              
  Lines        197441   197445       +4     
  Branches      30564    30566       +2     
============================================
+ Hits         125536   125599      +63     
+ Misses        61856    61809      -47     
+ Partials      10049    10037      -12     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.58% <100.00%> (+8.07%) ⬆️
java-21 63.57% <100.00%> (+0.01%) ⬆️
temurin 63.61% <100.00%> (+0.03%) ⬆️
unittests 63.60% <100.00%> (+0.03%) ⬆️
unittests1 55.57% <100.00%> (+0.02%) ⬆️
unittests2 35.06% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

// We only skip when the base column is still present in the schema; if the base column was also removed,
// the column is not a timestamp-derived orphan and must still trigger "column deleted".
if (TimestampIndexUtils.isValidColumnWithGranularity(columnName)) {
String baseColumn = columnName.substring(1, columnName.indexOf('$', 1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There must be util for this? If not let's add.

@Jackie-Jiang
Copy link
Copy Markdown
Contributor

At high level, when a timestamp index is removed, the segment should be count as stale right? Then a reload can clean up the old columns. This is similar to when we remove an inverted index

@Jackie-Jiang Jackie-Jiang added ingestion Related to data ingestion pipeline bug Something is not working as expected labels Apr 23, 2026
@noob-se7en
Copy link
Copy Markdown
Contributor

Yes, I agree we should still mark this segment as stale.

@shounakmk219
Copy link
Copy Markdown
Collaborator Author

@Jackie-Jiang @noob-se7en that makes sense. Let me take another look. My initial take was that the stale reason should be flagged by the timestamp index check and not come up as a column removed operation. Its confusing as the schema will not have the column. Also I ran few flows around the timestamp index and the stale reason is hard to understand as well as for few cases it is not fixed even after a segment reload.

Steps:

  1. Add a timestamp index
  2. check for stale API -> fails with reason "column added"
  3. reload segments
  4. check for stake API -> fails with reason "range index changed: $mtime$HOUR"
  5. add another granularity MINUTE to the index
  6. check for stale API -> fails with reason "column added"
  7. reload segments
  8. check for stake API -> still fails with reason "range index changed: $mtime$HOUR"
  9. remove the HOUR granularity
  10. check for stake API -> fails with reason "column deleted: $mtime$HOUR"
  11. reload segments
  12. check for stake API -> fails with reason "range index changed: $mtime$MINUTE"

Looks like the right fix would be to add a dedicated timestamp index validation and emit the right reason of staleness
But its also worth looking into why the check complains about range index even after we reload the segments.

Goal is to make the stale API meaningful as well as actionable through something like segment reload. If a stale reason cannot be addressed by reload we should have a way to skip these as typical usecase involve hitting this stale check and proceed to run reload if needed to handle things like preventing server restart time, etc.

@gortiz gortiz merged commit 6da39b4 into apache:master Apr 23, 2026
31 of 32 checks passed
@noob-se7en
Copy link
Copy Markdown
Contributor

The comments were still open 🥲 @gortiz

@shounakmk219
Copy link
Copy Markdown
Collaborator Author

We can move the discussion to #18310 @noob-se7en @Jackie-Jiang

gortiz pushed a commit that referenced this pull request Apr 28, 2026
…n isSegmentStale (#18310)

The previous fix (#18294) added a narrow guard inside the "column deleted" branch
to suppress false positives when a timestamp index is removed. This was incomplete:
- Removing a timestamp index should mark the segment stale (so SegmentPreProcessor
  can clean up the orphaned $col$GRANULARITY columns), but the old guard silently
  skipped them.
- Adding a new granularity or configuring a timestamp index for the first time was
  misreported as "column added" instead of a dedicated timestamp index reason.
- Existing derived columns in the segment could trigger spurious "range index
  changed" verdicts on every reload because applyTimestampIndex injects them into
  the range index config.

This commit replaces the guard with a dedicated feature-level check (following the
StarTree and MultiColumnText pattern). Before the generic per-column loop:
1. Compare the set of $col$GRANULARITY columns expected by the current table config
   against those physically present in the segment. Any mismatch returns
   "timestamp index changed", correctly covering add/remove/granularity changes.
2. The generic "column added" check now excludes $col$GRANULARITY columns (handled
   by step 1).
3. The per-column loop skips all $col$GRANULARITY columns entirely, preventing
   false positives on "column deleted" and "range index changed".

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is not working as expected ingestion Related to data ingestion pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants