Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Feb 6, 2026

Problem

Two flaky test failures were observed in CI:

  1. PinotTableRestletResourceTest#testTableTasksCleanupWithNonActiveTasks
  • Intermittent 500 with Failed to delete job ... from queue ... during table deletion.
  1. TruncateDecimalTransformFunctionTest#testTruncateDecimalTransformFunction
  • Intermittent assertion mismatch expected [0.0] but found [-0.0].

Changes

1) Controller test race fix

  • In testTableTasksCleanupWithNonActiveTasks, keep the minion task queue stopped while table deletion runs.
  • Resume queue in a finally block to avoid leaking queue state to other tests.

2) Truncate signed-zero fix

  • In TruncateDecimalTransformFunction, changed the single-argument path (truncate(value)) to use the same truncation logic as truncate(value, 0) via BigDecimal.setScale(0, RoundingMode.DOWN).
  • Added explicit regression coverage in TruncateDecimalTransformFunctionTest for truncate(-0.4) to guarantee canonical 0.0 output (not -0.0).

Why this fixes flakiness

  • The first issue removes a task-state transition race during cleanup.
  • The second issue removes a signed-zero inconsistency in output formatting and aligns truncate(value) with truncate(value, 0) semantics.

Validation

  • ./mvnw -pl pinot-controller -Dtest=PinotTableRestletResourceTest#testTableTasksCleanupWithNonActiveTasks -Dsurefire.failIfNoSpecifiedTests=false test -DskipITs -DskipIntegrationTests
  • Repeated runs of the same controller test (9 consecutive successful loop iterations + 1 additional pass)
  • ./mvnw -pl pinot-controller -Dtest=PinotTableRestletResourceTest#testTableTasksCleanupWithActiveTasks -Dsurefire.failIfNoSpecifiedTests=false test -DskipITs -DskipIntegrationTests
  • ./mvnw -pl pinot-controller -Dtest=PinotTableRestletResourceTest -Dsurefire.failIfNoSpecifiedTests=false test -DskipITs -DskipIntegrationTests
  • ./mvnw -pl pinot-core -Dtest=TruncateDecimalTransformFunctionTest -Dsurefire.failIfNoSpecifiedTests=false test -DskipITs -DskipIntegrationTests
  • Repeated runs of TruncateDecimalTransformFunctionTest#testTruncateDecimalTransformFunction (5 consecutive successful iterations)

@xiangfu0 xiangfu0 requested a review from Copilot February 6, 2026 11:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition in PinotTableRestletResourceTest#testTableTasksCleanupWithNonActiveTasks that caused flaky test failures. The issue occurred when the minion task queue was resumed before table deletion completed, allowing concurrent task transitions to interfere with cleanup operations.

Changes:

  • Move task queue resume logic from before table deletion to after deletion in a finally block
  • Add explanatory comments describing why the queue must remain stopped during deletion
  • Ensure queue resume happens even if deletion fails, preventing test interference

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.24%. Comparing base (a5eae57) to head (e3b021a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...orm/function/TruncateDecimalTransformFunction.java 50.00% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17652      +/-   ##
============================================
+ Coverage     63.22%   63.24%   +0.01%     
  Complexity     1499     1499              
============================================
  Files          3174     3174              
  Lines        190319   190332      +13     
  Branches      29080    29084       +4     
============================================
+ Hits         120338   120368      +30     
+ Misses        60643    60629      -14     
+ Partials       9338     9335       -3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.22% <50.00%> (+7.63%) ⬆️
java-21 63.19% <50.00%> (-0.02%) ⬇️
temurin 63.24% <50.00%> (+0.01%) ⬆️
unittests 63.23% <50.00%> (+0.01%) ⬆️
unittests1 55.63% <50.00%> (+0.02%) ⬆️
unittests2 34.05% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 changed the title [controller] Fix flaky PinotTableRestletResourceTest cleanup race [test] Fix flaky table cleanup and truncate signed-zero tests Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

@xiangfu0 xiangfu0 force-pushed the codex/fix-table-tasks-cleanup-flaky branch from 3d37641 to e3b021a Compare February 7, 2026 21:13
@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Feb 7, 2026

Addressed another flake from CI in TableRebalancePauselessIntegrationTest: avoid reusing extra server IDs/data directories across data-provider invocations (and avoid overlapping the baseline server ID).

Validation:

  • ./mvnw -Ppinot-fastdev -pl pinot-integration-tests -am -Dtest=TableRebalancePauselessIntegrationTest -Dsurefire.failIfNoSpecifiedTests=false test
  • Result: 2 tests run, 0 failures, 0 errors.

waitForTaskState(taskName, TaskState.IN_PROGRESS);

// stop the task queue to abort the task
// Stop the task queue to abort the task. Keep it stopped until table deletion is complete to avoid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test capturing a race condition?

} else {
truncated = 0.0d;
}
// Normalize -0.0 to +0.0 for deterministic output and test stability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? This is an actual production fix, suggest putting it in a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will separate

xiangfu0 added a commit to xiangfu0/pinot that referenced this pull request Feb 11, 2026
Extracted from apache#17652 as a standalone production fix.

- Use allocation-free math truncation in no-scale path
- Preserve NaN/+Inf/-Inf behavior for no-scale truncate
- Normalize -0.0 to canonical +0.0 for deterministic output
- Add regression coverage for signed-zero bit pattern and non-finite values
@xiangfu0
Copy link
Contributor Author

Split out the production truncate(value) fix into a dedicated PR: #17677.

That PR includes the non-finite handling guard, allocation-free no-scale truncation path, signed-zero normalization, and expanded unit coverage.

Keep apache#17652 focused on flaky-test fixes only; production truncate(value) fix is now tracked in apache#17677.
@xiangfu0
Copy link
Contributor Author

Updated this PR to remove all net pinot-core truncate changes; it now only contains flaky-test fixes in controller/integration tests.

The production truncate(value) fix remains in dedicated PR #17677.

Validation rerun:

  • ./mvnw -pl pinot-controller -am -Dtest=PinotTableRestletResourceTest#testTableTasksCleanupWithNonActiveTasks -Dsurefire.failIfNoSpecifiedTests=false test
  • ./mvnw -Ppinot-fastdev -pl pinot-integration-tests -am -Dtest=TableRebalancePauselessIntegrationTest -Dsurefire.failIfNoSpecifiedTests=false test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants