Skip to content

Fix compatibility stream count retries#18532

Merged
xiangfu0 merged 1 commit into
apache:masterfrom
xiangfu0:codex/fix-compat-stream-retry-flake
May 20, 2026
Merged

Fix compatibility stream count retries#18532
xiangfu0 merged 1 commit into
apache:masterfrom
xiangfu0:codex/fix-compat-stream-retry-flake

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

Summary

  • Retry transient count query failures in StreamOp while compatibility tests roll brokers and servers.
  • Handle Pinot broker exceptions responses represented as arrays instead of assuming an object.
  • Add focused coverage for retryable numeric and named query error codes.

Root cause

During MSQE compatibility runs, the stream verifier calls SELECT count(*) FROM <table> while components are being restarted or downgraded. The broker response can contain an exceptions array. StreamOp treated it as an object and dereferenced errorCode, producing an NPE instead of retrying transient routing or server errors.

User manual

No table config changes are required. Compatibility suites continue using the same stream op config; the verifier now waits up to 60 seconds for the count query to become stable before failing.

Sample verifier query:

SELECT count(*) FROM <table>

Testing

  • ./mvnw spotless:apply -pl pinot-compatibility-verifier
  • ./mvnw license:format -pl pinot-compatibility-verifier
  • ./mvnw checkstyle:check license:check -pl pinot-compatibility-verifier
  • ./mvnw -pl pinot-compatibility-verifier -Dtest=StreamOpTest -Dsurefire.failIfNoSpecifiedTests=false test

@xiangfu0 xiangfu0 force-pushed the codex/fix-compat-stream-retry-flake branch from 706e1b9 to dd8e48f Compare May 19, 2026 15:47
@xiangfu0 xiangfu0 marked this pull request as ready for review May 19, 2026 15:52
@xiangfu0 xiangfu0 force-pushed the codex/fix-compat-stream-retry-flake branch from dd8e48f to 28d97da Compare May 19, 2026 17:41
@Jackie-Jiang Jackie-Jiang added the testing Related to tests or test infrastructure label May 19, 2026
Comment thread pinot-compatibility-verifier/src/main/java/org/apache/pinot/compat/StreamOp.java Outdated
@xiangfu0 xiangfu0 force-pushed the codex/fix-compat-stream-retry-flake branch 2 times, most recently from 88b3f1d to 5c8a2a4 Compare May 19, 2026 18:14
Comment thread pinot-compatibility-verifier/src/main/java/org/apache/pinot/compat/StreamOp.java Outdated
Comment thread pinot-compatibility-verifier/src/main/java/org/apache/pinot/compat/StreamOp.java Outdated
@xiangfu0 xiangfu0 force-pushed the codex/fix-compat-stream-retry-flake branch from 5c8a2a4 to 06ffcd3 Compare May 19, 2026 18:34
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.72%. Comparing base (9471163) to head (25c24a4).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18532      +/-   ##
============================================
+ Coverage     55.76%   63.72%   +7.96%     
- Complexity      815     1932    +1117     
============================================
  Files          2561     3292     +731     
  Lines        148577   201519   +52942     
  Branches      24019    31322    +7303     
============================================
+ Hits          82850   128415   +45565     
- Misses        58624    62816    +4192     
- Partials       7103    10288    +3185     
Flag Coverage Δ
custom-integration1 100.00% <ø> (?)
integration 100.00% <ø> (?)
integration1 100.00% <ø> (?)
integration2 0.00% <ø> (?)
java-21 63.72% <ø> (+7.96%) ⬆️
temurin 63.72% <ø> (+7.96%) ⬆️
unittests 63.72% <ø> (+7.95%) ⬆️
unittests1 55.76% <ø> (+<0.01%) ⬆️
unittests2 35.25% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the codex/fix-compat-stream-retry-flake branch from 06ffcd3 to b44fa69 Compare May 19, 2026 23:16
The compatibility verifier polls SELECT count(*) while rolling brokers and servers between versions. During those transitions Pinot can return transient broker/server errors or partial count responses before routing and query scheduling stabilize.

Retry only those verifier polling failures, keep non-retryable query errors fail-fast, and cover the retry loop with focused StreamOp tests.
@xiangfu0 xiangfu0 force-pushed the codex/fix-compat-stream-retry-flake branch from b44fa69 to 25c24a4 Compare May 19, 2026 23:28
@xiangfu0 xiangfu0 merged commit ca0dfc1 into apache:master May 20, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Related to tests or test infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants