Fix compatibility stream count retries#18532
Merged
xiangfu0 merged 1 commit intoMay 20, 2026
Merged
Conversation
706e1b9 to
dd8e48f
Compare
dd8e48f to
28d97da
Compare
88b3f1d to
5c8a2a4
Compare
5c8a2a4 to
06ffcd3
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18532 +/- ##
============================================
+ Coverage 55.76% 63.72% +7.96%
- Complexity 815 1932 +1117
============================================
Files 2561 3292 +731
Lines 148577 201519 +52942
Branches 24019 31322 +7303
============================================
+ Hits 82850 128415 +45565
- Misses 58624 62816 +4192
- Partials 7103 10288 +3185
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
06ffcd3 to
b44fa69
Compare
Jackie-Jiang
approved these changes
May 19, 2026
The compatibility verifier polls SELECT count(*) while rolling brokers and servers between versions. During those transitions Pinot can return transient broker/server errors or partial count responses before routing and query scheduling stabilize. Retry only those verifier polling failures, keep non-retryable query errors fail-fast, and cover the retry loop with focused StreamOp tests.
b44fa69 to
25c24a4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StreamOpwhile compatibility tests roll brokers and servers.exceptionsresponses represented as arrays instead of assuming an object.Root cause
During MSQE compatibility runs, the stream verifier calls
SELECT count(*) FROM <table>while components are being restarted or downgraded. The broker response can contain anexceptionsarray.StreamOptreated it as an object and dereferencederrorCode, producing an NPE instead of retrying transient routing or server errors.User manual
No table config changes are required. Compatibility suites continue using the same stream op config; the verifier now waits up to 60 seconds for the count query to become stable before failing.
Sample verifier query:
Testing
./mvnw spotless:apply -pl pinot-compatibility-verifier./mvnw license:format -pl pinot-compatibility-verifier./mvnw checkstyle:check license:check -pl pinot-compatibility-verifier./mvnw -pl pinot-compatibility-verifier -Dtest=StreamOpTest -Dsurefire.failIfNoSpecifiedTests=false test