fix pulsar test instability by walterddr · Pull Request #8538 · apache/pinot

walterddr · 2022-04-13T21:47:32Z

This fixes #8537

xiangfu0 · 2022-04-13T22:33:34Z

...ion/pinot-pulsar/src/test/java/org/apache/pinot/plugin/stream/pulsar/PulsarConsumerTest.java

do we know why this call may give fewer records? Is it a bug or something like messages not flushed entirely and available to consume or some messages are actually missing

Shall we actually run a consumer to consume exact same number of messages then exit in setup() method?

I dunno why this fixes the bug. My guess is that pulsar flush is async but I am not sure.

GitHub Actions sometimes catch interesting bug b/c the machines are much less powerful

I saw the publishRecordBatch was changed in #8017 - any chance @KKcorps can give some insights?

This method was added to handle the scenarios where messages are published in batches which actually changes the format of MessageIds in Pulsar.

The only way to publish in batch is to enable async producer. In sync mode, all messages are flushed as soon as they are received.

@xiangfu0 your suggestion also works. I should actually add a method like waitForAllRecordsToFlush and only then exit the setup.

according to the documentation https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Producer.html

producer.flush() should wait until all the message finish publishing before returning (or throw exception); thus we can't check on the producer side

only way to check and wait for all records reaches pulsar for consumption is to actually consume -- which is exactly what this test is doing.

so I guess we should do is wrap the test with

TestUtils.waitForCondition(aVoid -> { // consume and assert }, timeoutMs, "")

sounds good?

There is another approach to just get topic stats. Since we are not producing messages in topics in any other method, it works for us.
I have implemented it here - #8554

once this this PR is merged, I'll pull changes into my PR and we can merge that as well.

walterddr · 2022-04-15T21:59:57Z

it still fails. I wonder if theres' anything wrong with the consumer itself that it didn't pull enough data out.

alternatively I can still make the wait for condition checker in the test function too so that we can at least get better against the flakiness

codecov-commenter · 2022-04-16T16:20:18Z

Codecov Report

Merging #8538 (4e23c04) into master (2704d88) will decrease coverage by 3.78%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #8538      +/-   ##
==========================================
- Coverage   29.52%   25.73%   -3.79%     
==========================================
  Files        1674     1674              
  Lines       87872    87872              
  Branches    13313    13313              
==========================================
- Hits        25942    22615    -3327     
- Misses      59551    63102    +3551     
+ Partials     2379     2155     -224

Flag	Coverage Δ
integration1	`?`
integration2	`25.73% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...g/apache/pinot/server/api/resources/ErrorInfo.java	`0.00% <0.00%> (-100.00%)`	⬇️
...apache/pinot/common/lineage/LineageEntryState.java	`0.00% <0.00%> (-100.00%)`	⬇️
...pinot/minion/exception/TaskCancelledException.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ker/failuredetector/ConnectionFailureDetector.java	`0.00% <0.00%> (-100.00%)`	⬇️
...minion/tasks/mergerollup/MergeRollupTaskUtils.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ion/tasks/mergerollup/MergeRollupTaskExecutor.java	`0.00% <0.00%> (-100.00%)`	⬇️
...nverttorawindex/ConvertToRawIndexTaskExecutor.java	`0.00% <0.00%> (-100.00%)`	⬇️
...e/pinot/common/minion/MergeRollupTaskMetadata.java	`0.00% <0.00%> (-94.74%)`	⬇️
...rg/apache/pinot/common/lineage/SegmentLineage.java	`0.00% <0.00%> (-91.31%)`	⬇️
...ache/pinot/common/lineage/SegmentLineageUtils.java	`11.11% <0.00%> (-88.89%)`	⬇️
... and 191 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2704d88...4e23c04. Read the comment docs.

walterddr · 2022-04-16T16:25:12Z

This doesn't work. It seems like not a flakiness due to delay async produce.

walterddr force-pushed the fix_pulsar_test branch from f139e59 to 7000f35 Compare April 13, 2022 21:48

xiangfu0 requested a review from KKcorps April 13, 2022 22:00

xiangfu0 reviewed Apr 13, 2022

View reviewed changes

richardstartin added the flaky-test Tracks a test that intermittently fails label Apr 14, 2022

KKcorps approved these changes Apr 15, 2022

View reviewed changes

walterddr force-pushed the fix_pulsar_test branch from 7000f35 to 62bde1e Compare April 15, 2022 18:29

Rong Rong and others added 3 commits April 16, 2022 08:32

add retry

baf2d98

use wait for condition in set up

e966b77

also add waitForCondition in test checkers

4e23c04

walterddr force-pushed the fix_pulsar_test branch from 62bde1e to 4e23c04 Compare April 16, 2022 15:36

walterddr closed this Apr 16, 2022

walterddr deleted the fix_pulsar_test branch December 6, 2023 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix pulsar test instability#8538

fix pulsar test instability#8538
walterddr wants to merge 3 commits intoapache:masterfrom
walterddr:fix_pulsar_test

walterddr commented Apr 13, 2022

Uh oh!

xiangfu0 Apr 13, 2022

Uh oh!

xiangfu0 Apr 13, 2022

Uh oh!

walterddr Apr 13, 2022

Uh oh!

walterddr Apr 14, 2022

Uh oh!

KKcorps Apr 15, 2022

Uh oh!

KKcorps Apr 15, 2022 •

edited

Loading

Uh oh!

walterddr Apr 15, 2022

Uh oh!

KKcorps Apr 15, 2022 •

edited

Loading

Uh oh!

walterddr commented Apr 15, 2022

Uh oh!

codecov-commenter commented Apr 16, 2022

Uh oh!

walterddr commented Apr 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

walterddr commented Apr 13, 2022

Uh oh!

xiangfu0 Apr 13, 2022

Choose a reason for hiding this comment

Uh oh!

xiangfu0 Apr 13, 2022

Choose a reason for hiding this comment

Uh oh!

walterddr Apr 13, 2022

Choose a reason for hiding this comment

Uh oh!

walterddr Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

KKcorps Apr 15, 2022

Choose a reason for hiding this comment

Uh oh!

KKcorps Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

walterddr Apr 15, 2022

Choose a reason for hiding this comment

Uh oh!

KKcorps Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

walterddr commented Apr 15, 2022

Uh oh!

codecov-commenter commented Apr 16, 2022

Codecov Report

Uh oh!

walterddr commented Apr 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KKcorps Apr 15, 2022 •

edited

Loading

KKcorps Apr 15, 2022 •

edited

Loading