Conversation
f139e59 to
7000f35
Compare
There was a problem hiding this comment.
do we know why this call may give fewer records? Is it a bug or something like messages not flushed entirely and available to consume or some messages are actually missing
There was a problem hiding this comment.
Shall we actually run a consumer to consume exact same number of messages then exit in setup() method?
There was a problem hiding this comment.
I dunno why this fixes the bug. My guess is that pulsar flush is async but I am not sure.
GitHub Actions sometimes catch interesting bug b/c the machines are much less powerful
There was a problem hiding this comment.
This method was added to handle the scenarios where messages are published in batches which actually changes the format of MessageIds in Pulsar.
The only way to publish in batch is to enable async producer. In sync mode, all messages are flushed as soon as they are received.
There was a problem hiding this comment.
@xiangfu0 your suggestion also works. I should actually add a method like waitForAllRecordsToFlush and only then exit the setup.
There was a problem hiding this comment.
according to the documentation https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Producer.html
- producer.flush() should wait until all the message finish publishing before returning (or throw exception); thus we can't check on the producer side
- only way to check and wait for all records reaches pulsar for consumption is to actually consume -- which is exactly what this test is doing.
so I guess we should do is wrap the test with
TestUtils.waitForCondition(aVoid -> {
// consume and assert
}, timeoutMs, "")
sounds good?
There was a problem hiding this comment.
There is another approach to just get topic stats. Since we are not producing messages in topics in any other method, it works for us.
I have implemented it here - #8554
once this this PR is merged, I'll pull changes into my PR and we can merge that as well.
7000f35 to
62bde1e
Compare
|
it still fails. I wonder if theres' anything wrong with the consumer itself that it didn't pull enough data out. alternatively I can still make the wait for condition checker in the test function too so that we can at least get better against the flakiness |
62bde1e to
4e23c04
Compare
Codecov Report
@@ Coverage Diff @@
## master #8538 +/- ##
==========================================
- Coverage 29.52% 25.73% -3.79%
==========================================
Files 1674 1674
Lines 87872 87872
Branches 13313 13313
==========================================
- Hits 25942 22615 -3327
- Misses 59551 63102 +3551
+ Partials 2379 2155 -224
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
This doesn't work. It seems like not a flakiness due to delay async produce. |
This fixes #8537