Fetch Pulsar offsets from Consumer interface instead of Reader #8017

KKcorps · 2022-01-13T18:31:18Z

The Pulsar Plugin fails to consume data if the auto.offset.reset property is set to largest. The reason is the reader interface always resets to after the last message in the topic. The solution is either to use large fetch timeouts so that records pushed are consumed before a new pulsar consumer is created.
OR
Ditch the Pulsar Reader interface and use the Consumer interface. The Consumer interface can return the last valid message-id in the topic and hence the PulsarConsumer can begin consumption after that.

codecov-commenter · 2022-01-13T19:30:38Z

Codecov Report

Merging #8017 (d0fb9a4) into master (35cef48) will decrease coverage by 1.78%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master    #8017      +/-   ##
============================================
- Coverage     71.34%   69.56%   -1.79%     
- Complexity     4215     4279      +64     
============================================
  Files          1596     1664      +68     
  Lines         82778    87342    +4564     
  Branches      12348    13227     +879     
============================================
+ Hits          59062    60758    +1696     
- Misses        19728    22299    +2571     
- Partials       3988     4285     +297

Flag	Coverage Δ
integration1	`27.07% <ø> (-1.89%)`	⬇️
integration2	`?`
unittests1	`67.02% <ø> (-1.09%)`	⬇️
unittests2	`14.13% <ø> (-0.17%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...t/core/plan/StreamingInstanceResponsePlanNode.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ore/operator/streaming/StreamingResponseUtils.java	`0.00% <0.00%> (-100.00%)`	⬇️
...ager/realtime/PeerSchemeSplitSegmentCommitter.java	`0.00% <0.00%> (-100.00%)`	⬇️
...pache/pinot/common/utils/grpc/GrpcQueryClient.java	`0.00% <0.00%> (-94.74%)`	⬇️
...he/pinot/core/plan/StreamingSelectionPlanNode.java	`0.00% <0.00%> (-88.89%)`	⬇️
...ator/streaming/StreamingSelectionOnlyOperator.java	`0.00% <0.00%> (-87.81%)`	⬇️
...re/query/reduce/SelectionOnlyStreamingReducer.java	`0.00% <0.00%> (-85.72%)`	⬇️
...data/manager/realtime/DefaultSegmentCommitter.java	`0.00% <0.00%> (-80.00%)`	⬇️
...ller/api/access/BasicAuthAccessControlFactory.java	`0.00% <0.00%> (-80.00%)`	⬇️
...ctionaryBasedSingleColumnDistinctOnlyExecutor.java	`0.00% <0.00%> (-80.00%)`	⬇️
... and 523 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35cef48...d0fb9a4. Read the comment docs.

…ogic

mathieudruart · 2022-02-03T20:35:21Z

Hi @KKcorps

We have tested your PR and it seems to miss messages, it seems to have an issue in the method getNextStreamParitionMsgOffsetAtIndex : if the message is part of a Pulsar batch (BatchMessageIdImpl), you add +1 to the entry id every time, that doesn't seem to be correct because in fact the next message id will have only the batch index incremented with the same entry id (all messages inside a Pulsar batch share the same entry id).
We tried with this version of the method, and it seems to get all messages correctly :

  @Override
  public StreamPartitionMsgOffset getNextStreamParitionMsgOffsetAtIndex(int index) {
    MessageIdImpl currentMessageId = MessageIdImpl.convertToMessageIdImpl(_messageList.get(index).getMessageId());
    MessageId nextMessageId;
    
    long currentLedgerId = currentMessageId.getLedgerId();
    long currentEntryId = currentMessageId.getEntryId();
    int currentPartitionIndex = currentMessageId.getPartitionIndex();
    
    if (currentMessageId instanceof BatchMessageIdImpl) {
      int currentBatchIndex = ((BatchMessageIdImpl) currentMessageId).getBatchIndex();
      int currentBatchSize = ((BatchMessageIdImpl) currentMessageId).getBatchSize();
      
      if (currentBatchIndex < currentBatchSize - 1) {
        nextMessageId = new BatchMessageIdImpl(currentLedgerId, currentEntryId,
                currentPartitionIndex, currentBatchIndex + 1, currentBatchSize, 
                ((BatchMessageIdImpl) currentMessageId).getAcker());
      } else {
        nextMessageId = new BatchMessageIdImpl(currentLedgerId, currentEntryId + 1,
                currentPartitionIndex, 0, currentBatchSize, ((BatchMessageIdImpl) currentMessageId).getAcker());
      }
    } else {
      nextMessageId =
              DefaultImplementation.newMessageId(currentLedgerId, currentEntryId + 1,
                      currentPartitionIndex);
    }
    return new MessageIdStreamOffset(nextMessageId);
  }

KKcorps · 2022-02-03T20:41:36Z

Hi
thanks a lot for pointing it out. I am also not a fan of the increment methodology but pulsar doesn't offer a neat interface to get next offset like Kafka.
I will test and incorporate your patch. I think our tests currently cover only non-batch use cases and hence they missed this.

KKcorps · 2022-02-04T07:59:09Z

@mathieudruart can you send me some doc/examples to produce batch message scenarios where the code fails.
Need it to fix the test cases.

mathieudruart · 2022-02-14T22:06:19Z

@mathieudruart can you send me some doc/examples to produce batch message scenarios where the code fails. Need it to fix the test cases.

Hi @KKcorps, here is an example scenario with which we reproduce the problem :

activate infinite retention policy on a Pulsar namespace (this will make it easier to reproduce the problem) : pulsar-admin namespaces set-retention my-tenant/my-ns --size -1 --time -1
use a Pulsar client to push messages with batch activated into a topic of the previous namespace (the Pulsar java client activates batches by default)
create a table into Pinot linked to the previous topic (all messages will be loaded correctly)
send new batch messages into the topic => the first messages will be skipped (the number of skipped messages depends on the size of the last correctly consumed batch)

Don't hesitate to ask me for clarification.

KKcorps · 2022-02-20T16:27:56Z

@mathieudruart Thanks for the help. I reproduced the scenario and it works fine with new code. I have also added a few unit test cases for Batch Id. The test cases still don't cover your scenario since it falls under Integration Test. I have added that as well but need to sort out a few dependency conflicts for test package before raising a PR. You can find the code here - https://github.com/KKcorps/incubator-pinot/blob/pulsar_integration_test/pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/RealtimePulsarIntegrationTest.java

mayankshriv · 2022-03-10T22:55:07Z

@npawar @navina please take a look.

mathieudruart · 2022-03-10T22:59:56Z

With the latest version of this patch, we have no more issues.

...pulsar/src/main/java/org/apache/pinot/plugin/stream/pulsar/PulsarStreamMetadataProvider.java

npawar

lgtm!
Only 1 optional comment, feel free to merge with or without it

...ingestion/pinot-pulsar/src/main/java/org/apache/pinot/plugin/stream/pulsar/PulsarConfig.java

walterddr · 2022-04-14T00:12:36Z

...ion/pinot-pulsar/src/test/java/org/apache/pinot/plugin/stream/pulsar/PulsarConsumerTest.java

+            public int choosePartition(Message<?> msg, TopicMetadata metadata) {
+              return partition;
+            }
+          }).batchingMaxMessages(BATCH_SIZE).batchingMaxPublishDelay(1, TimeUnit.SECONDS).create();


any chance this is related to recent flakiness in #8537 ?

Fetch offsets from consumer interface instead of reader

075dff4

KKcorps added 5 commits January 14, 2022 12:41

Add timeout for getLastMessageId

a835d2e

Add todo

885b18f

Include start message id when computing data

7b1d8ce

Handle batch message ids as well

a4b95ea

Use pulsar util to generate random consumer name instead of pinot's l…

a968f15

…ogic

KKcorps marked this pull request as ready for review January 16, 2022 21:05

xiangfu0 linked an issue Jan 24, 2022 that may be closed by this pull request

Fix the issue with "pinot-pulsar" module (potentially library conflicts) #7270

Closed

xiangfu0 requested review from xiangfu0 and npawar January 24, 2022 17:59

Increment batch id for batch message's next offset

0aebfd2

Add test for batch message ids

5b421d6

npawar reviewed Mar 11, 2022

View reviewed changes

KKcorps added 3 commits April 5, 2022 01:58

Close consumers properly and honour offset criteria in arguments

72aec7a

Remove default for subscription initial position

35fbf59

Honour config passed in function

4927151

KKcorps requested a review from npawar April 4, 2022 20:48

handle null

da72132

npawar approved these changes Apr 6, 2022

View reviewed changes

...ingestion/pinot-pulsar/src/main/java/org/apache/pinot/plugin/stream/pulsar/PulsarConfig.java Outdated Show resolved Hide resolved

Move method to pulsar utils

d0fb9a4

KKcorps merged commit d7be2ef into apache:master Apr 6, 2022

walterddr mentioned this pull request Apr 14, 2022

fix pulsar test instability #8538

Closed

walterddr reviewed Apr 14, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch Pulsar offsets from Consumer interface instead of Reader #8017

Fetch Pulsar offsets from Consumer interface instead of Reader #8017

KKcorps commented Jan 13, 2022

codecov-commenter commented Jan 13, 2022 •

edited

mathieudruart commented Feb 3, 2022

KKcorps commented Feb 3, 2022

KKcorps commented Feb 4, 2022

mathieudruart commented Feb 14, 2022

KKcorps commented Feb 20, 2022 •

edited

mayankshriv commented Mar 10, 2022

mathieudruart commented Mar 10, 2022

npawar left a comment

walterddr Apr 14, 2022

Fetch Pulsar offsets from Consumer interface instead of Reader #8017

Fetch Pulsar offsets from Consumer interface instead of Reader #8017

Conversation

KKcorps commented Jan 13, 2022

codecov-commenter commented Jan 13, 2022 • edited

Codecov Report

mathieudruart commented Feb 3, 2022

KKcorps commented Feb 3, 2022

KKcorps commented Feb 4, 2022

mathieudruart commented Feb 14, 2022

KKcorps commented Feb 20, 2022 • edited

mayankshriv commented Mar 10, 2022

mathieudruart commented Mar 10, 2022

npawar left a comment

Choose a reason for hiding this comment

walterddr Apr 14, 2022

Choose a reason for hiding this comment

codecov-commenter commented Jan 13, 2022 •

edited

KKcorps commented Feb 20, 2022 •

edited