Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher() #921

Merged
merged 1 commit into from
Jan 3, 2023

Conversation

Gleiphir2769
Copy link
Contributor

@Gleiphir2769 Gleiphir2769 commented Dec 21, 2022

Master Issue: #918

Motivation

The two chanel clearMessageQueuesCh and clearQueueCb just need to keep one.

For more details please check #918.

This PR does not ony aim to clean up, but also fix the potential bug in clearMessageQueuesCh.

For example, the clearMessageQueuesCh do the jod including clearing messageCh. But it may cause problem when SeekByTime invoked on partition topic.

func (c *consumer) SeekByTime(time time.Time) error {
c.Lock()
defer c.Unlock()
var errs error
// run SeekByTime on every partition of topic
for _, cons := range c.consumers {
if err := cons.SeekByTime(time); err != nil {
msg := fmt.Sprintf("unable to SeekByTime for topic=%s subscription=%s", c.topic, c.Subscription())
errs = pkgerrors.Wrap(newError(SeekFailed, err.Error()), msg)
}
}
return errs
}

case doneCh := <-pc.clearMessageQueuesCh:
for len(pc.queueCh) > 0 {
<-pc.queueCh
}
for len(pc.messageCh) > 0 {
<-pc.messageCh
}
messages = nil

When consume the partition topic, all the partitionConsumer share the same messageCh. After SeekByTime on partitioned topic, the messageCh may be cleared more than one time which will cause the messages losing. Suppose there is such a situation, partitionConsumer-1 has cleared its messageCh and queueCh. When partitionConsumer-2 do the clear job, it can also exec this logic.

for len(pc.messageCh) > 0 {
<-pc.messageCh
}

But messageCh is a share chan, partitionConsumer-1 may received new messages and put them to messageCh at this moment. There is such a possibility that partitionConsumer-2 cleared the new messages from messageCh.

Modifications

  • Remove the clearMessageQueuesCh in partitionConsumer.dispatcher()
  • Modify the clearQueueCb in partitionConsumer.dispatcher()

Verifying this change

  • Make sure that the change passes the CI checks.

@Gleiphir2769 Gleiphir2769 changed the title [Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher() [Issue 918] [Refactor] Remove the clearMessageQueuesCh in partitionConsumer.dispatcher() Dec 21, 2022
}

close(doneCh)
clearQueueCb(nextMessageInQueue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you forget the reset available permits?

Copy link
Contributor Author

@Gleiphir2769 Gleiphir2769 Dec 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you forget the reset available permits?

Hi. @nodece, I don't think it's necessary to reset available permits. In the user side, if permits over the threshold, internalFlow will be invoked.

if ap >= flowThreshold {
availablePermits := ap
requestedPermits := ap
// check if permits changed
if !atomic.CompareAndSwapInt32(&p.permits, ap, 0) {
return
}
p.pc.log.Debugf("requesting more permits=%d available=%d", requestedPermits, availablePermits)
if err := p.pc.internalFlow(uint32(requestedPermits)); err != nil {
p.pc.log.WithError(err).Error("unable to send permits")
}
}

In the broker side, I checked the relative code and doesn't find any need to reset it. The Java Client does not reset it too.

So I think it's no need to reset the available permits. I don't understand why should reset it in the legacy code and I guess it's just a mearsure taken for safe.

@@ -138,15 +138,14 @@ type partitionConsumer struct {
// the size of the queue channel for buffering messages
queueSize int32
queueCh chan []*message
startMessageID trackingMessageID
startMessageID atomicMessageID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could atomic.Value{} instead of this?

Copy link
Contributor Author

@Gleiphir2769 Gleiphir2769 Dec 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could atomic.Value{} instead of this?

I think atomicMessageID is better than atomic.Value{} because it's more simple and clearly.

For example, if we use atomic.Value as the startMessageID type, the original using of startMessageID will be like this.

// original L985
return pc.startMessageID.greater(msgID.messageID)

// atomicMessageID
return pc.startMessageID.get().greater(msgID.messageID)

// atomic.Value{}
return pc.startMessageID.Load().(trackingMessageID).greater(msgID.messageID)

atomic.Value{} will need one more time type assertion.

But atomicMessageID may need a better name and declear position. Do you have more idea? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -3177,6 +3177,7 @@ func TestConsumerSeekByTimeOnPartitionedTopic(t *testing.T) {
// should be able to consume all messages once again
for i := 0; i < N; i++ {
msg, err := consumer.Receive(ctx)
fmt.Println(string(msg.Payload()) + "-" + strconv.Itoa(i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt.Println(string(msg.Payload()) + "-" + strconv.Itoa(i))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

Copy link
Member

@nodece nodece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants