Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior when using Kafka API to consume event hub #339

Closed
warrenzhu25 opened this issue Jun 15, 2018 · 6 comments
Closed

Inconsistent behavior when using Kafka API to consume event hub #339

warrenzhu25 opened this issue Jun 15, 2018 · 6 comments
Assignees

Comments

@warrenzhu25
Copy link

warrenzhu25 commented Jun 15, 2018

Actual Behavior

  1. The consumer group in Kafka not listed in Azure portal
  2. The commit of this consumer group work as expected. Where commit offset is stored for Kafka API?
  3. Consumer in same consume group receives duplicated message

Expected Behavior

  1. Consumer group should be same whether in Kafka or Event hub concept
  2. Consumer in same consume group should only receive one copy of message

More details here https://seroter.wordpress.com/2018/05/29/how-to-use-the-kafka-interface-of-azure-event-hubs-with-spring-cloud-stream/

@hmlam
Copy link
Member

hmlam commented Jun 16, 2018

Thanks for your interest in the Kafka feature. to answer your questions:

  1. It is a known issue we are tracking internally. Kafka consumer group is actually slightly different from EventHub consumer group technically, but we are trying design our system in a way that would make this seamless to customers.
  2. Commit data is stored on the service side, which currently you can use Kafka's kafka-consumer-groups.sh command to query.
  3. I will try to get a repro on this one but if you have a namespace/entity/group/timerange combo for a repro I can track that down and take a look.

@JamesBirdsall
Copy link
Contributor

No activity on this issue for four months. If you have any additional info or comments, feel free to reopen, but for now I am closing.

@fpandyz
Copy link

fpandyz commented Nov 6, 2018

This is a highly critical issue for our team. Azure Events Hub Kafka can't be useful because of it.
It doesn't make sense if you cannot run several instances of competitive consumers from the same group. Thus you can't parallelize consumption in the groups.
In our case we do manual commit after message processing but in case of 2 competitive consumers from the same consumer group the second one will fail on offset commit.

@hmlam
Copy link
Member

hmlam commented Nov 6, 2018

If I understand your description correctly you are saying that you have 2 consumers in the same group processed the same message twice. Just to clarify, do you know if this is before or after the offset has been checkpointed or not? If the offset is within the range of the checkpoint that the message should not have returned to you. However, this all depends on how often are you checkpointing.

If you have trace/log to ensure that the offset is checkpointed and you still getting the duplicate message, do you have a repro with the following information so we can look into this more?

  1. Your EventHub namespace name.
  2. UTC time range of your repro where you receive the same message twice (I expect you would have 2 timestamps in this case).
  3. your client or member Id(s) where your two consumers are getting the same messages.

@hmlam hmlam reopened this Nov 6, 2018
@sjkwak
Copy link
Member

sjkwak commented Jan 16, 2019

Please reopen it if there is an issue and any help is needed.

@sjkwak sjkwak closed this as completed Jan 16, 2019
@deb-amit-84
Copy link

Hi All,
Not sure if this issue got resolved or not, but I am facing problem of duplicate messages while consuming from kafka enabled azure event hub.
we are doing load testing with around 30k messages,All these 30k messages are first consumed at our side from external source system on confluent kafka and then we do data processing/transformation and send it to our internal kafka topics on azure(kafka enabled azure event hub) from where our downstream applications consume these messages.
We are able to consume messages(>30K) with no issue from confluent kafka, but once data transformation is done and these messages are sent to event hub topic for downstream applications to consume, thats where we observed that for incoming messages with volumes(around >400 msgs) , messages are getting reprocessed(duplicated) even after manual acknowledgement of each record at the consumer side.
Our java service are using spring cloud stream kafka : Hoxton.RELEASE for consuming messages. as mentioned in earlier posts the issue happened for two consumers in a consumer group, but in our case its one topic and one consumer in a consumer group, still getting duplicate messages .

would appreciate if someone can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants