-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: kafka consumer data loss promble #1629
Conversation
Hi @luoluoyuyu thanks for the PR! |
Hi @dominikriemer you can find more related information in the following resources: I'm happy to contribute to the Streampipes community. |
So to be on the safe side, we should check how we commit our offsets, right? |
For the current default behavior, the kafka consumer will auto-commit the record every 5 seconds: https://kafka.apache.org/documentation/#consumerconfigs_auto.offset.reset while (isRunning) {
ConsumerRecords<byte[], byte[]> records = consumer.poll(duration);
records.forEach(record -> eventProcessor.onEvent(record.value()));
if (/*Disable auto commit*/) {
consumer.commitSync();
}
} Otherwise, we still have a high probability of consuming duplicate records within five seconds. For this PR, I propose exposing this config to users. Let users choose what they want. We could keep the same behavior as before(Use the latest offset). |
Yes, I think this method works, kafka default offsets expiration time is 7 days (https://kafka.apache.org/documentation/#brokerconfigs_offsets.retention.minutes), I think long enough if the user modifies the offsets expiration time. We can expose this configuration to users. Let users choose what they want |
should we put this program in another PR to complete |
Hi @luoluoyuyu @RobertIndie thanks for the explanation! Is this correct? Should we use |
I think we could just expose these configs to the Kafka adapter first to solve this adapter test issue. There are too many Kafka configurations, exposing them as environment variables would make it too complicated. If we subsequently need to allow users to configure more parameters for the Kafka messaging protocol, we can write these configurations to a file and expose an environment variable
Yes. That's correct. If we don't want to replay any messages that are produced before the consumer created and can tolerate messages loss, then we can use this approach to set the retention or use unique consumer group each time the consumer is created. I think the non-persistent topic feature from the Pulsar would be a better choice for this case. |
Yes. That's a different topic. Let's put it in another PR. |
Thanks for the explanation @RobertIndie! :-) |
Hi @luoluoyuyu, Thanks a lot for providing the PR. It looks good, and I see the problem you are solving. However, I agree with @dominikriemer that merging this PR as it is will break the behavior of running pipelines. Please correct me if I am wrong, but I would say that when a user does the following:
Expected behavior: The events between My suggestion would be to leave the consumer config Is there anything I'm not thinking about? Cheers, |
HI, @tenthe What do you think of this program? |
Can we pre-select the default option in the UI?
|
Hello @luoluoyuyu, |
Hi, @bossenti
Thank you for your suggestion, in the process of modification found that you should add the default configuration, streampipes in the process of creating adapters need to iterate through the configuration items, if there is an unconfigured option may throw an exception back, resulting in the failure of creating the adapter. |
Okay then I'd propose to remove the option to set Another aspect I'd like to discuss although I'm a bit late to the party here: I'm very sorry to be so late in bringing up this discussion, @loststar |
Hey @bossenti, |
Hi, |
Hi @bossenti |
Hi @loststar
This is a good idea, but I did not find the configuration and description of the advanced settings in the code, I think if we want to implement " advanced configuration " may need to create an issue to complete the " Advanced Configuration " function |
Okay, then I'm fine with it :)
I'm sorry @luoluoyuyu, this was addressed to you 🙂 |
Hi @bossenti
Sorry, I'll try to do better |
@luoluoyuyu don't worry, Tim only apologized for adressing the wrong username in his comment ;-) |
@luoluoyuyu wait wait... |
Ohhh 😁 |
Thank you very much @luoluoyuyu and sorry that this took so long to merge! |
See related discussion #1626