Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel consumer stops processing data sometimes #606

Closed
bdeneuter opened this issue Jul 19, 2023 · 7 comments
Closed

Parallel consumer stops processing data sometimes #606

bdeneuter opened this issue Jul 19, 2023 · 7 comments

Comments

@bdeneuter
Copy link

Hi,

In production we noticed that the parallel consumer (0.5.2.5) sometimes stops processing data. It is a similar problem like #547.
We can not reproduce the problem but we noticed the following:

  • the number of fetches goes to 0 or close to 0
  • a restart fixes the problem

When looking into the fix for 547, I was wondering if the fix is complete?
The fix validates if a WorkContainer is stale and end the flight of the WorkContainer and changes numberRecordsOutForProcessing
https://github.com/confluentinc/parallel-consumer/blob/master/parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/state/WorkManager.java#L251

Are those WorkContainers also cleaned up from the ProcessingShard which keeps a set of WorkContainers and is used to fetch work?
https://github.com/confluentinc/parallel-consumer/blob/master/parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/state/ProcessingShard.java#L44

For the succes and failure case, the ShardManager is being called while for the stale case the ShardManager is not called:
https://github.com/confluentinc/parallel-consumer/blob/master/parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/state/WorkManager.java#L152
https://github.com/confluentinc/parallel-consumer/blob/master/parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/state/WorkManager.java#L173

Kind regards,
Bart

@johnbyrnejb
Copy link
Contributor

We will investigate but we are planning to release a new version this week which will include metrics. Can you upgrade your version at that point and see if you can capture any more information that may be useful to triage?

Thanks

@bdeneuter
Copy link
Author

Looking forward for the new version. Metrics would definitely help us to investigate this.

@krvajal
Copy link

krvajal commented Jul 31, 2023

I am experiencing the same issue. Looking forward to the metrics update

@acktsap
Copy link
Contributor

acktsap commented Aug 6, 2023

Hi, i also experiencing the same issue..

@sangreal
Copy link
Contributor

sangreal commented Aug 7, 2023

I also experience this issue and after some digging I find the root cause, I am trying to fix this on my end, maybe will create a PR for this later.

@10000-ki
Copy link

10000-ki commented Aug 7, 2023

Hi, i also experiencing the same issue too..

@eddyv
Copy link
Member

eddyv commented Aug 25, 2023

Closed by #623

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants