New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR Uncaught exception in herder work thread in Distributed mode #189
Comments
What version are you on and did you resolve this? We did used to see this sometimes and I think we tried to add some logging to give more info if it happened again because it was difficult to tell how things had gotten out of sync from looking at the code. |
I could't resolve it, I just restart everything to default following documentation and test again. So, the error disappeared |
Reproducing this consistently with Kafka Connect 1.1.0 against a 1.0.0 cluster. Repro steps are:
Symptoms, you'll get a timeout on the GET request and an exception like this:
Interestingly, it will succeed in creating the Resetting cluster repeatedly (by deleting Kafka Connect topics) does not resolve. This happened immediately after upgrading Kafka Connect from |
I may have just figured this out... it looks like the status topic has been eternally marked for deletion (confirming from within ZK). It's no longer listed in the broker metadata as an existing topic but it appears ZK never completed the delete. I think this causing a strange state where when Kafka Connect then tries to recreate it, Kafka can't service the request because ZK disagrees... not sure if this is the same root cause as the original reported issue but it's something. |
We ended up forcing the controllers to reset their "deleting state" of the Kafka Connect topics by:
|
Had the same bug, kafka-connect was creating the 3 partitions As the above solution, I ended up removing the topics, recreating them with 1 partition and restarting kafka-connect. |
I am unable to start kafka connect Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253) org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata Full logs :: [2019-12-25 11:36:31,409] INFO ConsumerConfig values: [2019-12-25 11:36:32,532] INFO Started o.e.j.s.ServletContextHandler@43045f9f{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:855) |
I had the same issue. Resolved by deleting the existing topics |
Apologies if this is obvious, but for larger or production grade workloads you will need more than 1 partition on some of those topics. This deletion state issue notwithstanding, just thought I should mention that. |
I am deployed kafka, zookeeper, kafka-connect using Strimzi in kuberentes. I updated kafka-connect image in kubernetes for RabbitMQSourceConnector plugin and it was working initially, after restart Kafka-Connect we get following error in Kafka-Connect. May I know the reason why it should give below error? 2021-10-08 06:07:13,122 ERROR [Worker clientId=connect-1, groupId=connect-cluster] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1] |
this worked for |
Hi.
I am trying to start Kafka connect (Confluent 3.1.1) in distributed mode but I am having some problems:
[2017-02-01 11:08:52,803] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:181)
java.lang.NullPointerException
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:444)
[2017-02-01 11:08:51,893] INFO Discovered coordinator inaki-P552LA:9092 (id: 2147483647 rack: null) for group test. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:555)
[2017-02-01 11:08:52,803] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:181)
java.lang.NullPointerException
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:444)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:424)
at org.apache.kafka.connect.util.KafkaBasedLog.poll(KafkaBasedLog.java:253)
at org.apache.kafka.connect.util.KafkaBasedLog.readToLogEnd(KafkaBasedLog.java:293)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:143)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore.start(KafkaConfigBackingStore.java:259)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:114)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:169)
at java.lang.Thread.run(Thread.java:745)
[2017-02-01 11:08:52,804] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:68)
[2017-02-01 11:08:52,804] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:154)
[2017-02-01 11:08:52,807] INFO Stopped ServerConnector@e3cee7b{HTTP/1.1}{0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:306)
[2017-02-01 11:08:52,813] INFO Stopped o.e.j.s.ServletContextHandler@120f38e6{/,null,UNAVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:865)
Is something similar to that
confluentinc/kafka-connect-hdfs#85
I did some steps mentioned and If I start workers before connect it works without NullPointerException, but when I send messages throw timeout errors
The text was updated successfully, but these errors were encountered: