ERROR Uncaught exception in herder work thread in Distributed mode #189

aldobd · 2017-02-01T18:21:26Z

Hi.

I am trying to start Kafka connect (Confluent 3.1.1) in distributed mode but I am having some problems:

[2017-02-01 11:08:52,803] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:181)
java.lang.NullPointerException
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:444)
[2017-02-01 11:08:51,893] INFO Discovered coordinator inaki-P552LA:9092 (id: 2147483647 rack: null) for group test. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:555)
[2017-02-01 11:08:52,803] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:181)
java.lang.NullPointerException
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:444)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:424)
at org.apache.kafka.connect.util.KafkaBasedLog.poll(KafkaBasedLog.java:253)
at org.apache.kafka.connect.util.KafkaBasedLog.readToLogEnd(KafkaBasedLog.java:293)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:143)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore.start(KafkaConfigBackingStore.java:259)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:114)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:169)
at java.lang.Thread.run(Thread.java:745)
[2017-02-01 11:08:52,804] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:68)
[2017-02-01 11:08:52,804] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:154)
[2017-02-01 11:08:52,807] INFO Stopped ServerConnector@e3cee7b{HTTP/1.1}{0.0.0.0:8083} (org.eclipse.jetty.server.ServerConnector:306)
[2017-02-01 11:08:52,813] INFO Stopped o.e.j.s.ServletContextHandler@120f38e6{/,null,UNAVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:865)

Is something similar to that
confluentinc/kafka-connect-hdfs#85

I did some steps mentioned and If I start workers before connect it works without NullPointerException, but when I send messages throw timeout errors

ewencp · 2017-02-03T02:39:11Z

What version are you on and did you resolve this? We did used to see this sometimes and I think we tried to add some logging to give more info if it happened again because it was difficult to tell how things had gotten out of sync from looking at the code.

aldobd · 2017-02-23T16:33:34Z

I could't resolve it, I just restart everything to default following documentation and test again. So, the error disappeared

txbm · 2018-11-02T20:42:28Z

Reproducing this consistently with Kafka Connect 1.1.0 against a 1.0.0 cluster. Repro steps are:

Shutdown Kafka connect
Delete Kafka Connect topics (offsets, status, configs)
(assuming auto-creation enabled in cluster) Start Kafka Connect back up
Send GET request to https://my-kafka-connect/connectors

Symptoms, you'll get a timeout on the GET request and an exception like this:

Uncaught exception in herder work thread, exiting:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadat

Interestingly, it will succeed in creating the offsets topic, but not the config or status topics.

Resetting cluster repeatedly (by deleting Kafka Connect topics) does not resolve.

This happened immediately after upgrading Kafka Connect from 1.0 to 1.1. No other changes to cluster or deployment environment. @ewencp

txbm · 2018-11-02T20:49:05Z

I may have just figured this out... it looks like the status topic has been eternally marked for deletion (confirming from within ZK). It's no longer listed in the broker metadata as an existing topic but it appears ZK never completed the delete. I think this causing a strange state where when Kafka Connect then tries to recreate it, Kafka can't service the request because ZK disagrees... not sure if this is the same root cause as the original reported issue but it's something.

txbm · 2018-11-02T22:11:10Z

We ended up forcing the controllers to reset their "deleting state" of the Kafka Connect topics by:

Shutdown Kafka connect
Running deletes from kafka-topics.sh
Deleting ZK entries
Creating topics (will appear to fail if controllers are still in "deleting" state but that's ok)
Run deletes again from kafka-topics.sh (ensures resetting controllers "deleting" state)
Restart Kafka Connect to automatically create topics for real

gmanolache · 2019-10-02T09:19:48Z

Had the same bug, kafka-connect was creating the 3 partitions connect-configs, connect-offsets and connect-status with the default number of partitions (in my case 64 as defined in the kafka server properties) and was causing this issue.

As the above solution, I ended up removing the topics, recreating them with 1 partition and restarting kafka-connect.

liranshaked · 2019-12-25T11:45:06Z

I am unable to start kafka connect
See in the kafka connect logs :

Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253)

org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata

Full logs ::

[2019-12-25 11:36:31,409] INFO ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = connect-cluster
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.ByteArrayDeserializer
(org.apache.kafka.clients.consumer.ConsumerConfig:347)
[2019-12-25 11:36:31,448] WARN The configuration 'config.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,448] WARN The configuration 'status.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,448] WARN The configuration 'plugin.path' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'internal.key.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'config.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'value.converter.schema.registry.url' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'internal.key.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'status.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'internal.value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'internal.value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,449] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,450] WARN The configuration 'offset.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,450] WARN The configuration 'value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,450] WARN The configuration 'key.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,450] WARN The configuration 'key.converter.schema.registry.url' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:355)
[2019-12-25 11:36:31,450] INFO Kafka version: 5.3.0-ccs (org.apache.kafka.common.utils.AppInfoParser:117)
[2019-12-25 11:36:31,450] INFO Kafka commitId: 6481debc2be778ee (org.apache.kafka.common.utils.AppInfoParser:118)
[2019-12-25 11:36:31,451] INFO Kafka startTimeMs: 1577273791450 (org.apache.kafka.common.utils.AppInfoParser:119)
[2019-12-25 11:36:31,533] INFO [Producer clientId=producer-1] Cluster ID: 3gSpLKOtT8y-axt0zWl0oA (org.apache.kafka.clients.Metadata:261)
[2019-12-25 11:36:31,634] INFO [Consumer clientId=consumer-1, groupId=connect-cluster] Cluster ID: 3gSpLKOtT8y-axt0zWl0oA (org.apache.kafka.clients.Metadata:261)
Dec 25, 2019 11:36:32 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.RootResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.RootResource will be ignored.
Dec 25, 2019 11:36:32 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will be ignored.
Dec 25, 2019 11:36:32 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
Dec 25, 2019 11:36:32 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: The (sub)resource method listConnectors in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method createConnector in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource contains empty path annotation.
WARNING: The (sub)resource method serverInfo in org.apache.kafka.connect.runtime.rest.resources.RootResource contains empty path annotation.

[2019-12-25 11:36:32,532] INFO Started o.e.j.s.ServletContextHandler@43045f9f{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:855)
[2019-12-25 11:36:32,532] INFO REST resources initialized; server is started and ready to handle requests (org.apache.kafka.connect.runtime.rest.RestServer:233)
[2019-12-25 11:36:32,532] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect:56)
[2019-12-25 11:37:31,457] ERROR [Worker clientId=connect-1, groupId=connect-cluster] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder:253)
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
[2019-12-25 11:37:31,504] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:66)
[2019-12-25 11:37:31,519] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:241)
[2019-12-25 11:37:31,552] INFO Stopped http_8083@24361cfc{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector:341)
[2019-12-25 11:37:31,571] INFO node0 Stopped scavenging (org.eclipse.jetty.server.session:167)
[2019-12-25 11:37:31,706] INFO REST server stopped (org.apache.kafka.connect.runtime.rest.RestServer:258)
[2019-12-25 11:37:31,706] INFO [Worker clientId=connect-1, groupId=connect-cluster] Herder stopping (org.apache.kafka.connect.runtime.distributed.DistributedHerder:533)
[2019-12-25 11:37:36,749] INFO [Worker clientId=connect-1, groupId=connect-cluster] Herder stopped (org.apache.kafka.connect.runtime.distributed.DistributedHerder:553)
[2019-12-25 11:37:36,749] INFO Kafka Connect stopped (org.apache.kafka.connect.runtime.Connect:71)

ishanivij · 2020-09-16T08:27:22Z

I had the same issue. Resolved by deleting the existing topics connect-configs, connect-status, connect-offsets and recreating with number of partitions =1 for all. (As suggested by @gmanolache )

txbm · 2020-09-17T18:37:25Z

I had the same issue. Resolved by deleting the existing topics connect-configs, connect-status, connect-offsets and recreating with number of partitions =1 for all. (As suggested by @gmanolache )

Apologies if this is obvious, but for larger or production grade workloads you will need more than 1 partition on some of those topics. This deletion state issue notwithstanding, just thought I should mention that.

shivusajjan · 2021-10-08T11:36:17Z

I am deployed kafka, zookeeper, kafka-connect using Strimzi in kuberentes. I updated kafka-connect image in kubernetes for RabbitMQSourceConnector plugin and it was working initially, after restart Kafka-Connect we get following error in Kafka-Connect. May I know the reason why it should give below error?

2021-10-08 06:07:13,122 ERROR [Worker clientId=connect-1, groupId=connect-cluster] Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1]
java.lang.NullPointerException
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:685)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore$ConsumeCallback.onCompletion(KafkaConfigBackingStore.java:481)
at org.apache.kafka.connect.util.KafkaBasedLog.poll(KafkaBasedLog.java:264)
at org.apache.kafka.connect.util.KafkaBasedLog.readToLogEnd(KafkaBasedLog.java:287)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:154)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore.start(KafkaConfigBackingStore.java:265)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:123)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:277)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-10-08 06:07:13,123 INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect) [Thread-10]
2021-10-08 06:07:13,125 INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer) [Thread-10]
2021-10-08 06:07:13,130 INFO Stopped http_8083@39c11e6c{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector) [Thread-10]
2021-10-08 06:07:13,130 INFO node0 Stopped scavenging (org.eclipse.jetty.server.session) [Thread-10]
2021-10-08 06:07:13,131 INFO REST server stopped (org.apache.kafka.connect.runtime.rest.RestServer) [Thread-10]
2021-10-08 06:07:13,131 INFO [Worker clientId=connect-1, groupId=connect-cluster] Herder stopping (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [Thread-10]
2021-10-08 06:07:18,132 INFO [Worker clientId=connect-1, groupId=connect-cluster] Herder stopped (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [Thread-10]
2021-10-08 06:07:18,132 INFO Kafka Connect stopped (org.apache.kafka.connect.runtime.Connect) [Thread-10]

AllanNyasha · 2023-01-26T09:30:54Z

Reproducing this consistently with Kafka Connect 1.1.0 against a 1.0.0 cluster. Repro steps are:
1. Shutdown Kafka connect

2. Delete Kafka Connect topics (offsets, status, configs)

3. (assuming auto-creation enabled in cluster) Start Kafka Connect back up

4. Send GET request to https://my-kafka-connect/connectors
Symptoms, you'll get a timeout on the GET request and an exception like this:
Uncaught exception in herder work thread, exiting:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadat
Interestingly, it will succeed in creating the offsets topic, but not the config or status topics.

Resetting cluster repeatedly (by deleting Kafka Connect topics) does not resolve.

This happened immediately after upgrading Kafka Connect from 1.0 to 1.1. No other changes to cluster or deployment environment. @ewencp

this worked for

aldobd closed this as completed Feb 2, 2017

junaidrazzaq323 mentioned this issue Nov 5, 2018

Error while attempting to create/find topic(s) 'connect-offsets' lensesio/stream-reactor#508

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR Uncaught exception in herder work thread in Distributed mode #189

ERROR Uncaught exception in herder work thread in Distributed mode #189

aldobd commented Feb 1, 2017 •

edited

ewencp commented Feb 3, 2017

aldobd commented Feb 23, 2017

txbm commented Nov 2, 2018 •

edited

txbm commented Nov 2, 2018

txbm commented Nov 2, 2018 •

edited

gmanolache commented Oct 2, 2019

liranshaked commented Dec 25, 2019 •

edited

ishanivij commented Sep 16, 2020

txbm commented Sep 17, 2020

shivusajjan commented Oct 8, 2021 •

edited

AllanNyasha commented Jan 26, 2023

ERROR Uncaught exception in herder work thread in Distributed mode #189

ERROR Uncaught exception in herder work thread in Distributed mode #189

Comments

aldobd commented Feb 1, 2017 • edited

ewencp commented Feb 3, 2017

aldobd commented Feb 23, 2017

txbm commented Nov 2, 2018 • edited

txbm commented Nov 2, 2018

txbm commented Nov 2, 2018 • edited

gmanolache commented Oct 2, 2019

liranshaked commented Dec 25, 2019 • edited

ishanivij commented Sep 16, 2020

txbm commented Sep 17, 2020

shivusajjan commented Oct 8, 2021 • edited

AllanNyasha commented Jan 26, 2023

aldobd commented Feb 1, 2017 •

edited

txbm commented Nov 2, 2018 •

edited

txbm commented Nov 2, 2018 •

edited

liranshaked commented Dec 25, 2019 •

edited

shivusajjan commented Oct 8, 2021 •

edited