Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

JoinGroup failed: Coordinator 162.161.161.165:9092 (id: 466675684 rack: null) is loading the group. #1550

Open
LHZ-Y opened this issue Oct 28, 2022 · 7 comments
Labels

Comments

@LHZ-Y
Copy link

LHZ-Y commented Oct 28, 2022

(1)while performing the performance test, the consumer client appears:
10:01:08.001 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-mcPyxmw-1, groupId=sub-000-mcPyxmw] (Re-)joining group
10:01:08.003 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-mcPyxmw-1, groupId=sub-000-mcPyxmw] JoinGroup failed: Coordinator 162.161.161.165:9092 (id: 466675684 rack: null) is loading the group.

(2)Building a cluster:pulsar2.3.9+kop2.3.9.11,zookeeper uses three machines ,The broker and bookie are on three other machines together.

(3)The producer client and the consumer client are on one machine,tool for testing is openmessaging-benchmark ,The command used is:bin/benchmark -d driver-kop/kafka_to_kafka.yaml workloads/1-topic-1-partition-1kb.yaml

(4)The following logs are generated on the client:
08:23:02.033 [main] INFO - Waiting for consumers to be ready
08:23:02.039 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] Cluster ID: cluster-kop
08:23:02.050 [kafka-producer-network-thread | producer-1] INFO - [Producer clientId=producer-1] Cluster ID: cluster-kop
08:23:02.051 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] Discovered group coordinator 162.161.161.165:9092 (id: 1341109334 rack: null)
08:23:02.054 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] (Re-)joining group
08:23:02.075 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] JoinGroup failed: Coordinator 162.161.161.165:9092 (id: 1341109334 rack: null) is loading the group.
08:23:02.076 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] (Re-)joining group
08:23:02.078 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] JoinGroup failed: Coordinator 162.161.161.165:9092 (id: 1341109334 rack: null) is loading the group.
08:23:02.078 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] (Re-)joining group
08:23:02.079 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] JoinGroup failed: Coordinator 162.161.161.165:9092 (id: 1341109334 rack: null) is loading the group.
08:23:02.080 [pool-2-thread-1] INFO - [Consumer clientId=consumer-sub-000-e0wnzzg-1, groupId=sub-000-e0wnzzg] (Re-)joining group
。。。。。。。

@LHZ-Y LHZ-Y added the type/bug label Oct 28, 2022
@BewareMyPower
Copy link
Collaborator

Could you upload the broker's logs?

@LHZ-Y
Copy link
Author

LHZ-Y commented Oct 29, 2022

12:29:34.818 [pulsar-io-4-5] INFO io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder - close channel [id: 0x24b76ec4, L:/10.1.0.216:9092 ! R:/162.161.161.163:35882] with 0 pending responses
12:29:34.818 [pulsar-io-4-6] ERROR io.streamnative.pulsar.handlers.kop.KafkaRequestHandler - Caught error in handler, closing channel
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
12:29:34.819 [pulsar-io-4-6] INFO io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder - close channel [id: 0x9c135745, L:/10.1.0.216:9092 - R:/162.161.161.163:20303] with 0 pending responses
12:29:34.818 [pulsar-io-4-7] ERROR io.streamnative.pulsar.handlers.kop.KafkaRequestHandler - Caught error in handler, closing channel
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
12:29:34.819 [pulsar-io-4-7] INFO io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder - close channel [id: 0xd63f63bb, L:/10.1.0.216:9092 - R:/162.161.161.163:4686] with 0 pending responses
12:29:34.823 [pulsar-io-4-7] WARN io.streamnative.pulsar.handlers.kop.KafkaTopicConsumerManager - [[id: 0xd63f63bb, L:/10.1.0.216:9092 ! R:/162.161.161.163:4686]] Error deleting cursor kop-consumer-cursor-persistent://kafka-data/default/test-topic-Ue3R_Ng-0000-partition-0-69-0-6ea02fcdb7 for topic persistent://kafka-data/default/test-topic-Ue3R_Ng-0000-partition-0 for reason: TopicConsumerManager close but cursor is still outstanding.
org.apache.bookkeeper.mledger.ManagedLedgerException$CursorNotFoundException: ManagedCursor not found: kop-consumer-cursor-persistent://kafka-data/default/test-topic-Ue3R_Ng-0000-partition-0-69-0-6ea02fcdb7

@LHZ-Y
Copy link
Author

LHZ-Y commented Oct 29, 2022

10.1.0.216 is the intranet IP of 162.161.161.165

@BewareMyPower
Copy link
Collaborator

These logs show the Kafka client closed the connection actively. The WARN logs Error deleting cursor can be ignored. Not much info can be inferred.

Could you upload the complete logs?

image

@LHZ-Y
Copy link
Author

LHZ-Y commented Nov 2, 2022

The same problem occurred just now. I checked the broker log and found a large number of these errors :

08:09:54.305 [pulsar-io-4-3] WARN  org.apache.pulsar.broker.service.BrokerService - Namespace bundle for topic (persistent://kafka-data/default/test-topic-ItHZCWM-0016-partition-0) not served by this instance. Please redo the lookup. Request is denied: namespace=kafka-data/default
08:09:54.305 [pulsar-io-4-3] WARN  io.streamnative.pulsar.handlers.kop.KafkaTopicManager - Get partition-0 error [Namespace bundle for topic (persistent://kafka-data/default/test-topic-ItHZCWM-0016-partition-0) not served by this instance. Please redo the lookup. Request is denied: namespace=kafka-data/default].
08:09:54.305 [pulsar-io-4-3] WARN  org.apache.pulsar.broker.service.BrokerService - Namespace bundle for topic (persistent://kafka-data/default/test-topic-ItHZCWM-0016) not served by this instance. Please redo the lookup. Request is denied: namespace=kafka-data/default
08:15:58.131 [ForkJoinPool.commonPool-worker-11] ERROR io.streamnative.pulsar.handlers.kop.coordinator.group.GroupMetadataManager - Offset commit {persistent://kafka-data/def             ault/test-topic-x8buIe0-0015-0=[OffsetMetadata[235671,NO_METADATA],CommitTime 1667290557615,ExpirationTime 1667549757615]} from group sub-000-XJpk6tQ, consumer consumer-sub-000-XJpk6tQ-13-b4697a31-8317-4701-9e46-9cc0c4bd3cc9 with generation 1 failed when appending to log due to
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$AlreadyClosedException: The producer admq-cluster-xop-12-0 of the topic kafka-meta/__kafka/__consumer_offsets-partition-34 was already closed when closing the producers
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
        at org.apache.pulsar.client.impl.ProducerImpl$1.sendComplete(ProducerImpl.java:366) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at org.apache.pulsar.client.impl.ProducerImpl$OpSendMsg.sendComplete(ProducerImpl.java:1294) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at org.apache.pulsar.client.impl.ProducerImpl.lambda$failPendingMessages$18(ProducerImpl.java:1830) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at java.util.ArrayDeque.forEach(ArrayDeque.java:889) ~[?:?]
        at org.apache.pulsar.client.impl.ProducerImpl$OpSendMsgQueue.forEach(ProducerImpl.java:1373) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at org.apache.pulsar.client.impl.ProducerImpl.failPendingMessages(ProducerImpl.java:1820) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at org.apache.pulsar.client.impl.ProducerImpl.closeAndClearPendingMessages(ProducerImpl.java:969) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at org.apache.pulsar.client.impl.ProducerImpl.closeAsync(ProducerImpl.java:936) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) ~[?:?]
        at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) ~[?:?]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) ~[?:?]
        at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) ~[?:?]
        at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) ~[?:?]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) ~[?:?]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) ~[?:?]
Caused by: org.apache.pulsar.client.api.PulsarClientException$AlreadyClosedException: The producer admq-cluster-xop-12-0 of the topic kafka-meta/__kafka/__consumer_offsets-partition-34 was already closed when closing the producers
        at org.apache.pulsar.client.impl.ProducerImpl.closeAndClearPendingMessages(ProducerImpl.java:966) ~[org.apache.pulsar-pulsar-client-original-2.9.3.jar:2.9.3]
        ... 8 more
08:15:57.913 [pulsar-io-4-6] ERROR io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder - [[id: 0xa48a33ed, L:/10.1.0.216:9092 ! R:/162.161.161.163:25260]] Failed to write RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=4, clientId=consumer-sub-000-XJpk6tQ-13, correlationId=6877)
io.netty.channel.StacklessClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source) ~[io.netty-netty-transport-4.1.77.Final.jar:4.1.77.Final]
08:15:57.913 [pulsar-io-4-6] ERROR io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder - [[id: 0xa48a33ed, L:/10.1.0.216:9092 ! R:/162.161.161.163:25260]] Failed to write RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=4, clientId=consumer-sub-000-XJpk6tQ-13, correlationId=6879)
io.netty.channel.StacklessClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source) ~[io.netty-netty-transport-4.1.77.Final.jar:4.1.77.Final]
08:15:57.913 [pulsar-io-4-6] ERROR io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder - [[id: 0xa48a33ed, L:/10.1.0.216:9092 ! R:/162.161.161.163:25260]] Failed to write RequestHeader(apiKey=OFFSET_COMMIT, apiVersion=4, clientId=consumer-sub-000-XJpk6tQ-13, correlationId=6881)
io.netty.channel.StacklessClosedChannelException: null

@Demogorgon314
Copy link
Member

@LHZ-Y Can you upload the full broker log? We need more context to investigate.

@LHZ-Y
Copy link
Author

LHZ-Y commented Nov 2, 2022

ok, but next time, because this cluster I deleted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants