Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

[BUG] upgrade to kop 2.9.3+ will make kafkaexporter cannot get metrics #1584

Open
hzluyang opened this issue Nov 22, 2022 · 5 comments
Open
Labels

Comments

@hzluyang
Copy link

hzluyang commented Nov 22, 2022

KOP config:

protocolHandlerDirectory=./protocols
allowAutoTopicCreationType=partitioned
kafkaListeners=PLAINTEXT://ipaddr:9092
kafkaAdvertisedListeners=PLAINTEXT://ipaddr:9092
brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor
brokerDeleteInactiveTopicsEnabled=false
messagingProtocols=kafka
entryFormat=kafka

protocols
1 pulsar-protocol-handler-kafka-2.9.3.1.nar it's OK
2 pulsar-protocol-handler-kafka-2.9.3.16.nar it's not OK
3 pulsar-protocol-handler-kafka-2.10.2.1.nar it's not OK

It can be reproduce in standalone mode

kafkaexporter:kafka_exporter-1.6.0:https://github.com/danielqsj/kafka_exporter

when use pulsar-protocol-handler-kafka-2.9.3.1.nar it's OK
image

when use pulsar-protocol-handler-kafka-2.9.3.16.nar it's not OK
or even upgrade to 2.10.2 it's not OK
image

I need kafkaexporter to get lag info AND it may not only affect kafkaexporter to get metrics but also affect kafka client to consume topic

Question:
1 is my KOP config not right?
2 2.9.3+ has a way to get consumergroup lag info

@BewareMyPower
Copy link
Collaborator

Yeah, it can be reproduced in a standalone. I will look deeper into this issue.

@BewareMyPower
Copy link
Collaborator

@hzluyang Could you try this NAR file? https://github.com/streamnative/kop/releases/download/v2.9.3.16/pulsar-protocol-handler-kafka-2.9.3.16-jdk8.nar

I have tested various releases and found that the release from 2.9.3.1 to 2.9.3.5 works but 2.9.3.6 or later doesn't work. We I built 2.9.3.5 in my local env (JDK 17), it also doesn't work. But after I built with JDK 8, it finally works. Therefore, it seems that we need to compile KoP with JDK 8.

It might be caused by the environment change of the release CI. Maybe a higher JDK is used since 2.9.3.6 @yaalsn Do you have any idea about it?


How to reproduce

Environment:

  • Ubuntu 20.04 (WSL2)
  • Java 17.0.4

Run Pulsar 2.9.3 standalone with the following extra configs and a KoP 2.9.3.x release.

protocolHandlerDirectory=./protocols
allowAutoTopicCreationType=partitioned
kafkaListeners=PLAINTEXT://0.0.0.0:9092
kafkaAdvertisedListeners=PLAINTEXT://127.0.0.1:9092
brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor
brokerDeleteInactiveTopicsEnabled=false
messagingProtocols=kafka
entryFormat=kafka

Not only kafka_exporter doesn't work, but also basic end to end tests are broken.

$ ./bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic my-topic
>1
[2022-11-24 19:36:31,395] WARN [Producer clientId=console-producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
[2022-11-24 19:36:32,992] WARN [Producer clientId=console-producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
[2022-11-24 19:36:34,597] WARN [Producer clientId=console-producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
[2022-11-24 19:36:36,202] WARN [Producer clientId=console-producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
^C[2022-11-24 19:36:38,141] WARN [Producer clientId=console-producer] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
org.apache.kafka.common.KafkaException: Producer closed while send in progress
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:909)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:885)
        at kafka.tools.ConsoleProducer$.send(ConsoleProducer.scala:71)
        at kafka.tools.ConsoleProducer$.main(ConsoleProducer.scala:53)
        at kafka.tools.ConsoleProducer.main(ConsoleProducer.scala)
Caused by: org.apache.kafka.common.KafkaException: Requested metadata update after close

Possible cause

After digging into this issue, KoP itself works well, but messages might be pending in the network.

channel.writeAndFlush(result).addListener(future -> {
if (response instanceof ResponseCallbackWrapper) {
((ResponseCallbackWrapper) response).responseComplete();
}
if (!future.isSuccess()) {
log.error("[{}] Failed to write {}", channel, request.getHeader(), future.cause());
} else {
requestStats.getNetworkTotalBytesOut().add(resultSize);
}

See the code above, writeAndFlush is called for METADATA responses. But these responses were never delivered to the Kafka client, i.e. the listener is never called. So the possible cause might be related to the underlying Netty, but I'm not sure.

Workaround

If the NAR works, I think we have to build the NAR packages using JDK8. I didn't test branch-2.10. But 2.11 works well with JDK17.

@Demogorgon314
Copy link
Member

Demogorgon314 commented Nov 24, 2022

@hzluyang Which JDK version you used? I guess you used JDK 8, but the KoP NAR is using higher JDK to build, so it might not work when you use JDK 8 to run. You can try to use pulsar-protocol-handler-kafka-2.9.3.16-jdk8.nar in JDK 8 env, or you can try to upgrade the JDK version to JDK 11+.

@BewareMyPower
Copy link
Collaborator

Just correct my comments before. Maybe I was using JDK 8 but I thought I used JDK 17.

$ echo $JAVA_HOME
/usr/lib/jvm/java-8-openjdk-amd64
$ which java
/usr/bin/java
$ java -version
openjdk version "17.0.4" 2022-07-19
OpenJDK Runtime Environment (build 17.0.4+8-Ubuntu-120.04)
OpenJDK 64-Bit Server VM (build 17.0.4+8-Ubuntu-120.04, mixed mode, sharing)

Because I switched to JDK 8 by changing JAVA_HOME env variable to run Pulsar 2.7.5 before. The env variable was not changed back but java -version didn't print the actual java used by ./bin/pulsar script.

@hzluyang
Copy link
Author

@hzluyang Witch JDK version you used? I guess you used JDK 8, but the KoP NAR is using higher JDK to build, so it might not work when you use JDK 8 to run. You can try to use pulsar-protocol-handler-kafka-2.9.3.16-jdk8.nar in JDK 8 env, or you can try to upgrade the JDK version to JDK 11+.

@Demogorgon314 YES I use JDK8 AND finally find the problem later I will test [jdk8.nar] in jdk8 and i will update to jdk17 to avoid more problems(https://github.com/streamnative/kop/releases/download/v2.9.3.16/pulsar-protocol-handler-kafka-2.9.3.16-jdk8.nar)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants