New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Kafka event logger for server events #4733
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4733 +/- ##
============================================
+ Coverage 57.98% 58.05% +0.06%
Complexity 13 13
============================================
Files 581 583 +2
Lines 32491 32548 +57
Branches 4315 4317 +2
============================================
+ Hits 18841 18896 +55
+ Misses 11836 11834 -2
- Partials 1814 1818 +4
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
8a332c5
to
16cdfb1
Compare
We can use Testcontianers to address the Kafka server compatibility issues |
val SERVER_EVENT_LOGGERS: ConfigEntry[Seq[String]] = | ||
buildConf("kyuubi.backend.server.event.loggers") | ||
.doc("A comma-separated list of server history loggers, where session/operation etc" + | ||
" events go.<ul>" + | ||
s" <li>JSON: the events will be written to the location of" + | ||
s" ${SERVER_EVENT_JSON_LOG_PATH.key}</li>" + | ||
s" <li>" + | ||
s"KAFKA: the events sent to topic of `${SERVER_EVENT_KAFKA_TOPIC.key}`, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unify the format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear what's content in the kafka record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, change to mention the format with KAFKA: the events serialized in JSON format
dev/dependencyList
Outdated
@@ -181,6 +183,7 @@ simpleclient_tracer_otel/0.16.0//simpleclient_tracer_otel-0.16.0.jar | |||
simpleclient_tracer_otel_agent/0.16.0//simpleclient_tracer_otel_agent-0.16.0.jar | |||
slf4j-api/1.7.36//slf4j-api-1.7.36.jar | |||
snakeyaml/1.33//snakeyaml-1.33.jar | |||
snappy-java/1.1.7.3//snappy-java-1.1.7.3.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it support arm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when introduce new artifacts in release tarball, don't forget to update LICENSE/NOTICE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snappy-java supports Windows, Mac OS X, Linux (x86, x86_64, arm, etc...). If your platform is not supported, you need to build native libraries by yourself.
Accorrding to snappy-java
's doc here(https://github.com/xerial/snappy-java/blob/master/BUILD.md), arm architecture should be supported.
kyuubi-server/src/main/scala/org/apache/kyuubi/events/ServerEventHandlerRegister.scala
Outdated
Show resolved
Hide resolved
kyuubi-server/src/main/scala/org/apache/kyuubi/events/ServerEventHandlerRegister.scala
Outdated
Show resolved
Hide resolved
...er/src/test/scala/org/apache/kyuubi/events/handler/ServerKafkaLoggingEventHandlerSuite.scala
Outdated
Show resolved
Hide resolved
|
||
override def apply(event: KyuubiEvent): Unit = { | ||
try { | ||
val record = new ProducerRecord[String, String](topic, event.eventType, event.toJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that Kafka allows setting event timestamp on constructing ProducerRecord
, it may be useful in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Setting timestamp with System.currentTimeMillis
. Is that good enough? Or use Java8's Instant.now.toEpochMilli
(with UTC clock)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While timestamp is not set , the producer will assign the timestamp using System.currentTimeMillis()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, it should be event time, but I just notice that KyuubiEvent does not have event time ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no available unified event time in KyuubiEvent right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave a TODO here, we should add an event_time
in KyuubiEvent
and propagate it to Kafka, but it may break the backward compatibility, so better to do it later separately
kyuubi-events/src/main/scala/org/apache/kyuubi/events/handler/KafkaLoggingEventHandler.scala
Outdated
Show resolved
Hide resolved
kyuubi-events/src/main/scala/org/apache/kyuubi/events/handler/KafkaLoggingEventHandler.scala
Outdated
Show resolved
Hide resolved
9d87863
to
8be87c8
Compare
993fd62
to
1d01413
Compare
I am looking for proper timing to close event handlers for Kyuubi Server. But repeated errors comes in the irreverent places. As I tried to close event handlers via If you guys have any idea for this, feel free to ping me any time. Thanks.
see : https://github.com/apache/kyuubi/actions/runs/4795357563/jobs/8529805995?pr=4733#step:8:1778 |
681b865
to
d2a7a4c
Compare
...ver/src/test/scala/org/apache/kyuubi/events/handler/ServerJsonLoggingEventHandlerSuite.scala
Outdated
Show resolved
Hide resolved
850c3e6
to
2a66955
Compare
|
||
override def close(): Unit = { | ||
kafkaProducer.flush() | ||
kafkaProducer.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use close method with timeout.
org.apache.kafka.clients.producer.Producer#close(long, java.util.concurrent.TimeUnit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggested conf for the timeout here? As for loggers, for servers, for server event loggers?
Or let's introduce proper config next time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to introduce a new configuration for it. For server's Kafka logger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will do it in two steps.
- introduce a new config
kyuubi.backend.server.event.kafka.close.timeout
, with default value 60 secs. - pass the config in
ServerEventHandlerRegister#createKafkaEventHandler
to constructKafkaLoggingEventHandler
instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60s is too long, it will blocks other shutdown procedures, likely 3~5s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's make it in 5s by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, it's removed for flushing the producer, for the reasons, the closing producer wakes up the internal sender to send pending requests (exact way in flushing), and the flushing method does not accept timeout.
@@ -22,5 +22,5 @@ object EventLoggerType extends Enumeration { | |||
type EventLoggerType = Value | |||
|
|||
// TODO: Only SPARK is done now | |||
val SPARK, JSON, JDBC, CUSTOM = Value | |||
val SPARK, JSON, JDBC, KAFKA, CUSTOM = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add enumeration values instead of modifying the order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KAFKA
value is added before CUSTOM
here, in order to
- follow the same order as in docs
CUSTOM
is an extra option, which could be listed behind the options with embedded support
I'm open to your suggestion. WDYT @pan3793 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the new one at the tail, to suppress the potential serializable issue. And the TODO comment is outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Let me move it to the tail, as you mentioned serializable problem.
kyuubi-server/pom.xml
Outdated
|
||
<dependency> | ||
<groupId>org.apache.kafka</groupId> | ||
<artifactId>kafka_${scala.binary.version}</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are using Testcontainers for integration testing, why should we depend on the Kafka server?
And after second thought, I prefer to use MockProducer
rather than launch a real Kafka broker, but the latter is also fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MockProducer
is lightweight to simulate producer-consumer
situation, but we need a broker bootstrap server from producer-borker-consumer
here. And I don't know how to get a bootstrap server from a MockProducer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of the consumer is to verify the events have been successfully delivered to Kafka, we can simply check the buffer of MockProducer instead
pom.xml
Outdated
<exclusions> | ||
<exclusion> | ||
<groupId>org.slf4j</groupId> | ||
<artifactId>slf4j-api</artifactId> | ||
</exclusion> | ||
</exclusions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary exclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we should skip this exclusion? kafka-clients
is depending on slf4j-api
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should respect the Maven transitive dependencies in most cases, just excluding the deps we truly don't need, and overriding versions in dependencyManagement
(just overriding, without excluding)
But it's not suitable for components which is disastrous for dependencies management, like Hive, Hudi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Removed this exclusion.
@@ -1236,7 +1236,7 @@ This product optionally depends on 'zstd-jni', a zstd-jni Java compression | |||
and decompression library, which can be obtained at: | |||
|
|||
* LICENSE: | |||
* license/LICENSE.zstd-jni.txt (Apache License 2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this license issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍻 No problem. And we could make this change to a separate PR if you want.
kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
Outdated
Show resolved
Hide resolved
@@ -21,6 +21,5 @@ object EventLoggerType extends Enumeration { | |||
|
|||
type EventLoggerType = Value | |||
|
|||
// TODO: Only SPARK is done now | |||
val SPARK, JSON, JDBC, CUSTOM = Value | |||
val SPARK, JSON, JDBC, CUSTOM, KAFKA = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all others is supported now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems JDBC is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this TODO can be removed. Not all the event logger types in both servers or the engine. It's a legacy todo hints.
s" <li>KAFKA: the events serialized in JSON format" + | ||
s" and sent to topic of `${SERVER_EVENT_KAFKA_TOPIC.key}`. " + | ||
s" Note: For the configs of Kafka producer," + | ||
s" please specify them with the prefix: `kyuubi.backend.server.event.kafka.`." + | ||
s" For example, kyuubi.backend.server.event.kafka.bootstrap.servers=127.0.0.1:9092" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s" <li>KAFKA: the events serialized in JSON format" + | |
s" and sent to topic of `${SERVER_EVENT_KAFKA_TOPIC.key}`. " + | |
s" Note: For the configs of Kafka producer," + | |
s" please specify them with the prefix: `kyuubi.backend.server.event.kafka.`." + | |
s" For example, kyuubi.backend.server.event.kafka.bootstrap.servers=127.0.0.1:9092" + | |
s" <li>KAFKA: the events will be serialized in JSON format" + | |
s" and sent to topic of `${SERVER_EVENT_KAFKA_TOPIC.key}`." + | |
s" Note: For the configs of Kafka producer," + | |
s" please specify them with the prefix: `kyuubi.backend.server.event.kafka.`." + | |
s" For example, `kyuubi.backend.server.event.kafka.bootstrap.servers=127.0.0.1:9092`" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM, only minor things of docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM, only minor things of docs
8105738
to
b5220d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks, merged to master |
Why are the changes needed?
KAFKA
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request