KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238) #4314

ewencp · 2017-12-11T23:38:02Z

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Simple unit tests sufficiently exercise the behavior. In fact, this addition increases coverage since RootResource was not previously unit tested.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

wicknicks · 2017-12-13T22:38:52Z

@ewencp: this is very useful. do you think we can include some additional metadata besides just the cluster id (for instance, bootstrap.servers?). since the cluster_id may be a generated value, it might not immediately recognizable. The bootstrap.servers would be a more readable property.

ewencp · 2017-12-15T03:59:38Z

@wicknicks agreed that cluster ID isn't necessarily the most useful human-readable property. my motivation in this case is actually pretty much not at all about humans, so i'm not too worried about that :) i'm more interested in just being able to get the id from the kafka cluster and correlate it w/ other info i have (for monitoring, config, mgmt, etc).

the comment about bootstrap.servers is interesting, however, because for a lot of folks that would be a sufficiently unique cluster identifier. that would tend to be the case when they just put the cluster behind a load balancer and bootstrap.servers is just your load balancer/round robin dns/etc. but i'm skeptical of pushing configs like this in an endpoint like this because some folks don't deploy like that and bootstrap.servers isn't fixed & unique.

i think exposing the worker config properties could be a pretty interesting feature, especially for tools designed to manage both connect & kafka. i'm not sure i'd be in favor of exposing them all on the GET / resource, and there are definitely security concerns involved, but exposing that info would make some things way simpler. e.g. giving a list of recommended topics for a connector makes a lot more sense if you have enough config information available to understand what topics the Connect worker even has access to.

I'd prefer to keep those extensions to a separate KIP, but if there's something concrete you really want to see included here we could discuss further. And if you have ideas about how those other worker configs could be cleanly exposed (and securely, and in a way where we don't need to worry about compatibility), I would love to see a proposal. We're kind of stingy in what configuration information we expose programmatically across core Kafka, Connect, and Streams; exposing a bit more, but in a conservative way, might make building tooling around these components much simpler.

ewencp · 2017-12-19T03:31:58Z

@kkonstantine @rhauch @wicknicks KIP vote passed, any comments before I drag in another reviewer to commit?

kkonstantine

Two minor comments, but looks good to me! Thanks, lgtm

kkonstantine · 2017-12-19T08:07:23Z

connect/runtime/src/test/java/org/apache/kafka/connect/runtime/WorkerTest.java

+    public void testLookupKafkaClusterId() {
+        final Node broker1 = new Node(0, "dummyHost-1", 1234);
+        final Node broker2 = new Node(1, "dummyHost-2", 1234);
+        List<Node> cluster = new ArrayList<Node>(2) {


I have to admit, I'm not a big fan of the trick with the anonymous class on collections.

MockAdminClient doesn't seem to mutate this argument, plus it should probably create a copy if it did anyways.

Given that, List<Node> cluster = Arrays.asList(broker1, broker1) would be my suggestion, but I also like to leave plenty of room for different styles, so not a strong suggestion.

I agree that Arrays.asList(broker1, broker2) is more readable and more concise.

was just copy paste from another test, cleaned up.

kkonstantine · 2017-12-19T08:07:46Z

connect/runtime/src/test/java/org/apache/kafka/connect/runtime/WorkerTest.java

+    public void testLookupKafkaClusterIdTimeout() {
+        final Node broker1 = new Node(0, "dummyHost-1", 1234);
+        final Node broker2 = new Node(1, "dummyHost-2", 1234);
+        List<Node> cluster = new ArrayList<Node>(2) {


If you change above you might change here too.

kkonstantine · 2017-12-19T08:09:43Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/entities/ServerInfo.java

@@ -22,10 +22,12 @@
 public class ServerInfo {
    private String version;
    private String commit;
+    private String kafkaClusterId;


Seems like all the member fields could be final.

rhauch

I agree with one of @kkonstantine's preferences, but nothing big so +1 from me.

rhauch · 2017-12-19T15:49:53Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

+    }
+
+    static String lookupKafkaClusterId(AdminClient adminClient) {
+        log.debug("Looking up Kafka cluster ID");


Is there any benefit to logging (in debug) the cluster ID once we get it?

mostly i logged this statement so it would be obvious what was going wrong if something hangs / you see connection errors logged since this is all blocking full startup. but no harm in also logging the result. i also realized i missed a case where the future can be null if the broker version is < 0.10.1.0, i've handled that now too.

rhauch · 2017-12-19T15:51:16Z

connect/runtime/src/test/java/org/apache/kafka/connect/runtime/WorkerTest.java

+    public void testLookupKafkaClusterId() {
+        final Node broker1 = new Node(0, "dummyHost-1", 1234);
+        final Node broker2 = new Node(1, "dummyHost-2", 1234);
+        List<Node> cluster = new ArrayList<Node>(2) {


I agree that Arrays.asList(broker1, broker2) is more readable and more concise.

ewencp · 2017-12-19T19:52:54Z

@hachikuji @junrao Just waiting on tests to come back after addressing a few minor comments, either one of you care to review?

hachikuji

Thanks, left a few minor comments/questions.

hachikuji · 2017-12-20T22:48:12Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

+            this.kafkaClusterId = kafkaClusterId;
+        } else {
+            try (AdminClient adminClient = AdminClient.create(config.originals())) {
+                this.kafkaClusterId = lookupKafkaClusterId(adminClient);


Would it be reasonable to move this lookup to the caller (i.e. ConnectDistributed and ConnectStandalone). That makes the intent to kill the process if the cluster cannot be found a little clearer. Also, why not move the try into lookupKafkaClusterId?

i've moved the calls up as suggested and just pass the info into the herders. the actual contents are now in ConnectUtils because I couldn't find them a better home.

the try is separated so the method is properly unit testable. i would just mock out AdminClient.create but people seem to prefer doing this to useing powermock.

hachikuji · 2017-12-20T22:51:25Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/entities/ServerInfo.java

        version = AppInfoParser.getVersion();
        commit = AppInfoParser.getCommitId();
+        this.kafkaClusterId = kafkaClusterId;


nit: it looks nicer if we use the this. prefix on all of the fields initialized

hachikuji · 2017-12-20T22:59:29Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java

+            return kafkaClusterId;
+        } catch (InterruptedException e) {
+            final String msg = "Unexpectedly interrupted when looking up Kafka cluster info";
+            log.error(msg, e);


The log and rethrow pattern tends to lead to duplicate logging of errors. I would have expected that something higher up would catch this exception and log it.

hachikuji · 2017-12-20T23:03:18Z

connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/entities/ServerInfo.java

@@ -37,4 +39,9 @@ public String version() {
    public String commit() {
        return commit;
    }
+
+    @JsonProperty("kafka_cluster_id")


If the clusterId is null, will the field still show up? I am wondering if there should be a descriptive sentinel instead?

you have to change ObjectMapper settings to omit null values, so it'll be there and null. not sure a sentinel is much better than a null, and actually seems more likely to cause problems since it could incorrectly be interpreted as a valid id.

Makes sense.

hachikuji · 2017-12-20T23:06:59Z

clients/src/test/java/org/apache/kafka/clients/admin/MockAdminClient.java

    private int timeoutNextRequests = 0;

+    /**
+     * Creates MockAdminClient for a cluster with the given brokers. By default the first broker in the list is the controller


nit: It would be a little clearer to use a second argument for the controller.

hah, you caught me. i was just being lazy because i thought i'd have to update a bunch of places where this is used. turns out it's only used in one other place in streams... changed the signature and updated the comment.

hachikuji · 2017-12-20T23:07:53Z

clients/src/test/java/org/apache/kafka/clients/admin/MockAdminClient.java

+        this.controller = brokers.get(0);
+    }
+
+    public void controller(Node controller) {


Seems we don't use this anywhere, but I guess there's no harm having it. Not sure it matters, but maybe we should have a sanity check that the controller is included in the broker list?

yeah, i mainly added it because i was adding enough to make describeCluster work and it seemed like it could potentially be useful.

hachikuji · 2017-12-20T23:12:21Z

.../runtime/src/test/java/org/apache/kafka/connect/runtime/rest/resources/RootResourceTest.java

+import static org.junit.Assert.assertEquals;
+
+@RunWith(EasyMockRunner.class)
+public class RootResourceTest extends EasyMockSupport {


Perhaps not too interesting, but maybe we should cover the case when the clusterId is null as well?

doesn't really add much to this test since we're already verifying it is passing the value through and this code isn't expected to do anything with the value. however, I did add a test for the supporting code when adminclient return a null value.

ewencp · 2018-01-04T03:37:58Z

@hachikuji Mind taking another look? I've addressed the last round of comments, hopefully this is ready to go now.

hachikuji

Thanks for the updates, LGTM.

KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238)

d11af89

ewencp force-pushed the expose-kafka-cluster-id-in-connect-api branch from b657438 to d11af89 Compare December 11, 2017 23:41

ewencp changed the title ~~KAFKA-6331: Expose Kafka cluster ID in Connect REST API (KIP-238)~~ KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238) Dec 11, 2017

kkonstantine approved these changes Dec 19, 2017

View reviewed changes

rhauch approved these changes Dec 19, 2017

View reviewed changes

Address review comments

3480ec8

hachikuji reviewed Dec 20, 2017

View reviewed changes

Address review comments

687e593

hachikuji approved these changes Jan 5, 2018

View reviewed changes

Remove apparently inadvertent newlines

6cb4cec

hachikuji merged commit bfb272c into apache:trunk Jan 5, 2018

kkonstantine added the connect label Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238) #4314

KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238) #4314

ewencp commented Dec 11, 2017

wicknicks commented Dec 13, 2017

ewencp commented Dec 15, 2017

ewencp commented Dec 19, 2017

kkonstantine left a comment

kkonstantine Dec 19, 2017

rhauch Dec 19, 2017

ewencp Dec 19, 2017

kkonstantine Dec 19, 2017

kkonstantine Dec 19, 2017

rhauch left a comment

rhauch Dec 19, 2017

ewencp Dec 19, 2017

rhauch Dec 19, 2017

ewencp commented Dec 19, 2017

hachikuji left a comment

hachikuji Dec 20, 2017

ewencp Dec 20, 2017

hachikuji Dec 20, 2017

hachikuji Dec 20, 2017

hachikuji Dec 20, 2017

ewencp Dec 20, 2017

hachikuji Jan 4, 2018

hachikuji Dec 20, 2017

ewencp Dec 20, 2017

hachikuji Dec 20, 2017

ewencp Dec 20, 2017

hachikuji Dec 20, 2017

ewencp Dec 30, 2017

ewencp commented Jan 4, 2018

hachikuji left a comment

KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238) #4314

KAFKA-6311: Expose Kafka cluster ID in Connect REST API (KIP-238) #4314

Conversation

ewencp commented Dec 11, 2017

Committer Checklist (excluded from commit message)

wicknicks commented Dec 13, 2017

ewencp commented Dec 15, 2017

ewencp commented Dec 19, 2017

kkonstantine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhauch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ewencp commented Dec 19, 2017

hachikuji left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ewencp commented Jan 4, 2018

hachikuji left a comment

Choose a reason for hiding this comment