STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API #1687

hmcl · 2016-09-16T17:35:05Z

The Kafka Trident implementation is on top of the Trident logs improvement patch because they are related, and it makes it easier to merge the patch. There is already another PR for STORM-2097

HeartSaVioR · 2016-09-29T23:19:29Z

Since it does load the two modules - storm-kafka and storm-kafka-client - but only load kafka-client with ${storm.kafka.client.version}. Does storm-starter work with both storm-kafka and storm-kafka-client? If not, I guess we need to move out storm-kafka and storm-kafka-client to separate modules like recent examples modules.

harshach · 2016-10-13T18:23:27Z

...rter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountWildcardTopics.java

+    }
+
+    public static void main(String[] args) throws Exception {
+        final String[] zkBrokerUrl = parseUrl(args);


do we still need zookeeper config. This topology using new KafkaSpout right?

Agree. Refactored examples in this PR

harshach · 2016-10-13T18:47:21Z

examples/storm-starter/src/jvm/org/apache/storm/starter/trident/DebugMemoryMapState.java

+        for (int i = 0; i < keys.size(); i++) {
+            ValueUpdater valueUpdater = updaters.get(i);
+            Object arg = ((CombinerValueUpdater) valueUpdater).getArg();
+            LOG.debug("updateCount = {}, keys = {} => updaterArgs = {}", updateCount, keys.get(i), arg);


should this just print with info level since this is a debugState why make another hop to enable debug for this topology.

Done. Refactored examples in this PR

harshach · 2016-10-13T18:50:57Z

...starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountNamedTopics.java

+    }
+
+    private KafkaTridentSpoutOpaque<String, String> createOpaqueKafkaSpoutNew() {
+        return new KafkaTridentSpoutOpaque<String, String>(getKafkaTridentManager());


can we merge this into single a method?. So that it shows the series of steps in creating a KafkaTrident topology. It has few redirections with one method calling another which can be confusing for the users looking for an example

Partially Done in refactored examples in this PR.

There were some redundant "factory methods" that I removed. However, the code creating the "dependency" objects that need to be passed in is not 1 or two lines. I believe that a method with a meaningful name creating and initializing these "dependency" objects makes the code much more cohesive and easier to read. Furthermore, this class is extended for wildcard topics, and some of these methods overridden.

I will be happy to write a more "copy" and "paste" like example in the docs if you feel it's appropriate. Please let me know.

harshach · 2016-10-13T19:07:27Z

...starter/src/jvm/org/apache/storm/starter/trident/TridentKafkaClientWordCountNamedTopics.java

+        return new KafkaSpoutStreamsNamedTopics.Builder(outputFields, new String[]{"test-trident","test-trident-1"}).build();
+    }
+
+    protected static class TopicsTupleBuilder<K, V> extends KafkaSpoutTupleBuilder<K,V> {


can we not provide a default implementation for KafkaSpoutTupleBuilder and why are we not using Deserializer Interface for this. Similar to scheme we can provide a StringDeserializer and users can implement their own version of Deserializer to parse and return the values.

I am not sure I completely follow this observation. This same object exists for the Kafa Spout. This is only used for the user to be able to provide a custom implementation of how he wishes to create (build) a Tuple from the ConsumerRecord object. The users are free to, when creating their KafkaSpoutTupleBuilder implementation, specify which Deserializer they want to use.

As far as default implementation, I am not sure I am following. I don't think we are providing any default implementation. The class KafkaSpoutTupleBuilder is abstract. This implementation here is for the sake of the example.

harshach · 2016-10-13T19:12:38Z

...afka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java

+
+    @Override
+    public void close() {
+        LOG.debug("Closed");


we should call kafkaManager.kafkaConsumer.close() here.

harshach · 2016-10-13T21:06:13Z

...lient/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutBatchMetadata.java

+                lastOffset = lastBatch.lastOffset;
+            }
+        }
+        LOG.debug("Created {}", this);


probably useful to log the first and last offset of the batch.

logging "this" will call the overridden toString() method for this class, which prints the first and last offset.

harshach · 2016-10-13T21:07:00Z

...afka-client/src/main/java/org/apache/storm/kafka/spout/trident/KafkaTridentSpoutEmitter.java

+        }
+    }
+
+    private long seek(TopicPartition tp, KafkaTridentSpoutBatchMetadata<K, V> lastBatchMeta) {


It will good to have some doc on this method to explain the seek logic here.

…er API - Kafka New Client - Opaque Transactional Trident Spout Implementation - Implementation supporting multiple named topics and wildcard topics

hmcl · 2016-11-01T17:48:03Z

@HeartSaVioR I believe I address your valid concerns about the kafka library versions in this PR

HeartSaVioR · 2016-11-02T06:58:00Z

@hmcl OK thanks for following up.
@harshach Please treat my comment as addressed, and also feel free to merge when you think it's OK to merge.

harshach

+1.

harshach · 2016-11-15T05:15:28Z

@hmcl these PR titles are very confusing. Can you fix that in both the PRs. Not sure what it means by "Apache master storm 1694 top storm 2097". Just add the JIRA title as PR title.

harshach · 2016-11-15T05:16:50Z

@hmcl also this PR is the same as this one with one more commit?
https://github.com/apache/storm/pull/1757/commits . If so can you close this one.

hmcl · 2016-11-15T05:55:26Z

@harshach I have changed the titles of the PRs, however what is really important are the git commit messages, and those were correct. Once the patch is merged, no one will ever look at the PR titles again, only at the git commit logs.

I don't think we should close this PR in order to keep the history of the review comments. Also, the title of this PR reflects this patch. The title of the other PR reflects the other patch that is on top of this patch.

There is no extra work in merging the two PRs and the git commit history is the same.

harshach · 2016-11-15T06:05:02Z

@hmcl This PR here has two commits right? https://github.com/apache/storm/pull/1757/commits . Why do we want to merge this PR again?. Closing the PR doesn't mean you'll loose comments.
I am not sure if I am missing anything. If I merge this https://github.com/apache/storm/pull/1757/commits than I'll get both the commits necessary that means this current PR is not needed.

hmcl · 2016-11-15T06:08:55Z

@harshach done. The other PR should suffice.

harshach reviewed Oct 13, 2016

View reviewed changes

harshach requested changes Oct 13, 2016

View reviewed changes

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consum…

76d6e7e

…er API - Kafka New Client - Opaque Transactional Trident Spout Implementation - Implementation supporting multiple named topics and wildcard topics

hmcl force-pushed the Apache_master_STORM-1694_top_STORM-2097 branch from a2d678d to 76d6e7e Compare November 1, 2016 17:27

harshach approved these changes Nov 7, 2016

View reviewed changes

hmcl changed the title ~~Apache master storm 1694 top storm 2097~~ STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API Nov 15, 2016

hmcl closed this Nov 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API #1687

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API #1687

hmcl commented Sep 16, 2016

HeartSaVioR commented Sep 29, 2016

harshach Oct 13, 2016

hmcl Nov 1, 2016

harshach Oct 13, 2016

hmcl Nov 1, 2016

harshach Oct 13, 2016

hmcl Nov 1, 2016

harshach Oct 13, 2016

hmcl Nov 1, 2016

harshach Oct 13, 2016

hmcl Oct 29, 2016

harshach Oct 13, 2016

hmcl Nov 1, 2016

harshach Oct 13, 2016

hmcl Nov 1, 2016

hmcl commented Nov 1, 2016

HeartSaVioR commented Nov 2, 2016

harshach left a comment

harshach commented Nov 15, 2016

harshach commented Nov 15, 2016

hmcl commented Nov 15, 2016

harshach commented Nov 15, 2016

hmcl commented Nov 15, 2016 •

edited

Loading

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API #1687

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API #1687

Conversation

hmcl commented Sep 16, 2016

HeartSaVioR commented Sep 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hmcl commented Nov 1, 2016

HeartSaVioR commented Nov 2, 2016

harshach left a comment

Choose a reason for hiding this comment

harshach commented Nov 15, 2016

harshach commented Nov 15, 2016

hmcl commented Nov 15, 2016

harshach commented Nov 15, 2016

hmcl commented Nov 15, 2016 • edited Loading

hmcl commented Nov 15, 2016 •

edited

Loading