-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-2658: Extract storm-kafka-client examples to storm-kafka-client-examples #2243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. Left comments on guiding how users provide the dependencies manually.
``` | ||
will submit the topologies set up by KafkaSpoutTopologyMainNamedTopics to Storm. | ||
|
||
Note that this example produces a jar containing all dependencies for ease of use. In a production environment you may want to reduce the jar size by extracting some dependencies (e.g. org.apache.kafka:kafka-clients) from the jar. You can do this by setting the dependencies you don't want to include in the jars to `provided` scope, and then manually copying the dependencies to your Storm extlib directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of copying dependencies to the extlib, you can achieve the same thing (or more) via using --artifacts
to add dependencies for specific topology while submitting. I think this is simpler and topology-wide, so would love to guide both, or only --artifacts
. (We already replaced the guide for how to add dependencies from Storm SQL.)
Please refer https://github.com/apache/storm/blob/master/docs/Command-line-client.md#jar for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I didn't know about this flag. It's much better, will replace references to extlib.
@@ -42,7 +42,9 @@ | |||
<groupId>org.apache.storm</groupId> | |||
<artifactId>storm-kafka</artifactId> | |||
<version>${project.version}</version> | |||
<!-- You can reduce jar size by uncommenting this and putting dependencies in $STORM-HOME/extlib instead of including them in the jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above. Maybe we can let users choose how they provide dependencies, change the sentence to ...uncommenting this and providing dependencies manually.
or so. My intention is that we don't recommend putting dependencies to extlib directory, unless they know what they're doing (affecting whole topologies' dependencies)
@@ -73,19 +75,27 @@ | |||
<groupId>org.apache.storm</groupId> | |||
<artifactId>storm-kafka-client</artifactId> | |||
<version>${project.version}</version> | |||
<!-- You can reduce jar size by uncommenting this and putting dependencies in $STORM-HOME/extlib instead of including them in the jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
</dependency> | ||
<dependency> | ||
<groupId>org.apache.kafka</groupId> | ||
<artifactId>${storm.kafka.artifact.id}</artifactId> | ||
<version>${storm.kafka.client.version}</version> | ||
<scope>compile</scope> | ||
<!-- You can reduce jar size by uncommenting this and putting dependencies in $STORM-HOME/extlib instead of including them in the jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
</dependency> | ||
<dependency> | ||
<groupId>org.apache.kafka</groupId> | ||
<artifactId>kafka-clients</artifactId> | ||
<version>${storm.kafka.client.version}</version> | ||
<scope>compile</scope> | ||
<!-- You can reduce jar size by uncommenting this and putting dependencies in $STORM-HOME/extlib instead of including them in the jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of the changes proposed in this JIRA. However, my understanding is that this patch is removing the ability to specify the topic names from the CLI, as well as running the topology in LocalCluster. I think these are valid options that we should keep.
Since we are already refactoring, I am also suggesting a few small name changes.
Once we have all the +1s and are in agreement, let's squash all the commits into one.
## Usage | ||
This module contains example topologies demonstrating storm-kafka-client spout and Trident usage. | ||
|
||
The module is built by `mvn clean package`. This will generate the `target/storm-kafka-client-examples-VERSION.jar` file. The jar contains all dependencies and can be submitted to Storm via the Storm CLI. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... built running ... the Storm CLI, e.g.:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
@@ -42,7 +42,9 @@ | |||
<groupId>org.apache.storm</groupId> | |||
<artifactId>storm-kafka</artifactId> | |||
<version>${project.version}</version> | |||
<!-- You can reduce jar size by uncommenting this and providing the dependencies manually. See the README for details. | |||
<scope>${provided.scope}</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the goal of ${provided.scope} to handle the proper scope according to the profile, e.g. just like it is done with the Intellij profile. I am not quite following why the user has to comment/uncomment the scope configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I somehow missed the dollar. I'll try reverting this and update the readme to set the scope to compile
@@ -73,19 +75,27 @@ | |||
<groupId>org.apache.storm</groupId> | |||
<artifactId>storm-kafka-client</artifactId> | |||
<version>${project.version}</version> | |||
<!-- You can reduce jar size by uncommenting this and providing the dependencies manually. See the README for details. | |||
<scope>${provided.scope}</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the goal of ${provided.scope} to handle the proper scope according to the profile, e.g. just like it is done with the Intellij profile. I am not quite following why the user has to comment/uncomment the scope configuration.
Thread.sleep(2000); | ||
DrpcResultsPrinter.remoteClient().printResults(60, 1, TimeUnit.SECONDS); | ||
} | ||
protected void run(String[] args) throws AlreadyAliveException, InvalidTopologyException, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove the ability to specify the topic name from the command line ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topic name wasn't being passed to the consumer before, only the producers as far as I could tell, so if you used the parameters the example didn't work. Fixing it caused a conflict with the wildcard example, because I'd have to change the newKafkaSpoutConfig signature to take a list of topics. That won't work with the Pattern required by the wildcard example. It seemed easier to just remove the option.
@@ -21,15 +21,10 @@ | |||
import static org.apache.storm.kafka.spout.KafkaSpoutConfig.FirstPollOffsetStrategy.EARLIEST; | |||
|
|||
import org.apache.kafka.clients.consumer.ConsumerConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call package where this class lives config.builder
instead of builders
, which is a bit misleading since this is really a configuration class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also would call the two getXyz methods in this class createXyz, as they are static factory methods. I know that the name was already like that, but since we are changing it, we should just make it more conventional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will rename
new KafkaSpoutTopologyMainNamedTopics().runMain(args); | ||
} | ||
|
||
protected void runMain(String[] args) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this change removing the ability to run this code in LocalCluster mode? I think it is very useful. For example, I use it all the time to run these simple test examples from IntelliJ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll restore that bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered why I removed it. LocalCluster is in the storm-server jar, which isn't included by the example projects. I think including it would cause conflict when the jar is deployed to a real cluster. How about I move the ability to run this from a local cluster to a test class? That should still leave people able to run on a local cluster from an IDE, but doesn't interfere with the generated jar.
</dependency> | ||
<dependency> | ||
<groupId>org.apache.kafka</groupId> | ||
<artifactId>kafka-clients</artifactId> | ||
<version>${storm.kafka.client.version}</version> | ||
<scope>compile</scope> | ||
<!-- You can reduce jar size by uncommenting this and providing the dependencies manually. See the README for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the goal of ${provided.scope} to handle the proper scope according to the profile, e.g. just like it is done with the Intellij profile. I am not quite following why the user has to comment/uncomment the scope configuration.
</dependency> | ||
<dependency> | ||
<groupId>org.apache.kafka</groupId> | ||
<artifactId>${storm.kafka.artifact.id}</artifactId> | ||
<version>${storm.kafka.client.version}</version> | ||
<scope>compile</scope> | ||
<!-- You can reduce jar size by uncommenting this and providing the dependencies manually. See the README for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the goal of ${provided.scope} to handle the proper scope according to the profile, e.g. just like it is done with the Intellij profile. I am not quite following why the user has to comment/uncomment the scope configuration.
@hmcl I think I addressed everything. Please look again. Thanks. |
+1. Thanks @srdo |
@srdo |
Will do, thanks @HeartSaVioR :) |
See https://issues.apache.org/jira/browse/STORM-2658.
The changes to the Trident examples mainly have to do with the command line arguments not being consistently passed to the topologies, e.g. the broker url was passed to producing topologies but not the consuming topology. The extra parameters have been removed, I think the example code is fine with hard coded topic names.