New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-2349: Add one RocketMQ plugin for the Apache Storm #2024

Merged
merged 3 commits into from Apr 24, 2017

Conversation

Projects
None yet
5 participants
@vesense
Member

vesense commented Mar 22, 2017

https://issues.apache.org/jira/browse/STORM-2349

This is the init version for code review.
Current phase(fetatures included in this PR):

  • RocketMQ Bolt
  • RocketMQ Spout
  • RocketMQ Trident State
  • documents and examples

Local tests passed.

Next phase(in the plan):
RocketMQ Trident Spout.

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Mar 22, 2017

Member

I will update the POM files accordingly based on STORM-2416(Reduce release package size) later.

Member

vesense commented Mar 22, 2017

I will update the POM files accordingly based on STORM-2416(Reduce release package size) later.

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Mar 31, 2017

Member

Any comments are welcome.

Member

vesense commented Mar 31, 2017

Any comments are welcome.

Show outdated Hide outdated external/storm-rocketmq/pom.xml
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>

This comment has been minimized.

@vongosling

vongosling Apr 2, 2017

Member

No need, RocketMQ client has dependency commons-lang3, which is a next generation for commons lang package~

@vongosling

vongosling Apr 2, 2017

Member

No need, RocketMQ client has dependency commons-lang3, which is a next generation for commons lang package~

This comment has been minimized.

@vesense

vesense Apr 10, 2017

Member

This is used for validating parameters. It will cause ClassNotFound exception without the dependency.

@vesense

vesense Apr 10, 2017

Member

This is used for validating parameters. It will cause ClassNotFound exception without the dependency.

Show outdated Hide outdated external/storm-rocketmq/pom.xml
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<scope>test</scope>

This comment has been minimized.

@vongosling

vongosling Apr 2, 2017

Member

Does the latest storm core still dependency log4j?

@vongosling

vongosling Apr 2, 2017

Member

Does the latest storm core still dependency log4j?

This comment has been minimized.

@vesense

vesense Apr 10, 2017

Member

This is not used. Will remove.

@vesense

vesense Apr 10, 2017

Member

This is not used. Will remove.

} else {
return ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT;
}
}

This comment has been minimized.

@vongosling

vongosling Apr 2, 2017

Member

Nowadays, RocketMQ does not supported many instances in one machine. If we wish it , we must set the instanceID, as the 75 line listed in the https://github.com/rocketmq/rocketmq-storm/blob/master/src/main/java/org/apache/rocketmq/integration/storm/spout/SimpleMessageSpout.java.

@vongosling

vongosling Apr 2, 2017

Member

Nowadays, RocketMQ does not supported many instances in one machine. If we wish it , we must set the instanceID, as the 75 line listed in the https://github.com/rocketmq/rocketmq-storm/blob/master/src/main/java/org/apache/rocketmq/integration/storm/spout/SimpleMessageSpout.java.

This comment has been minimized.

@vesense

vesense Apr 10, 2017

Member

Yes, I noticed this. Please see: line 156 - 162 in RocketMQConfig.

@vesense

vesense Apr 10, 2017

Member

Yes, I noticed this. Please see: line 156 - 162 in RocketMQConfig.

This comment has been minimized.

@hustfxj

hustfxj Apr 13, 2017

Contributor

@vesense In fact I don't suggest that per task has a consumer by setting the instanceId, thus per worker maybe have lots of consumer. Why not consider that per worker only has a consumer by singleton pattern.

@hustfxj

hustfxj Apr 13, 2017

Contributor

@vesense In fact I don't suggest that per task has a consumer by setting the instanceId, thus per worker maybe have lots of consumer. Why not consider that per worker only has a consumer by singleton pattern.

This comment has been minimized.

@vesense

vesense Apr 13, 2017

Member

Yes, I see the RocketMQ Consumer is thread-safe, sharing a single instance across threads should generally be faster than having multiple instances, as well as Producer. I will refactor them.
@hustfxj Thanks for your suggestion.

@vesense

vesense Apr 13, 2017

Member

Yes, I see the RocketMQ Consumer is thread-safe, sharing a single instance across threads should generally be faster than having multiple instances, as well as Producer. I will refactor them.
@hustfxj Thanks for your suggestion.

Show outdated Hide outdated ...orm-rocketmq/src/main/java/org/apache/storm/rocketmq/RocketMQConfig.java
// use taskID/UUID for client name by default
String defaultClientName;
if (context != null) {
defaultClientName = String.valueOf(context.getThisTaskId());

This comment has been minimized.

@vongosling

vongosling Apr 10, 2017

Member

Cool~

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Apr 10, 2017

Member

@vongosling Thanks for your comments. And I will rebase the code on master.

Member

vesense commented Apr 10, 2017

@vongosling Thanks for your comments. And I will rebase the code on master.

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Apr 10, 2017

Member

Hi @hustfxj I guess you know Apache RocketMQ a lot, please take a look if you have time.

Member

vesense commented Apr 10, 2017

Hi @hustfxj I guess you know Apache RocketMQ a lot, please take a look if you have time.

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Apr 11, 2017

Member

POM files updated & Rebased.

Member

vesense commented Apr 11, 2017

POM files updated & Rebased.

} else {
return ConsumeOrderlyStatus.SUSPEND_CURRENT_QUEUE_A_MOMENT;
}
}

This comment has been minimized.

@hustfxj

hustfxj Apr 13, 2017

Contributor

@vesense In fact I don't suggest that per task has a consumer by setting the instanceId, thus per worker maybe have lots of consumer. Why not consider that per worker only has a consumer by singleton pattern.

@hustfxj

hustfxj Apr 13, 2017

Contributor

@vesense In fact I don't suggest that per task has a consumer by setting the instanceId, thus per worker maybe have lots of consumer. Why not consider that per worker only has a consumer by singleton pattern.

if (process(msgs)) {
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
} else {
return ConsumeConcurrentlyStatus.RECONSUME_LATER;

This comment has been minimized.

@hustfxj

hustfxj Apr 13, 2017

Contributor

The messages maybe lost because the consumer is based the automatic commit mode of RocketMq here. It had better not commit the message until the storm handle the message successfully. Of course, this is only a consumption strategy.

@hustfxj

hustfxj Apr 13, 2017

Contributor

The messages maybe lost because the consumer is based the automatic commit mode of RocketMq here. It had better not commit the message until the storm handle the message successfully. Of course, this is only a consumption strategy.

This comment has been minimized.

@vesense

vesense Apr 13, 2017

Member

Currently, only push mode is supported which is auto-commit. We can add pull mode in the next stage.

@vesense

vesense Apr 13, 2017

Member

Currently, only push mode is supported which is auto-commit. We can add pull mode in the next stage.

This comment has been minimized.

@vesense

vesense Apr 13, 2017

Member

@hustfxj Thanks for your reminding.

@vesense

vesense Apr 13, 2017

Member

@hustfxj Thanks for your reminding.

This comment has been minimized.

@hustfxj

hustfxj Apr 17, 2017

Contributor

@vesense The push mode can also send messages with at-least-once. You can commit the message at "ack(Object msgId)" , which means that spout don't commit the message until the storm handle the message successfully.

@hustfxj

hustfxj Apr 17, 2017

Contributor

@vesense The push mode can also send messages with at-least-once. You can commit the message at "ack(Object msgId)" , which means that spout don't commit the message until the storm handle the message successfully.

This comment has been minimized.

@vesense

vesense Apr 18, 2017

Member

@hustfxj I updated the code. Now all consumed messages will be put into queue and managed by MessageRetryManager. Of course, this is in storm side to implement the "at-least-once".

BTW, I'm not sure if I misunderstand the ConsumeConcurrentlyStatus:

public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs,
                                                                ConsumeConcurrentlyContext context) {
                    if (process(msgs)) {
                        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
                    } else {
                        return ConsumeConcurrentlyStatus.RECONSUME_LATER;
                    }
                }

When returning CONSUME_SUCCESS, the consumer will commit the offset automatically. And when returning RECONSUME_LATER, the consumer will consume the failed messages again? Do you mind explaining what the consumer will do after receiving the ConsumeConcurrentlyStatus.RECONSUME_LATER?

Please correct me if I misunderstand. Thanks.

@vesense

vesense Apr 18, 2017

Member

@hustfxj I updated the code. Now all consumed messages will be put into queue and managed by MessageRetryManager. Of course, this is in storm side to implement the "at-least-once".

BTW, I'm not sure if I misunderstand the ConsumeConcurrentlyStatus:

public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs,
                                                                ConsumeConcurrentlyContext context) {
                    if (process(msgs)) {
                        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
                    } else {
                        return ConsumeConcurrentlyStatus.RECONSUME_LATER;
                    }
                }

When returning CONSUME_SUCCESS, the consumer will commit the offset automatically. And when returning RECONSUME_LATER, the consumer will consume the failed messages again? Do you mind explaining what the consumer will do after receiving the ConsumeConcurrentlyStatus.RECONSUME_LATER?

Please correct me if I misunderstand. Thanks.

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Apr 13, 2017

Member

@vongosling @hustfxj Updated. Can you take a look again?

Member

vesense commented Apr 13, 2017

@vongosling @hustfxj Updated. Can you take a look again?

@harshach

overall LGTM. Added few questions

Show outdated Hide outdated ...ocketmq/src/main/java/org/apache/storm/rocketmq/spout/RocketMQSpout.java
// Since RocketMQ Consumer is thread-safe, RocketMQSpout uses a single
// consumer instance across threads to improve the performance.
synchronized (RocketMQSpout.class) {

This comment has been minimized.

@harshach

harshach Apr 14, 2017

Contributor

even if its thread-safe shouldn't we consider making per spout instance its own consumer. That way it will more performant instead of one consumer making a call to the rocketmq-servers?

@harshach

harshach Apr 14, 2017

Contributor

even if its thread-safe shouldn't we consider making per spout instance its own consumer. That way it will more performant instead of one consumer making a call to the rocketmq-servers?

This comment has been minimized.

@vesense

vesense Apr 14, 2017

Member

Maybe my code comment is not so clear. thread-safe is just precondition, the important is that this is related to the RocketMQ internal implementation(sharing queue, threads, etc.), "Consumer concurrency / Only one consumer instance per process" is the way official recommend.

@vesense

vesense Apr 14, 2017

Member

Maybe my code comment is not so clear. thread-safe is just precondition, the important is that this is related to the RocketMQ internal implementation(sharing queue, threads, etc.), "Consumer concurrency / Only one consumer instance per process" is the way official recommend.

Show outdated Hide outdated ...ocketmq/src/main/java/org/apache/storm/rocketmq/spout/RocketMQSpout.java
RocketMQConfig.buildConsumerConfigs(properties, (DefaultMQPushConsumer)consumer);
if (ordered) {
consumer.registerMessageListener(new MessageListenerOrderly() {

This comment has been minimized.

@harshach

harshach Apr 14, 2017

Contributor

is this a push model from server instead of spout polling?

@harshach

harshach Apr 14, 2017

Contributor

is this a push model from server instead of spout polling?

This comment has been minimized.

This comment has been minimized.

@vesense

vesense Apr 14, 2017

Member

In fact, the RocketMQ "push" mode is still pulling data from broker. PushConsumer is a high level consumer API, wrapping the pulling details, looks like broker push messages to consumer.

@vesense

vesense Apr 14, 2017

Member

In fact, the RocketMQ "push" mode is still pulling data from broker. PushConsumer is a high level consumer API, wrapping the pulling details, looks like broker push messages to consumer.

Show outdated Hide outdated ...ocketmq/src/main/java/org/apache/storm/rocketmq/spout/RocketMQSpout.java
if (msgs.isEmpty()) {
return true;
}
MessageSet messageSet = new MessageSet(msgs);

This comment has been minimized.

@harshach

harshach Apr 14, 2017

Contributor

any plans of offering a TupleMapper to flatten the schema or you want to preserve the same data in rocketmq and send it downstream

@harshach

harshach Apr 14, 2017

Contributor

any plans of offering a TupleMapper to flatten the schema or you want to preserve the same data in rocketmq and send it downstream

This comment has been minimized.

@vesense

vesense Apr 14, 2017

Member

Yes, the work is in progress. I will update the PR later.

@vesense

vesense Apr 14, 2017

Member

Yes, the work is in progress. I will update the PR later.

This comment has been minimized.

@vesense

vesense Apr 18, 2017

Member

Updated.

@vesense

vesense Apr 18, 2017

Member

Updated.

@vesense

This comment has been minimized.

Show comment
Hide comment
@vesense

vesense Apr 18, 2017

Member

@vongosling @hustfxj @harshach Updated. Can you take a look again?

Member

vesense commented Apr 18, 2017

@vongosling @hustfxj @harshach Updated. Can you take a look again?

@hustfxj

This comment has been minimized.

Show comment
Hide comment
@hustfxj

hustfxj Apr 21, 2017

Contributor

@vesense thank you. +1

Contributor

hustfxj commented Apr 21, 2017

@vesense thank you. +1

@harshach

This comment has been minimized.

Show comment
Hide comment
@harshach

harshach Apr 24, 2017

Contributor

LGTM @vesense . +1

Contributor

harshach commented Apr 24, 2017

LGTM @vesense . +1

@asfgit asfgit merged commit ded7a1e into apache:master Apr 24, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@vongosling

This comment has been minimized.

Show comment
Hide comment
@vongosling

vongosling Apr 24, 2017

Member

Great ~

Member

vongosling commented Apr 24, 2017

Great ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment