Skip to content

Conversation

@shenhui0509
Copy link

@shenhui0509 shenhui0509 commented Oct 10, 2019

this commit resolves #1515 change SYNC_MASTER flushSlave/flushDisk to pipeline manner

What is the purpose of the change

performance improvement for SYNC_MASTER :

  • reduce produce latency
  • improve produce throughput

Brief changelog

  1. add an interface asyncProcessRequest to remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRequestProcessor.java allow processors can process request in async manner, and by default wrap the result of ProcessRequest with CompletableFuture.
  2. change remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingAbstract.java, let write response to client happen after the future is filled, rather than a blocking call
  3. add and interface asyncPutMessage to MessageStore
  4. change the store/src/main/java/org/apache/rocketmq/store/CommitLog.java, the asyncPutMessage method returns a CompletableFuture, this future is filled by the flushSlave/flushDisk thread instead of the processor thread, thus the processor can process the next request just after append to local commitlog, let the flushSlave/flushDisk thread respond to client.
  5. some dup code eliminate

Verifying this change

Functional

  1. add unit test in store/src/test/java/org/apache/rocketmq/store/HATest.java
  2. shutdown slave while sending message, the producer received the FLUSH_SLAVE_TIMEOUT response code.
    image

performance

benchmark config:
2 * broker:48C,512G mem,4 * 2T SSD; 1-master-1-slave
namesrv is hybrid deployed with broker
3 * client : 40C, 64G mem
message size : 1024B
write thread : 64

benchmark result
sync_pipeline is the optimized version configured with SYNC_MASTER
wait is the origin version configured with SYNC_MASTER
async is the origin version configured with ASYNC_MASTER

TPS
image
image
Latency
image
image

The result shows that the optimized version's performance is close to ASYNC_MASTER.

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

… reduce produce latency and improve throughput
@shenhui0509 shenhui0509 changed the title [improvement] SYNC_MASTER could be change into pipeline manner #1515 [improvement] SYNC_MASTER could be change into pipeline manner Oct 10, 2019
@shenhui0509
Copy link
Author

ping @duhenglucky @vongosling

@duhenglucky
Copy link
Contributor

@shenhui0509 good job, thanks for your contribution, we will review this PR ASAP.

@duhenglucky duhenglucky added this to the 4.7.0 milestone Oct 10, 2019
@duhenglucky
Copy link
Contributor

@shenhui0509 and it would be nice if you can pull this request to develop branch

@shenhui0509 shenhui0509 changed the base branch from master to develop October 10, 2019 11:16
@shenhui0509
Copy link
Author

@shenhui0509 and it would be nice if you can pull this request to develop branch

done

@shenhui0509
Copy link
Author

Hi, @duhenglucky, is it necessary to write a RIP? If needed, I'd like to write one.

@xujianhai666
Copy link
Member

Hi, @duhenglucky, is it necessary to write a RIP? If needed, I'd like to write one.

@shenhui0509 @duhenglucky I think this pr is so important, A RIP is necessary

@duhenglucky duhenglucky changed the title [improvement] SYNC_MASTER could be change into pipeline manner [ISSUE #1515] SYNC_MASTER could be change into pipeline manner Oct 12, 2019
@duhenglucky
Copy link
Contributor

duhenglucky commented Oct 12, 2019

@xujianhai666 good suggestion, @shenhui0509 This PR is indeed a relatively big change, and on the critical path, it is best to write a RIP to describe the scope of the relevant changes, as well as the key design in this PR, if you have any other question, please feel free ask me directly.

@coveralls
Copy link

coveralls commented Oct 14, 2019

Coverage Status

Coverage increased (+0.05%) to 50.534% when pulling b143ff6 on shenhui0509:sync_pipeline into 44569e4 on apache:develop.

Copy link
Contributor

@RongtongJin RongtongJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does improve TPS and reduce latency when synchronized replication (Verified by benchmark). There are some issues:

  1. Transaction message does not seem to work after modification.
  2. [DISCUSS] When slave down, master return FLUSH_SLAVE_TIMEOUT instead of SLAVE_NOT_AVALIABLE.

shenhui.backend added 2 commits October 16, 2019 14:01
2. fix properties when store message
3. add IT for transaction
4. correct resonse code when slave down
@shenhui0509
Copy link
Author

This PR does improve TPS and reduce latency when synchronized replication (Verified by benchmark). There are some issues:

  1. Transaction message does not seem to work after modification.
  2. [DISCUSS] When slave down, master return FLUSH_SLAVE_TIMEOUT instead of SLAVE_NOT_AVALIABLE.
  1. I missed msg properties when storage.
  2. I agree to return SLAVE_NOT_AVAILABLE.
    Both 1 and 2 are fixed, and add some UT and IT for transaction message. Plz take a look at the new commit

final RemotingCommand response = pair.getObject1().processRequest(ctx, cmd);
doAfterRpcHooks(RemotingHelper.parseChannelRemoteAddr(ctx.channel()), cmd, response);
CompletableFuture<RemotingCommand> responseFuture = pair.getObject1().asyncProcessRequest(ctx, cmd);
responseFuture.thenAccept((r) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this anonymous variable r should be changed to response for readability

String originMsgId = MessageAccessor.getOriginMessageId(msgExt);
MessageAccessor.setOriginMessageId(msgInner, UtilAll.isBlank(originMsgId) ? msgExt.getMsgId() : originMsgId);
CompletableFuture<PutMessageResult> putMessageResult = this.brokerController.getMessageStore().asyncPutMessage(msgInner);
return putMessageResult.thenApply((r) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this anonymous variable r should be changed to response for readability

SendMessageContext sendMessageContext,
ChannelHandlerContext ctx,
int queueIdInt) {
return putMessageResult.thenApply((r) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this anonymous variable r should be changed to putMessageResult for readability

@vongosling
Copy link
Member

Very great optimization. I will pick some time to work with this pr in the next week.

@duhenglucky duhenglucky modified the milestones: 4.7.0, 4.6.2 Feb 28, 2020
@duhenglucky duhenglucky merged commit 64e4ca7 into apache:develop Feb 28, 2020
JiaMingLiu93 pushed a commit to JiaMingLiu93/rocketmq that referenced this pull request May 28, 2020
[ISSUE apache#1515] SYNC_MASTER could be change into pipeline manner
GenerousMan pushed a commit to GenerousMan/rocketmq that referenced this pull request Aug 12, 2022
[ISSUE apache#1515] SYNC_MASTER could be change into pipeline manner
pulllock pushed a commit to pulllock/rocketmq that referenced this pull request Oct 19, 2023
[ISSUE apache#1515] SYNC_MASTER could be change into pipeline manner
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants