Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Summer of Code] Support switch role for ha service #4236

Conversation

hzh0425
Copy link
Member

@hzh0425 hzh0425 commented May 2, 2022

What is the purpose of the change

tracking issue: #4330

We want unified log replication, using rocketmq's original HaService instead of dledger mode.
Previously, i have done the following job with @RongtongJin

  1. Add statemachine mode for dledger: Feature: add statemachine for dledger openmessaging/dledger#128
  2. Embed a strongly consistent controller based on dledger implementation in name-srv: [Summer of Code] Dledger controller #4195

In this pr, I added a haService - AutoSwitchHAService that can switch role, this ha can cooperate with the controller to achieve master-slave switching

In the follow-up work, I will modify the code of the broker to fully realize the master-slave switch

Brief changelog

The ha service protocol is:
主从复制

Verifying this change

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

@coveralls
Copy link

coveralls commented May 2, 2022

Coverage Status

Coverage increased (+0.7%) to 47.994% when pulling b23e975 on hzh0425:feature/auto-switch-ha into d26773a on apache:5.0.0-beta-dledger-controller.

@codecov-commenter
Copy link

codecov-commenter commented May 2, 2022

Codecov Report

Merging #4236 (b23e975) into 5.0.0-beta-dledger-controller (d26773a) will increase coverage by 0.55%.
The diff coverage is 71.42%.

@@                         Coverage Diff                         @@
##             5.0.0-beta-dledger-controller    #4236      +/-   ##
===================================================================
+ Coverage                            43.27%   43.82%   +0.55%     
- Complexity                            6137     6277     +140     
===================================================================
  Files                                  818      826       +8     
  Lines                                57559    58520     +961     
  Branches                              7852     7993     +141     
===================================================================
+ Hits                                 24910    25648     +738     
- Misses                               29412    29586     +174     
- Partials                              3237     3286      +49     
Impacted Files Coverage Δ
...in/java/org/apache/rocketmq/common/EpochEntry.java 0.00% <0.00%> (ø)
...g/apache/rocketmq/common/utils/CheckpointFile.java 0.00% <0.00%> (ø)
...e/rocketmq/namesrv/routeinfo/RouteInfoManager.java 69.16% <ø> (+0.09%) ⬆️
...org/apache/rocketmq/store/DefaultMessageStore.java 53.78% <50.00%> (+1.43%) ⬆️
...ketmq/store/ha/autoswitch/AutoSwitchHAService.java 50.00% <50.00%> (ø)
...java/org/apache/rocketmq/store/ha/io/HAWriter.java 54.16% <54.16%> (ø)
...mq/store/ha/autoswitch/AutoSwitchHAConnection.java 76.21% <76.21%> (ø)
...cketmq/store/ha/autoswitch/AutoSwitchHAClient.java 79.84% <79.84%> (ø)
...org/apache/rocketmq/store/ha/DefaultHAService.java 67.46% <81.25%> (+1.43%) ⬆️
...e/rocketmq/store/ha/autoswitch/EpochFileCache.java 86.80% <86.80%> (ø)
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d26773a...b23e975. Read the comment docs.

@RongtongJin RongtongJin added module/ha high availably related soc Summer of Code, hosted by Google, Alibaba, Chinese Academy of Sciences and so on labels May 2, 2022
1.add EpochStartOffset in ha protocal
2.notify AutoSwitchHAService when delete expired files
3.add more tests for AutoSwitchHAService
localEpochCache.initCacheFromEntries(this.epochCache.getAllEntries());
localEpochCache.setLastEpochEntryEndOffset(this.messageStore.getMaxPhyOffset());

final long truncateOffset = localEpochCache.findConsistentPoint(masterEpochCache);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.如果因为某些原因(比如日志被删除),找不到主备间的一致的点,应该等待人工处理,而不是继续往下走。
2.如果slave是空,是不是可以直接不走截断流程,这里正确的原因主要是找不到主备间的一致的点currentReportedOffset=-1,然后再reportSlaveMaxOffset被修正成0.

1.transfer syncFromLastFile from slave to master in handshake state
2.return false if find consistent point failed
Comment on lines +190 to +196
public long getConfirmOffset() {
long confirmOffset = this.defaultMessageStore.getMaxPhyOffset();
for (HAConnection connection : this.connectionList) {
confirmOffset = Math.min(confirmOffset, connection.getSlaveAckOffset());
}
return confirmOffset;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个后面还得改,得筛选出在SyncStateSet中的connnection然后比较位点

@RongtongJin RongtongJin merged commit 911ee34 into apache:5.0.0-beta-dledger-controller May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module/ha high availably related soc Summer of Code, hosted by Google, Alibaba, Chinese Academy of Sciences and so on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants