Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27463 Reset sizeOfLogQueue when refresh replication source #4863

Merged
merged 2 commits into from
Nov 27, 2022

Conversation

frostruan
Copy link
Contributor

No description provided.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 25s master passed
+1 💚 compile 2m 14s master passed
+1 💚 checkstyle 0m 28s master passed
+1 💚 spotless 0m 38s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 12s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 0s the patch passed
+1 💚 compile 2m 9s the patch passed
+1 💚 javac 2m 9s the patch passed
+1 💚 checkstyle 0m 27s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 7m 47s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚 spotless 0m 35s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 18s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
27m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4863
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux a74866d7b672 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 655f19c
Default Java Temurin-1.8.0_352-b08
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 3m 15s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 39s master passed
+1 💚 compile 0m 41s master passed
+1 💚 shadedjars 3m 55s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 29s the patch passed
+1 💚 compile 0m 42s the patch passed
+1 💚 javac 0m 42s the patch passed
+1 💚 shadedjars 3m 54s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 22s the patch passed
_ Other Tests _
+1 💚 unit 197m 16s hbase-server in the patch passed.
221m 0s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4863
Optional Tests javac javadoc unit shadedjars compile
uname Linux dbc7506af0b0 5.4.0-1085-aws #92~18.04.1-Ubuntu SMP Wed Aug 31 17:21:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 655f19c
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/testReport/
Max. process+thread count 3018 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 29s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 40s master passed
+1 💚 compile 0m 39s master passed
+1 💚 shadedjars 3m 45s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 13s the patch passed
+1 💚 compile 0m 39s the patch passed
+1 💚 javac 0m 39s the patch passed
+1 💚 shadedjars 3m 46s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s the patch passed
_ Other Tests _
+1 💚 unit 217m 31s hbase-server in the patch passed.
238m 44s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4863
Optional Tests javac javadoc unit shadedjars compile
uname Linux 7596ecd6e947 5.4.0-124-generic #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 655f19c
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/testReport/
Max. process+thread count 2558 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -467,6 +467,9 @@ public void refreshSources(String peerId) throws IOException {
ReplicationSourceInterface toRemove = this.sources.put(peerId, src);
if (toRemove != null) {
LOG.info("Terminate replication source for " + toRemove.getPeerId());
// Reset sizeOfLogQueue, log will re enqueue to the created new source.
toRemove.getSourceMetrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frostruan Do you think we need to reset just size of log queue metric or other source metrics also? IMO we should reset all the metrics and start with clean state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @shahrs87

Currently we monitor the replication status by the sizeOfLogQueue metric, so I just reset the sizeOfLogQueue here. I'll check if there is any other metrics need to be reset. Thanks for your suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any rules about which metrics should be reset while others should not?

Copy link
Contributor Author

@frostruan frostruan Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review Duo.@Apache9

Sorry, currently have no idea about the rule you mentioned, maybe the rule is a little complicated.

I think what makes the rule complicated is that we first create the new ReplicationSource, then replace the old ReplicationSource with the new one, and then if the old ReplicationSource exists, terminate it. And since HBASE-23231, we will not clear the old metrics when terminate the old ReplicationSource to avoid the metric for the new ReplicationSource being cleared. Then it's a little complicated to keep the new ReplicationSourceMetric and the GlobalReplicationSourceMetric right and consistent.

If we adjust the order, first terminate the old ReplicationSource if it exists and then create and register the new ReplicationSource, the logic of metric here maybe will be much simpler, and of course we can clear the metric when terminate the old ReplicationSource. But I'm not sure yet, still need to confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will effectively undo HBASE-23231 @Apache9 Since you were one of the reviewer on HBASE-23231, do you think it is safe to do this?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 7s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 4m 2s master passed
+1 💚 compile 3m 2s master passed
+1 💚 checkstyle 0m 36s master passed
+1 💚 spotless 0m 48s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 48s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 29s the patch passed
+1 💚 compile 2m 44s the patch passed
+1 💚 javac 2m 44s the patch passed
+1 💚 checkstyle 0m 37s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 13m 33s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚 spotless 0m 58s patch has no errors when running spotless:check.
+1 💚 spotbugs 2m 25s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 18s The patch does not generate ASF License warnings.
43m 11s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4863
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 64ce1c7fca40 5.4.0-1085-aws #92~18.04.1-Ubuntu SMP Wed Aug 31 17:21:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 41c7bd3
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 77 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 12s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 1s master passed
+1 💚 compile 1m 1s master passed
+1 💚 shadedjars 4m 32s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 31s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 24s the patch passed
+1 💚 compile 0m 50s the patch passed
+1 💚 javac 0m 50s the patch passed
+1 💚 shadedjars 4m 30s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 29s the patch passed
_ Other Tests _
-1 ❌ unit 214m 29s hbase-server in the patch failed.
239m 51s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4863
Optional Tests javac javadoc unit shadedjars compile
uname Linux f9656020f7e7 5.4.0-1085-aws #92~18.04.1-Ubuntu SMP Wed Aug 31 17:21:08 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 41c7bd3
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/testReport/
Max. process+thread count 2434 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 20s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 10s master passed
+1 💚 compile 0m 39s master passed
+1 💚 shadedjars 3m 45s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 11s the patch passed
+1 💚 compile 0m 39s the patch passed
+1 💚 javac 0m 39s the patch passed
+1 💚 shadedjars 3m 45s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 23s the patch passed
_ Other Tests _
-1 ❌ unit 229m 43s hbase-server in the patch failed.
247m 56s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4863
Optional Tests javac javadoc unit shadedjars compile
uname Linux 2dc3b709f90c 5.4.0-124-generic #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 41c7bd3
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/testReport/
Max. process+thread count 2416 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4863/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache9
Copy link
Contributor

Apache9 commented Nov 17, 2022

So the fix here is to create the new replication source after terminating the old replication source.

First, how could this fix the problem?
Second, I guess why we create the replication source outside the lock is to reduce the locking time, but anyway, IIRC the start up of a replication source is asynchronous, so probably it is OK to move it into the lock protection.

@frostruan
Copy link
Contributor Author

So the fix here is to create the new replication source after terminating the old replication source.

First, how could this fix the problem? Second, I guess why we create the replication source outside the lock is to reduce the locking time, but anyway, IIRC the start up of a replication source is asynchronous, so probably it is OK to move it into the lock protection.

Thanks for reviewing Duo @Apache9

The problem arises like this

  1. Imagine we have a replication source A whose sizeOfLogQueue is x, and the global sizeOfLogQueue is y.
  2. When disable the peer, we will create a new replication source A', and enqueue the log queue, that makes the sizeOfLogQueue of A' also grows to x.
    https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L473
  3. To avoid the metric being cleared, we introduce a flag to keep the old metric. See HBASE-23231 for more details.
    As a result, the global sizeOfLogQueue is x+y, however actually it should still be y.
  4. In the first commit, I only decrease the global sizeOfLogQueue when terminate the old replication source, but @shahrs87 point out that maybe there are metrics should also be reset, for example the sizeOfHFileRefsQueue.

So I think maybe we can change the order, terminate old replication first and then create the new one. That will make the problem less complicated.

@Apache9
Copy link
Contributor

Apache9 commented Nov 23, 2022

So the fix here is to create the new replication source after terminating the old replication source.
First, how could this fix the problem? Second, I guess why we create the replication source outside the lock is to reduce the locking time, but anyway, IIRC the start up of a replication source is asynchronous, so probably it is OK to move it into the lock protection.

Thanks for reviewing Duo @Apache9

The problem arises like this

  1. Imagine we have a replication source A whose sizeOfLogQueue is x, and the global sizeOfLogQueue is y.
  2. When disable the peer, we will create a new replication source A', and enqueue the log queue, that makes the sizeOfLogQueue of A' also grows to x.
    https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L473
  3. To avoid the metric being cleared, we introduce a flag to keep the old metric. See HBASE-23231 for more details.
    As a result, the global sizeOfLogQueue is x+y, however actually it should still be y.
  4. In the first commit, I only decrease the global sizeOfLogQueue when terminate the old replication source, but @shahrs87 point out that maybe there are metrics should also be reset, for example the sizeOfHFileRefsQueue.

So I think maybe we can change the order, terminate old replication first and then create the new one. That will make the problem less complicated.

OK, so if we terminate the replication source first, we do not need to keep the old metrics, just let it decrease and after we create the new replication source, the metrics will be restored?

@frostruan
Copy link
Contributor Author

yes. I think so.

@Apache9
Copy link
Contributor

Apache9 commented Nov 23, 2022

@shahrs87 Do you have any other concerns?

@shahrs87
Copy link
Contributor

@frostruan Thank you for the update. I think this is more cleaner way. +1.

@Apache9 Apache9 merged commit bb9f43c into apache:master Nov 27, 2022
Apache9 pushed a commit that referenced this pull request Nov 27, 2022
Co-authored-by: huiruan <huiruan@tencent.com>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Rushabh Shah <shahrs87@gmail.com>
(cherry picked from commit bb9f43c)
Apache9 pushed a commit that referenced this pull request Nov 27, 2022
Co-authored-by: huiruan <huiruan@tencent.com>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Rushabh Shah <shahrs87@gmail.com>
(cherry picked from commit bb9f43c)
Apache9 pushed a commit that referenced this pull request Nov 27, 2022
Co-authored-by: huiruan <huiruan@tencent.com>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Rushabh Shah <shahrs87@gmail.com>
(cherry picked from commit bb9f43c)
vinayakphegde pushed a commit to vinayakphegde/hbase that referenced this pull request Apr 4, 2024
…che#4863)

Co-authored-by: huiruan <huiruan@tencent.com>
Signed-off-by: Duo Zhang <zhangduo@apache.org>
Reviewed-by: Rushabh Shah <shahrs87@gmail.com>
(cherry picked from commit bb9f43c)
(cherry picked from commit 5ce1d8f)
Change-Id: I9f8a19286df32d68de472e8f4b3f8f7926551178
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants