Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27149 Server should close scanner if client times out before results are ready #4604

Merged
merged 1 commit into from
Jul 12, 2022

Conversation

bbeaudreault
Copy link
Contributor

No description provided.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 7s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 2m 31s master passed
+1 💚 compile 2m 20s master passed
+1 💚 checkstyle 0m 31s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
+1 💚 spotbugs 1m 22s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 25s the patch passed
+1 💚 compile 2m 32s the patch passed
+1 💚 javac 2m 32s the patch passed
+1 💚 checkstyle 0m 32s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 12m 42s Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.
+1 💚 spotless 0m 51s patch has no errors when running spotless:check.
+1 💚 spotbugs 1m 50s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 12s The patch does not generate ASF License warnings.
35m 39s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #4604
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 50fd653b45ad 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2197b38
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 64 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 20s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 26s master passed
+1 💚 compile 0m 39s master passed
+1 💚 shadedjars 3m 43s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 15s the patch passed
+1 💚 compile 0m 39s the patch passed
+1 💚 javac 0m 39s the patch passed
+1 💚 shadedjars 3m 42s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 24s the patch passed
_ Other Tests _
+1 💚 unit 207m 17s hbase-server in the patch passed.
223m 35s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #4604
Optional Tests javac javadoc unit shadedjars compile
uname Linux 25c40a3fda46 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2197b38
Default Java AdoptOpenJDK-1.8.0_282-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/testReport/
Max. process+thread count 2872 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 10s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 2s master passed
+1 💚 compile 0m 51s master passed
+1 💚 shadedjars 3m 54s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 29s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 54s the patch passed
+1 💚 compile 0m 50s the patch passed
+1 💚 javac 0m 50s the patch passed
+1 💚 shadedjars 3m 56s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 26s the patch passed
_ Other Tests _
+1 💚 unit 214m 34s hbase-server in the patch passed.
233m 15s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #4604
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6e6adc438ecf 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2197b38
Default Java AdoptOpenJDK-11.0.10+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/testReport/
Max. process+thread count 2526 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4604/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@apurtell apurtell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bbeaudreault bbeaudreault merged commit 886e319 into apache:master Jul 12, 2022
@bbeaudreault bbeaudreault deleted the HBASE-27149 branch July 12, 2022 02:22
bbeaudreault added a commit that referenced this pull request Jul 12, 2022
…sults are ready (#4604)

Signed-off-by: Andrew Purtell <apurtell@apache.org>
bbeaudreault added a commit that referenced this pull request Jul 12, 2022
…sults are ready (#4604)

Signed-off-by: Andrew Purtell <apurtell@apache.org>
bbeaudreault added a commit to HubSpot/hbase that referenced this pull request Jul 12, 2022
…imes out before results are ready (apache#4604)

Signed-off-by: Andrew Purtell <apurtell@apache.org>
@fanweneddie
Copy link

I have a question about the sentence

In my experience, if a server becomes overwhelmed, client scanners can start to time out. Those scanners live on on the server, contributing to memory and resource pressure. That further slows down the server, etc.

in issue HBASE-27149. Could anyone explain the evidence that shows the server is slowed down? I am trying to reproduce this error, but I find that the scanning time after the timeout scans still does not change.

@bbeaudreault
Copy link
Contributor Author

This issue has been fixed for about 6 months, so I don't have any evidence to hand anymore. You won't be able to reproduce it if you have the patch.

Basically you can see that the scanner holds resources on the server side prior to this patch, even if the client times out. Ideally I'd the client times out it will try to close the scanner with another RPC. But if the server is overloaded that close RPC won't even make it to the server. In this case the scan lives on. It's not actively scanning data but just holding memory. There is an activeScanners metric you can inspect in these cases.

Anyway no I can't provide any hard data at this point but hopefully that should give you some pointers for what to look for, as long as you don't have this patch.

It's also possible to mitigate this by setting the server side lease period low. But this is hard to do in a multi tenant environment where one timeout does not fit all cases.

@bbeaudreault
Copy link
Contributor Author

In terms of how to tell if server is slowed down enough. Look at queue times and response times, which might both be in the seconds. You'd expect to see a similar saturation in cpu and/or disk metrics.

@fanweneddie
Copy link

Thank you for your quick reply!

But if the server is overloaded that close RPC won't even make it to the server. In this case the scan lives on

Yes, I find that if I set HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD to a large value (say 60000), then those scanners are not removed by method leaseExpired().

Look at queue times and response times, which might both be in the seconds.

OK, I will try it by running on a slow container and disabling the block cache (since the scanning is fast, I have to make it slow for a better observation), and then check the response time. Thank you for your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants