Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-27710 ByteBuff ref counting is too expensive for on-heap buffers #5104

Merged
merged 4 commits into from
Mar 17, 2023

Conversation

bbeaudreault
Copy link
Contributor

This felt like the cleanest way to solve this case, but open to other opinions. Whether we need to checkRefCount is directly tied to whether we use NONE recycler.

A couple other options I considered:

  • Create a new inheritance hierarchy, i.e. OnHeapSingleByteBuff, etc. This felt like it'd only complicate an already complex system.
  • Update the SingleByteBuff and MultiByteBuff constructors to take a new boolean onHeap or boolean shouldCheckRefCount. This felt more error prone because it's too easy for someone to forget to pass the correct boolean value for the corresponding recycler.

I added a basic test to validate that we only call checkRefCount for non-NONE recyclers. Beyond that, I think our existing ample coverage should suffice? Let me know if you'd like to see a particular test.

hbase-server/pom.xml Outdated Show resolved Hide resolved
@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase

This comment was marked as outdated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 3m 53s master passed
+1 💚 compile 3m 6s master passed
+1 💚 checkstyle 0m 48s master passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
+1 💚 spotbugs 2m 2s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 3m 36s the patch passed
+1 💚 compile 3m 8s the patch passed
+1 💚 javac 3m 8s the patch passed
+1 💚 checkstyle 0m 46s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 hadoopcheck 14m 2s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
+1 💚 spotbugs 2m 13s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
44m 34s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5104
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile xml
uname Linux ea0ff56bbf26 5.4.0-1097-aws #105~18.04.1-Ubuntu SMP Mon Feb 13 17:50:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / c2b64e7
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for branch
+1 💚 mvninstall 3m 22s master passed
+1 💚 compile 1m 24s master passed
+1 💚 shadedjars 5m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 40s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 52s the patch passed
+1 💚 compile 1m 4s the patch passed
+1 💚 javac 1m 4s the patch passed
+1 💚 shadedjars 5m 6s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s the patch passed
_ Other Tests _
+1 💚 unit 2m 28s hbase-common in the patch passed.
+1 💚 unit 206m 12s hbase-server in the patch passed.
234m 54s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5104
Optional Tests javac javadoc unit shadedjars compile
uname Linux 178808e9bed2 5.4.0-1097-aws #105~18.04.1-Ubuntu SMP Mon Feb 13 17:50:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / c2b64e7
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/testReport/
Max. process+thread count 2408 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 2s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 2m 48s master passed
+1 💚 compile 0m 58s master passed
+1 💚 shadedjars 4m 17s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 2m 50s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 shadedjars 4m 17s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 38s the patch passed
_ Other Tests _
+1 💚 unit 1m 53s hbase-common in the patch passed.
+1 💚 unit 224m 18s hbase-server in the patch passed.
249m 46s
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5104
Optional Tests javac javadoc unit shadedjars compile
uname Linux 1c8c92624138 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / c2b64e7
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/testReport/
Max. process+thread count 2405 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@bbeaudreault
Copy link
Contributor Author

@Apache9 thank you very much for the review. I had an idea this morning, and wonder if you have any opinion.

Currently we do this:

protected void checkRefCount() {
  ObjectUtil.checkPositive(refCnt(), REFERENCE_COUNT_NAME);
}

Calling refCnt() goes down the expensive path of getting the real refCnt numeric value.

I think what we really care about is "has this buffer been recycled". In which case, what if we added a volatile boolean to our RefCnt class which gets set to true when the Recycler is called? We don't care about synchronization since it always goes from false to true. The above method could become:

//
// in RefCnt.java
//
private volatile boolean recycled;

public boolean isRecycled() {
    return recycled;
}

@Override
protected final void deallocate() {
    this.recycler.free();
    this.recycled = true; // of note
    if (leak != null) {
      this.leak.close(this);
    }
}

//
// In ByteBuff.java
//
protected void checkRefCount() {
    Preconditions.checkState(!refCnt.isRecycled(), "ByteBuff has been recycled");
}

Of course we'd also rename the method. I plugged this into our test case and it performs similarly to this PR. The benefit of this approach is it might also speed up off-heap usages while still providing protection.

@Apache9
Copy link
Contributor

Apache9 commented Mar 16, 2023

What you proposed is the trick in netty's CompositeByteBuf, where they introduce a freed flag to indicate whether the ByteBuf is still valid.

And for AbstractReferenceCountedByteBuf, the code is like this

    @Override
    boolean isAccessible() {
        // Try to do non-volatile read for performance as the ensureAccessible() is racy anyway and only provide
        // a best-effort guard.
        return updater.isLiveNonVolatile(this);
    }

But seems we do not have access to the updater field so I think we could go with your current approach. The down side is we will add one more boolean for each ByteBuff but should be OK?

@bbeaudreault
Copy link
Contributor Author

bbeaudreault commented Mar 16, 2023

I downloaded jol-core and ran ClassLayout on RefCnt... On my platform, 24 bytes without the boolean, 32 bytes with. Not insubstantial. Despite a boolean being just 1 byte, we lose 3 bytes on internal alignment and then another 4 bytes on external/class alignment.

So it's effectively like adding a long... I guess most ByteBufferAllocators are configured in the 10s of thousands, so not a huge issue there. RefCnt is also used in BucketCache where imagine this will only matter for very large bucket cache sizes? We give 75gb to bucket cache in some cases, which equals 2-5M blocks. That'd be 40mb of space for us, which might be worth the performance tradeoff. If someone uses TB of file cache (i.e. when using object store like s3 for main storage), then it might be a lot more.

This solution is equivalent in performance to my original memory-free solution for on-heap, which is where we noticed the regression. The potential benefit is for off-heap, which I don't have performance numbers on.

For my company's case, I'd be fine to add the Boolean. For the more general case, it might make sense to only add the boolean if we can back it up with benchmarks. In that case, it might make sense to do that in a separate jira so we can solve the specific regression here first.

Let me know if that changes your opinion at all before I merge this as-is.

@Apache9
Copy link
Contributor

Apache9 commented Mar 17, 2023

Just commit it as is for now, can open another issue for improvement performance for off heap ByteBuff.

@bbeaudreault
Copy link
Contributor Author

Agreed, thanks for discussing and review.

I had another idea that we could null out the recycler after calling it, so isAccessible would be a null check on that existing reference, rather than a new boolean field check. This would not take any additional memory. But can investigate these options in another issue.

This reverts commit 9a727d1.
@bbeaudreault bbeaudreault merged commit 1673762 into apache:master Mar 17, 2023
@bbeaudreault bbeaudreault deleted the HBASE-27710 branch March 17, 2023 12:05
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 3s #5104 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.
Subsystem Report/Notes
GITHUB PR #5104
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/3/console
versions git=2.17.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 2s #5104 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.
Subsystem Report/Notes
GITHUB PR #5104
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/3/console
versions git=2.25.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 3s #5104 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.
Subsystem Report/Notes
GITHUB PR #5104
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5104/3/console
versions git=2.25.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

bbeaudreault added a commit to HubSpot/hbase that referenced this pull request Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants