Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-24850 CellComparator perf improvement #2747

Closed

Conversation

ramkrish86
Copy link
Contributor

Closed the original PR due to some issues with my linux/windows environment toggling. Created a new PR which can compile too.
This version of the patch tries to introduce an interface ContiguousCellFormat which understands the KV format where the data is arranged in the KV serialization format.
It tries to minimize the branching in cases of pure Kv or pure ByteBufferKV. with this patch and JMH like test with adding >100MB of data getting added to Memstore like CSLM provides >50% improvement where all the cells are pure KVs.

We did some cluster testing with only KV as the cell type and also with no DBEs. We might need some more tests to ensure we don't break anything.
In this commit apart from having the ContiguousCellComparator, We also found that the bulk load performance was slower inspite of overall improving the comparator performance by above 15%.
The reason was that PutsortReducer - get a given row with all the cells for that row and that gets written to the hfile. So effectively it is one row that is geting added to the map. Now even when cases where there are 300 cells in a row, the optimization that we expect out of ContiguousCellComparator changes does not kick in. That is due to the various branches we still have in the code and the number of cells for the optimization to kick in is still lesser.
For those cases if we can bring up the KVComparator again (currently it is deprecated - see the PutsortReducer changes in the patch) and use that KVComparator specifically for these bulk load type of cases then we are performing 15% faster than 1.3 branch. This is in line with what we are trying to do in https://issues.apache.org/jira/browse/HBASE-24754.
I can open up a discussion thread with all the details in the dev@ for others to chime in.
@anoopsjohn , @saintstack - FYI.

@ramkrish86 ramkrish86 changed the title Branch 2.3 hbase 24850 HBASE-24850 CellComparator perf improvement Dec 8, 2020
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 18s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2.3 Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 4m 23s branch-2.3 passed
+1 💚 checkstyle 0m 52s branch-2.3 passed
+1 💚 spotbugs 1m 46s branch-2.3 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 4m 31s the patch passed
-0 ⚠️ checkstyle 0m 32s hbase-common: The patch generated 2 new + 192 unchanged - 1 fixed = 194 total (was 193)
-0 ⚠️ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 22m 31s Patch does not cause any errors with Hadoop 2.10.0 or 3.1.2 3.2.1.
+1 💚 spotbugs 1m 48s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 22s The patch does not generate ASF License warnings.
48m 14s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2747
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 27e8f81d2918 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.3 / 7b1e3e9
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 84 (vs. ulimit of 12500)
modules C: hbase-common hbase-mapreduce U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=3.1.12
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 42s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.3 Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for branch
+1 💚 mvninstall 5m 38s branch-2.3 passed
+1 💚 compile 1m 5s branch-2.3 passed
+1 💚 shadedjars 7m 57s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 19s hbase-common in branch-2.3 failed.
-0 ⚠️ javadoc 0m 25s hbase-mapreduce in branch-2.3 failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for patch
+1 💚 mvninstall 5m 1s the patch passed
+1 💚 compile 1m 4s the patch passed
+1 💚 javac 1m 4s the patch passed
+1 💚 shadedjars 7m 28s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 19s hbase-common in the patch failed.
-0 ⚠️ javadoc 0m 23s hbase-mapreduce in the patch failed.
_ Other Tests _
+1 💚 unit 2m 0s hbase-common in the patch passed.
-1 ❌ unit 64m 33s hbase-mapreduce in the patch failed.
99m 8s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2747
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4476f2344a07 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.3 / 7b1e3e9
Default Java AdoptOpenJDK-11.0.6+10
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-mapreduce.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-mapreduce.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/testReport/
Max. process+thread count 3261 (vs. ulimit of 12500)
modules C: hbase-common hbase-mapreduce U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 8s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.3 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 30s branch-2.3 passed
+1 💚 compile 0m 48s branch-2.3 passed
+1 💚 shadedjars 5m 3s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 42s branch-2.3 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 3m 16s the patch passed
+1 💚 compile 0m 50s the patch passed
+1 💚 javac 0m 50s the patch passed
+1 💚 shadedjars 5m 4s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 22s hbase-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
_ Other Tests _
+1 💚 unit 1m 33s hbase-common in the patch passed.
-1 ❌ unit 75m 11s hbase-mapreduce in the patch failed.
99m 18s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2747
Optional Tests javac javadoc unit shadedjars compile
uname Linux 006df8f5ceeb 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.3 / 7b1e3e9
Default Java AdoptOpenJDK-1.8.0_232-b09
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk8-hadoop2-check/output/diff-javadoc-javadoc-hbase-common.txt
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/testReport/
Max. process+thread count 3538 (vs. ulimit of 12500)
modules C: hbase-common hbase-mapreduce U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 17s Docker mode activated.
-0 ⚠️ yetus 0m 6s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.3 Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 31s branch-2.3 passed
+1 💚 compile 0m 48s branch-2.3 passed
+1 💚 shadedjars 5m 4s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s branch-2.3 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 17s Maven dependency ordering for patch
+1 💚 mvninstall 3m 13s the patch passed
+1 💚 compile 0m 50s the patch passed
+1 💚 javac 0m 50s the patch passed
+1 💚 shadedjars 5m 4s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 21s hbase-common generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
_ Other Tests _
+1 💚 unit 1m 28s hbase-common in the patch passed.
+1 💚 unit 11m 46s hbase-mapreduce in the patch passed.
36m 20s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR #2747
Optional Tests javac javadoc unit shadedjars compile
uname Linux e7d5a2c89ada 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.3 / 7b1e3e9
Default Java AdoptOpenJDK-1.8.0_232-b09
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk8-hadoop2-check/output/diff-javadoc-javadoc-hbase-common.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/testReport/
Max. process+thread count 3551 (vs. ulimit of 12500)
modules C: hbase-common hbase-mapreduce U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 32s Docker mode activated.
-0 ⚠️ yetus 0m 7s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ branch-2.3 Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for branch
+1 💚 mvninstall 3m 55s branch-2.3 passed
+1 💚 compile 0m 53s branch-2.3 passed
+1 💚 shadedjars 5m 49s branch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 17s hbase-common in branch-2.3 failed.
-0 ⚠️ javadoc 0m 21s hbase-mapreduce in branch-2.3 failed.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 55s the patch passed
+1 💚 compile 0m 55s the patch passed
+1 💚 javac 0m 55s the patch passed
+1 💚 shadedjars 5m 48s patch has no errors when building our shaded downstream artifacts.
-0 ⚠️ javadoc 0m 17s hbase-common in the patch failed.
-0 ⚠️ javadoc 0m 21s hbase-mapreduce in the patch failed.
_ Other Tests _
+1 💚 unit 1m 46s hbase-common in the patch passed.
+1 💚 unit 9m 48s hbase-mapreduce in the patch passed.
36m 31s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2747
Optional Tests javac javadoc unit shadedjars compile
uname Linux a92c97eeedfc 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.3 / 7b1e3e9
Default Java AdoptOpenJDK-11.0.6+10
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-common.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-mapreduce.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-common.txt
javadoc https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/testReport/
Max. process+thread count 4953 (vs. ulimit of 12500)
modules C: hbase-common hbase-mapreduce U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ branch-2.3 Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 5m 7s branch-2.3 passed
+1 💚 checkstyle 0m 49s branch-2.3 passed
+1 💚 spotbugs 1m 38s branch-2.3 passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 4m 23s the patch passed
-0 ⚠️ checkstyle 0m 31s hbase-common: The patch generated 2 new + 192 unchanged - 1 fixed = 194 total (was 193)
-0 ⚠️ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 hadoopcheck 23m 58s Patch does not cause any errors with Hadoop 2.10.0 or 3.1.2 3.2.1.
+1 💚 spotbugs 2m 16s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 26s The patch does not generate ASF License warnings.
51m 45s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2747
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 2db23ca39cc0 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision branch-2.3 / 7b1e3e9
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count 84 (vs. ulimit of 12500)
modules C: hbase-common hbase-mapreduce U: .
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/2/console
versions git=2.17.1 maven=3.6.3 spotbugs=3.1.12
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are just generalizing the BBKVComparator trick so it works for KeyValue as well as the BBKV. I like the speedup. I don't like the proliferation of branches: if BBKV, if unsafe, if onheap, if tags, if seqid, if extended cell.

We should fix this. A Cell implementation that does lazy length caching internally? Then we'd not need comparator knowing about implementation? What you think Ram?

@@ -161,11 +161,11 @@ private int getTimestampOffset() {

@Override
public byte getTypeByte() {
return ByteBufferUtils.toByte(this.buf, this.offset + this.length - 1);
return getTypeByte(this.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.length is updated as we parse? Presumes we read Cell parts in order... first the row and then the CF and then the qualifier? We are not allowed to just and read the qualifier w/o first reading row?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe this.length doesn't change? Its the key length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is the keylength only because it is BbKeyOnlyKV.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Maybe if a new patch, rename this data member to keyLength?

}

@Override
public void setSequenceId(long seqId) throws IOException {
public void setSequenceId(long seqId) {
throw new IllegalArgumentException("This is a key only Cell");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@@ -292,4 +287,49 @@ public long heapSize() {
}
return ClassSize.align(FIXED_OVERHEAD);
}

@Override
public int getFamilyLengthPosition(int rowLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff used to be private. Making it public exposes the implementation. You fellows did a mountain of work making it so we could CHANGE the implementation. This goes back on that work? At least exposing this stuff as public methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, what if a Cell implementation had data members that cached all lengths... a column family length data member and a row length data member, etc. These methods wouldn't make sense to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this stuff just stayed internal? I think you said, I did the KV one and here you are doing BB. Do we have to do more? There'd be duplication... or we could call out to a utility class of statics so the offset math could be shared....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this static utility class . So something like moving this entire code into a Util class?

For instance, what if a Cell implementation had data members that cached all lengths.
This we did not do and always we restrain from doing this because say if we add those cells to memstore most of the size will go for the cell overhead. Say in a 128M flush may be 80M only is data remaining may become only overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion.

Main point is worry about exposing implementation detail especially after all the work you lot did to change the KV so we could put another impl in place (for example, one that caches sizes as they are calculated?)

}

@Override
public int hashCode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a spot bug inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


@Override
public boolean equals(Object other) {
return super.equals(other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?


protected final ByteBuffer buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why take away the final?

If for default constructor, pass nulls to the override?

@@ -51,35 +53,38 @@
*/
public static final CellComparatorImpl COMPARATOR = new CellComparatorImpl();

private static final ContiguousCellFormatComparator contiguousCellComparator =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... when would a comparator NOT be do left-to-right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it always be a 'contiguous' left-to-right compare?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the term Contiguous was added in terms of the Kv format. In case of cells that are encoder based it is alwasy from left to right but the format may not be KV based. Hence I went with the word 'contiguous'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked but might be worth adding this note on what contiguous means to your marker interface....

}
// "Peeling off" the most common cases where the Cells backed by KV format either onheap or
// offheap
if (a instanceof ContiguousCellFormat && b instanceof ContiguousCellFormat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You want to use a comparator that has expectations about internals... that there methods it can call to speed up the compare.

Man. We have too many if/else's in the path. if BB, if tags, if sequenceid, if offheap.... if unsafe. If ByteBufferExtendedCell...

We don't yet have an implementation that is not contiguous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have too many if/else's in the path. if BB, if tags, if sequenceid, if offheap.... if unsafe. If ByteBufferExtendedCell...

Previously we had all these branches for BytebufferExtendedcell and left being normal cell and right being normal cell etc. But we were not having the optimization of knowing the internals. Now this branching is fixed and the branches are only at this place and internally no branches. Previously the compareRows() had 4 branches, compareFamilies() had 4 branches, comparequalifiers() had 4 branches and then the tags and sequenceId. This reduces those branches. I strongly favour this lesser branching but more duplicate code

* @return the key length
*/
int getKeyLength();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this not just be internal to the Cell implementation? Does it have to be exposed like this in an Interface with special comparator?

It doesn't look like it. Has to be exposed so Comparator can make use of these methods. I seen that this patch is just generalizing the behavior done when we added BBKVComparator. Argh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like it. Has to be exposed so Comparator can make use of these methods. I seen that this patch is just generalizing the behavior done when we added BBKVComparator. Argh.
Yes I agree. I wanted to make it generic. Also the fact that if we have KV and BBKV in my cells I can get similar performance with this patch because we are having the internals with us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod

@@ -76,7 +77,7 @@
* length and actual tag bytes length.
*/
@InterfaceAudience.Private
public class KeyValue implements ExtendedCell, Cloneable {
public class KeyValue implements ExtendedCell, ContiguousCellFormat, Cloneable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the methods be added to ExtendedCell or is that different concerns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need not add to ExtendedCell. ExtendedCell is for server to create cells. This new interface only the comparator can understand.

@ramkrish86
Copy link
Contributor Author

A Cell implementation that does lazy length caching internally? Then we'd not need comparator knowing about implementation? What you think Ram?
Caching everything internally will add to the overhead of the cell. That we were always restrictive and I think it is better to be so. We still do some of that in SizeCachedKV impl of the cell. But for KVs which is mostly at the memstore layer that is going to cost us in how much we flush.

@saintstack
Copy link
Contributor

@ramkrish86 Should we close this in favor of #2776 ?

@ramkrish86
Copy link
Contributor Author

Sure Stack let me close this. But before that I will ensure I read all your comments and address them in the other PR including some profiling.

@saintstack
Copy link
Contributor

@ramkrish86 Did you get a chance to read through the above (and then close this)? Thanks.

@saintstack
Copy link
Contributor

Closing. You can read the comments here even if closed @ramkrish86

@saintstack saintstack closed this Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants