HBASE-24850 CellComparator perf improvement #2747

ramkrish86 · 2020-12-08T10:14:57Z

Closed the original PR due to some issues with my linux/windows environment toggling. Created a new PR which can compile too.
This version of the patch tries to introduce an interface ContiguousCellFormat which understands the KV format where the data is arranged in the KV serialization format.
It tries to minimize the branching in cases of pure Kv or pure ByteBufferKV. with this patch and JMH like test with adding >100MB of data getting added to Memstore like CSLM provides >50% improvement where all the cells are pure KVs.

We did some cluster testing with only KV as the cell type and also with no DBEs. We might need some more tests to ensure we don't break anything.
In this commit apart from having the ContiguousCellComparator, We also found that the bulk load performance was slower inspite of overall improving the comparator performance by above 15%.
The reason was that PutsortReducer - get a given row with all the cells for that row and that gets written to the hfile. So effectively it is one row that is geting added to the map. Now even when cases where there are 300 cells in a row, the optimization that we expect out of ContiguousCellComparator changes does not kick in. That is due to the various branches we still have in the code and the number of cells for the optimization to kick in is still lesser.
For those cases if we can bring up the KVComparator again (currently it is deprecated - see the PutsortReducer changes in the patch) and use that KVComparator specifically for these bulk load type of cases then we are performing 15% faster than 1.3 branch. This is in line with what we are trying to do in https://issues.apache.org/jira/browse/HBASE-24754.
I can open up a discussion thread with all the details in the dev@ for others to chime in.
@anoopsjohn , @saintstack - FYI.

Apache-HBase · 2020-12-08T11:04:05Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	2m 18s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
		_ branch-2.3 Compile Tests _
+0 🆗	mvndep	0m 17s	Maven dependency ordering for branch
+1 💚	mvninstall	4m 23s	branch-2.3 passed
+1 💚	checkstyle	0m 52s	branch-2.3 passed
+1 💚	spotbugs	1m 46s	branch-2.3 passed
		_ Patch Compile Tests _
+0 🆗	mvndep	0m 14s	Maven dependency ordering for patch
+1 💚	mvninstall	4m 31s	the patch passed
-0 ⚠️	checkstyle	0m 32s	hbase-common: The patch generated 2 new + 192 unchanged - 1 fixed = 194 total (was 193)
-0 ⚠️	whitespace	0m 0s	The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚	hadoopcheck	22m 31s	Patch does not cause any errors with Hadoop 2.10.0 or 3.1.2 3.2.1.
+1 💚	spotbugs	1m 48s	the patch passed
		_ Other Tests _
+1 💚	asflicense	0m 22s	The patch does not generate ASF License warnings.
		48m 14s

Subsystem	Report/Notes
Docker	ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#2747
Optional Tests	dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname	Linux 27e8f81d2918 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2.3 / `7b1e3e9`
checkstyle	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-common.txt
whitespace	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/artifact/yetus-general-check/output/whitespace-eol.txt
Max. process+thread count	84 (vs. ulimit of 12500)
modules	C: hbase-common hbase-mapreduce U: .
Console output	https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2747/1/console
versions	git=2.17.1 maven=3.6.3 spotbugs=3.1.12
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org