New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24850 CellComparator perf improvement #2776
HBASE-24850 CellComparator perf improvement #2776
Conversation
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion of another simplification. See what you think
import org.apache.yetus.audience.InterfaceAudience; | ||
|
||
/** | ||
* A marker interface that indicates that the cells follow the KV serialization pattern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say a bit more what you mean here... if you make a new patch (nit)
/** | ||
* @return Qualifier offset | ||
*/ | ||
int getQualifierOffset(int foffset, int flength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, these methods are not part of the Cell Interface. They are package private. The CellComparator is in same package. It makes use of these package private methods. I think that is fine. I don't think we need the marker interface in this case given all current Cell implementations are 'contiguous'? I say this because there are already too many Cell Interfaces. Lets worry about adding a Marker interface later, when we are adding a new Cell type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Be careful @ramkrish86 .. you are missing the JIRA from this PR summary. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
matcher where we have more cols
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
*/ | ||
public final static int compare(final ContiguousCellFormat l, final ContiguousCellFormat r, boolean ignoreSequenceid) { | ||
int diff = 0; | ||
if (l instanceof KeyValue && r instanceof KeyValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be abstract this into a separate method, say KVCompare? Per discussion in the jira, this size of this method is too big to be inlined.
// everything. | ||
return (0xff & rightType) - (0xff & leftType); | ||
|
||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here.
// appears ahead of everything, and minimum (0) appears after | ||
// everything. | ||
return (0xff & rightType) - (0xff & leftType); | ||
} else if (l instanceof KeyValue && r instanceof ByteBufferKeyValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For } else if (l instanceof ByteBufferKeyValue && r instanceof KeyValue) {
and } else if (l instanceof KeyValue && r instanceof ByteBufferKeyValue) { cases, essentially, there are same.
Can we have a method for BBKVCompare() and swap parameters for these two cases?
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@huaxiangsun , @saintstack |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Added some comments. If required we can do those in followup isssues also.
int lqlength = left.getQualifierLength(); | ||
int rqlength = right.getQualifierLength(); | ||
// match length | ||
if ((lrowlength + lfamlength + lqlength) != (rrowlength + rfamlength + rqlength)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More efficient way is ((lrowlength == rrowlength) && (lfamlength == rfamlength) && (lqlength == rqlength) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the @anoopsjohn suggestion
int diff = 0; | ||
// "Peel off" the most common path. | ||
if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) { | ||
diff = BBKVComparator.compare((ByteBufferKeyValue)a, (ByteBufferKeyValue)b, ignoreSequenceid); | ||
if (l instanceof KeyValue && r instanceof KeyValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In read path mostly both cells will be either KV or BBKV. In write path we have BBKV now? I think some patches did that for master branch already. we create the ChunkCell or so. Is that reverted? Pls check.
So better to do both KV check and then both BBKV followed by other 2 options?
Te perf tests u did now was giving all cells are KVs right? is it possible to test for both cells as BBKV also? That will be better. May be a JMH test?
The split of private methods came good now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf tests were PE @ramkrish86 ? Yeah, we need a JMH test so can do the combinations @anoopsjohn suggests. I can work on that...
return -diff; | ||
} | ||
// if ignoreSequenceid then the diff will be 0. | ||
return ignoreSequenceid ? diff : Long.compare(r.getSequenceId(), l.getSequenceId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line needed here? All other branches fall back to the last line in this method. Any specific reason for doing it here?
if (diff != 0) { | ||
return diff; | ||
} | ||
} else { | ||
diff = compareRows(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract this also to a private method pls?
// No need of left row length below here. | ||
|
||
byte leftType = left.getTypeByte(leftKeyLength); | ||
if (leftFamilyLength + leftQualifierLength == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reverse the check. Mostly leftType wont be Minimum type and so we can skip the '+' operator eval. These were like this before.. But can we focus on these small small optimization also now? That would be good.
// No need of right row length below here. | ||
|
||
byte rightType = right.getTypeByte(rightKeyLength); | ||
if (rightFamilyLength + rightQualifierLength == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here also
|
||
// Timestamps. | ||
// Swap order we pass into compare so we get DESCENDING order. | ||
diff = Long.compare(right.getTimestamp(rightKeyLength), left.getTimestamp(leftKeyLength)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suggesting for doing in this patch.
But may be we can avoid the long decodes and do compare but instead bytes level compare we can do. We are not interested in the exact diff here anyways. A compare value of 0 or +ve number of -ve number. Add a TODO so that we can come back and do some JMH tests and if that helps, we can incorporate in all places. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea. Wonder if we can do this elsewhere?
} else if ((left instanceof ByteBufferKeyValue) && (right instanceof KeyValue)) { | ||
return compareQualifiers((ByteBufferKeyValue) left, (KeyValue) right); | ||
} else { | ||
if (left instanceof ByteBufferExtendedCell && right instanceof ByteBufferExtendedCell) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have ByteBufferKeyValue which extends ByteBufferExtendedCell and other sub types also. May be its time to reduce these types. I see we have KeyOnlyByteBufferExtendedCell.. But there is some ways.. Same way will see how we can have the DBE seekers also returning Cells of type KV or BBKV. So the optimizations will be used in case of scan over DBEed table:cf. Will do it in a followup issue. I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Not in this issue but file a follow-on.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
All clear in this now. I will push this by EOD IST. |
@@ -920,6 +930,14 @@ public static boolean matchingColumn(final Cell left, final Cell right) { | |||
return matchingQualifier(left, right); | |||
} | |||
|
|||
public static boolean matchingColumn(final Cell left, final byte lFamLen, final int lQualLength, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have to be public? They are in same package as comparator and Cell types.
byte lfamlength = left.getFamilyLength(); | ||
byte rfamlength = right.getFamilyLength(); | ||
int lqlength = left.getQualifierLength(); | ||
int rqlength = right.getQualifierLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are decoding ints, bytes and shorts that we might end up not using at all.
/** | ||
* @return Qualifier offset | ||
*/ | ||
int getQualifierOffset(int foffset, int flength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
int diff = 0; | ||
// "Peel off" the most common path. | ||
if (a instanceof ByteBufferKeyValue && b instanceof ByteBufferKeyValue) { | ||
diff = BBKVComparator.compare((ByteBufferKeyValue)a, (ByteBufferKeyValue)b, ignoreSequenceid); | ||
if (l instanceof KeyValue && r instanceof KeyValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf tests were PE @ramkrish86 ? Yeah, we need a JMH test so can do the combinations @anoopsjohn suggests. I can work on that...
// negate- Findbugs will complain? | ||
return -diff; | ||
} | ||
} else if (l instanceof ByteBufferKeyValue && r instanceof ByteBufferKeyValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if/else is a lot of code. Would be good to break it out... check for it at top of the compare.... For a follow-on.
// appears ahead of everything, and minimum (0) appears after | ||
// everything. | ||
return (0xff & rightType) - (0xff & leftType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In follow-on, we should look at produced byte code. Might get ideas on how to make this go faster.
@saintstack - finished all comments. Will push this unless objections. |
@ramkrish86 I think there is an @anoopsjohn ask in the above and I have a question on how the test was run but otherwise, looks good to me. |
@saintstack - The test I used to verify this was a standalone test (not JMH) where I measure the E2E time for adding millions of cells to the tree map. Some thing that you see in https://issues.apache.org/jira/browse/HBASE-24754. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is at least a
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
* Using ContiguousCellFormat as a marker alone * Commit the new file * Fix the comparator logic that was an oversight * Fix the sequenceId check order * Adding few more static methods that helps in scan flow like query matcher where we have more cols * Remove ContiguousCellFormat and ensure compare() can be inlined * applying negation as per review comment * Fix checkstyle comments * fix review comments * Address review comments Signed-off-by: stack <stack@apache.org> Signed-off-by: AnoopSamJohn <anoopsamjohn@apache.org> Signed-off-by: huaxiangsun <huaxiangsun@apache.org>
* Using ContiguousCellFormat as a marker alone * Commit the new file * Fix the comparator logic that was an oversight * Fix the sequenceId check order * Adding few more static methods that helps in scan flow like query matcher where we have more cols * Remove ContiguousCellFormat and ensure compare() can be inlined * applying negation as per review comment * Fix checkstyle comments * fix review comments * Address review comments Signed-off-by: stack <stack@apache.org> Signed-off-by: AnoopSamJohn <anoopsamjohn@apache.org> Signed-off-by: huaxiangsun <huaxiangsun@apache.org>
* Using ContiguousCellFormat as a marker alone * Commit the new file * Fix the comparator logic that was an oversight * Fix the sequenceId check order * Adding few more static methods that helps in scan flow like query matcher where we have more cols * Remove ContiguousCellFormat and ensure compare() can be inlined * applying negation as per review comment * Fix checkstyle comments * fix review comments * Address review comments Signed-off-by: stack <stack@apache.org> Signed-off-by: AnoopSamJohn <anoopsamjohn@apache.org> Signed-off-by: huaxiangsun <huaxiangsun@apache.org>
* Using ContiguousCellFormat as a marker alone * Commit the new file * Fix the comparator logic that was an oversight * Fix the sequenceId check order * Adding few more static methods that helps in scan flow like query matcher where we have more cols * Remove ContiguousCellFormat and ensure compare() can be inlined * applying negation as per review comment * Fix checkstyle comments * fix review comments * Address review comments Signed-off-by: stack <stack@apache.org> Signed-off-by: AnoopSamJohn <anoopsamjohn@apache.org> Signed-off-by: huaxiangsun <huaxiangsun@apache.org> (cherry picked from commit 5da3cde) Change-Id: I84b53eb14c277f387869f03b8ca2ff83d103411d
This is much simpler version. Instead of adding the APIs to ContiguousCellFormat we are just using as a marker. If so we directly use the package private methods of those cells - currently Only KV and BBKV are marked as ContiguousCellFormat . So this is simple and much easier to understand.