Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-24850 CellComparator perf improvement #2747

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
173 changes: 0 additions & 173 deletions hbase-common/src/main/java/org/apache/hadoop/hbase/BBKVComparator.java

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import java.util.Collections;
import java.util.Iterator;
import java.util.Optional;

import org.apache.hadoop.hbase.util.ByteBufferUtils;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.util.ClassSize;
Expand All @@ -34,12 +35,9 @@
* (onheap and offheap).
*/
@InterfaceAudience.Private
public class ByteBufferKeyOnlyKeyValue extends ByteBufferExtendedCell {
public static final int FIXED_OVERHEAD = ClassSize.OBJECT + ClassSize.REFERENCE
+ (2 * Bytes.SIZEOF_INT) + Bytes.SIZEOF_SHORT;
private ByteBuffer buf;
private int offset = 0; // offset into buffer where key starts at
private int length = 0; // length of this.
public class ByteBufferKeyOnlyKeyValue extends ByteBufferKeyValue {
public static final int FIXED_OVERHEAD =
ByteBufferKeyOnlyKeyValue.FIXED_OVERHEAD + Bytes.SIZEOF_SHORT;
private short rowLen;

/**
Expand All @@ -54,8 +52,8 @@ public ByteBufferKeyOnlyKeyValue(ByteBuffer buf, int offset, int length) {
}

/**
* A setter that helps to avoid object creation every time and whenever
* there is a need to create new OffheapKeyOnlyKeyValue.
* A setter that helps to avoid object creation every time and whenever there is a need to create
* new OffheapKeyOnlyKeyValue.
* @param key
* @param offset
* @param length
Expand Down Expand Up @@ -121,7 +119,8 @@ public byte getFamilyLength() {
return getFamilyLength(getFamilyLengthPosition());
}

private byte getFamilyLength(int famLenPos) {
@Override
public byte getFamilyLength(int famLenPos) {
return ByteBufferUtils.toByte(this.buf, famLenPos);
}

Expand All @@ -143,11 +142,12 @@ public int getQualifierOffset() {

@Override
public int getQualifierLength() {
return getQualifierLength(getRowLength(), getFamilyLength());
return getQualifierLength(this.length, getRowLength(), getFamilyLength());
}

private int getQualifierLength(int rlength, int flength) {
return this.length - (int) KeyValue.getKeyDataStructureSize(rlength, flength, 0);
@Override
public int getQualifierLength(int keyLength, int rlength, int flength) {
return keyLength - (int) KeyValue.getKeyDataStructureSize(rlength, flength, 0);
}

@Override
Expand All @@ -161,11 +161,11 @@ private int getTimestampOffset() {

@Override
public byte getTypeByte() {
return ByteBufferUtils.toByte(this.buf, this.offset + this.length - 1);
return getTypeByte(this.length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.length is updated as we parse? Presumes we read Cell parts in order... first the row and then the CF and then the qualifier? We are not allowed to just and read the qualifier w/o first reading row?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe this.length doesn't change? Its the key length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is the keylength only because it is BbKeyOnlyKV.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Maybe if a new patch, rename this data member to keyLength?

}

@Override
public void setSequenceId(long seqId) throws IOException {
public void setSequenceId(long seqId) {
throw new IllegalArgumentException("This is a key only Cell");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

}

Expand Down Expand Up @@ -234,11 +234,6 @@ public int getFamilyPosition() {
return getFamilyLengthPosition() + Bytes.SIZEOF_BYTE;
}

// The position in BB where the family length is added.
private int getFamilyLengthPosition() {
return this.offset + Bytes.SIZEOF_SHORT + getRowLength();
}

@Override
public ByteBuffer getQualifierByteBuffer() {
return this.buf;
Expand Down Expand Up @@ -292,4 +287,49 @@ public long heapSize() {
}
return ClassSize.align(FIXED_OVERHEAD);
}

@Override
public int getFamilyLengthPosition(int rowLen) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stuff used to be private. Making it public exposes the implementation. You fellows did a mountain of work making it so we could CHANGE the implementation. This goes back on that work? At least exposing this stuff as public methods?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, what if a Cell implementation had data members that cached all lengths... a column family length data member and a row length data member, etc. These methods wouldn't make sense to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this stuff just stayed internal? I think you said, I did the KV one and here you are doing BB. Do we have to do more? There'd be duplication... or we could call out to a utility class of statics so the offset math could be shared....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this static utility class . So something like moving this entire code into a Util class?

For instance, what if a Cell implementation had data members that cached all lengths.
This we did not do and always we restrain from doing this because say if we add those cells to memstore most of the size will go for the cell overhead. Say in a 128M flush may be 80M only is data remaining may become only overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion.

Main point is worry about exposing implementation detail especially after all the work you lot did to change the KV so we could put another impl in place (for example, one that caches sizes as they are calculated?)

return this.offset + Bytes.SIZEOF_SHORT + rowLen;
}

@Override
public int getFamilyInternalPosition(int familyLengthPosition) {
return familyLengthPosition + Bytes.SIZEOF_BYTE;
}

@Override
public int getQualifierInternalPosition(int famOffset, byte famLength) {
return famOffset + famLength;
}

private int getTimestampOffset(final int keylength) {
return this.offset + keylength - KeyValue.TIMESTAMP_TYPE_SIZE;
}

@Override
public long getTimestamp(int keyLength) {
int tsOffset = getTimestampOffset(keyLength);
return ByteBufferUtils.toLong(this.buf, tsOffset);
}

@Override
public byte getTypeByte(int keyLength) {
return ByteBufferUtils.toByte(this.buf, this.offset + keyLength - 1);
}

@Override
public int getKeyLength() {
return this.length;
}

@Override
public boolean equals(Object other) {
return super.equals(other);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

}

@Override
public int hashCode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a spot bug inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return super.hashCode();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run w/ the jdk compile flags on to make sure all methods doing compare get inlined? I found it informative... sometimes methods would not inline or compile because too big.... Sometimes fixing this w/ refactor helped improve perf.

What about that nice stack trace you showed in the issue where you showed deep trace for hbase2 in compare but a shallow one for hbase1. As you suggested in the JIRA, yes, a deep stack trace costs... trimming it would help perf too.