-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9368][SQL] Support get(ordinal, dataType) generic getter in UnsafeRow. #7682
Changes from all commits
11f80a3
9989064
24a3e46
fb6ca30
0f57c55
3063788
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,7 +24,7 @@ | |
import java.util.HashSet; | ||
import java.util.Set; | ||
|
||
import org.apache.spark.sql.types.DataType; | ||
import org.apache.spark.sql.types.*; | ||
import org.apache.spark.unsafe.PlatformDependent; | ||
import org.apache.spark.unsafe.array.ByteArrayMethods; | ||
import org.apache.spark.unsafe.bitset.BitSetMethods; | ||
|
@@ -235,6 +235,41 @@ public Object get(int ordinal) { | |
throw new UnsupportedOperationException(); | ||
} | ||
|
||
@Override | ||
public Object get(int ordinal, DataType dataType) { | ||
if (dataType instanceof NullType) { | ||
return null; | ||
} else if (dataType instanceof BooleanType) { | ||
return getBoolean(ordinal); | ||
} else if (dataType instanceof ByteType) { | ||
return getByte(ordinal); | ||
} else if (dataType instanceof ShortType) { | ||
return getShort(ordinal); | ||
} else if (dataType instanceof IntegerType) { | ||
return getInt(ordinal); | ||
} else if (dataType instanceof LongType) { | ||
return getLong(ordinal); | ||
} else if (dataType instanceof FloatType) { | ||
return getFloat(ordinal); | ||
} else if (dataType instanceof DoubleType) { | ||
return getDouble(ordinal); | ||
} else if (dataType instanceof DecimalType) { | ||
return getDecimal(ordinal); | ||
} else if (dataType instanceof DateType) { | ||
return getInt(ordinal); | ||
} else if (dataType instanceof TimestampType) { | ||
return getLong(ordinal); | ||
} else if (dataType instanceof BinaryType) { | ||
return getBinary(ordinal); | ||
} else if (dataType instanceof StringType) { | ||
return getUTF8String(ordinal); | ||
} else if (dataType instanceof StructType) { | ||
return getStruct(ordinal, ((StructType) dataType).size()); | ||
} else { | ||
throw new UnsupportedOperationException("Unsupported data type " + dataType.simpleString()); | ||
} | ||
} | ||
|
||
@Override | ||
public boolean isNullAt(int ordinal) { | ||
assertIndexIsValid(ordinal); | ||
|
@@ -436,4 +471,19 @@ public String toString() { | |
public boolean anyNull() { | ||
return BitSetMethods.anySet(baseObject, baseOffset, bitSetWidthInBytes / 8); | ||
} | ||
|
||
/** | ||
* Writes the content of this row into a memory address, identified by an object and an offset. | ||
* The target memory address must already been allocated, and have enough space to hold all the | ||
* bytes in this string. | ||
*/ | ||
public void writeToMemory(Object target, long targetOffset) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this method is unused? Are there other outstanding changes that you need to push? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not for this PR, but I'm using this in my struct type support PR. |
||
PlatformDependent.copyMemory( | ||
baseObject, | ||
baseOffset, | ||
target, | ||
targetOffset, | ||
sizeInBytes | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a bug here: we should also check whether the row
isNullAt(ordinal)
, in which case we should returnnull
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me: there's a potentially subtle pitfall with UnsafeRow if someone calls a primitive type-specific accessor without first checking the nullability: in that case, we'll currently return 0 instead of throwing an error.
I'm going to add some assertions to try to see if there are any places where we make this mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like our existing row behavior is to just return the zero-value of the given type for null inputs (e.g. getFloat on a null column returns 0.0f whereas the generic getter returns null). For some reason, it looks like UnsafeRow was returning NaN instead of 0 in those cases, leading to a confusing bug. I'm going to fix this inconsistency in a separate patch.