Skip to content

Commit

Permalink
[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays …
Browse files Browse the repository at this point in the history
…when nulls exist

### What changes were proposed in this pull request?

This is a followup to #46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy.

Primitive arrays are faster than object arrays and won't create unnecessary objects.

### Why are the changes needed?

This will improve performance and memory usage, when nulls are present in the `ColumnarArray`.

### Does this PR introduce _any_ user-facing change?

This is expected to be faster when copying `ColumnarArray`.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46372 from gene-db/primitive-nulls.

Authored-by: Gene Pang <gene.pang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
gene-db authored and cloud-fan committed May 5, 2024
1 parent a0f6239 commit bf2e254
Showing 1 changed file with 24 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,31 +49,43 @@ public int numElements() {
return length;
}

/**
* Sets all the appropriate null bits in the input UnsafeArrayData.
*
* @param arrayData The UnsafeArrayData to set the null bits for
* @return The UnsafeArrayData with the null bits set
*/
private UnsafeArrayData setNullBits(UnsafeArrayData arrayData) {
if (data.hasNull()) {
for (int i = 0; i < length; i++) {
if (data.isNullAt(i)) {
arrayData.setNullAt(i);
}
}
}
return arrayData;
}

@Override
public ArrayData copy() {
DataType dt = data.dataType();

if (data.hasNull()) {
// UnsafeArrayData cannot be used if there are any nulls.
return new GenericArrayData(toObjectArray(dt)).copy();
}

if (dt instanceof BooleanType) {
return UnsafeArrayData.fromPrimitiveArray(toBooleanArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toBooleanArray()));
} else if (dt instanceof ByteType) {
return UnsafeArrayData.fromPrimitiveArray(toByteArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toByteArray()));
} else if (dt instanceof ShortType) {
return UnsafeArrayData.fromPrimitiveArray(toShortArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toShortArray()));
} else if (dt instanceof IntegerType || dt instanceof DateType
|| dt instanceof YearMonthIntervalType) {
return UnsafeArrayData.fromPrimitiveArray(toIntArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toIntArray()));
} else if (dt instanceof LongType || dt instanceof TimestampType
|| dt instanceof DayTimeIntervalType) {
return UnsafeArrayData.fromPrimitiveArray(toLongArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toLongArray()));
} else if (dt instanceof FloatType) {
return UnsafeArrayData.fromPrimitiveArray(toFloatArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toFloatArray()));
} else if (dt instanceof DoubleType) {
return UnsafeArrayData.fromPrimitiveArray(toDoubleArray());
return setNullBits(UnsafeArrayData.fromPrimitiveArray(toDoubleArray()));
} else {
return new GenericArrayData(toObjectArray(dt)).copy(); // ensure the elements are copied.
}
Expand Down

0 comments on commit bf2e254

Please sign in to comment.