[SPARK-42384][SQL] Check for null input in generated code for mask function#39945
[SPARK-42384][SQL] Check for null input in generated code for mask function#39945bersprockets wants to merge 4 commits intoapache:masterfrom
Conversation
|
cc @gengliangwang FYI |
There was a problem hiding this comment.
Btw, the new generated code will look like this (if the first child is null, then the result is null):
/* 031 */ boolean isNull_1 = i.isNullAt(0);
/* 032 */ UTF8String value_1 = isNull_1 ?
/* 033 */ null : (i.getUTF8String(0));
/* 034 */
/* 035 */
/* 036 */
/* 037 */
/* 038 */ boolean isNull_0 = isNull_1;
/* 039 */ UTF8String value_0 = null;
/* 040 */
/* 041 */ if (!isNull_1) {
/* 042 */ value_0 = org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) references[3] /* literal */));;
/* 043 */ }
/* 044 */ if (isNull_0) {
/* 045 */ mutableStateArray_0[0].setNullAt(0);
/* 046 */ } else {
/* 047 */ mutableStateArray_0[0].write(0, value_0);
/* 048 */ }
/* 049 */ return (mutableStateArray_0[0].getRow());
Versus the old generated code, which looks like this (call Mask.transformInput even when input is null, and call UnsafeWriter.write(0, value_0) even if value_0 is null):
/* 031 */ boolean isNull_1 = i.isNullAt(0);
/* 032 */ UTF8String value_1 = isNull_1 ?
/* 033 */ null : (i.getUTF8String(0));
/* 034 */
/* 035 */
/* 036 */
/* 037 */
/* 038 */ UTF8String value_0 = null;
/* 039 */ value_0 = org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) references[3] /* literal */));;
/* 040 */ if (false) {
/* 041 */ mutableStateArray_0[0].setNullAt(0);
/* 042 */ } else {
/* 043 */ mutableStateArray_0[0].write(0, value_0);
/* 044 */ }
/* 045 */ return (mutableStateArray_0[0].getRow());
/* 046 */ }
25fd835 to
49c743a
Compare
49c743a to
ef49f02
Compare
| f(firstGen.value, secondGen.value, thirdGen.value, fourthGen.value, fifthGen.value) | ||
| ev.copy( | ||
| code = code""" | ||
| if (nullable) { |
There was a problem hiding this comment.
@bersprockets Have you tried using the NullIntolerant trait on the expression Mask instead?
There was a problem hiding this comment.
@gengliangwang With this function, only the first parameter affects whether NULL is returned. For the other parameters, NULL input has special meaning. An example from the doc:
> SELECT _FUNC_('AbCD123-@$#', NULL, NULL, NULL, 'o');
AbCD123oooo
Mixing in NullIntolerant causes the above query to return NULL:
spark-sql> SELECT MASK('AbCD123-@$#', NULL, NULL, NULL, 'o');
NULL
Time taken: 0.063 seconds, Fetched 1 row(s)
There was a problem hiding this comment.
I see. Thanks for the explanation!
|
@bersprockets Thanks for the work. Merging to master/3.4 |
…nction
### What changes were proposed in this pull request?
When generating code for the mask function, call `ctx.nullSafeExec` to produce null safe code.
This change assumes that the mask function returns null only when the input is null (which appears to be the case, from reading the code of `Mask.transformInput`).
### Why are the changes needed?
The following query fails with a `NullPointerException`:
```
create or replace temp view v1 as
select * from values
(null),
('AbCD123-$#')
as data(col1);
cache table v1;
select mask(col1) from v1;
23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
```
The generated code calls `UnsafeWriter.write(0, value_0)` regardless of whether `Mask.transformInput` returns null or not. The `UnsafeWriter.write` method for `UTF8String` does not expect a null pointer.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests.
Closes #39945 from bersprockets/mask_npe_issue.
Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 7ff8ba2)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
…nction
### What changes were proposed in this pull request?
When generating code for the mask function, call `ctx.nullSafeExec` to produce null safe code.
This change assumes that the mask function returns null only when the input is null (which appears to be the case, from reading the code of `Mask.transformInput`).
### Why are the changes needed?
The following query fails with a `NullPointerException`:
```
create or replace temp view v1 as
select * from values
(null),
('AbCD123-$#')
as data(col1);
cache table v1;
select mask(col1) from v1;
23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
```
The generated code calls `UnsafeWriter.write(0, value_0)` regardless of whether `Mask.transformInput` returns null or not. The `UnsafeWriter.write` method for `UTF8String` does not expect a null pointer.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests.
Closes apache#39945 from bersprockets/mask_npe_issue.
Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit 7ff8ba2)
Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
When generating code for the mask function, call
ctx.nullSafeExecto produce null safe code.This change assumes that the mask function returns null only when the input is null (which appears to be the case, from reading the code of
Mask.transformInput).Why are the changes needed?
The following query fails with a
NullPointerException:The generated code calls
UnsafeWriter.write(0, value_0)regardless of whetherMask.transformInputreturns null or not. TheUnsafeWriter.writemethod forUTF8Stringdoes not expect a null pointer.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New unit tests.