-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
almost all of spark expression do not report the nullability correctly and they should implement return_field_from_args rather than return_type.
in some spark expressions source code, the expression is marked as null intolerant which means that if any of the arguments are null, than everything is null
In previous spark versions it was a trait, in later versions it became a field, this does not mean it's nullable depend on the children.
here bitwise_not marked as nullIntolerant: true which means if the child is null than the output is null as well
https://github.com/apache/spark/blob/98058da21e8a341eca10207f0ca458671220ca94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala#L186
in others, the expression explicitly set nullability (make sure to see if it set null intolerant as well).
for example this isNull expression always marked as non nullable:
https://github.com/apache/spark/blob/98058da21e8a341eca10207f0ca458671220ca94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L411
for others who don't specify but extends from UnaryExpression, than the nullability by default is based on the child nullability:
https://github.com/apache/spark/blob/98058da21e8a341eca10207f0ca458671220ca94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L590