Skip to content

EPIC: fix nullability report for spark expression #19144

@rluvaton

Description

@rluvaton

almost all of spark expression do not report the nullability correctly and they should implement return_field_from_args rather than return_type.

in some spark expressions source code, the expression is marked as null intolerant which means that if any of the arguments are null, than everything is null
In previous spark versions it was a trait, in later versions it became a field, this does not mean it's nullable depend on the children.

here bitwise_not marked as nullIntolerant: true which means if the child is null than the output is null as well
https://github.com/apache/spark/blob/98058da21e8a341eca10207f0ca458671220ca94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala#L186

in others, the expression explicitly set nullability (make sure to see if it set null intolerant as well).

for example this isNull expression always marked as non nullable:
https://github.com/apache/spark/blob/98058da21e8a341eca10207f0ca458671220ca94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L411

for others who don't specify but extends from UnaryExpression, than the nullability by default is based on the child nullability:
https://github.com/apache/spark/blob/98058da21e8a341eca10207f0ca458671220ca94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L590

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    EPICA larger project, actively underway, with sub tasks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions