Skip to content

Restore nullability support for consuming Substrait fields. #12727

@westonpace

Description

@westonpace

Is your feature request related to a problem or challenge?

In #11130 the Substrait consumer was changed to always produce nullable fields.

However, I'm not entirely sure of the rationale. From that PR it states:

Arrow requires schema to match exactly to literals. Substrait contains nullability separate in literals and types, and the nullability of a literal may vary
row-by-row.

What does "Arrow requires schema to match exactly to literals" mean? Arrow doesn't really have a concept of literals that I'm aware of (neither does arrow-rs). So Arrow shouldn't really care if literals are nullable or not.

It appears that Datafusion maps literals to ScalarValue and there is no such thing in Datafusion as a non-nullable literal. So it makes sense to me that nullability is ignored when consuming literals.

However, arrow-rs and Substrait both have a concept of nullable fields and, as far as I can tell, those concepts are compatible. Can we map Substrait nullability to Arrow field nullability?

Describe the solution you'd like

The method from_substrait_type should either return (DataType, bool) or Field (with an empty name).
The method from_substrait_named_struct should return a schema that has the nullability set appropriately in fields.

Non-nullable fields can round-trip from DF->Substrait->DF without losing their non-nullability.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions