-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
In #11130 the Substrait consumer was changed to always produce nullable fields.
However, I'm not entirely sure of the rationale. From that PR it states:
Arrow requires schema to match exactly to literals. Substrait contains nullability separate in literals and types, and the nullability of a literal may vary
row-by-row.
What does "Arrow requires schema to match exactly to literals" mean? Arrow doesn't really have a concept of literals that I'm aware of (neither does arrow-rs). So Arrow shouldn't really care if literals are nullable or not.
It appears that Datafusion maps literals to ScalarValue and there is no such thing in Datafusion as a non-nullable literal. So it makes sense to me that nullability is ignored when consuming literals.
However, arrow-rs and Substrait both have a concept of nullable fields and, as far as I can tell, those concepts are compatible. Can we map Substrait nullability to Arrow field nullability?
Describe the solution you'd like
The method from_substrait_type should either return (DataType, bool) or Field (with an empty name).
The method from_substrait_named_struct should return a schema that has the nullability set appropriately in fields.
Non-nullable fields can round-trip from DF->Substrait->DF without losing their non-nullability.
Describe alternatives you've considered
No response
Additional context
No response