-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Protocol Change Request
Description of the protocol change
The protocol currently does not describe anything about requirements for writers regarding verifying non-nullable fields are indeed not null. Delta Spark treats non-nullable columns as a form of an invariant, and if you have non-nullable fields the table will automatically get either min writer version of 2, or min writer version of 7 with the invariant feature enabled. From what I've seen, the Java kernel, Rust kernel, and delta-rs do not treat non-nullable fields as invariants, and don't do any checks to make sure non-nullable fields are indeed not null.
Additionally, there have been a large number of complaints about the way Delta Spark handles non-nullability, specifically non-nullable fields inside nullable structs:
- Not null invariant incorrectly fails on non-nullable field inside a nullable struct #860
- [BUG] Non-Null Columns Implicitly Interpreted As Invariant #2006
- [Protocol] Column Invariants definition clarification #3471
- [Feature Request][Spark] Support adding nullable struct with non-nullable fields to existing table #4606
- [Delta] When checks schema for writing, Delta enforces not null on a Nested Field only when its parent is not null #4121
It would be great to fix this and codify it in the protocol so all the implementations properly align.
Willingness to contribute
The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?
- Yes. I can contribute.
- Yes. I would be willing to contribute with guidance from the Delta Lake community.
- No. I cannot contribute at this time.