fix(proto): correctly serialize FilterExec empty projection#21885
fix(proto): correctly serialize FilterExec empty projection#21885Adez017 wants to merge 10 commits intoapache:mainfrom
Conversation
|
@askalt , can you look into this ? |
|
@Adez017 Thank you for the patch! |
| // Determine if the projection is full to optimize used memory, | ||
| // storing `None` in this case. |
There was a problem hiding this comment.
"optimize used memory" seems the wrong intention here? Isn't it more to try to accurately recreate a None state which the proto definition can't encode?
There was a problem hiding this comment.
We can store Some(vec![0,1,2...n-1]) to represent a full projection, so I thought to highlight why we focus on None usage.
There was a problem hiding this comment.
I feel we should focus more on the fact this logic is trying to preserve a None across proto boundaries because of the limitations we face (with how its encoded the same as Some(vec![])) instead of explaining it only as a memory efficiency gain
| None, | ||
| "None projection must stay None after roundtrip" | ||
| ); | ||
|
|
There was a problem hiding this comment.
I feel like we can use roundtrip_test here as others do above, e.g.
datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rs
Lines 3504 to 3507 in 22bb4e6
There was a problem hiding this comment.
absolutely , i just added a different test to use it in a better way .
There was a problem hiding this comment.
The issue occurs because when a FilterExec has an empty projection, the previous proto serialization didn't explicitly encode the 'empty' state. This caused the physical plan to default back to a full projection or an invalid state upon deserialization. By explicitly handling the projection field even when empty, we ensure that the execution plan remains consistent across the network/serde boundary, which is critical for count-only queries.
There was a problem hiding this comment.
My comment is suggesting we use roundtrip_test() function to streamline these tests. See the code snippet I linked
There was a problem hiding this comment.
My comment is suggesting we use
roundtrip_test()function to streamline these tests. See the code snippet I linked
should i remove the additional test i had added ? @Jefffrey
There was a problem hiding this comment.
No I am saying we should refactor these tests to use roundtrip_test() instead of manually asserting a specific property of the roundtripped struct
There was a problem hiding this comment.
No I am saying we should refactor these tests to use
roundtrip_test()instead of manually asserting a specific property of the roundtripped struct
sounds good . i'll try producing it
Which issue does this PR close?
FilterExecempty projection is changed to full projection after serde #21871Rationale for this change
FilterExec supports two semantically different projection states:
However, both cases were being serialized identically as an empty vector in the proto representation. During deserialization, an empty vector was always mapped back to None, meaning an empty projection would silently become a full projection after a serde round-trip.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
No