Skip to content

fix(proto): correctly serialize FilterExec empty projection#21885

Open
Adez017 wants to merge 10 commits intoapache:mainfrom
Adez017:proto-fix
Open

fix(proto): correctly serialize FilterExec empty projection#21885
Adez017 wants to merge 10 commits intoapache:mainfrom
Adez017:proto-fix

Conversation

@Adez017
Copy link
Copy Markdown
Contributor

@Adez017 Adez017 commented Apr 28, 2026

Which issue does this PR close?

Rationale for this change

FilterExec supports two semantically different projection states:

  • None → return all columns (full projection)
  • Some(vec![]) → return no columns (empty projection)
    However, both cases were being serialized identically as an empty vector in the proto representation. During deserialization, an empty vector was always mapped back to None, meaning an empty projection would silently become a full projection after a serde round-trip.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

No

@github-actions github-actions Bot added the proto Related to proto crate label Apr 28, 2026
@Adez017
Copy link
Copy Markdown
Contributor Author

Adez017 commented Apr 28, 2026

@askalt , can you look into this ?

@Adez017
Copy link
Copy Markdown
Contributor Author

Adez017 commented Apr 28, 2026

Comment thread datafusion/proto/src/physical_plan/mod.rs
@askalt
Copy link
Copy Markdown
Contributor

askalt commented Apr 28, 2026

@Adez017 Thank you for the patch!

Comment on lines +719 to +720
// Determine if the projection is full to optimize used memory,
// storing `None` in this case.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"optimize used memory" seems the wrong intention here? Isn't it more to try to accurately recreate a None state which the proto definition can't encode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems valid!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can store Some(vec![0,1,2...n-1]) to represent a full projection, so I thought to highlight why we focus on None usage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should focus more on the fact this logic is trying to preserve a None across proto boundaries because of the limitations we face (with how its encoded the same as Some(vec![])) instead of explaining it only as a memory efficiency gain

None,
"None projection must stay None after roundtrip"
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can use roundtrip_test here as others do above, e.g.

roundtrip_test(Arc::new(FilterExec::try_new(
expr,
Arc::new(EmptyExec::new(schema)),
)?))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely , i just added a different test to use it in a better way .

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Copy Markdown
Contributor Author

@Adez017 Adez017 Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue occurs because when a FilterExec has an empty projection, the previous proto serialization didn't explicitly encode the 'empty' state. This caused the physical plan to default back to a full projection or an invalid state upon deserialization. By explicitly handling the projection field even when empty, we ensure that the execution plan remains consistent across the network/serde boundary, which is critical for count-only queries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is suggesting we use roundtrip_test() function to streamline these tests. See the code snippet I linked

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is suggesting we use roundtrip_test() function to streamline these tests. See the code snippet I linked

should i remove the additional test i had added ? @Jefffrey

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I am saying we should refactor these tests to use roundtrip_test() instead of manually asserting a specific property of the roundtripped struct

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I am saying we should refactor these tests to use roundtrip_test() instead of manually asserting a specific property of the roundtripped struct

sounds good . i'll try producing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FilterExec empty projection is changed to full projection after serde

3 participants