-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix(proto): correctly serialize FilterExec empty projection #21885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8562f72
e890283
89c4328
9951c16
896120b
2ba3a97
6847a16
f702ce8
6059237
49a58ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3681,3 +3681,63 @@ async fn roundtrip_issue_18602_complex_filter_decode_recursion() -> Result<()> { | |||||||||
|
|
||||||||||
| roundtrip_test_sql_with_context(sql, &ctx).await | ||||||||||
| } | ||||||||||
|
|
||||||||||
| #[tokio::test] | ||||||||||
| async fn test_filter_exec_projection_serde_roundtrip() -> Result<()> { | ||||||||||
| let ctx = SessionContext::new(); | ||||||||||
| let codec = DefaultPhysicalExtensionCodec {}; | ||||||||||
|
|
||||||||||
| let schema = Arc::new(Schema::new(vec![ | ||||||||||
| Field::new("a", DataType::Int32, false), | ||||||||||
| Field::new("b", DataType::Int32, false), | ||||||||||
| Field::new("c", DataType::Int32, false), | ||||||||||
| ])); | ||||||||||
|
|
||||||||||
| let input: Arc<dyn ExecutionPlan> = Arc::new(EmptyExec::new(Arc::clone(&schema))); | ||||||||||
|
|
||||||||||
| let predicate: Arc<dyn PhysicalExpr> = Arc::new(BinaryExpr::new( | ||||||||||
| Arc::new(Column::new("a", 0)), | ||||||||||
| Operator::Gt, | ||||||||||
| Arc::new(Literal::new(ScalarValue::Int32(Some(0)))), | ||||||||||
| )); | ||||||||||
|
|
||||||||||
| // Case 1: None -> should round-trip as None (return all columns) | ||||||||||
| let filter = | ||||||||||
| FilterExecBuilder::new(Arc::clone(&predicate), Arc::clone(&input)).build()?; | ||||||||||
| let proto = PhysicalPlanNode::try_from_physical_plan(Arc::new(filter) as _, &codec)?; | ||||||||||
| let roundtripped = proto.try_into_physical_plan(ctx.task_ctx().as_ref(), &codec)?; | ||||||||||
| let rt = roundtripped.as_ref().downcast_ref::<FilterExec>().unwrap(); | ||||||||||
| assert_eq!( | ||||||||||
| rt.projection().as_deref(), | ||||||||||
| None, | ||||||||||
| "None projection must stay None after roundtrip" | ||||||||||
| ); | ||||||||||
|
|
||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like we can use datafusion/datafusion/proto/tests/cases/roundtrip_physical_plan.rs Lines 3504 to 3507 in 22bb4e6
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. absolutely , i just added a different test to use it in a better way .
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this mean?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue occurs because when a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My comment is suggesting we use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
should i remove the additional test i had added ? @Jefffrey
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No I am saying we should refactor these tests to use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
sounds good . i'll try producing it |
||||||||||
| // Case 2: Some(vec![]) -> must survive as Some([]), NOT silently become None | ||||||||||
| let filter = FilterExecBuilder::new(Arc::clone(&predicate), Arc::clone(&input)) | ||||||||||
| .apply_projection(Some(vec![]))? | ||||||||||
| .build()?; | ||||||||||
| let proto = PhysicalPlanNode::try_from_physical_plan(Arc::new(filter) as _, &codec)?; | ||||||||||
| let roundtripped = proto.try_into_physical_plan(ctx.task_ctx().as_ref(), &codec)?; | ||||||||||
| let rt = roundtripped.as_ref().downcast_ref::<FilterExec>().unwrap(); | ||||||||||
| assert_eq!( | ||||||||||
| rt.projection().as_deref(), | ||||||||||
| Some(&[][..]), | ||||||||||
| "Empty projection Some([]) must survive roundtrip, not become None" | ||||||||||
| ); | ||||||||||
|
|
||||||||||
| // Case 3: Some(vec![2, 0]) -> partial projection must survive | ||||||||||
| let filter = FilterExecBuilder::new(Arc::clone(&predicate), Arc::clone(&input)) | ||||||||||
| .apply_projection(Some(vec![2, 0]))? | ||||||||||
| .build()?; | ||||||||||
| let proto = PhysicalPlanNode::try_from_physical_plan(Arc::new(filter) as _, &codec)?; | ||||||||||
| let roundtripped = proto.try_into_physical_plan(ctx.task_ctx().as_ref(), &codec)?; | ||||||||||
| let rt = roundtripped.as_ref().downcast_ref::<FilterExec>().unwrap(); | ||||||||||
| assert_eq!( | ||||||||||
| rt.projection().as_deref(), | ||||||||||
| Some(&[2_usize, 0_usize][..]), | ||||||||||
| "Partial projection must survive roundtrip" | ||||||||||
| ); | ||||||||||
|
|
||||||||||
| Ok(()) | ||||||||||
| } | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"optimize used memory" seems the wrong intention here? Isn't it more to try to accurately recreate a
Nonestate which the proto definition can't encode?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems valid!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can store
Some(vec![0,1,2...n-1])to represent a full projection, so I thought to highlight why we focus onNoneusage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should focus more on the fact this logic is trying to preserve a
Noneacross proto boundaries because of the limitations we face (with how its encoded the same asSome(vec![])) instead of explaining it only as a memory efficiency gain