Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix window expr deserialization #10506

Merged
merged 3 commits into from
May 15, 2024

Conversation

lewiszlw
Copy link
Contributor

@lewiszlw lewiszlw commented May 14, 2024

Which issue does this PR close?

Fix window expr deserialization bug which introduced in #9686.

We encountered the error like this when deserializing window plan from proto.

from proto: ArrowError(SchemaError("Unable to get field named \"FIRST_VALUE(a) PARTITION BY [b] ORDER BY [a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW\". Valid fields: [\"a\", \"b\"]"), None)

The error thrown at https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/windows/mod.rs#L188.

If I understand correctly, now create_window_expr function needs output schema of window plan instead of the schema of window plan input. So we should build extended schema which contains input schema and window fields.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core datafusion crate label May 14, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @lewiszlw -- I can't quite understand what it is fixing. Is there any way you can add a unit test to the round trip serialization tests that fails prior to the code change but passes afterwards?

I think that would make it easier to understand what bug was fixed as well as ensure we don't accidentally break it again

Somewere in https://github.com/apache/datafusion/blob/main/datafusion/proto/tests/proto_integration.rs perhaps?

@lewiszlw
Copy link
Contributor Author

Thank you for this PR @lewiszlw -- I can't quite understand what it is fixing. Is there any way you can add a unit test to the round trip serialization tests that fails prior to the code change but passes afterwards?

I think that would make it easier to understand what bug was fixed as well as ensure we don't accidentally break it again

Somewere in https://github.com/apache/datafusion/blob/main/datafusion/proto/tests/proto_integration.rs perhaps?

I updated desc and physical window plan proto test.

@@ -253,8 +253,7 @@ fn roundtrip_nested_loop_join() -> Result<()> {
fn roundtrip_window() -> Result<()> {
let field_a = Field::new("a", DataType::Int64, false);
let field_b = Field::new("b", DataType::Int64, false);
let field_c = Field::new("FIRST_VALUE(a) PARTITION BY [b] ORDER BY [a ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", DataType::Int64, false);
let schema = Arc::new(Schema::new(vec![field_a, field_b, field_c]));
let schema = Arc::new(Schema::new(vec![field_a, field_b]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this schema will be used as input schema, see line 310. I don't think the input schema should contains window field.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @lewiszlw

@alamb alamb merged commit 8199e9e into apache:main May 15, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants