Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix dml logical plan output schema #10394

Merged
merged 3 commits into from
May 7, 2024
Merged

Conversation

leoyvens
Copy link
Contributor

@leoyvens leoyvens commented May 6, 2024

Previously, LogicalPlan::schema would return the input schema for DML plans, rather than the expected output schema. It is typical for the output to be the count of rows affected by the DML statement, so the code assumes that.

See fn dml_output_schema for a test.

Which issue does this PR close?

Closes #10393.

Rationale for this change

Current behaviour is wrong.

Are these changes tested?

Yes there is a test fn dml_output_schema.

Are there any user-facing changes?

The bug being fixed is visible to users of the DataFrame API.

Previously, `LogicalPlan::schema` would return the
input schema for Dml plans, rather than the expected
output schema. This is an unusal case since Dmls are
typically not run for their output, but it is typical
for the output to be the `count` of rows affected by
the DML statement.

See `fn dml_output_schema` for a test.
@github-actions github-actions bot added sql logical-expr Logical plan and expressions core Core datafusion crate labels May 6, 2024
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leoyvens for your contribution, please add more details to the issue description

@@ -58,6 +58,19 @@ async fn unsupported_dml_returns_error() {
ctx.sql_with_options(sql, options).await.unwrap();
}

#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its better to have this test in one of .slt files which runs end to end tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this might not be caught in a .slt test as this issue is not visible in the pretty-printed logical plan, and doesn't affect the physical plan or execution at all, afaict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I see some slt tests failing on CI, I should look into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously, this was in fact tested in many sqllogictests. Because the tests do in fact cover checking that the physical output schema matches the logical output schema, which is pretty cool.

Let me know if you want to keep this Rust test as well, or if you rather that I remove it. It does test more behaviour than what the slts can test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the rust level test is a good idea as it makes explicit the expected output schema

@comphead
Copy link
Contributor

comphead commented May 6, 2024

I checked current behavior

> CREATE TABLE test (x int)
;
0 row(s) fetched. 
Elapsed 0.017 seconds.

> INSERT INTO test VALUES (1);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> INSERT INTO test VALUES (3);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.005 seconds.

I'm not even sure why we output anything after DDL/DML... I'd probably prefer no output like in duckdb

D create table x (id int);
D insert into x values(1);

@leoyvens
Copy link
Contributor Author

leoyvens commented May 6, 2024

@comphead Thank you for your review, I have addressed the outstanding comments. I have no opinion on desired behaviour, this PR is just trying to make things consistent.

@@ -259,7 +259,7 @@ statement error Error during planning: Column count doesn't match insert query!
insert into table_without_values(id) values(4, 'zoo');

# insert NULL values for the missing column (name)
query IT
query I
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it fail with old value?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @leoyvens and @comphead -- this is a very nice contribution 🏆

@@ -58,6 +58,19 @@ async fn unsupported_dml_returns_error() {
ctx.sql_with_options(sql, options).await.unwrap();
}

#[tokio::test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping the rust level test is a good idea as it makes explicit the expected output schema

@@ -3260,7 +3260,7 @@ SELECT STRING_AGG(column1, '|') FROM (values (''), (null), (''));
statement ok
CREATE TABLE strings(g INTEGER, x VARCHAR, y VARCHAR)

query ITT
query I
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 161d0f2 into apache:main May 7, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions sql sqllogictest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

For DML plans, LogicalPlan::schema returns the input schema instead of output schema
3 participants