fix dml logical plan output schema #10394

leoyvens · 2024-05-06T15:02:55Z

Previously, LogicalPlan::schema would return the input schema for DML plans, rather than the expected output schema. It is typical for the output to be the count of rows affected by the DML statement, so the code assumes that.

See fn dml_output_schema for a test.

Which issue does this PR close?

Closes #10393.

Rationale for this change

Current behaviour is wrong.

Are these changes tested?

Yes there is a test fn dml_output_schema.

Are there any user-facing changes?

The bug being fixed is visible to users of the DataFrame API.

Previously, `LogicalPlan::schema` would return the input schema for Dml plans, rather than the expected output schema. This is an unusal case since Dmls are typically not run for their output, but it is typical for the output to be the `count` of rows affected by the DML statement. See `fn dml_output_schema` for a test.

comphead

Thanks @leoyvens for your contribution, please add more details to the issue description

comphead · 2024-05-06T16:01:44Z

datafusion/core/tests/sql/sql_api.rs

@@ -58,6 +58,19 @@ async fn unsupported_dml_returns_error() {
    ctx.sql_with_options(sql, options).await.unwrap();
 }

+#[tokio::test]


its better to have this test in one of .slt files which runs end to end tests

I suspect this might not be caught in a .slt test as this issue is not visible in the pretty-printed logical plan, and doesn't affect the physical plan or execution at all, afaict.

Actually I see some slt tests failing on CI, I should look into that.

Curiously, this was in fact tested in many sqllogictests. Because the tests do in fact cover checking that the physical output schema matches the logical output schema, which is pretty cool.

Let me know if you want to keep this Rust test as well, or if you rather that I remove it. It does test more behaviour than what the slts can test.

I think keeping the rust level test is a good idea as it makes explicit the expected output schema

datafusion/expr/src/logical_plan/dml.rs

comphead · 2024-05-06T16:07:48Z

I checked current behavior

> CREATE TABLE test (x int)
;
0 row(s) fetched. 
Elapsed 0.017 seconds.

> INSERT INTO test VALUES (1);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> INSERT INTO test VALUES (3);
+-------+
| count |
+-------+
| 1     |
+-------+
1 row(s) fetched. 
Elapsed 0.005 seconds.

I'm not even sure why we output anything after DDL/DML... I'd probably prefer no output like in duckdb

D create table x (id int);
D insert into x values(1);

leoyvens · 2024-05-06T18:12:50Z

@comphead Thank you for your review, I have addressed the outstanding comments. I have no opinion on desired behaviour, this PR is just trying to make things consistent.

comphead · 2024-05-07T17:22:32Z

datafusion/sqllogictest/test_files/insert.slt

@@ -259,7 +259,7 @@ statement error Error during planning: Column count doesn't match insert query!
 insert into table_without_values(id) values(4, 'zoo');

 # insert NULL values for the missing column (name)
-query IT
+query I


does it fail with old value?

alamb

Thank you @leoyvens and @comphead -- this is a very nice contribution 🏆

alamb · 2024-05-07T17:22:49Z

datafusion/core/tests/sql/sql_api.rs

@@ -58,6 +58,19 @@ async fn unsupported_dml_returns_error() {
    ctx.sql_with_options(sql, options).await.unwrap();
 }

+#[tokio::test]


I think keeping the rust level test is a good idea as it makes explicit the expected output schema

alamb · 2024-05-07T17:23:29Z

datafusion/sqllogictest/test_files/aggregate.slt

@@ -3260,7 +3260,7 @@ SELECT STRING_AGG(column1, '|') FROM (values (''), (null), (''));
 statement ok
 CREATE TABLE strings(g INTEGER, x VARCHAR, y VARCHAR)

-query ITT
+query I


github-actions bot added sql logical-expr Logical plan and expressions core Core datafusion crate labels May 6, 2024

comphead reviewed May 6, 2024

View reviewed changes

datafusion/expr/src/logical_plan/dml.rs Show resolved Hide resolved

leoyvens added 2 commits May 6, 2024 20:03

document DmlStatement::new

3b607b1

Fix expected logical schema of 'insert into' in sqllogictests

0f76fbd

github-actions bot added the sqllogictest label May 6, 2024

leoyvens mentioned this pull request May 6, 2024

For DML plans, LogicalPlan::schema returns the input schema instead of output schema #10393

Closed

comphead reviewed May 7, 2024

View reviewed changes

alamb approved these changes May 7, 2024

View reviewed changes

alamb merged commit 161d0f2 into apache:main May 7, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix dml logical plan output schema #10394

fix dml logical plan output schema #10394

leoyvens commented May 6, 2024

comphead left a comment

comphead May 6, 2024

leoyvens May 6, 2024

leoyvens May 6, 2024

leoyvens May 6, 2024

alamb May 7, 2024

comphead commented May 6, 2024

leoyvens commented May 6, 2024

comphead May 7, 2024

alamb left a comment

alamb May 7, 2024

alamb May 7, 2024

fix dml logical plan output schema #10394

fix dml logical plan output schema #10394

Conversation

leoyvens commented May 6, 2024

Which issue does this PR close?

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

comphead left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comphead commented May 6, 2024

leoyvens commented May 6, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment