New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40000][SQL] Update INSERTs without user-specified fields to not automatically add default values #37430
Conversation
Hi @gengliangwang this change will be helpful to let SQL engines toggle whether to return an error if |
Can one of the admins verify this patch? |
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
Hi @gengliangwang, thanks for your review, this is ready again :) |
Had an offline discussion with @dtenedor . Let's make it simple and strict for now: throw an exception if a table insert contains less columns than the table and there is not a insert column list. |
fetch latest from master
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! The unit tests had to change a bit, but the behavior is simpler and more consistent now.
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
Outdated
Show resolved
Hide resolved
Hi @gengliangwang responded to comments, this is ready for another round when ready :) |
Hi @gengliangwang per offline discussion, I have added the new flag |
@dtenedor let's update the migration guide https://github.com/apache/spark/blob/master/docs/sql-migration-guide.md since this changes the default behavior. Also, please update the PR description as per the latest code changes. |
@dtenedor FYI there is a conflict with the master branch now. |
OK @gengliangwang all the conflicts should be resolved now. 👍 |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
…ysis/ResolveDefaultColumns.scala Co-authored-by: Gengliang Wang <ltnwgl@gmail.com>
…ysis/ResolveDefaultColumns.scala Co-authored-by: Gengliang Wang <ltnwgl@gmail.com>
…ysis/ResolveDefaultColumns.scala Co-authored-by: Gengliang Wang <ltnwgl@gmail.com>
…ysis/ResolveDefaultColumns.scala Co-authored-by: Gengliang Wang <ltnwgl@gmail.com>
Thanks @gengliangwang for the reviews, applied suggestions and updated the PR description 👍 |
Thanks, merging to master |
This is reverted via 50c1635 |
@dongjoon-hyun Thanks for the note! |
What changes were proposed in this pull request?
Update INSERTs without user-specified fields to not automatically add default values.
For example, with the new behavior, this
INSERT INTO
command will fail with an error message reporting that the table has two columns but the command only inserted one:For INSERTs with user-specified fields, these commands may now specify fewer field/value pairs than the number of columns in the target table. The analyzer will assign the default value for each remaining column (either NULL, or else the explicit DEFAULT value assigned to the column from a previous command).
For example, with the new behavior, this
INSERT INTO
command will succeed, assigning the new row(42, 2)
to the target table:To implement this behavior, this PR creates the following config which is true by default:
spark.sql.defaultColumn.addMissingValuesForInsertsWithExplicitColumns
To switch back to the previous behavior of returning errors for
INSERT INTO
commands with fewer user-specified fields than the number of columns in the target table, switch this new config to false.Why are the changes needed?
After looking at desired SQL semantics, it is preferred to be strict and require that the number of inserted columns exactly matches the target table to prevent against accidental mistakes.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
Updated unit test coverage.