-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-36241][SQL] Support creating tables with null column #33488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36241][SQL] Support creating tables with null column #33488
Conversation
Test build #141535 has finished for PR 33488 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
674d3f3
to
27eb8e1
Compare
27eb8e1
to
3e4cd4b
Compare
Kubernetes integration test starting |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141619 has finished for PR 33488 at commit
|
Test build #141620 has finished for PR 33488 at commit
|
Test build #141622 has finished for PR 33488 at commit
|
"CREATE TABLE t2 AS SELECT null as null_col", | ||
"Cannot create tables with null type") | ||
"CREATE TABLE t2 STORED AS PARQUET AS SELECT null as null_col", | ||
"Unknown field type: void") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parquet doesn't support null(spark)/void(hive) type
cc @cloud-fan |
thanks, merging to master/3.2! (since it removes the constraint added in 3.2) |
### What changes were proposed in this pull request? Previously we blocked creating tables with the null column to follow the hive behavior in PR #28833 In this PR, I propose the restore the previous behavior to support the null column in a table. ### Why are the changes needed? For a complex query, it's possible to generate a column with null type. If this happens to the input query of CTAS, the query will fail due to Spark doesn't allow creating a table with null type. From the user's perspective, it’s hard to figure out why the null type column is produced in the complicated query and how to fix it. So removing this constraint is more friendly to users. ### Does this PR introduce _any_ user-facing change? Yes, this reverts the previous behavior change in #28833, for example, below command will success after this PR ```sql CREATE TABLE t (col_1 void, col_2 int) ``` ### How was this patch tested? newly added and existing test cases Closes #33488 from linhongliu-db/SPARK-36241-support-void-column. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8e7e14d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Previously we blocked creating tables with the null column to follow the hive behavior in PR #28833
In this PR, I propose the restore the previous behavior to support the null column in a table.
Why are the changes needed?
For a complex query, it's possible to generate a column with null type. If this happens to the input query of
CTAS, the query will fail due to Spark doesn't allow creating a table with null type. From the user's perspective,
it’s hard to figure out why the null type column is produced in the complicated query and how to fix it.
So removing this constraint is more friendly to users.
Does this PR introduce any user-facing change?
Yes, this reverts the previous behavior change in #28833, for example, below command will success after this PR
How was this patch tested?
newly added and existing test cases