[SPARK-36241][SQL] Support creating tables with null column #33488

linhongliu-db · 2021-07-23T03:14:25Z

What changes were proposed in this pull request?

Previously we blocked creating tables with the null column to follow the hive behavior in PR #28833
In this PR, I propose the restore the previous behavior to support the null column in a table.

Why are the changes needed?

For a complex query, it's possible to generate a column with null type. If this happens to the input query of
CTAS, the query will fail due to Spark doesn't allow creating a table with null type. From the user's perspective,
it’s hard to figure out why the null type column is produced in the complicated query and how to fix it.
So removing this constraint is more friendly to users.

Does this PR introduce any user-facing change?

Yes, this reverts the previous behavior change in #28833, for example, below command will success after this PR

CREATE TABLE t (col_1 void, col_2 int)

How was this patch tested?

newly added and existing test cases

SparkQA · 2021-07-23T03:21:55Z

Test build #141535 has finished for PR 33488 at commit 674d3f3.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-23T04:49:56Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46053/

SparkQA · 2021-07-23T05:21:21Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46053/

SparkQA · 2021-07-26T08:03:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46134/

SparkQA · 2021-07-26T08:24:51Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46137/

SparkQA · 2021-07-26T08:36:45Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46134/

SparkQA · 2021-07-26T08:59:16Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46139/

SparkQA · 2021-07-26T09:32:01Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46139/

SparkQA · 2021-07-26T10:28:57Z

Test build #141619 has finished for PR 33488 at commit 27eb8e1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-26T12:04:31Z

Test build #141620 has finished for PR 33488 at commit 3e4cd4b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-07-26T13:58:36Z

Test build #141622 has finished for PR 33488 at commit c833c86.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

linhongliu-db · 2021-07-26T14:09:57Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

-        "CREATE TABLE t2 AS SELECT null as null_col",
-        "Cannot create tables with null type")
+        "CREATE TABLE t2 STORED AS PARQUET AS SELECT null as null_col",
+        "Unknown field type: void")


parquet doesn't support null(spark)/void(hive) type

linhongliu-db · 2021-07-26T14:10:12Z

cc @cloud-fan

cloud-fan · 2021-07-27T09:31:51Z

thanks, merging to master/3.2! (since it removes the constraint added in 3.2)

### What changes were proposed in this pull request? Previously we blocked creating tables with the null column to follow the hive behavior in PR #28833 In this PR, I propose the restore the previous behavior to support the null column in a table. ### Why are the changes needed? For a complex query, it's possible to generate a column with null type. If this happens to the input query of CTAS, the query will fail due to Spark doesn't allow creating a table with null type. From the user's perspective, it’s hard to figure out why the null type column is produced in the complicated query and how to fix it. So removing this constraint is more friendly to users. ### Does this PR introduce _any_ user-facing change? Yes, this reverts the previous behavior change in #28833, for example, below command will success after this PR ```sql CREATE TABLE t (col_1 void, col_2 int) ``` ### How was this patch tested? newly added and existing test cases Closes #33488 from linhongliu-db/SPARK-36241-support-void-column. Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8e7e14d) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added CORE PYTHON SQL labels Jul 23, 2021

linhongliu-db changed the title ~~[SPARK-36241][SQL] Support creating tables with void column~~ [WIP][SPARK-36241][SQL] Support creating tables with void column Jul 23, 2021

linhongliu-db marked this pull request as draft July 23, 2021 03:14

linhongliu-db added 2 commits July 26, 2021 15:17

support creating null columns

c5df65c

test case

7dbb16c

linhongliu-db force-pushed the SPARK-36241-support-void-column branch from 674d3f3 to 27eb8e1 Compare July 26, 2021 07:17

code clean

3e4cd4b

linhongliu-db force-pushed the SPARK-36241-support-void-column branch from 27eb8e1 to 3e4cd4b Compare July 26, 2021 07:23

fix test

c833c86

linhongliu-db marked this pull request as ready for review July 26, 2021 07:46

linhongliu-db changed the title ~~[WIP][SPARK-36241][SQL] Support creating tables with void column~~ [SPARK-36241][SQL] Support creating tables with void column Jul 26, 2021

linhongliu-db commented Jul 26, 2021

View reviewed changes

cloud-fan approved these changes Jul 27, 2021

View reviewed changes

linhongliu-db changed the title ~~[SPARK-36241][SQL] Support creating tables with void column~~ [SPARK-36241][SQL] Support creating tables with null column Jul 27, 2021

cloud-fan closed this in 8e7e14d Jul 27, 2021

[SPARK-36241][SQL] Support creating tables with null column #33488

[SPARK-36241][SQL] Support creating tables with null column #33488

Uh oh!

Conversation

linhongliu-db commented Jul 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Jul 23, 2021

Uh oh!

SparkQA commented Jul 23, 2021

Uh oh!

SparkQA commented Jul 23, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

SparkQA commented Jul 26, 2021

Uh oh!

linhongliu-db Jul 26, 2021

Choose a reason for hiding this comment

Uh oh!

linhongliu-db commented Jul 26, 2021

Uh oh!

cloud-fan commented Jul 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linhongliu-db commented Jul 23, 2021 •

edited

Loading