Skip to content

Conversation

linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Jul 23, 2021

What changes were proposed in this pull request?

Previously we blocked creating tables with the null column to follow the hive behavior in PR #28833
In this PR, I propose the restore the previous behavior to support the null column in a table.

Why are the changes needed?

For a complex query, it's possible to generate a column with null type. If this happens to the input query of
CTAS, the query will fail due to Spark doesn't allow creating a table with null type. From the user's perspective,
it’s hard to figure out why the null type column is produced in the complicated query and how to fix it.
So removing this constraint is more friendly to users.

Does this PR introduce any user-facing change?

Yes, this reverts the previous behavior change in #28833, for example, below command will success after this PR

CREATE TABLE t (col_1 void, col_2 int)

How was this patch tested?

newly added and existing test cases

@linhongliu-db linhongliu-db changed the title [SPARK-36241][SQL] Support creating tables with void column [WIP][SPARK-36241][SQL] Support creating tables with void column Jul 23, 2021
@linhongliu-db linhongliu-db marked this pull request as draft July 23, 2021 03:14
@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Test build #141535 has finished for PR 33488 at commit 674d3f3.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46053/

@SparkQA
Copy link

SparkQA commented Jul 23, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46053/

@linhongliu-db linhongliu-db force-pushed the SPARK-36241-support-void-column branch from 674d3f3 to 27eb8e1 Compare July 26, 2021 07:17
@linhongliu-db linhongliu-db force-pushed the SPARK-36241-support-void-column branch from 27eb8e1 to 3e4cd4b Compare July 26, 2021 07:23
@linhongliu-db linhongliu-db marked this pull request as ready for review July 26, 2021 07:46
@linhongliu-db linhongliu-db changed the title [WIP][SPARK-36241][SQL] Support creating tables with void column [SPARK-36241][SQL] Support creating tables with void column Jul 26, 2021
@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46134/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46137/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46134/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46139/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46139/

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Test build #141619 has finished for PR 33488 at commit 27eb8e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Test build #141620 has finished for PR 33488 at commit 3e4cd4b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 26, 2021

Test build #141622 has finished for PR 33488 at commit c833c86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"CREATE TABLE t2 AS SELECT null as null_col",
"Cannot create tables with null type")
"CREATE TABLE t2 STORED AS PARQUET AS SELECT null as null_col",
"Unknown field type: void")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquet doesn't support null(spark)/void(hive) type

@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan

@linhongliu-db linhongliu-db changed the title [SPARK-36241][SQL] Support creating tables with void column [SPARK-36241][SQL] Support creating tables with null column Jul 27, 2021
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2! (since it removes the constraint added in 3.2)

@cloud-fan cloud-fan closed this in 8e7e14d Jul 27, 2021
cloud-fan pushed a commit that referenced this pull request Jul 27, 2021
### What changes were proposed in this pull request?
Previously we blocked creating tables with the null column to follow the hive behavior in PR #28833
In this PR, I propose the restore the previous behavior to support the null column in a table.

### Why are the changes needed?
For a complex query, it's possible to generate a column with null type. If this happens to the input query of
CTAS, the query will fail due to Spark doesn't allow creating a table with null type. From the user's perspective,
it’s hard to figure out why the null type column is produced in the complicated query and how to fix it. So removing
this constraint is more friendly to users.

### Does this PR introduce _any_ user-facing change?
Yes, this reverts the previous behavior change in #28833, for example, below command will success after this PR
```sql
CREATE TABLE t (col_1 void, col_2 int)
```

### How was this patch tested?
newly added and existing test cases

Closes #33488 from linhongliu-db/SPARK-36241-support-void-column.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8e7e14d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants