Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32356][SQL] Forbid create view with null type #29152

Closed
wants to merge 11 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jul 18, 2020

What changes were proposed in this pull request?

After #28833, we forbid create table with null type, but not include the view. This pr aims to forbid creat view with null type.

Why are the changes needed?

Currently, we support create view v as select null with in-memory but failed in hive. It's better to have same behavior of them.

inclued 4 command:

  • create view v as select null
  • alter view v as select null
  • create temporary view v as select null
  • create global temporary view v as select null

Does this PR introduce any user-facing change?

Yes, user will get exception when create view with null type .

How was this patch tested?

Add ut.

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126106 has finished for PR 29152 at commit d5a5575.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you, @ulysses-you .
cc @cloud-fan

@@ -1557,6 +1557,17 @@ class PlanResolutionSuite extends AnalysisTest {
checkFailure("testcat.tab", "foo")
}

test("SPARK-32356: forbid null type in create view") {
val sql1 = "create view v as select null as c"
val sql2 = "alter view v as select null as c"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test temp view as well? also the df api df.createTempView

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

val msg = intercept[AnalysisException] {
parseAndResolve(sql)
}.getMessage
assert(msg.contains(s"Cannot create tables with ${NullType.simpleString} type."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we update the error message to be tables/views?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@SparkQA
Copy link

SparkQA commented Jul 20, 2020

Test build #126175 has finished for PR 29152 at commit 2e8d9f3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2540,6 +2540,17 @@ class DataFrameSuite extends QueryTest
assert(df.schema === new StructType().add(StructField("d", DecimalType(38, 0))))
}
}

test("SPARK-32356: forbid null type in create view") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put test in one place SQLViewSuite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this

@@ -2314,6 +2314,7 @@ class HiveDDLSuite
}

test("SPARK-20680: do not support for null column datatype") {
val errMsg = s"Cannot create tables/views with ${NullType.simpleString} type."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed old error msg tables -> tables/views.

@SparkQA
Copy link

SparkQA commented Jul 20, 2020

Test build #126181 has finished for PR 29152 at commit e1d92ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1557,6 +1557,19 @@ class PlanResolutionSuite extends AnalysisTest {
checkFailure("testcat.tab", "foo")
}

test("SPARK-32356: forbid null type in create view") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this to SQLViewSuite as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

@@ -96,6 +97,7 @@ case class CreateViewCommand(
qe.assertAnalyzed()
val analyzedPlan = qe.analyzed

assertNoNullTypeInSchema(analyzedPlan.schema)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is for df.createTempView. Note that, we support select null as c but fail in create view with null type plan.

@SparkQA
Copy link

SparkQA commented Jul 21, 2020

Test build #126208 has finished for PR 29152 at commit 94fb546.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2020

Test build #126211 has finished for PR 29152 at commit 4d8d088.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM if tests pass.

One last question: Hive also fails to create view if the column is null type?

@ulysses-you
Copy link
Contributor Author

@cloud-fan Hive support this

create view v as select null as c from (select * from t1)t1

That said, we will get c null if run desc v.

@cloud-fan
Copy link
Contributor

Currently, we support create view v as select null with in-memory but failed in hive.

how does this happen if Hive supports it?

@ulysses-you
Copy link
Contributor Author

@cloud-fan
The reason is NullType simpleString is null but hive need a void, after #28833 we only support parse void -> NullType.
So when we use NullType to create a hive view, it failed.

@cloud-fan
Copy link
Contributor

And it works if we change NullType.toString to void?

To clarify, this case never works in Spark?

@ulysses-you
Copy link
Contributor Author

NullType.toString you mean NullType.simpleString ? yes it works if return void.

To clarify, this case never works in Spark?

Yes, never works.

@cloud-fan
Copy link
Contributor

cc @maropu @viirya do you have any opinions? This PR looks fine if it never worked and this just produces a better error message. But I'm wondering if we should support it to follow hive.

@maropu
Copy link
Member

maropu commented Jul 21, 2020

hm, at least, most databases can accpet it create view v as select null, but null is regared as another type, e.g., a text in postgresql;

postgres=# create view v as select null;
CREATE VIEW
postgres=# \d v
                 View "public.v"
  Column  | Type | Collation | Nullable | Default 
----------+------+-----------+----------+---------
 ?column? | text |           |          | 

MySQL: http://sqlfiddle.com/#!9/42febe/1/0
SQLite: http://sqlfiddle.com/#!7/42feb/1/0

So, IMO accepting it looks somewhat meaningful to follow other databases, but we don't need to handle it as a null type. Either way, I think its better to update the migration guide if we change thsi behaivour.

@viirya
Copy link
Member

viirya commented Jul 21, 2020

But create view v as select null with in-memory works before, right, according to the description? This changes the behavior.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jul 22, 2020

I'd treat it as not-working if it ONLY works with in-memory catalog. In production no one use it. The official release of Spark always include the hive dependencies.

@ulysses-you
Copy link
Contributor Author

How about give user a config that convertNullAsString, then we convert null as StringType if the config is enable ?

In my practice, most time select null as c means c is a meaningless string. Just like Java String str = null.

@viirya
Copy link
Member

viirya commented Jul 23, 2020

I'd treat it as not-working if it ONLY works with in-memory catalog. In production no one use it. The official release of Spark always include the hive dependencies.

Oh ok, it makes sense. For a better error message, this PR is fine.

@maropu
Copy link
Member

maropu commented Jul 23, 2020

Looks okay to me, but (this is trivial though) we might need to choose either action?;

  • jira type: improvement -> bug, then backport this to branch-3.0/branch-2.4, or
  • update the migration guide for the behaviour change

cc: @dongjoon-hyun

@cloud-fan
Copy link
Contributor

If it just improves error message, I think we don't need to backport. My question is if we want to support it. It seems it's easy to support it, and hive supports it as well.

@maropu
Copy link
Member

maropu commented Jul 23, 2020

Forbitting it in in-memory catalog, too, looks ok to me. If we get a user's request about the behaivour, we can easily fix it.

@ulysses-you
Copy link
Contributor Author

It seems df.createTempView is a use case of NullType with in-memory.

During resolve failed test, I find many place use create temp view as select null or df.createTempView. The result is, we can't or hard to test anything about NullType if we forbid create view with it.

e.g.,

  • test function/udf that accept null
  • test sizeInBytes in memory that include null column

I'm not confident to remove all that test.

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126459 has finished for PR 29152 at commit 3cb224f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

It seems df.createTempView is a use case of NullType with in-memory.

That's a good point. Then I think we should support NullType in permanent views as well. @ulysses-you can you open a PR to do it?

@ulysses-you
Copy link
Contributor Author

@cloud-fan ok may be tomorrow.

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126471 has finished for PR 29152 at commit d02986c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 24, 2020

@cloud-fan . Just for my understanding, do you mean to reverting [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype, too? Or, it's only limited on Views (temp/permanant/global_temp)?

Then I think we should support NullType in permanent views as well.

cc @LantaoJin

@cloud-fan
Copy link
Contributor

do you mean to reverting [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype, too?

I mean to override NullType.simpleString as void, which was reverted before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants