Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35685][SQL] Prompt recreating the view when there is an incompatible schema issue #32831

Closed

Conversation

linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Jun 9, 2021

What changes were proposed in this pull request?

If the user creates a view in 2.4 and reads it in 3.1/3.2, there will be an incompatible schema issue.
So this PR adds a view ddl in the error message to prompt the user recreating the view to fix the
incompatible issue.
For example:

-- create view in 2.4
CREATE TABLE IF NOT EXISTS t USING parquet AS SELECT '1' as a, '20210420' as b"
CREATE OR REPLACE VIEW v AS SELECT CAST(t.a AS INT), to_date(t.b, 'yyyyMMdd') FROM t
-- select view in master
SELECT * FROM v

Then we will get below error:

cannot resolve '`to_date(spark_catalog.default.t.b, 'yyyyMMdd')`' given input columns: [a, to_date(b, yyyyMMdd)];

Why are the changes needed?

Improve the error message

Does this PR introduce any user-facing change?

Yes, the error message will change

How was this patch tested?

newly added test case

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44050/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44050/

@SparkQA
Copy link

SparkQA commented Jun 9, 2021

Test build #139525 has finished for PR 32831 at commit 1cda453.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val viewName = metadata.identifier.toString
val viewText = metadata.viewText
val viewColumns = metadata.schema.fieldNames.mkString(", ")
s"CREATE OR REPLACE $temp VIEW $viewName ($viewColumns) AS $viewText"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to alter view is better

val viewName = metadata.identifier.toString
val viewText = metadata.viewText
val viewColumns = metadata.schema.fieldNames.mkString(", ")
s"CREATE OR REPLACE $temp VIEW $viewName ($viewColumns) AS $viewText"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can generate ALTER VIEW

@SparkQA
Copy link

SparkQA commented Jun 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44762/

@SparkQA
Copy link

SparkQA commented Jun 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44762/

@SparkQA
Copy link

SparkQA commented Jun 24, 2021

Test build #140234 has finished for PR 32831 at commit f9c4e9d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sql("SELECT * FROM v").show()
sql("DROP TABLE t")
sql("CREATE TABLE t(a INT, b INT) USING json")
sql("ALTER VIEW `v` AS SELECT * FROM t")
Copy link
Contributor

@cloud-fan cloud-fan Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test should check the error message, before altering the view...

incompatible schema issue

address comments

code clean

test example

code clean
@linhongliu-db linhongliu-db changed the title [WIP][SPARK-35685][SQL] Prompt recreating the view when there is an incompatible schema issue [SPARK-35685][SQL] Prompt recreating the view when there is an incompatible schema issue Jun 30, 2021
@linhongliu-db linhongliu-db marked this pull request as ready for review June 30, 2021 16:54
@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44970/

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Test build #140455 has finished for PR 32831 at commit cae2be9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 30, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44970/

@@ -846,6 +846,30 @@ class SessionCatalog(
case None => fromCatalogTable(viewInfo.tableMeta, isTempView = true)
}

private def buildViewDDL(metadata: CatalogTable, isTempView: Boolean): String = {
val isGlobalTemp = metadata.identifier.database.exists(_ == globalTempViewManager.database)
val viewType = if (isTempView && isGlobalTemp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only applies to the permanent view? We don't store temp view in the cataog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sounds good. this will make the code simpler.

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Test build #140500 has finished for PR 32831 at commit 2fede50.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45010/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 0c34b96 Jul 1, 2021
@@ -910,4 +911,27 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils {
}
}
}

test("SPARK-35685: Prompt recreating view message for schema mismatch") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I missed it . It's better to move this test to PersistedViewTestSuite

You can fix it in #32832

@SparkQA
Copy link

SparkQA commented Jul 1, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45010/

cloud-fan pushed a commit that referenced this pull request Jul 1, 2021
… view

### What changes were proposed in this pull request?
As described in  #32831, Spark has compatible issues when querying a view created by an
older version. The root cause is that Spark changed the auto-generated alias name. To avoid
this in the future, we could ask the user to specify explicit column names when creating
a view.

### Why are the changes needed?
Avoid compatible issue when querying a view

### Does this PR introduce _any_ user-facing change?
Yes. User will get error when running query below after this change
```
CREATE OR REPLACE VIEW v AS SELECT CAST(t.a AS INT), to_date(t.b, 'yyyyMMdd') FROM t
```

### How was this patch tested?
not yet

Closes #32832 from linhongliu-db/SPARK-35686-no-auto-alias.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants