Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for JDBCTableCatalog.alterTable #30041

Closed
wants to merge 2 commits into from

Conversation

huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

I currently have unquoted column names in alter table, e.g. ALTER TABLE "test"."alt_table" DROP COLUMN c1
should change to quoted column name ALTER TABLE "test"."alt_table" DROP COLUMN "c1"

Why are the changes needed?

We should always use quoted identifiers in JDBC SQLs, e.g. CREATE TABLE "test"."abc" ("col" INTEGER ) or INSERT INTO "test"."abc" ("col") VALUES (?). Using unquoted column name in alterTable causes problems, for example:

sql("CREATE TABLE h2.test.alt_table (c1 INTEGER, c2 INTEGER) USING _")
sql("ALTER TABLE h2.test.alt_table DROP COLUMN c1")

org.apache.spark.sql.AnalysisException: Failed table altering: test.alt_table;
......

Caused by: org.h2.jdbc.JdbcSQLException: Column "C1" not found; SQL statement:
ALTER TABLE "test"."alt_table" DROP COLUMN c1 [42122-195]


Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Test build #129744 has finished for PR 30041 at commit cb0de33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@huaxingao
Copy link
Contributor Author

@cloud-fan @MaxGekk Could you please take a look? Thanks!

@cloud-fan
Copy link
Contributor

can we add a test?

t = spark.table("h2.test.alt_table")
expectedSchema = expectedSchema.add("C3", DoubleType)
expectedSchema = expectedSchema.add("c3", DoubleType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without fix (with unquoted column name), assert(t.schema === expectedSchema) throws the following Exception:

org.scalatest.exceptions.TestFailedException: StructType(StructField(ID,IntegerType,true), StructField(C1,IntegerType,true), StructField(C2,StringType,true), StructField(C3,DoubleType,true)) did not equal StructType(StructField(ID,IntegerType,true), StructField(C1,IntegerType,true), StructField(C2,StringType,true), StructField(c3,DoubleType,true))
	at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)

sql("CREATE TABLE h2.test.alt_table (ID INTEGER, C0 INTEGER) USING _")
sql("ALTER TABLE h2.test.alt_table RENAME COLUMN ID TO C")
sql("CREATE TABLE h2.test.alt_table (id INTEGER, C0 INTEGER) USING _")
sql("ALTER TABLE h2.test.alt_table RENAME COLUMN id TO C")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without fix (with unquoted column name), this throws the following Exception:

Failed table altering: test.alt_table;
org.apache.spark.sql.AnalysisException: Failed table altering: test.alt_table;
	at org.apache.spark.sql.jdbc.JdbcDialect.classifyException(JdbcDialects.scala:268)

Caused by: org.h2.jdbc.JdbcSQLException: Column "ID" not found; SQL statement:
ALTER TABLE "test"."alt_table" RENAME COLUMN id TO "C" [42122-195]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should still work if we write ID, right? Spark normalizes the columns according to the table schema, so case sensitivity flag can still apply. See ResolveAlterTableChanges.

It would be better if we can add tests for case insensitive column resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, upper case ID still works here. I will add tests for case insensitive column resolution.
Thanks for reviewing and merging my PR!

sql("ALTER TABLE h2.test.alt_table DROP COLUMN C1")
sql("ALTER TABLE h2.test.alt_table DROP COLUMN c3")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without fix (with unquoted column name), this throws the following Exception:

Failed table altering: test.alt_table;
org.apache.spark.sql.AnalysisException: Failed table altering: test.alt_table;
	at org.apache.spark.sql.jdbc.JdbcDialect.classifyException(JdbcDialects.scala:268)
Caused by: org.h2.jdbc.JdbcSQLException: Column "C3" not found; SQL statement:
ALTER TABLE "test"."alt_table" DROP COLUMN c3 [42122-195]

sql("ALTER TABLE h2.test.alt_table ALTER COLUMN id TYPE DOUBLE")
sql("ALTER TABLE h2.test.alt_table ALTER COLUMN deptno TYPE DOUBLE")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without fix (with unquoted column name), this throws the following Exception:

Failed table altering: test.alt_table;
org.apache.spark.sql.AnalysisException: Failed table altering: test.alt_table;
	at org.apache.spark.sql.jdbc.JdbcDialect.classifyException(JdbcDialects.scala:268)
  Caused by: org.h2.jdbc.JdbcSQLException: Column "DEPTNO" not found; SQL statement:
ALTER TABLE "test"."alt_table" ALTER COLUMN deptno DOUBLE PRECISION [42122-195]

@@ -272,10 +274,12 @@ class JDBCTableCatalogSuite extends QueryTest with SharedSparkSession {

test("alter table ... update column nullability") {
withTable("h2.test.alt_table") {
sql("CREATE TABLE h2.test.alt_table (ID INTEGER NOT NULL) USING _")
sql("CREATE TABLE h2.test.alt_table (ID INTEGER NOT NULL, deptno INTEGER NOT NULL) USING _")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without fix (with unquoted column name), this throws the following Exception:

Failed table altering: test.alt_table;
org.apache.spark.sql.AnalysisException: Failed table altering: test.alt_table;
	at org.apache.spark.sql.jdbc.JdbcDialect.classifyException(JdbcDialects.scala:268)
  Caused by: org.h2.jdbc.JdbcSQLException: Column "DEPTNO" not found; SQL statement:
ALTER TABLE "test"."alt_table" ALTER COLUMN deptno SET NULL [42122-195]

@huaxingao
Copy link
Contributor Author

I actually get a little confused now: do I also need to take consideration of spark.sql.caseSensitive?
I will take a look at this tomorrow.

@cloud-fan
Copy link
Contributor

I actually get a little confused now: do I also need to take consideration of spark.sql.caseSensitive?

I think there was a discussion a long time ago. For table/column lookup that happens inside custom catalogs, Spark can't control it. spark.sql.caseSensitive only controls lookup behavior inside spark.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Test build #129808 has finished for PR 30041 at commit 7e8098c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34413/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34413/

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants