Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. #8676

Closed

Conversation

sureshthalamati
Copy link
Contributor

Current implementation uses query with a LIMIT clause to find if table already exists. This syntax works only in some database systems. This patch changes the default query to the one that is likely to work on most databases, and adds a new method to the JdbcDialect abstract class to allow dialects to override the default query.

I looked at using the JDBC meta data calls, it turns out there is no common way to find the current schema, catalog..etc. There is a new method Connection.getSchema() , but that is available only starting jdk1.7 , and existing jdbc drivers may not have implemented it. Other option was to use jdbc escape syntax clause for LIMIT, not sure on how well this supported in all the databases also. After looking at all the jdbc metadata options my conclusion was most common way is to use the simple select query with 'where 1 =0' , and allow dialects to customize as needed

…ialect implementations to specify the query.
@rxin
Copy link
Contributor

rxin commented Sep 10, 2015

FWIW, we dropped JVM 1.6 support in Spark 1.5. Would that make this easier?

@sureshthalamati
Copy link
Contributor Author

@rxin

Even if spark is running on jdk1.7, customers using older version of drivers will run into AbstractMethodError exception. I think adding requirement for customers to use new drivers that implement getSchema() function will be unnecessary.

After implementing the current approach I got curious on how the jdbc read functionality finds the meta data and learned org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.resolveTable also uses s"SELECT * FROM $table WHERE 1=0" to get column information.

Alternative approach is to add getMetadataQuery(table:string) to the JdbcDialect interface that helps to determine if table exists for write case , and column type information in the case of read instead of getTableExistsQuery() as implemented in the current pull request. It might be a milli second slower in the case of write call for dialects that specify “select 1 from $table limit 1", instead of “select * from $table limit 1”. Advantage is one method to the interface will address both the cases.

Any comments ?

* @return The SQL query to use for checking the table.
*/
def getTableExistsQuery(table: String): String = {
s"SELECT * FROM $table WHERE 1=0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should quote the table here actually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually never mind we cannot quote it.

Copy link

@toddleo toddleo Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin What's the specific reason table name cannot be quoted? We happen to have a table with dots and parenthesis in its name, planning to add surrounding backticks before passing it to Spark.

@sureshthalamati
Copy link
Contributor Author

next() will return false because resultset will be empty when query is where 1!=0. executeQuery() will throw an exception if table is not found. next() call is not really required to find if the table exists or not.

@sureshthalamati
Copy link
Contributor Author

Typo in my previous comment, I meant when query is where 1=0.

@sureshthalamati
Copy link
Contributor Author

@rxin Thank you for reviewing the patch . Just to make sure tested with out the next() call on MySql, Postgres, and DB2, it worked fine. Updated the pull request.

@rxin
Copy link
Contributor

rxin commented Sep 15, 2015

Thanks. Merging this in master.

@rxin
Copy link
Contributor

rxin commented Sep 15, 2015

(Oops spoke too soon - I will merge after tests pass)

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #1760 has finished for PR 8676 at commit ee7b842.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Sep 15, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Sep 15, 2015

Test build #42504 has finished for PR 8676 at commit ee7b842.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Sep 16, 2015

@vanzin do you know what's going on with the tests?

[error] Execution of test test.org.apache.spark.sql.JavaApplySchemaSuite failed: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ExtendedYarnTest

@rxin
Copy link
Contributor

rxin commented Sep 16, 2015

I've merged this.

@vanzin
Copy link
Contributor

vanzin commented Sep 16, 2015

@rxin I reverted the patch that caused those.

@yhuai
Copy link
Contributor

yhuai commented Sep 16, 2015

It has been merged to master.

@asfgit asfgit closed this in 64c29af Sep 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants