Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment #18724

Closed
wants to merge 4 commits into from
Closed

[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment #18724

wants to merge 4 commits into from

Conversation

LucaCanali
Copy link
Contributor

Add an option to the JDBC data source to initialize the environment of the remote database session

What changes were proposed in this pull request?

This proposes an option to the JDBC datasource, tentatively called " sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements ) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle).

See also https://issues.apache.org/jira/browse/SPARK-21519

How was this patch tested?

Manually tested using Spark SQL data source and Oracle JDBC

@gatorsmile
Copy link
Member

test this please

@@ -135,6 +135,8 @@ class JDBCOptions(
case "REPEATABLE_READ" => Connection.TRANSACTION_REPEATABLE_READ
case "SERIALIZABLE" => Connection.TRANSACTION_SERIALIZABLE
}
// An option to execute custom SQL before fetching data from the remote DB
val sessionInitStatement = parameters.getOrElse(JDBC_SESSION_INIT_STATEMENT, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use parameters.get

@gatorsmile
Copy link
Member

Could you add a test case to JDBCSuite?

@SparkQA
Copy link

SparkQA commented Jul 31, 2017

Test build #80065 has finished for PR 18724 at commit 92d082a.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

ping @LucaCanali

@LucaCanali
Copy link
Contributor Author

Thank you very much @gatorsmile for the review. I plan to provide the required changes and add a test case, however it is probably going to take one more week before I can do that.

@gatorsmile
Copy link
Member

Thanks for your time!

@gatorsmile
Copy link
Member

ok to test

@@ -1007,4 +1007,14 @@ class JDBCSuite extends SparkFunSuite
assert(sql("select * from people_view").count() == 3)
}
}

test("SPARK-21519: option sessionInitStatement, run SQL to initialize the database session.") {
val initSQL = "SET @MYTESTVAR 21519"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test case for more than one statements?

@SparkQA
Copy link

SparkQA commented Aug 10, 2017

Test build #80505 has finished for PR 18724 at commit 0a0ff0a.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.load()
assert(df1.collect() === Array(Row(21519)))

val initSQL2 = "SET SCHEMA DUMMY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might not explain it clearly. Is that possible we can have a test case to send more than one statements in a single session initialization? Now these two examples have only one statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I have now added a test that runs 2 SQL statements.
For future reference I'd like to stress the fact that the code executed by the option "sessionInitStatement" is just the user-provided string fed through the execute method of the JDBC connection, so it can use the features of the target database language/syntax. In the case of the test I wrote for the H2 database I have just put together two commands separated by ";". When using sessionInitStatement for querying Oracle, for example, the user-provided command can be a SQL statemnet or a PL/SQL block grouping multiple commands and logic.

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80507 has finished for PR 18724 at commit 55e63a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@SparkQA
Copy link

SparkQA commented Aug 11, 2017

Test build #80539 has finished for PR 18724 at commit 5792fd6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 0377338 Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants