-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment #18724
[SPARK-21519][SQL] Add an option to the JDBC data source to initialize the target DB environment #18724
Conversation
…f the remote database session
test this please |
@@ -135,6 +135,8 @@ class JDBCOptions( | |||
case "REPEATABLE_READ" => Connection.TRANSACTION_REPEATABLE_READ | |||
case "SERIALIZABLE" => Connection.TRANSACTION_SERIALIZABLE | |||
} | |||
// An option to execute custom SQL before fetching data from the remote DB | |||
val sessionInitStatement = parameters.getOrElse(JDBC_SESSION_INIT_STATEMENT, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use parameters.get
Could you add a test case to |
Test build #80065 has finished for PR 18724 at commit
|
ping @LucaCanali |
Thank you very much @gatorsmile for the review. I plan to provide the required changes and add a test case, however it is probably going to take one more week before I can do that. |
Thanks for your time! |
ok to test |
@@ -1007,4 +1007,14 @@ class JDBCSuite extends SparkFunSuite | |||
assert(sql("select * from people_view").count() == 3) | |||
} | |||
} | |||
|
|||
test("SPARK-21519: option sessionInitStatement, run SQL to initialize the database session.") { | |||
val initSQL = "SET @MYTESTVAR 21519" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test case for more than one statements?
Test build #80505 has finished for PR 18724 at commit
|
.load() | ||
assert(df1.collect() === Array(Row(21519))) | ||
|
||
val initSQL2 = "SET SCHEMA DUMMY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I might not explain it clearly. Is that possible we can have a test case to send more than one statements in a single session initialization? Now these two examples have only one statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. I have now added a test that runs 2 SQL statements.
For future reference I'd like to stress the fact that the code executed by the option "sessionInitStatement" is just the user-provided string fed through the execute method of the JDBC connection, so it can use the features of the target database language/syntax. In the case of the test I wrote for the H2 database I have just put together two commands separated by ";". When using sessionInitStatement for querying Oracle, for example, the user-provided command can be a SQL statemnet or a PL/SQL block grouping multiple commands and logic.
Test build #80507 has finished for PR 18724 at commit
|
LGTM |
Test build #80539 has finished for PR 18724 at commit
|
Thanks! Merging to master. |
Add an option to the JDBC data source to initialize the environment of the remote database session
What changes were proposed in this pull request?
This proposes an option to the JDBC datasource, tentatively called " sessionInitStatement" to implement the functionality of session initialization present for example in the Sqoop connector for Oracle (see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements ) . After each database session is opened to the remote DB, and before starting to read data, this option executes a custom SQL statement (or a PL/SQL block in the case of Oracle).
See also https://issues.apache.org/jira/browse/SPARK-21519
How was this patch tested?
Manually tested using Spark SQL data source and Oracle JDBC