New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector #28863
Conversation
Test build #124222 has finished for PR 28863 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of pointers to help reviewers.
The same applies here just like on MS SQL case. Namely I was able to make this work with active directory. Please see further info here.
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
pom.xml
Outdated
@@ -984,6 +984,12 @@ | |||
<version>8.2.2.jre8</version> | |||
<scope>test</scope> | |||
</dependency> | |||
<dependency> | |||
<groupId>com.oracle.database.jdbc</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the latest version from the Oracle JDBC driver which supports JDK8, JDK9, and JDK11: https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8
|
||
import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions | ||
|
||
private[sql] class OracleConnectionProvider(driver: Driver, options: JDBCOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is based on this.
result | ||
} | ||
|
||
override def setAuthenticationConfigIfNeeded(): Unit = SecurityConfigurationLock.synchronized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here synchronization is important to avoid race just like in other providers.
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
retest this please |
cc @HeartSaVioR |
Test build #124231 has finished for PR 28863 at commit
|
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
package org.apache.spark.sql.execution.datasources.jdbc.connection | ||
|
||
class OracleConnectionProviderSuite extends ConnectionProviderSuiteBase { | ||
test("setAuthenticationConfigIfNeeded must set authentication if not set") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the XXXConnectionProviderSuite
has the almost same test, so could you move it into ConnectionProviderSuiteBase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest to do that? Driver registration and provider instantiation lines are different in each case.
The only duplicate what I see is the test name + the testSecureConnectionProvider
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. But, I felt a a bit less testing for creating a separate test file.
|
||
private[sql] class OracleConnectionProvider(driver: Driver, options: JDBCOptions) | ||
extends SecureConnectionProvider(driver, options) { | ||
override val appEntry: String = "kprb5module" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question; where does this value come? From the Oracle JDBC impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've used JD-GUI to take a look at the details.
...la/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala
Show resolved
Hide resolved
Note: I've checked that |
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
I'm intended to merge master into this PR and resolve conflicts when #28893 accepted. |
Thank you, @gaborgsomogyi . I reviewed #28893 . We can merge that if the timeout issue is resolved there. |
...cker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
Test build #124420 has finished for PR 28863 at commit
|
retest this please |
Test build #124425 has finished for PR 28863 at commit
|
Test build #124418 has finished for PR 28863 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @dongjoon-hyun
...la/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala
Outdated
Show resolved
Hide resolved
...la/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala
Outdated
Show resolved
Hide resolved
...la/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala
Show resolved
Hide resolved
Retest this please |
Test build #124569 has finished for PR 28863 at commit
|
Retest this please. |
Test build #124573 has finished for PR 28863 at commit
|
Retest this please. |
Test build #124576 has finished for PR 28863 at commit
|
Test build #124640 has finished for PR 28863 at commit
|
retest this please |
Retest this please. |
Hi, @gaborgsomogyi . Is |
retest this please |
(Seems Jenkins sleeping now...) |
@dongjoon-hyun Yes, I've spent almost gross 1 month to make it work w/ |
|
Test build #124667 has finished for PR 28863 at commit
|
@dongjoon-hyun Oh gosh! OracleIntegrationSuite is a leftover from the original PR where driver upgrade was inside. |
retest this please |
Test build #124681 has finished for PR 28863 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @gaborgsomogyi and @maropu .
I tested locally the dependency and the new test suite.
Merged to master for Apache Spark 3.1.0.
@dongjoon-hyun thank you taking care and giving me suggestions like your last comment. I'm going to consider them in later contributions. |
Can this cause the build failure? #28959 (comment) |
Thanks, I'm looking at this, @MaxGekk and @jiangxb1987 . It's weird because GitHub action passed. |
What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.
This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.
In this PR I've added Oracle support.
What this PR contains:
OracleConnectionProvider
OracleConnectionProviderSuite
Why are the changes needed?
Missing JDBC kerberos support.
Does this PR introduce any user-facing change?
Yes, now user is able to connect to Oracle using kerberos.
How was this patch tested?
MiniKDC
but it was working only with Active Directory (AD is not available in docker, please see Add support for Kerberos/Active Directory/"windows" authentication microsoft/mssql-docker#165).