Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector #28863

Closed
wants to merge 4 commits into from

Conversation

gaborgsomogyi
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi commented Jun 18, 2020

What changes were proposed in this pull request?

When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added Oracle support.

What this PR contains:

  • Added OracleConnectionProvider
  • Added OracleConnectionProviderSuite

Why are the changes needed?

Missing JDBC kerberos support.

Does this PR introduce any user-facing change?

Yes, now user is able to connect to Oracle using kerberos.

How was this patch tested?

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124222 has finished for PR 28863 at commit 5cc3d0a.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of pointers to help reviewers.

The same applies here just like on MS SQL case. Namely I was able to make this work with active directory. Please see further info here.

pom.xml Outdated
@@ -984,6 +984,12 @@
<version>8.2.2.jre8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.oracle.database.jdbc</groupId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the latest version from the Oracle JDBC driver which supports JDK8, JDK9, and JDK11: https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8


import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions

private[sql] class OracleConnectionProvider(driver: Driver, options: JDBCOptions)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is based on this.

result
}

override def setAuthenticationConfigIfNeeded(): Unit = SecurityConfigurationLock.synchronized {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here synchronization is important to avoid race just like in other providers.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@gaborgsomogyi
Copy link
Contributor Author

cc @HeartSaVioR

@SparkQA
Copy link

SparkQA commented Jun 18, 2020

Test build #124231 has finished for PR 28863 at commit 5cc3d0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

cc @maropu @MaxGekk

package org.apache.spark.sql.execution.datasources.jdbc.connection

class OracleConnectionProviderSuite extends ConnectionProviderSuiteBase {
test("setAuthenticationConfigIfNeeded must set authentication if not set") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the XXXConnectionProviderSuite has the almost same test, so could you move it into ConnectionProviderSuiteBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest to do that? Driver registration and provider instantiation lines are different in each case.
The only duplicate what I see is the test name + the testSecureConnectionProvider call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. But, I felt a a bit less testing for creating a separate test file.


private[sql] class OracleConnectionProvider(driver: Driver, options: JDBCOptions)
extends SecureConnectionProvider(driver, options) {
override val appEntry: String = "kprb5module"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question; where does this value come? From the Oracle JDBC impl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've used JD-GUI to take a look at the details.

@maropu
Copy link
Member

maropu commented Jun 20, 2020

Note: I've checked that OracleIntegrationSuite passed in my local Mac env.

@gaborgsomogyi
Copy link
Contributor Author

I'm intended to merge master into this PR and resolve conflicts when #28893 accepted.

@dongjoon-hyun
Copy link
Member

Thank you, @gaborgsomogyi . I reviewed #28893 . We can merge that if the timeout issue is resolved there.

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124420 has finished for PR 28863 at commit e16b4a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124425 has finished for PR 28863 at commit e16b4a4.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124418 has finished for PR 28863 at commit d698fea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public final class MapOutputCommitMessage
  • sealed trait LogisticRegressionSummary extends ClassificationSummary
  • class _ClassificationSummary(JavaWrapper):
  • class _TrainingSummary(JavaWrapper):
  • class _BinaryClassificationSummary(_ClassificationSummary):
  • class LogisticRegressionSummary(_ClassificationSummary):
  • class LogisticRegressionTrainingSummary(LogisticRegressionSummary, _TrainingSummary):
  • class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,
  • case class Hour(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Minute(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • case class Second(child: Expression, timeZoneId: Option[String] = None) extends GetTimeField
  • trait GetDateField extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant
  • case class DayOfYear(child: Expression) extends GetDateField
  • case class Year(child: Expression) extends GetDateField
  • case class YearOfWeek(child: Expression) extends GetDateField
  • case class Quarter(child: Expression) extends GetDateField
  • case class Month(child: Expression) extends GetDateField
  • case class DayOfMonth(child: Expression) extends GetDateField
  • case class DayOfWeek(child: Expression) extends GetDateField
  • case class WeekDay(child: Expression) extends GetDateField
  • case class WeekOfYear(child: Expression) extends GetDateField
  • case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends Rule[SparkPlan]
  • class StateStoreConf(

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Jun 27, 2020

Test build #124569 has finished for PR 28863 at commit e16b4a4.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jun 27, 2020

Test build #124573 has finished for PR 28863 at commit e16b4a4.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jun 27, 2020

Test build #124576 has finished for PR 28863 at commit e16b4a4.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 29, 2020

Test build #124640 has finished for PR 28863 at commit 461eed0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@dongjoon-hyun
Copy link
Member

Retest this please.

@dongjoon-hyun
Copy link
Member

Hi, @gaborgsomogyi . Is OracleKrbIntegrationSuite missing here?

@maropu
Copy link
Member

maropu commented Jun 30, 2020

retest this please

@maropu
Copy link
Member

maropu commented Jun 30, 2020

(Seems Jenkins sleeping now...)

@gaborgsomogyi
Copy link
Contributor Author

gaborgsomogyi commented Jun 30, 2020

@dongjoon-hyun Yes, I've spent almost gross 1 month to make it work w/ MiniKDC but no success. Please see my comment up. Not found any clear statement it must work so I gave up. If somebody can make it work or when active directory will be available in docker we can add it.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 30, 2020

@gaborgsomogyi .

  • +1 for adding those important content on the PR description How was this patch tested? section. Your important Investigation and decision is worth to be a commit log. Mostly, only commit logs prove your achievement and contribution, and make people trust you.
  • Existing integration tests (especially OracleIntegrationSuite) sounds like misleading a little. Is that testing this code patch?

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124667 has finished for PR 28863 at commit 461eed0.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

@dongjoon-hyun Oh gosh! OracleIntegrationSuite is a leftover from the original PR where driver upgrade was inside.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 30, 2020

Test build #124681 has finished for PR 28863 at commit 461eed0.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @gaborgsomogyi and @maropu .
I tested locally the dependency and the new test suite.
Merged to master for Apache Spark 3.1.0.

@gaborgsomogyi
Copy link
Contributor Author

@dongjoon-hyun thank you taking care and giving me suggestions like your last comment. I'm going to consider them in later contributions.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 30, 2020

Can this cause the build failure? #28959 (comment)

@dongjoon-hyun
Copy link
Member

Thanks, I'm looking at this, @MaxGekk and @jiangxb1987 . It's weird because GitHub action passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants