Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019

Closed
wants to merge 4 commits into from

Conversation

gaborgsomogyi
Copy link
Contributor

What changes were proposed in this pull request?

When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added MariaDB support (other supported databases will come in later PRs).

What this PR contains:

  • Introduced SecureConnectionProvider and added basic secure functionalities
  • Added MariaDBConnectionProvider
  • Added MariaDBConnectionProviderSuite
  • Added MariaDBKrbIntegrationSuite docker integration test
  • Added some missing code documentation

Why are the changes needed?

Missing JDBC kerberos support.

Does this PR introduce any user-facing change?

Yes, now user is able to connect to MariaDB using kerberos.

How was this patch tested?

  • Additional + existing unit tests
  • Additional + existing integration tests
  • Test on cluster manually

@SparkQA
Copy link

SparkQA commented Mar 25, 2020

Test build #120336 has finished for PR 28019 at commit ff04926.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JDBCConfiguration(

@gaborgsomogyi
Copy link
Contributor Author

Seems unrelated.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@gaborgsomogyi
Copy link
Contributor Author

Filed https://issues.apache.org/jira/browse/SPARK-31247

@SparkQA
Copy link

SparkQA commented Mar 25, 2020

Test build #120353 has finished for PR 28019 at commit ff04926.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JDBCConfiguration(

@gaborgsomogyi
Copy link
Contributor Author

Seems unrelated.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@gaborgsomogyi
Copy link
Contributor Author

Filed https://issues.apache.org/jira/browse/SPARK-31252

@SparkQA
Copy link

SparkQA commented Mar 25, 2020

Test build #120363 has finished for PR 28019 at commit ff04926.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JDBCConfiguration(

@gaborgsomogyi
Copy link
Contributor Author

Seems unrelated.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 26, 2020

Hm, I think the tests became considerably flaky lately .. yes, might be best to file a JIRA for now ...

@SparkQA
Copy link

SparkQA commented Mar 26, 2020

Test build #120404 has finished for PR 28019 at commit ff04926.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JDBCConfiguration(

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 26, 2020

Test build #120407 has finished for PR 28019 at commit ff04926.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JDBCConfiguration(

@gaborgsomogyi
Copy link
Contributor Author

@gaborgsomogyi
Copy link
Contributor Author

cc @HeartSaVioR

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have guidance comments if you do refactor something as well - that would avoid review on moved method via line by line (added vs deleted) unnecessarily.

external/docker-integration-tests/pom.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change looks good assuming the tests pass - code change is majorly from removing possible deduplication between postgre and mariadb which totally makes sense and looks better.

I can't still run the actual tests unfortunately. I'll give a try, but it would be nice if someone in better understanding on this area can help reviewing as well.

@@ -91,4 +98,66 @@ abstract class DockerKrbJDBCIntegrationSuite extends DockerJDBCIntegrationSuite
logInfo(s"Created executable resource file: ${newEntry.getAbsolutePath}")
newEntry
}

override def dataPreparation(conn: Connection): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI to further reviewers: this, and below tests are moved from PostgreKrbIntegrationSuite.

import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions
import org.apache.spark.util.SecurityUtils

private[jdbc] abstract class SecureConnectionProvider(driver: Driver, options: JDBCOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI to further reviewers: methods in SecureConnectionProvider (both class and object) are moved from PostgresConnectionProvider.

import org.apache.spark.SparkFunSuite
import org.apache.spark.sql.execution.datasources.jdbc.{DriverRegistry, JDBCOptions}

abstract class ConnectionProviderSuiteBase extends SparkFunSuite with BeforeAndAfterEach {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI to further reviewers: almost everything in ConnectionProviderSuiteBase is moved from PostgreConnectionProviderSuite.

@SparkQA
Copy link

SparkQA commented Mar 30, 2020

Test build #120581 has finished for PR 28019 at commit 89f5ac9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

cc @vanzin @dongjoon-hyun

@gaborgsomogyi
Copy link
Contributor Author

While I'm implementing DB2 kerberos part I've realised that creating new database is not essential for kerberos testing so I've made this simplification in the last commit. Worth to mention re-executed all the docker tests again and all passed.

@SparkQA
Copy link

SparkQA commented Apr 1, 2020

Test build #120670 has finished for PR 28019 at commit 2bb6426.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -121,8 +121,8 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<groupId>org.mariadb.jdbc</groupId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some minor things.

@SparkQA
Copy link

SparkQA commented Apr 6, 2020

Test build #120865 has finished for PR 28019 at commit 2740a50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Apr 9, 2020

Loks good, merging to master.

@vanzin vanzin closed this in 1354d2d Apr 9, 2020
@gaborgsomogyi
Copy link
Contributor Author

@vanzin many thanks for taking care!

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added MariaDB support (other supported databases will come in later PRs).

What this PR contains:
* Introduced `SecureConnectionProvider` and added basic secure functionalities
* Added `MariaDBConnectionProvider`
* Added `MariaDBConnectionProviderSuite`
* Added `MariaDBKrbIntegrationSuite` docker integration test
* Added some missing code documentation

### Why are the changes needed?
Missing JDBC kerberos support.

### Does this PR introduce any user-facing change?
Yes, now user is able to connect to MariaDB using kerberos.

### How was this patch tested?
* Additional + existing unit tests
* Additional + existing integration tests
* Test on cluster manually

Closes apache#28019 from gaborgsomogyi/SPARK-31021.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants