[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019

gaborgsomogyi · 2020-03-25T09:53:52Z

What changes were proposed in this pull request?

When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.

This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.

In this PR I've added MariaDB support (other supported databases will come in later PRs).

What this PR contains:

Introduced SecureConnectionProvider and added basic secure functionalities
Added MariaDBConnectionProvider
Added MariaDBConnectionProviderSuite
Added MariaDBKrbIntegrationSuite docker integration test
Added some missing code documentation

Why are the changes needed?

Missing JDBC kerberos support.

Does this PR introduce any user-facing change?

Yes, now user is able to connect to MariaDB using kerberos.

How was this patch tested?

Additional + existing unit tests
Additional + existing integration tests
Test on cluster manually

external/docker-integration-tests/pom.xml

SparkQA · 2020-03-25T12:50:37Z

Test build #120336 has finished for PR 28019 at commit ff04926.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCConfiguration(

gaborgsomogyi · 2020-03-25T12:54:04Z

Seems unrelated.

gaborgsomogyi · 2020-03-25T12:54:13Z

retest this please

gaborgsomogyi · 2020-03-25T13:02:54Z

Filed https://issues.apache.org/jira/browse/SPARK-31247

SparkQA · 2020-03-25T15:32:02Z

Test build #120353 has finished for PR 28019 at commit ff04926.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCConfiguration(

gaborgsomogyi · 2020-03-25T15:45:28Z

Seems unrelated.

gaborgsomogyi · 2020-03-25T15:45:42Z

retest this please

gaborgsomogyi · 2020-03-25T16:16:05Z

Filed https://issues.apache.org/jira/browse/SPARK-31252

SparkQA · 2020-03-25T20:00:39Z

Test build #120363 has finished for PR 28019 at commit ff04926.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCConfiguration(

gaborgsomogyi · 2020-03-26T06:49:05Z

Seems unrelated.

gaborgsomogyi · 2020-03-26T06:49:16Z

retest this please

HyukjinKwon · 2020-03-26T06:54:31Z

Hm, I think the tests became considerably flaky lately .. yes, might be best to file a JIRA for now ...

SparkQA · 2020-03-26T07:05:01Z

Test build #120404 has finished for PR 28019 at commit ff04926.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCConfiguration(

HyukjinKwon · 2020-03-26T07:12:40Z

retest this please

SparkQA · 2020-03-26T10:21:23Z

Test build #120407 has finished for PR 28019 at commit ff04926.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class JDBCConfiguration(

gaborgsomogyi · 2020-03-26T10:51:57Z

Filed https://issues.apache.org/jira/browse/SPARK-31266 and https://issues.apache.org/jira/browse/SPARK-31267

gaborgsomogyi · 2020-03-26T10:52:12Z

cc @HeartSaVioR

HeartSaVioR

It'd be nice to have guidance comments if you do refactor something as well - that would avoid review on moved method via line by line (added vs deleted) unnecessarily.

external/docker-integration-tests/pom.xml

HeartSaVioR

The code change looks good assuming the tests pass - code change is majorly from removing possible deduplication between postgre and mariadb which totally makes sense and looks better.

I can't still run the actual tests unfortunately. I'll give a try, but it would be nice if someone in better understanding on this area can help reviewing as well.

...-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala

HeartSaVioR · 2020-03-29T11:59:17Z

...tegration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerKrbJDBCIntegrationSuite.scala

@@ -91,4 +98,66 @@ abstract class DockerKrbJDBCIntegrationSuite extends DockerJDBCIntegrationSuite
    logInfo(s"Created executable resource file: ${newEntry.getAbsolutePath}")
    newEntry
  }
+
+  override def dataPreparation(conn: Connection): Unit = {


FYI to further reviewers: this, and below tests are moved from PostgreKrbIntegrationSuite.

...a/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala

HeartSaVioR · 2020-03-29T12:09:34Z

...la/org/apache/spark/sql/execution/datasources/jdbc/connection/SecureConnectionProvider.scala

+import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions
+import org.apache.spark.util.SecurityUtils
+
+private[jdbc] abstract class SecureConnectionProvider(driver: Driver, options: JDBCOptions)


FYI to further reviewers: methods in SecureConnectionProvider (both class and object) are moved from PostgresConnectionProvider.

HeartSaVioR · 2020-03-29T12:14:09Z

...org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuiteBase.scala

+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.execution.datasources.jdbc.{DriverRegistry, JDBCOptions}
+
+abstract class ConnectionProviderSuiteBase extends SparkFunSuite with BeforeAndAfterEach {


FYI to further reviewers: almost everything in ConnectionProviderSuiteBase is moved from PostgreConnectionProviderSuite.

SparkQA · 2020-03-30T11:30:44Z

Test build #120581 has finished for PR 28019 at commit 89f5ac9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gaborgsomogyi · 2020-03-31T11:23:10Z

cc @vanzin @dongjoon-hyun

…ros testing

gaborgsomogyi · 2020-04-01T08:44:35Z

While I'm implementing DB2 kerberos part I've realised that creating new database is not essential for kerberos testing so I've made this simplification in the last commit. Worth to mention re-executed all the docker tests again and all passed.

SparkQA · 2020-04-01T11:51:26Z

Test build #120670 has finished for PR 28019 at commit 2bb6426.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gaborgsomogyi · 2020-04-01T14:12:53Z

external/docker-integration-tests/pom.xml

@@ -121,8 +121,8 @@
      <scope>test</scope>
    </dependency>
    <dependency>
-      <groupId>mysql</groupId>
-      <artifactId>mysql-connector-java</artifactId>
+      <groupId>org.mariadb.jdbc</groupId>


Changing from mysql to mariadb is needed because of this: https://stackoverflow.com/questions/52718788/how-to-read-data-from-mariadb-using-spark-java

vanzin

Only some minor things.

.../docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala

...-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala

...a/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala

SparkQA · 2020-04-06T11:36:38Z

Test build #120865 has finished for PR 28019 at commit 2740a50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2020-04-09T16:18:13Z

Loks good, merging to master.

gaborgsomogyi · 2020-04-09T18:44:12Z

@vanzin many thanks for taking care!

### What changes were proposed in this pull request? When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. In this PR I've added MariaDB support (other supported databases will come in later PRs). What this PR contains: * Introduced `SecureConnectionProvider` and added basic secure functionalities * Added `MariaDBConnectionProvider` * Added `MariaDBConnectionProviderSuite` * Added `MariaDBKrbIntegrationSuite` docker integration test * Added some missing code documentation ### Why are the changes needed? Missing JDBC kerberos support. ### Does this PR introduce any user-facing change? Yes, now user is able to connect to MariaDB using kerberos. ### How was this patch tested? * Additional + existing unit tests * Additional + existing integration tests * Test on cluster manually Closes apache#28019 from gaborgsomogyi/SPARK-31021. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@apache.org>

[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector

ff04926

gaborgsomogyi commented Mar 25, 2020

View reviewed changes

external/docker-integration-tests/pom.xml Outdated Show resolved Hide resolved

HeartSaVioR reviewed Mar 29, 2020

View reviewed changes

external/docker-integration-tests/pom.xml Outdated Show resolved Hide resolved

HeartSaVioR reviewed Mar 29, 2020

View reviewed changes

Review fixes

89f5ac9

Simplification since creating new database is not essential for kerbe…

2bb6426

…ros testing

gaborgsomogyi commented Apr 1, 2020

View reviewed changes

vanzin reviewed Apr 3, 2020

View reviewed changes

Review fix

2740a50

vanzin closed this in 1354d2d Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019

[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019

gaborgsomogyi commented Mar 25, 2020

SparkQA commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

SparkQA commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

SparkQA commented Mar 25, 2020

gaborgsomogyi commented Mar 26, 2020

gaborgsomogyi commented Mar 26, 2020

HyukjinKwon commented Mar 26, 2020 •

edited

Loading

SparkQA commented Mar 26, 2020

HyukjinKwon commented Mar 26, 2020

SparkQA commented Mar 26, 2020

gaborgsomogyi commented Mar 26, 2020

gaborgsomogyi commented Mar 26, 2020

HeartSaVioR left a comment

HeartSaVioR left a comment •

edited

Loading

HeartSaVioR Mar 29, 2020

HeartSaVioR Mar 29, 2020

HeartSaVioR Mar 29, 2020

SparkQA commented Mar 30, 2020

gaborgsomogyi commented Mar 31, 2020

gaborgsomogyi commented Apr 1, 2020

SparkQA commented Apr 1, 2020

gaborgsomogyi Apr 1, 2020

vanzin left a comment

SparkQA commented Apr 6, 2020

vanzin commented Apr 9, 2020

gaborgsomogyi commented Apr 9, 2020

[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019

[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019

Conversation

gaborgsomogyi commented Mar 25, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

SparkQA commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

gaborgsomogyi commented Mar 25, 2020

SparkQA commented Mar 25, 2020

gaborgsomogyi commented Mar 26, 2020

gaborgsomogyi commented Mar 26, 2020

HyukjinKwon commented Mar 26, 2020 • edited Loading

SparkQA commented Mar 26, 2020

HyukjinKwon commented Mar 26, 2020

SparkQA commented Mar 26, 2020

gaborgsomogyi commented Mar 26, 2020

gaborgsomogyi commented Mar 26, 2020

HeartSaVioR left a comment

Choose a reason for hiding this comment

HeartSaVioR left a comment • edited Loading

Choose a reason for hiding this comment

HeartSaVioR Mar 29, 2020

Choose a reason for hiding this comment

HeartSaVioR Mar 29, 2020

Choose a reason for hiding this comment

HeartSaVioR Mar 29, 2020

Choose a reason for hiding this comment

SparkQA commented Mar 30, 2020

gaborgsomogyi commented Mar 31, 2020

gaborgsomogyi commented Apr 1, 2020

SparkQA commented Apr 1, 2020

gaborgsomogyi Apr 1, 2020

Choose a reason for hiding this comment

vanzin left a comment

Choose a reason for hiding this comment

SparkQA commented Apr 6, 2020

vanzin commented Apr 9, 2020

gaborgsomogyi commented Apr 9, 2020

HyukjinKwon commented Mar 26, 2020 •

edited

Loading

HeartSaVioR left a comment •

edited

Loading