-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31021][SQL] Support MariaDB Kerberos login in JDBC connector #28019
Conversation
Test build #120336 has finished for PR 28019 at commit
|
Seems unrelated. |
retest this please |
Test build #120353 has finished for PR 28019 at commit
|
Seems unrelated. |
retest this please |
Test build #120363 has finished for PR 28019 at commit
|
Seems unrelated. |
retest this please |
Hm, I think the tests became considerably flaky lately .. yes, might be best to file a JIRA for now ... |
Test build #120404 has finished for PR 28019 at commit
|
retest this please |
Test build #120407 has finished for PR 28019 at commit
|
cc @HeartSaVioR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have guidance comments if you do refactor something as well - that would avoid review on moved method via line by line (added vs deleted) unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks good assuming the tests pass - code change is majorly from removing possible deduplication between postgre and mariadb which totally makes sense and looks better.
I can't still run the actual tests unfortunately. I'll give a try, but it would be nice if someone in better understanding on this area can help reviewing as well.
...-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
@@ -91,4 +98,66 @@ abstract class DockerKrbJDBCIntegrationSuite extends DockerJDBCIntegrationSuite | |||
logInfo(s"Created executable resource file: ${newEntry.getAbsolutePath}") | |||
newEntry | |||
} | |||
|
|||
override def dataPreparation(conn: Connection): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI to further reviewers: this, and below tests are moved from PostgreKrbIntegrationSuite
.
...a/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala
Outdated
Show resolved
Hide resolved
import org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions | ||
import org.apache.spark.util.SecurityUtils | ||
|
||
private[jdbc] abstract class SecureConnectionProvider(driver: Driver, options: JDBCOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI to further reviewers: methods in SecureConnectionProvider (both class and object) are moved from PostgresConnectionProvider.
import org.apache.spark.SparkFunSuite | ||
import org.apache.spark.sql.execution.datasources.jdbc.{DriverRegistry, JDBCOptions} | ||
|
||
abstract class ConnectionProviderSuiteBase extends SparkFunSuite with BeforeAndAfterEach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI to further reviewers: almost everything in ConnectionProviderSuiteBase is moved from PostgreConnectionProviderSuite.
Test build #120581 has finished for PR 28019 at commit
|
While I'm implementing DB2 kerberos part I've realised that creating new database is not essential for kerberos testing so I've made this simplification in the last commit. Worth to mention re-executed all the docker tests again and all passed. |
Test build #120670 has finished for PR 28019 at commit
|
@@ -121,8 +121,8 @@ | |||
<scope>test</scope> | |||
</dependency> | |||
<dependency> | |||
<groupId>mysql</groupId> | |||
<artifactId>mysql-connector-java</artifactId> | |||
<groupId>org.mariadb.jdbc</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing from mysql
to mariadb
is needed because of this: https://stackoverflow.com/questions/52718788/how-to-read-data-from-mariadb-using-spark-java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some minor things.
.../docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala
Outdated
Show resolved
Hide resolved
...a/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala
Outdated
Show resolved
Hide resolved
Test build #120865 has finished for PR 28019 at commit
|
Loks good, merging to master. |
@vanzin many thanks for taking care! |
### What changes were proposed in this pull request? When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it. This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues. In this PR I've added MariaDB support (other supported databases will come in later PRs). What this PR contains: * Introduced `SecureConnectionProvider` and added basic secure functionalities * Added `MariaDBConnectionProvider` * Added `MariaDBConnectionProviderSuite` * Added `MariaDBKrbIntegrationSuite` docker integration test * Added some missing code documentation ### Why are the changes needed? Missing JDBC kerberos support. ### Does this PR introduce any user-facing change? Yes, now user is able to connect to MariaDB using kerberos. ### How was this patch tested? * Additional + existing unit tests * Additional + existing integration tests * Test on cluster manually Closes apache#28019 from gaborgsomogyi/SPARK-31021. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: Marcelo Vanzin <vanzin@apache.org>
What changes were proposed in this pull request?
When loading DataFrames from JDBC datasource with Kerberos authentication, remote executors (yarn-client/cluster etc. modes) fail to establish a connection due to lack of Kerberos ticket or ability to generate it.
This is a real issue when trying to ingest data from kerberized data sources (SQL Server, Oracle) in enterprise environment where exposing simple authentication access is not an option due to IT policy issues.
In this PR I've added MariaDB support (other supported databases will come in later PRs).
What this PR contains:
SecureConnectionProvider
and added basic secure functionalitiesMariaDBConnectionProvider
MariaDBConnectionProviderSuite
MariaDBKrbIntegrationSuite
docker integration testWhy are the changes needed?
Missing JDBC kerberos support.
Does this PR introduce any user-facing change?
Yes, now user is able to connect to MariaDB using kerberos.
How was this patch tested?