Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources #32344

Closed
wants to merge 4 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Apr 26, 2021

What changes were proposed in this pull request?

This PR proposes to introduce a new JDBC option refreshKrb5Config which allows to reflect the change of krb5.conf.

Why are the changes needed?

In the current master, JDBC datasources can't accept refreshKrb5Config which is defined in Krb5LoginModule.
So even if we change the krb5.conf after establishing a connection, the change will not be reflected.

The similar issue happens when we run multiple *KrbIntegrationSuites at the same time.
MiniKDC starts and stops every KerberosIntegrationSuite and different port number is recorded to krb5.conf.
Due to SecureConnectionProvider.JDBCConfiguration doesn't take refreshKrb5Config, KerberosIntegrationSuites except the first running one see the wrong port so those suites fail.
You can easily confirm with the following command.

build/sbt -Phive Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"

Does this PR introduce any user-facing change?

Yes. Users can set refreshKrb5Config to refresh krb5 relevant configuration.

How was this patch tested?

New test.

@github-actions github-actions bot added the SQL label Apr 26, 2021
@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42456/

@SparkQA
Copy link

SparkQA commented Apr 26, 2021

Test build #137934 has finished for PR 32344 at commit 1a2c472.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Configuration.setConfiguration(null)
withTempDir { dir =>
val dummyKrb5Conf = File.createTempFile("dummy", "krb5.conf", dir)
val origKrb5Conf = sys.props("java.security.krb5.conf")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the pre-defined KRB5_CONF_PROP instead of java.security.krb5.conf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I've updated.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-35226][SQL] JDBC datasources should accept refreshKrb5Config parameter [SPARK-35226][SQL] Support refreshKrb5Config option in JDBC datasources Apr 26, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (with minor comment).
Thank you, @sarutak .

@dongjoon-hyun
Copy link
Member

Thank you for update.

@HyukjinKwon
Copy link
Member

cc @gaborgsomogyi FYI

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42494/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42494/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137974 has finished for PR 32344 at commit 5df750d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor

@HyukjinKwon thanks for pinging me.

I've had a look at the change and I think it looks good but can introduce a race. Let me share the use-case:

  • User sets refreshKrb5Config flag with Security context 1
  • User uses a JDBC connection provider
  • User modifies krb5.conf but JVM not yet realized that it must be reloaded
  • Spark authenticates here successfully
  • JVM loads the Security context 2 from the modified krb5.conf
  • Spark restores the previously saved Security context 1 here
  • The modified krb5.conf content just gone

This is just one example, the second one is when config update happens between security context modification and authentication. Such case JDBC authentication will fail temporarily.

I'm not against to add this but I think it's a must to mention somewhere that setting this flag can cause severe issues on running workloads. Seems like the doc change is missing, right?

@sarutak
Copy link
Member Author

sarutak commented Apr 27, 2021

@gaborgsomogyi Thanks for your suggestion. So, it seems better to note in the doc about the possibility of race condition.

@github-actions github-actions bot added the DOCS label Apr 28, 2021
@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42543/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42543/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Test build #138024 has finished for PR 32344 at commit 600598e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sarutak sarutak closed this in 529b875 Apr 29, 2021
sarutak added a commit that referenced this pull request Apr 29, 2021
### What changes were proposed in this pull request?

This PR proposes to introduce a new JDBC option `refreshKrb5Config` which allows to reflect the change of `krb5.conf`.

### Why are the changes needed?

In the current master, JDBC datasources can't accept `refreshKrb5Config` which is defined in `Krb5LoginModule`.
So even if we change the `krb5.conf` after establishing a connection, the change will not be reflected.

The similar issue happens when we run multiple `*KrbIntegrationSuites` at the same time.
`MiniKDC` starts and stops every KerberosIntegrationSuite and different port number is recorded to `krb5.conf`.
Due to `SecureConnectionProvider.JDBCConfiguration` doesn't take `refreshKrb5Config`, KerberosIntegrationSuites except the first running one see the wrong port so those suites fail.
You can easily confirm with the following command.
```
build/sbt -Phive Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```
### Does this PR introduce _any_ user-facing change?

Yes. Users can set `refreshKrb5Config` to refresh krb5 relevant configuration.

### How was this patch tested?

New test.

Closes #32344 from sarutak/kerberos-refresh-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit 529b875)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
@sarutak
Copy link
Member Author

sarutak commented Apr 29, 2021

Merged to master and branch-3.1. Thanks all.

dongjoon-hyun pushed a commit that referenced this pull request May 23, 2021
…IntegrationSuite

### What changes were proposed in this pull request?

This PR fixes an test added in SPARK-35226 (#32344).

### Why are the changes needed?

`SELECT 1` seems non-valid query for DB2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

DB2KrbIntegrationSuite passes on my laptop.

I also confirmed all the KrbIntegrationSuites pass with the following command.
```
build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```

Closes #32632 from sarutak/followup-SPARK-35226.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 23, 2021
…IntegrationSuite

### What changes were proposed in this pull request?

This PR fixes an test added in SPARK-35226 (#32344).

### Why are the changes needed?

`SELECT 1` seems non-valid query for DB2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

DB2KrbIntegrationSuite passes on my laptop.

I also confirmed all the KrbIntegrationSuites pass with the following command.
```
build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```

Closes #32632 from sarutak/followup-SPARK-35226.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 1a43415)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
### What changes were proposed in this pull request?

This PR proposes to introduce a new JDBC option `refreshKrb5Config` which allows to reflect the change of `krb5.conf`.

### Why are the changes needed?

In the current master, JDBC datasources can't accept `refreshKrb5Config` which is defined in `Krb5LoginModule`.
So even if we change the `krb5.conf` after establishing a connection, the change will not be reflected.

The similar issue happens when we run multiple `*KrbIntegrationSuites` at the same time.
`MiniKDC` starts and stops every KerberosIntegrationSuite and different port number is recorded to `krb5.conf`.
Due to `SecureConnectionProvider.JDBCConfiguration` doesn't take `refreshKrb5Config`, KerberosIntegrationSuites except the first running one see the wrong port so those suites fail.
You can easily confirm with the following command.
```
build/sbt -Phive Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```
### Does this PR introduce _any_ user-facing change?

Yes. Users can set `refreshKrb5Config` to refresh krb5 relevant configuration.

### How was this patch tested?

New test.

Closes apache#32344 from sarutak/kerberos-refresh-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(cherry picked from commit 529b875)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…IntegrationSuite

### What changes were proposed in this pull request?

This PR fixes an test added in SPARK-35226 (apache#32344).

### Why are the changes needed?

`SELECT 1` seems non-valid query for DB2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

DB2KrbIntegrationSuite passes on my laptop.

I also confirmed all the KrbIntegrationSuites pass with the following command.
```
build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```

Closes apache#32632 from sarutak/followup-SPARK-35226.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 1a43415)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants