[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression #31056

HeartSaVioR · 2021-01-06T04:38:45Z

What changes were proposed in this pull request?

This PR proposes to adjust the order of check in KafkaTokenUtil.needTokenUpdate, so that short-circuit applies on the non-delegation token cases (insecure + secured without delegation token) and remedies the performance regression heavily.

Why are the changes needed?

There's a serious performance regression between Spark 2.4 vs Spark 3.0 on read path against Kafka data source.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually ran a reproducer (https://github.com/codegorillauk/spark-kafka-read with modification to just count instead of writing to Kafka topic) with measuring the time.

the branch applying the change with adding measurement

https://github.com/HeartSaVioR/spark/commits/debug-SPARK-33635-v3.0.1

the branch only adding measurement

https://github.com/HeartSaVioR/spark/commits/debug-original-ver-SPARK-33635-v3.0.1

the result (before the fix)

count: 10280000
Took 41.634007047 secs

21/01/06 13:16:07 INFO KafkaDataConsumer: debug ver. 17-original
21/01/06 13:16:07 INFO KafkaDataConsumer: Total time taken to retrieve: 82118 ms

the result (after the fix)

count: 10280000
Took 7.964058475 secs

21/01/06 13:08:22 INFO KafkaDataConsumer: debug ver. 17
21/01/06 13:08:22 INFO KafkaDataConsumer: Total time taken to retrieve: 987 ms

…enUpdate to remedy perf regression

HeartSaVioR · 2021-01-06T04:43:44Z

I just made the smallest change as a short-term solution (the problem will persist for end users using Kafka delegation token), as Spark 3.1.0 RC vote is happening and I don't want to drag the release too much.

It would be ideal if we deal with long-term solution (making overhead of HadoopDelegationTokenManager.isServiceEnabled be minor, or reduce occurrence of the check heavily) in time, but just wanted to be safe.

HeartSaVioR · 2021-01-06T04:56:50Z

The further optimization should be depending on the possibility of "changes" in SparkConf on the fly. If we don't allow it (or at least Kafka/token related configuration), we just need to call HadoopDelegationTokenManager.isServiceEnabled once per JVM, and cache it to avoid calling per each get.

HeartSaVioR · 2021-01-06T05:05:33Z

cc. @tdas @zsxwing @jose-torres @viirya @gaborgsomogyi @xuanyuanking

HeartSaVioR · 2021-01-06T05:10:04Z

In addition, the time in Total time taken to retrieve checks pure overhead compared to Spark 2.4, but I ran the application with my local dev where parallelism is not fully enabled (just ran with local[3]) and I also have to do some other thing in parallel, so the value should be pretty much smaller (probably trivial) if the test environment is isolated properly with enough power to enable full parallelism.

viirya

As a quick fix, it looks good as it just reorders the conditions. Even we will deal with long-term solution, I think it is still no harm to have the quick fix now.

SparkQA · 2021-01-06T05:22:53Z

Test build #133711 has finished for PR 31056 at commit f5decdd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-06T05:57:15Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38299/

dongjoon-hyun

+1, LGTM. Thank you, @HeartSaVioR and @viirya . I agree with you with this fix.
Merged to master/3.1/3.0.

…enUpdate to remedy perf regression ### What changes were proposed in this pull request? This PR proposes to adjust the order of check in KafkaTokenUtil.needTokenUpdate, so that short-circuit applies on the non-delegation token cases (insecure + secured without delegation token) and remedies the performance regression heavily. ### Why are the changes needed? There's a serious performance regression between Spark 2.4 vs Spark 3.0 on read path against Kafka data source. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually ran a reproducer (https://github.com/codegorillauk/spark-kafka-read with modification to just count instead of writing to Kafka topic) with measuring the time. > the branch applying the change with adding measurement https://github.com/HeartSaVioR/spark/commits/debug-SPARK-33635-v3.0.1 > the branch only adding measurement https://github.com/HeartSaVioR/spark/commits/debug-original-ver-SPARK-33635-v3.0.1 > the result (before the fix) count: 10280000 Took 41.634007047 secs 21/01/06 13:16:07 INFO KafkaDataConsumer: debug ver. 17-original 21/01/06 13:16:07 INFO KafkaDataConsumer: Total time taken to retrieve: 82118 ms > the result (after the fix) count: 10280000 Took 7.964058475 secs 21/01/06 13:08:22 INFO KafkaDataConsumer: debug ver. 17 21/01/06 13:08:22 INFO KafkaDataConsumer: Total time taken to retrieve: 987 ms Closes #31056 from HeartSaVioR/SPARK-33635. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit fa93090) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

dongjoon-hyun · 2021-01-06T06:01:22Z

cc @HyukjinKwon since he is the release manager of Spark 3.1.0.

SparkQA · 2021-01-06T06:27:50Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38299/

gaborgsomogyi

Late LGTM.

gaborgsomogyi · 2021-01-06T10:36:42Z

Since SparkConf is static we can cache the first time calculated values. I've started to work on the long term solution.

[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTok…

f5decdd

…enUpdate to remedy perf regression

viirya approved these changes Jan 6, 2021

View reviewed changes

dongjoon-hyun approved these changes Jan 6, 2021

View reviewed changes

dongjoon-hyun closed this in fa93090 Jan 6, 2021

gaborgsomogyi reviewed Jan 6, 2021

View reviewed changes

xuanyuanking mentioned this pull request Jan 21, 2021

[SPARK-34090][SS]Cache HadoopDelegationTokenManager.isServiceEnabled result used in KafkaTokenUtil.needTokenUpdate #31154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression #31056

[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression #31056

HeartSaVioR commented Jan 6, 2021

HeartSaVioR commented Jan 6, 2021 •

edited

Loading

HeartSaVioR commented Jan 6, 2021

HeartSaVioR commented Jan 6, 2021

HeartSaVioR commented Jan 6, 2021 •

edited

Loading

viirya left a comment

SparkQA commented Jan 6, 2021

SparkQA commented Jan 6, 2021

dongjoon-hyun left a comment

dongjoon-hyun commented Jan 6, 2021

SparkQA commented Jan 6, 2021

gaborgsomogyi left a comment

gaborgsomogyi commented Jan 6, 2021

[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression #31056

[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression #31056

Conversation

HeartSaVioR commented Jan 6, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HeartSaVioR commented Jan 6, 2021 • edited Loading

HeartSaVioR commented Jan 6, 2021

HeartSaVioR commented Jan 6, 2021

HeartSaVioR commented Jan 6, 2021 • edited Loading

viirya left a comment

Choose a reason for hiding this comment

SparkQA commented Jan 6, 2021

SparkQA commented Jan 6, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jan 6, 2021

SparkQA commented Jan 6, 2021

gaborgsomogyi left a comment

Choose a reason for hiding this comment

gaborgsomogyi commented Jan 6, 2021

HeartSaVioR commented Jan 6, 2021 •

edited

Loading

HeartSaVioR commented Jan 6, 2021 •

edited

Loading