[SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value. #30139

AngersZhuuuu · 2020-10-23T07:57:49Z

What changes were proposed in this pull request?

Followup from #27831 , origin author @chrysan.

Each request it will check chunksBeingTransferred

public long chunksBeingTransferred() {
    long sum = 0L;
    for (StreamState streamState: streams.values()) {
      sum += streamState.chunksBeingTransferred.get();
    }
    return sum;
  }

such as

long chunksBeingTransferred = streamManager.chunksBeingTransferred();
    if (chunksBeingTransferred >= maxChunksBeingTransferred) {
      logger.warn("The number of chunks being transferred {} is above {}, close the connection.",
        chunksBeingTransferred, maxChunksBeingTransferred);
      channel.close();
      return;
    }

It will traverse streams repeatedly and we know that fetch data chunk will access stream too, there cause two problem:

repeated traverse streams, the longer the length, the longer the time
lock race in ConcurrentHashMap streams

In this PR, when maxChunksBeingTransferred use default value, we avoid compute chunksBeingTransferred since we don't care about this. If user want to set this configuration and meet performance problem, you can also backport PR #27831

Why are the changes needed?

Speed up getting chunksBeingTransferred and avoid lock race in object streams

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existed UT

AngersZhuuuu · 2020-10-23T08:09:04Z

ping @wangyum @dongjoon-hyun @HeartSaVioR @jiangxb1987 @Ngone51
With patch duration grows linearly in streams's length. The results seem to be in line with our expectations.

wangyum · 2020-10-23T08:27:07Z

Thank you @AngersZhuuuu' number. We have been running this change for more than half a year.

SparkQA · 2020-10-23T08:56:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34799/

SparkQA · 2020-10-23T09:18:47Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34799/

SparkQA · 2020-10-23T10:20:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34800/

SparkQA · 2020-10-23T10:30:39Z

Test build #130198 has finished for PR 30139 at commit 6857e4e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-23T10:49:04Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34800/

SparkQA · 2020-10-23T12:13:05Z

Test build #130199 has finished for PR 30139 at commit 72e7b87.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm · 2020-10-24T08:33:00Z

+CC @otterc

dongjoon-hyun · 2020-10-25T03:28:19Z

Could you add an empty commit whose authorship is the original author, @AngersZhuuuu . If then, Apache Spark merge script can give both of you the authorship properly.

AngersZhuuuu · 2020-10-25T07:54:31Z

Could you add an empty commit whose authorship is the original author, @AngersZhuuuu . If then, Apache Spark merge script can give both of you the authorship properly.

Done this. thanks for mention this point.

SparkQA · 2020-10-25T08:32:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34841/

SparkQA · 2020-10-25T08:53:26Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34841/

SparkQA · 2020-10-25T10:10:52Z

Test build #130241 has finished for PR 30139 at commit 0654684.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mridulm

I am concerned that the totalChunksBeingTransferred can diverge over time from state of streams when there are concurrent updates. Either updates to both should be within the same critical section, or we should carefully ensure there is no potential for divergence. Can you relook and ensure this is not the case ?

For example, can there be concurrent execution between chunkBeingSent, chunkSent and connectionTerminated ? If yes, can we ensure the state remains consistent.

AngersZhuuuu · 2020-10-26T02:08:28Z

I am concerned that the totalChunksBeingTransferred can diverge over time from state of streams when there are concurrent updates. Either updates to both should be within the same critical section, or we should carefully ensure there is no potential for divergence. Can you relook and ensure this is not the case ?

For example, can there be concurrent execution between chunkBeingSent, chunkSent and connectionTerminated ? If yes, can we ensure the state remains consistent.

In origin code, we just use a concurrent hash map(streams) and also there no strong locking mechanism. And chunksBeingTransferred is only used to check with config.

    long chunksBeingTransferred = streamManager.chunksBeingTransferred();
    if (chunksBeingTransferred >= maxChunksBeingTransferred) {
      logger.warn("The number of chunks being transferred {} is above {}, close the connection.",
        chunksBeingTransferred, maxChunksBeingTransferred);
      channel.close();
      return;
    }

IMO, we can add a lock to keep strong consistence of value totalChunksBeingTransferred, such as

And I have run the test in desc, this change has little effect on performance. WDYT?

jiangxb1987 · 2020-10-27T05:38:07Z

Yes we should ensure the streamState and the totalChunksBeingTransfered are updated synchronically. Other than that the PR looks good!

jiangxb1987 · 2020-10-27T05:40:11Z

common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java


  private final AtomicLong nextStreamId;
  private final ConcurrentHashMap<Long, StreamState> streams;
+  private final AtomicLong totalChunksBeingTransferred = new AtomicLong(0);


nit: maybe rename to numChunksBeingTransferred? Because it's not accumulating all the chunks that are transferred in history.

nit: maybe rename to numChunksBeingTransferred? Because it's not accumulating all the chunks that are transferred in history.

Updated

otterc · 2020-10-27T05:53:19Z

IMO, we can add a lock to keep strong consistence of value totalChunksBeingTransferred, such as

And I have run the test in desc, this change has little effect on performance. WDYT?

This should have a considerable impact on the performance when there are multiple open streams because updates of different streams would lock on a single object totalChunksBeingTransferred. Isn't that the case?

AngersZhuuuu · 2020-10-27T06:01:20Z

This should have a considerable impact on the performance when there are multiple open streams because updates of different streams would lock on a single object totalChunksBeingTransferred. Isn't that the case?

Since totalChunksBeingTransfereed is atomic, when it update It has its own competition, add a lock at totalChunksBeingTransfereed won't have too much impact. And we can keep strong consistences between the streamState and the totalChunksBeingTransfered are updated synchronically. Also run the test above, can't see any apparent influence.

otterc · 2020-10-27T06:10:03Z

This should have a considerable impact on the performance when there are multiple open streams because updates of different streams would lock on a single object totalChunksBeingTransferred. Isn't that the case?

Since totalChunksBeingTransfereed is atomic, when it update It has its own competition, add a lock at totalChunksBeingTransfereed won't have too much impact. And we can keep strong consistences between the streamState and the totalChunksBeingTransfered are updated synchronically.

Hmmm. Every update to chunkSent and chunkBeingSent will compete for the lock on object totalChunksBeingTransferred if we add the synchronize(totalChunksBeingTransferred). This would increase the time for these operations. This would mean that to speed up chunksBeingTransferred, we are increasing the time of updates to streamState.

AngersZhuuuu · 2020-10-27T06:20:25Z

Yes we should ensure the streamState and the totalChunksBeingTransfered are updated synchronically. Other than that the PR looks good!

How about current change? Don't use AtomicLong but use synchronize to keep strong consistency. And the test result is

OneForOneStreamManager fetch data duration test:
Stream Size          Max           Min           Avg
10000           1796           187           497.5
50000           4214           1295           2267.9
100000           10635           3643           5800.3

Process finished with exit code 0

AngersZhuuuu · 2020-10-27T06:27:45Z

Hmmm. Every update to chunkSent and chunkBeingSent will compete for the lock on object totalChunksBeingTransferred if we add the synchronize(totalChunksBeingTransferred). This would increase the time for these operations. This would mean that to speed up chunksBeingTransferred, we are increasing the time of updates to streamState.

I know that, but as @jiangxb1987 @mridulm mentioned, we need to ensure the streamState and the totalChunksBeingTransfered are updated synchronically. Add this lock is a strong guarantee. The execution process in the middle of the lock is very fast, so the impact is not really significant. Also, It's a huge leap in performance compared to what it was before

AngersZhuuuu · 2020-10-27T06:35:50Z

Hmmm. Every update to chunkSent and chunkBeingSent will compete for the lock on object totalChunksBeingTransferred

We reduce many race condition on streams and just add a very quick lock on. numChunksBeingTransferred

HeartSaVioR · 2020-10-27T06:41:07Z

Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation and assign to the field, which is used as a lock. Your best try would be changing it to long (to avoid box/unbox), and have a separate Object field as locking purpose.

AngersZhuuuu · 2020-10-27T06:43:27Z

Sorry but your latest change doesn't actually lock properly. Long is immutable, and you always replace the object when you do the calculation and assign to the field, which is used as a lock. Your best try would be changing it to long, and have a separate Object field as locking purpose.

a big mistake, thanks for your suggestion.

SparkQA · 2020-11-09T11:09:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35388/

SparkQA · 2020-11-09T11:31:21Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35388/

SparkQA · 2020-11-09T12:59:51Z

Test build #130779 has finished for PR 30139 at commit b07b999.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2020-11-13T06:38:39Z

I see this is now only making exception on not calculating the chunksBeingTransferred on default value of config. This is far from the original approach, so could you please update PR title and description based on the new change?

IMHO we no longer fixes SPARK-31069 via this PR, but if there's no good idea to address the root issue, it'd be OK to keep this to be associated with SPARK-31069. If someone encounters with the issue setting the config as non-default, the issue would be filed again and we can work on the new JIRA issue.

AngersZhuuuu · 2020-11-13T14:08:16Z

I see this is now only making exception on not calculating the chunksBeingTransferred on default value of config. This is far from the original approach, so could you please update PR title and description based on the new change?

IMHO we no longer fixes SPARK-31069 via this PR, but if there's no good idea to address the root issue, it'd be OK to keep this to be associated with SPARK-31069. If someone encounters with the issue setting the config as non-default, the issue would be filed again and we can work on the new JIRA issue.

Yea, updated and added guidelines for users who need to use the origin PR #27831

mridulm · 2020-11-13T17:25:24Z

@AngersZhuuuu To clarify, the updated behavior of the PR is that by default we dont incur the cost of computing chunksBeingTransferred as default values disable maxChunksBeingTransferred. But if maxChunksBeingTransferred is configured, the performance characteristics is same as earlier.

I am fine with the change, ideally we should fix the behavior even when maxChunksBeingTransferred is configured, but that can be a followup work.

AngersZhuuuu · 2020-11-13T23:56:37Z

I am fine with the change, ideally we should fix the behavior even when maxChunksBeingTransferred is configured, but that can be a followup work.

Yea, need to think about how to guarantee performance and consistent execution with minimal overhead as followup work. Current change will solve most users' problems.

mridulm · 2020-11-14T00:50:31Z

+CC @srowen, @jiangxb1987, @cloud-fan
I will leave it around till monday in case others want to comment, and then merge.

mridulm · 2020-11-14T00:52:09Z

+CC @otterc, @wangyum, @HeartSaVioR, @Ngone51 who also reviewed this change.

HeartSaVioR

+1 Looks OK to me.

It'd be ideal if we could find the good way we all agree to resolve the origin problem, but it doesn't look like that easy.

otterc

Other than the nits, looks good to me.

otterc · 2020-11-14T02:25:31Z

...n/network-common/src/main/java/org/apache/spark/network/server/ChunkFetchRequestHandler.java

+      long chunksBeingTransferred = streamManager.chunksBeingTransferred();
+      if (chunksBeingTransferred >= maxChunksBeingTransferred) {
+        logger.warn("The number of chunks being transferred {} is above {}, close the connection.",
+            chunksBeingTransferred, maxChunksBeingTransferred);


Nit: indentation should be 2

Nit: indentation should be 2

Done

otterc · 2020-11-14T02:25:59Z

...on/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java

+      long chunksBeingTransferred = streamManager.chunksBeingTransferred();
+      if (chunksBeingTransferred >= maxChunksBeingTransferred) {
+        logger.warn("The number of chunks being transferred {} is above {}, close the connection.",
+            chunksBeingTransferred, maxChunksBeingTransferred);


Nit: indentation should be 2 spaces

Nit: indentation should be 2 spaces

Done

SparkQA · 2020-11-14T03:38:55Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35690/

SparkQA · 2020-11-14T04:06:05Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35690/

SparkQA · 2020-11-14T05:20:49Z

Test build #131087 has finished for PR 30139 at commit fab5557.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AngersZhuuuu · 2020-11-18T02:15:24Z

Any update?

mridulm · 2020-11-18T02:41:28Z

+CC @srowen, @HyukjinKwon I merged the pr via ./dev/merge_spark_pr.py - but it failed with following [1]
When I look at apache repo, the commit has made it through : https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f8b95dddc1571194fd728d7e0c6de495895da99e

I had run a git fetch -all and git pull before merging. A subsequent pull showed no conflicts with local [2].

Any idea why there is a disconnect between github and apache repo ? Thanks.

[1]

Merge complete (local ref PR_TOOL_MERGE_PR_30139_MASTER). Push to apache? (y/n): y
git push apache PR_TOOL_MERGE_PR_30139_MASTER:master
Username for 'https://git-wip-us.apache.org': mridulm80
Password for 'https://mridulm80@git-wip-us.apache.org': 
Enumerating objects: 26, done.
Counting objects: 100% (26/26), done.
Delta compression using up to 8 threads
Compressing objects: 100% (9/9), done.
Writing objects: 100% (14/14), 4.72 KiB | 4.72 MiB/s, done.
Total 14 (delta 5), reused 0 (delta 0)
remote: To git@github:apache/spark.git
remote:  ! [rejected]        f8b95dddc1571194fd728d7e0c6de495895da99e -> master (fetch first)
remote: error: failed to push some refs to 'git@github:apache/spark.git'
remote: hint: Updates were rejected because the remote contains work that you do
remote: hint: not have locally. This is usually caused by another repository pushing
remote: hint: to the same ref. You may want to first integrate the remote changes
remote: hint: (e.g., 'git pull ...') before pushing again.
remote: hint: See the 'Note about fast-forwards' in 'git push --help' for details.
remote: Syncing refs/heads/master...
remote: Could not sync with GitHub: 
remote: Sending notification emails to: [u'"commits@spark.apache.org" <commits@spark.apache.org>']
remote: Error running hook: /x1/gitbox/hooks/post-receive.d/01-sync-repo.py
To https://git-wip-us.apache.org/repos/asf/spark.git
   5e8549973dc..f8b95dddc15  PR_TOOL_MERGE_PR_30139_MASTER -> master
git rev-parse PR_TOOL_MERGE_PR_30139_MASTER
Restoring head pointer to master
git checkout master
Switched to branch 'master'
git branch
Deleting local branch PR_TOOL_MERGE_PR_30139
git branch -D PR_TOOL_MERGE_PR_30139
Deleting local branch PR_TOOL_MERGE_PR_30139_MASTER
git branch -D PR_TOOL_MERGE_PR_30139_MASTER
Pull request #30139 merged!
Merge hash: f8b95ddd

[2]

$ git pull apache-github master
remote: Enumerating objects: 8, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 13 (delta 5), reused 6 (delta 5), pack-reused 5
Unpacking objects: 100% (13/13), 1.63 KiB | 104.00 KiB/s, done.
From github.com:apache/spark
 * branch                    master     -> FETCH_HEAD
 + f8b95dddc15...7f3d99a8a5b master     -> origin/master  (forced update)
Updating 5e8549973dc..7f3d99a8a5b
Fast-forward
 python/pyspark/sql/functions.py                              | 4 ++--
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

mridulm · 2020-11-18T02:55:45Z

Scratch that - was a concurrent merge issue at git, not seen this before :-)

mridulm · 2020-11-18T02:56:12Z

Thanks for fixing this @AngersZhuuuu !

Thanks for the reviews @jiangxb1987, @HeartSaVioR, @otterc, @srowen, @cloud-fan

HeartSaVioR · 2020-11-18T02:56:35Z

I became a committer after gitbox integration, and according to the guide, I guess we no longer need to set up ASF git as remote repo.

mridulm · 2020-11-18T02:58:40Z

@AngersZhuuuu Can you update the jira please ? I am not sure if I have added your id as assignee

mridulm · 2020-11-18T03:00:05Z

My repo config's are from ages back @HeartSaVioR :-)
I should perhaps update them ... good point.

AngersZhuuuu · 2020-11-18T03:02:19Z

@AngersZhuuuu Can you update the jira please ? I am not sure if I have added your id as assignee

assignee is me now, any other information need update?

mridulm · 2020-11-18T05:43:25Z

Thanks !

HyukjinKwon · 2020-11-19T03:53:04Z

Yeah .. I migrated to gitbox completely IIRC. I remember there was a similar syncing issue without the migration. Now committers have to set up according to https://infra.apache.org/apache-github.html if I am not wrong. FYI, I believe we now can use interchangeably gitbox and github. This is my remote FYI:

git remote -v

...
apache	https://github.com/apache/spark.git (fetch)
apache	https://github.com/apache/spark.git (push)
apache-github	https://github.com/apache/spark.git (fetch)
apache-github	https://github.com/apache/spark.git (push)
...
origin	https://github.com/HyukjinKwon/spark.git (fetch)
origin	https://github.com/HyukjinKwon/spark.git (push)
...
upstream	https://github.com/apache/spark.git (fetch)
upstream	https://github.com/apache/spark.git (push)
...

commit

6857e4e

Update OneForOneStreamManager.java

72e7b87

Set original author

0654684

mridulm reviewed Oct 25, 2020

View reviewed changes

jiangxb1987 reviewed Oct 27, 2020

View reviewed changes

Update OneForOneStreamManager.java

a32d6f3

Update OneForOneStreamManager.java

b07b999

mridulm approved these changes Nov 14, 2020

View reviewed changes

HeartSaVioR mentioned this pull request Nov 14, 2020

[SPARK-33259][SS] Disable streaming query with possible correctness issue by default #30210

Closed

HeartSaVioR approved these changes Nov 14, 2020

View reviewed changes

otterc reviewed Nov 14, 2020

View reviewed changes

follow comment

fab5557

asfgit closed this in dd32f45 Nov 18, 2020

[SPARK-31069][CORE] Avoid repeat compute chunksBeingTransferred cause hight cpu cost in external shuffle service when maxChunksBeingTransferred use default value. #30139

[SPARK-31069][CORE] Avoid repeat compute chunksBeingTransferred cause hight cpu cost in external shuffle service when maxChunksBeingTransferred use default value. #30139

Uh oh!

Conversation

AngersZhuuuu commented Oct 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AngersZhuuuu commented Oct 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyum commented Oct 23, 2020

Uh oh!

SparkQA commented Oct 23, 2020

Uh oh!

SparkQA commented Oct 23, 2020

Uh oh!

SparkQA commented Oct 23, 2020

Uh oh!

SparkQA commented Oct 23, 2020

Uh oh!

SparkQA commented Oct 23, 2020

Uh oh!

SparkQA commented Oct 23, 2020

Uh oh!

mridulm commented Oct 24, 2020

Uh oh!

dongjoon-hyun commented Oct 25, 2020

Uh oh!

AngersZhuuuu commented Oct 25, 2020

Uh oh!

SparkQA commented Oct 25, 2020

Uh oh!

SparkQA commented Oct 25, 2020

Uh oh!

SparkQA commented Oct 25, 2020

Uh oh!

mridulm left a comment

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu commented Oct 26, 2020

Uh oh!

jiangxb1987 commented Oct 27, 2020

Uh oh!

jiangxb1987 Oct 27, 2020

Choose a reason for hiding this comment

Uh oh!

AngersZhuuuu Oct 27, 2020

Choose a reason for hiding this comment

Uh oh!

otterc commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngersZhuuuu commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

otterc commented Oct 27, 2020

Uh oh!

AngersZhuuuu commented Oct 27, 2020

Uh oh!

AngersZhuuuu commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngersZhuuuu commented Oct 27, 2020

Uh oh!

HeartSaVioR commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngersZhuuuu commented Oct 27, 2020

Uh oh!

SparkQA commented Nov 9, 2020

Uh oh!

SparkQA commented Nov 9, 2020

Uh oh!

SparkQA commented Nov 9, 2020

Uh oh!

HeartSaVioR commented Nov 13, 2020

[SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value. #30139

[SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value. #30139

AngersZhuuuu commented Oct 23, 2020 •

edited

Loading

AngersZhuuuu commented Oct 23, 2020 •

edited

Loading

otterc commented Oct 27, 2020 •

edited

Loading

AngersZhuuuu commented Oct 27, 2020 •

edited

Loading

AngersZhuuuu commented Oct 27, 2020 •

edited

Loading

HeartSaVioR commented Oct 27, 2020 •

edited

Loading

mridulm commented Nov 18, 2020 •

edited

Loading