Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start #38901

Closed
wants to merge 4 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Dec 4, 2022

What changes were proposed in this pull request?

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match org.apache.spark.network.client.TransportClientFactory.

Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match org.apache.spark.network.client.TransportClientFactory

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual testing.

@github-actions github-actions bot added the CORE label Dec 4, 2022
@pan3793
Copy link
Member Author

pan3793 commented Dec 4, 2022

cc @Ngone51 @mridulm

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pan3793
Copy link
Member Author

pan3793 commented Dec 5, 2022

@srowen code and comment have been updated.

@srowen
Copy link
Member

srowen commented Dec 5, 2022

It's a bigger change, maybe someone with more understanding of the code can evaluate this

@pan3793 pan3793 changed the title [SPARK-41376][CORE] Executor netty direct memory check should respect spark.shuffle.io.preferDirectBufs [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start Dec 5, 2022
@mridulm
Copy link
Contributor

mridulm commented Dec 5, 2022

+CC @Ngone51 (who last updated this) and @cloud-fan (who merged the commit).

val preferDirectBufsForSharedByteBufAllocators =
env.conf.getBoolean("spark.network.io.preferDirectBufs", true)
val preferDirectBufs =
env.conf.getBoolean("spark.shuffle.io.preferDirectBufs", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see where is this conf accessed? I didn't find it used anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a combination of ("spark", module, "io.preferDirectBufs"), for shuffle client, the module is "shuffle", see org.apache.spark.network.util.TransportConf

@@ -85,7 +85,19 @@ private[spark] class CoarseGrainedExecutorBackend(

logInfo("Connecting to driver: " + driverUrl)
try {
if (PlatformDependent.directBufferPreferred() &&
// The following logic originally comes from the constructor of
Copy link
Member

@Ngone51 Ngone51 Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add this logic into a NettyUitls function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid not, NettyUtils located at network-common, it can not access SparkConf which in core

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about NettyUtils just accept boolean inputs, not config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, @Ngone51 would you please take a look again?

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK; can we or should we share this logic elsewhere? can it be called at other sites or is this the only place this check happens?

@pan3793
Copy link
Member Author

pan3793 commented Dec 7, 2022

is this the only place this check happens?

I think yes.

The following logic looks similar but not same, I think the existing code is clear enough.

if (conf.sharedByteBufAllocators()) {
this.pooledAllocator = NettyUtils.getSharedPooledByteBufAllocator(
conf.preferDirectBufsForSharedByteBufAllocators(), true /* allowCache */);
} else {
this.pooledAllocator = NettyUtils.createPooledByteBufAllocator(
conf.preferDirectBufs(), true /* allowCache */, conf.serverThreads());
}

if (conf.sharedByteBufAllocators()) {
this.pooledAllocator = NettyUtils.getSharedPooledByteBufAllocator(
conf.preferDirectBufsForSharedByteBufAllocators(), false /* allowCache */);
} else {
this.pooledAllocator = NettyUtils.createPooledByteBufAllocator(
conf.preferDirectBufs(), false /* allowCache */, conf.clientThreads());
}

@srowen
Copy link
Member

srowen commented Dec 8, 2022

Merged to master

@srowen srowen closed this in e6be300 Dec 8, 2022
@dongjoon-hyun
Copy link
Member

Thank you all.
If this is caused by SPARK-27991 at Spark 3.2.0, do we need to backport to branch-3.3 and 3.2?

@pan3793
Copy link
Member Author

pan3793 commented Dec 8, 2022

Thanks @dongjoon-hyun for checking, I think you are right, it should be ported to 3.2/3.3

@dongjoon-hyun
Copy link
Member

Could you make a backporting PR, @pan3793 ?

pan3793 added a commit to pan3793/spark that referenced this pull request Dec 8, 2022
… executor start

### What changes were proposed in this pull request?

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match `org.apache.spark.network.client.TransportClientFactory`.

### Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match `org.apache.spark.network.client.TransportClientFactory`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes apache#38901 from pan3793/SPARK-41376.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
pan3793 added a commit to pan3793/spark that referenced this pull request Dec 8, 2022
… executor start

### What changes were proposed in this pull request?

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match `org.apache.spark.network.client.TransportClientFactory`.

### Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match `org.apache.spark.network.client.TransportClientFactory`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes apache#38901 from pan3793/SPARK-41376.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
@pan3793
Copy link
Member Author

pan3793 commented Dec 8, 2022

@dongjoon-hyun I opened #38981 (for branch-3.3) and #38982 (for branch-3.2), please take a look.

dongjoon-hyun pushed a commit that referenced this pull request Dec 8, 2022
…ic on executor start

### What changes were proposed in this pull request?

Backport #38901 to branch-3.3.

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match `org.apache.spark.network.client.TransportClientFactory`.

### Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match `org.apache.spark.network.client.TransportClientFactory`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes #38981 from pan3793/SPARK-41376-3.3.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Dec 9, 2022
…ic on executor start

### What changes were proposed in this pull request?

Backport #38901 to branch-3.2.

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match `org.apache.spark.network.client.TransportClientFactory`.

### Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match `org.apache.spark.network.client.TransportClientFactory`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes #38982 from pan3793/SPARK-41376-3.2.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
… executor start

### What changes were proposed in this pull request?

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match `org.apache.spark.network.client.TransportClientFactory`.

### Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match `org.apache.spark.network.client.TransportClientFactory`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes apache#38901 from pan3793/SPARK-41376.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…ic on executor start

### What changes were proposed in this pull request?

Backport apache#38901 to branch-3.2.

Fix the condition for judging Netty prefer direct memory on executor start, the logic should match `org.apache.spark.network.client.TransportClientFactory`.

### Why are the changes needed?

The check logical was added in SPARK-27991, the original intention is to avoid potential Netty OOM issue when Netty uses direct memory to consume shuffle data, but the condition is not sufficient, this PR updates the logic to match `org.apache.spark.network.client.TransportClientFactory`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes apache#38982 from pan3793/SPARK-41376-3.2.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants