Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48063][CORE] Enable spark.stage.ignoreDecommissionFetchFailure by default #46308

Closed
wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 30, 2024

What changes were proposed in this pull request?

This PR aims to enable spark.stage.ignoreDecommissionFetchFailure by default while keeping spark.scheduler.maxRetainedRemovedDecommissionExecutors=0 without any change for Apache Spark 4.0.0 in order to help a user use this feature more easily by setting only one configuration, spark.scheduler.maxRetainedRemovedDecommissionExecutors.

Why are the changes needed?

This feature was added at Apache Spark 3.4.0 via SPARK-40481 and SPARK-40979 and has been used for two years to support executor decommissioning features in the production.

Does this PR introduce any user-facing change?

No because spark.scheduler.maxRetainedRemovedDecommissionExecutors is still 0.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @huaxingao ?
Technically, this reduces one step to use this feature but the feature itself is still disabled because the executor cache size is 0 by default.

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @huaxingao !

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-48063 branch April 30, 2024 22:19
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…e` by default

### What changes were proposed in this pull request?

This PR aims to **enable `spark.stage.ignoreDecommissionFetchFailure` by default** while keeping `spark.scheduler.maxRetainedRemovedDecommissionExecutors=0` without any change for Apache Spark 4.0.0 in order to help a user use this feature more easily by setting only one configuration, `spark.scheduler.maxRetainedRemovedDecommissionExecutors`.

### Why are the changes needed?

This feature was added at Apache Spark 3.4.0 via SPARK-40481 and SPARK-40979 and has been used for two years to support executor decommissioning features in the production.
- apache#37924
- apache#38441

### Does this PR introduce _any_ user-facing change?

No because `spark.scheduler.maxRetainedRemovedDecommissionExecutors` is still `0`.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46308 from dongjoon-hyun/SPARK-48063.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants