-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-26957][SCHEDULER] Config properties for spark dynamic scheduler pools #23865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merge in spark
| val DEFAULT_SCHEDULING_MODE = SchedulingMode.FIFO | ||
| val DEFAULT_MINIMUM_SHARE = 0 | ||
| val DEFAULT_WEIGHT = 1 | ||
| val defaultSchedulingMode = SchedulingMode.withName(conf.get(SCHEDULER_DEFAULT_SCHEDULING_MODE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks like good.
If user open the spark fair scheduler,the default pool always use fifo mode.
|
I don't know much about this part, but if you want a non-default pool, wouldn't you configure your own pool? should the defaults be configurable rather than just a default? |
|
The only way to configure your own pools is in an XML file loaded on startup. In our case we'd like to have the system be more dynamic, rather than having to hardcode them in config files. Spark already supports creating pools dynamically like this, but doesn't support configuring them with anything but the default parameters. I cover the specific details on why we need this in the Jira ticket. Basically, we want a low priority pool and a set of dynamic pools for different projects that run at a higher priority. There is no way to do that with the current setup. |
|
Can one of the admins verify this patch? |
|
Is there a way to move this forward? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This fix addresses the issue raised in SPARK-26957. If a spark.scheduler.pool is set to a name that was not a preconfigured pool, the pool was always created with the default properties. This fix allows the default scheduler pool properties to be configured through configuration parameters. Previously they were hardcoded.
The fix is fully backwards compatible because the configuration properties default to the existing hardcoded values. This PR just allows those values to be updated. The specific use case where we needed this was explained in SPARK-26957
How was this patch tested?
A unit test was added to PoolSuite that sets the config values, generates a pool, and checks that the pool is using the configured values. An existing test verifies that with no special configuration settings, the original defaults are used.
In addition to this unit testing, we are currently using a version of this patch in production on AWS EMR for Spark 2.4.0.
This contribution is my original work and that I license the work to the project under the project’s open source license.