Skip to content

Conversation

@DaveDeCaprio
Copy link
Contributor

What changes were proposed in this pull request?

This fix addresses the issue raised in SPARK-26957. If a spark.scheduler.pool is set to a name that was not a preconfigured pool, the pool was always created with the default properties. This fix allows the default scheduler pool properties to be configured through configuration parameters. Previously they were hardcoded.

The fix is fully backwards compatible because the configuration properties default to the existing hardcoded values. This PR just allows those values to be updated. The specific use case where we needed this was explained in SPARK-26957

How was this patch tested?

A unit test was added to PoolSuite that sets the config values, generates a pool, and checks that the pool is using the configured values. An existing test verifies that with no special configuration settings, the original defaults are used.

In addition to this unit testing, we are currently using a version of this patch in production on AWS EMR for Spark 2.4.0.

This contribution is my original work and that I license the work to the project under the project’s open source license.

val DEFAULT_SCHEDULING_MODE = SchedulingMode.FIFO
val DEFAULT_MINIMUM_SHARE = 0
val DEFAULT_WEIGHT = 1
val defaultSchedulingMode = SchedulingMode.withName(conf.get(SCHEDULER_DEFAULT_SCHEDULING_MODE))
Copy link
Contributor

@beliefer beliefer Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks like good.
If user open the spark fair scheduler,the default pool always use fifo mode.

@srowen
Copy link
Member

srowen commented Feb 22, 2019

I don't know much about this part, but if you want a non-default pool, wouldn't you configure your own pool? should the defaults be configurable rather than just a default?

@DaveDeCaprio
Copy link
Contributor Author

The only way to configure your own pools is in an XML file loaded on startup. In our case we'd like to have the system be more dynamic, rather than having to hardcode them in config files. Spark already supports creating pools dynamically like this, but doesn't support configuring them with anything but the default parameters.

I cover the specific details on why we need this in the Jira ticket. Basically, we want a low priority pool and a set of dynamic pools for different projects that run at a higher priority. There is no way to do that with the current setup.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@DaveDeCaprio
Copy link
Contributor Author

Is there a way to move this forward?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 27, 2020
@github-actions github-actions bot closed this Mar 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants