[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend #18209

barnardb · 2017-06-06T02:44:33Z

What changes were proposed in this pull request?

Adds support for Nomad as a scheduler backend. Nomad is a cluster manager designed for both long lived services and short lived batch processing workloads.

The integration supports client and cluster mode, dynamic allocation (increasing only), has basic support for python and R applications, and works with applications packaged either as JARs or as docker images.

Documentation is in docs/running-on-nomad.md.

This will be presented at Spark Summit 2017.

A build of the pull request with Nomad support is at available here.

Feedback would be much appreciated.

How was this patch tested?

This patch was tested with Integration and manual tests, and a load test was performed to ensure it doesn't have worse performance than the YARN integration.

The feature was developed and tested against Nomad 0.5.6 (current stable version)
on Spark 2.1.0, rebased to 2.1.1 and retested, and finally rebased to master and retested.

srowen · 2017-06-06T09:17:09Z

I don't believe we'd accept a change this big into the core project. This isn't a widely used scheduler AFAICT (I had never heard of it). If possible, it can begin life as an external add-on.

SparkQA · 2017-06-06T14:53:16Z

Test build #3782 has finished for PR 18209 at commit c762194.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
| --class CLASS_NAME Name of your application's main class (required)
sealed trait PrimaryResource
case class PrimaryJar(url: String) extends PrimaryResource
case class PrimaryPythonFile(url: String) extends PrimaryResource
case class PrimaryRFile(url: String) extends PrimaryResource
case Some(_) => usageException(\"--class cannot be specified multiple times\")
case class Parameters(command: ApplicationRunCommand, nomadUrl: Option[HttpHost])
sealed trait JobDescriptor
case class ExistingJob(
case class NewJob(job: Job) extends JobDescriptor
case class KeyPair(certFile: String, keyFile: String)
protected class NomadDriverEndpoint(rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])
case class CommonConf(
case class ConfigurablePort(label: String)

FRosner · 2017-06-08T17:11:03Z

I very much appreciate this effort. We are using Nomad since one year now and it has proven to be a very well designed and robust cluster manager. Currently we use Nomad to deploy a Standalone Spark cluster which then blocks a lot of resources on Nomad just to be ready for jobs to be submitted.

If we were able to submit jobs directly to Nomad this could heavily improve our resource utilization.

Thanks a lot for the contribution @barnardb we will definitely look into it and give it a try.

rcgenova · 2017-06-08T20:07:51Z

Hey @srowen. Nomad is an unusual product for HashiCorp in that we have had tremendous interest from the top of the funnel at an early stage (due in part to the success of our other products). This PR is the result of a first class, internal initiative that was driven by a number of top tier customers. The majority of the changes are unit tests and documentation. Most of the complexity is isolated from the Spark code base. We fully expect and have planned for the onus of maintenance to be on us. We are committed to the integration and will dedicate resources to resolve issues quickly as they arise.

srowen · 2017-06-08T20:19:10Z

The code should also just live outside Spark if that's true. If there are SPIs that need to be established to make that happen then that is what should be proposed. I know this has been rejected in the past though.

FRosner · 2017-06-08T21:12:54Z

docs/job-scheduling.md

-[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes).
+[standalone mode](spark-standalone.html), [YARN mode](running-on-yarn.html),
+[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes), and
+[Nomad mode](running-on-nomade.html#dynamic-allocation-of-executors).


It should be running-on-nomad.html, shouldn't it?

timperrett · 2017-06-08T21:48:00Z

Heartily agree with @FRosner here. We also have been running Nomad for quite some time and have been running this patched Spark distribution for a while and it has indeed been working really nicely.

Huge thanks to @barnardb for the hard work and effort that went into making this a reality.

jrasell · 2017-06-09T07:30:08Z

+1 to this. Nomad is our chosen scheduler and adding Nomad Spark support would be a huge benefit, adding additional flexibility to the great Spark project. Thanks to @barnardb for the great work on this.

srowen · 2017-06-09T07:37:35Z

All: it's not relevant whether Nomad is a good piece of software or not, or whether you use it. I do not believe it makes sense to make the project support this code. Similar things have been rejected in the past. Please focus on what SPIs might enable you to support this externally.

acaiafa · 2017-06-09T13:43:53Z

+1 these efforts.

jerrypeng · 2017-06-09T14:02:25Z

Nomad is great scheduler, solves a lot of scaling issues with schedulers that most schedulers out there today cannot. I have been around the open source community a bit and I can say that if support of something isn't directly merged into a project and its part of some external add-on, supporting it will be a nightmare and support will probably eventually disappear since there is most likely not going to be a full time person monitoring PRs to see if a change will break support for this "add-on". Given the code is of quality, I am in favor of merging this PR into spark.

chicagobuss · 2017-06-09T14:02:46Z

I don't see why spark should have native mesos support but not native nomad support - it doesn't make sense.

We heavily rely on nomad for all ephemeral workload scheduling and if this doesn't go into the trunk people will always have to follow a cumbersome patching (or just a "don't run the real thing, run this instead") process.

I think it's worth noting why we can't just run Mesos - when we run a spark workload in the cloud, we provision tens of thousands of VMs and immediately want to start using them for a short burst of work. When we're done, it all goes away. Mesos does not handle this sort of extreme elasticity very well, but nomad excels at it. We've benchmarked nomad at scheduling nearly 2,500 containers per second which is crazy fast compared to both mesos and kubernetes.

It's a win for everyone when there's more choice - especially since I'm sure hashicorp will support the ongoing efforts to support and maintain this integration.

Thanks for the hard work on this, @barnardb. We tested it thoroughly and found it great for our needs. Would love to see this pushed through.

stew · 2017-06-09T20:28:17Z

thanks very much @barnardb ! We've been testing this patch extensively and it seems to be working great for us, it solves the problem much better than we had solved it ourselves waiting for a patch like this. I'd love to see this get support!

rxin · 2017-06-10T01:46:44Z

The next one to add is probably Kubernetes. Even the Spark on Kubernetes is going through this cycle of maintaining a separate project for it first.

Adding a new scheduler has huge overhead. It is not the initial development that's the issue, but the following testing, releases, etc. In the past, it is almost always bugs in scheduler integration that's blocking a Spark release.

tejasapatil · 2017-06-10T03:10:33Z

Given that at Facebook we use our own in-house scheduler, I see why people would want to see their scheduler impls added right in Spark codebase as a first class citizen. Like @srowen said, this should not be part of Spark codebase. There would be many schedulers that would be developed in future and if we keep adding those to Spark, it will be lot of maintenance burden. Mesos and YARN have their place in the codebase given that they were there in early days of the project and are widely used in deployments (not saying that this is good or bad). This PR is making things worse for future by taking the easier route of adding yet another if ..else or switch case for a new scheduler impl. One can blame that this should have been cleanly right in the start for mesos / YARN but to give benefit of doubt to original authors, they didn't expect new schedulers to come up in future.

I would recommend to add interfaces in Spark code to abstract out things so that plugging in new schedulers would be as simple as adding a jar to the classpath. As a matter of fact, we are currently able to use Spark with our own scheduler by extending the existing interfaces. The only patching we have to do is 3-4 lines which is not super pain during upgrades. Seems like you need more than the current interfaces.

Having said this, I do appreciate the efforts put in this PR.

manojlds · 2017-06-21T18:21:33Z

@rxin is there any timeline / plan around Kubernetes? Where can I follow that activity? Thanks!

ash211 · 2017-06-22T07:49:05Z

@manojlds I'm a part of the Spark-on-k8s team that's currently building k8s integration for Spark outside of the Apache Spark repo. You can follow our work at https://github.com/apache-spark-on-k8s/spark

Here's the current outstanding diff on top of apache/branch-2.1: apache-spark-on-k8s#200 , produced from about 6months of work in a varied group from approaching a dozen companies at this point. We aim to eventually bring this into the Apache repo in some form, though the timeline for that is not yet fixed.

You can follow from the original design and discussion for the kubernetes efforts at https://issues.apache.org/jira/browse/SPARK-18278

One of the major pieces of feedback we received back in November was that, if possible, the kubernetes integration should be done as a plugin to the Apache Spark repo. Unfortunately that plugin point does not yet exist in Spark. So we filed https://issues.apache.org/jira/browse/SPARK-19700 to begin the process of creating that API but have not yet devoted effort to building that plugin point. Once active development on the k8s integration begin to slow down and near completion (starting to see signs of this slowing this month) we'll probably shift focus to the plugin point as a means of gaining even wider use of the k8s integration. On what a plugin point could possibly look like, we produced a thought experiment at palantir#81 which you may find interesting.

As of now, it remains unclear whether the k8s code will be brought in as a first-class citizen alongside Mesos and YARN, or whether Spark will gain a plugin point and the k8s integration.

We'd be happy to be a part of any wider efforts to create a plugin point for custom resource managers at SPARK-19700. Is that something the Nomad team would be interested in contributing to?

rcgenova · 2017-06-23T18:15:17Z

@rxin We definitely understand the overhead involved when the integration surface area grows too large. All of the HashiCorp tools use a plugin strategy to minimize it. @ash211 We'd love to collaborate on or help validate any plugin designs.

rcgenova · 2017-07-27T13:48:54Z

The Apache Spark fork enhanced to support HashiCorp Nomad as a scheduler is now located at: https://github.com/hashicorp/nomad-spark. If you are interested in trying it out, the best place to get started is here: https://www.nomadproject.io/guides/spark/spark.html.

OneCricketeer · 2020-02-14T17:18:37Z

@rcgenova @barnardb Where does that fork stand in terms of support or feature readiness?

If possible, it can begin life as an external add-on. - @srowen

Is there a document how "add-on" are maintained? My first thought is that there is https://spark-packages.org/ , but Nomad as a scheduler seems to require rebuilding all of Spark, for some reason, not just a "Drop-in" / "Add On" package...

OneCricketeer · 2020-02-14T17:25:33Z

I would recommend to add interfaces in Spark code to abstract out things so that plugging in new schedulers would be as simple as adding a jar to the classpath. As a matter of fact, we are currently able to use Spark with our own scheduler by extending the existing interfaces. The only patching we have to do is 3-4 lines which is not super pain during upgrades. Seems like you need more than the current interfaces. - @tejasapatil

Do these such interfaces exist in Spark at present day? What efforts do you know of to make support for such efforts to "be as simple as adding a jar to the classpath"?

I see https://issues.apache.org/jira/browse/SPARK-19700 is still open...

it remains unclear whether the k8s code will be brought in as a first-class citizen alongside Mesos and YARN

Well, looks like it's first class now. Is there historical context one can read on how that decision was ultimately made instead of working on the plugin interface, as discussed?

OneCricketeer · 2020-02-27T06:47:05Z

Follow up question, what's the external folder for if not non-core packages?

Add support for Nomad as a scheduler backend

9979423

barnardb force-pushed the nomad branch from c762194 to 9979423 Compare June 6, 2017 18:36

Merge branch 'master' into nomad

6d1c8e7

FRosner reviewed Jun 8, 2017

View reviewed changes

barnardb closed this Sep 26, 2017

barnardb deleted the nomad branch September 26, 2017 16:19

OneCricketeer mentioned this pull request Feb 14, 2020

Spark with native support for Nomad? hashicorp/nomad-spark#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend #18209

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend #18209

barnardb commented Jun 6, 2017

srowen commented Jun 6, 2017

SparkQA commented Jun 6, 2017

FRosner commented Jun 8, 2017

rcgenova commented Jun 8, 2017

srowen commented Jun 8, 2017

FRosner Jun 8, 2017

timperrett commented Jun 8, 2017

jrasell commented Jun 9, 2017

srowen commented Jun 9, 2017

acaiafa commented Jun 9, 2017

jerrypeng commented Jun 9, 2017 •

edited

chicagobuss commented Jun 9, 2017 •

edited

stew commented Jun 9, 2017

rxin commented Jun 10, 2017 •

edited

tejasapatil commented Jun 10, 2017

manojlds commented Jun 21, 2017

ash211 commented Jun 22, 2017

rcgenova commented Jun 23, 2017

rcgenova commented Jul 27, 2017

OneCricketeer commented Feb 14, 2020

OneCricketeer commented Feb 14, 2020

OneCricketeer commented Feb 27, 2020

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend #18209

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend #18209

Conversation

barnardb commented Jun 6, 2017

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Jun 6, 2017

SparkQA commented Jun 6, 2017

FRosner commented Jun 8, 2017

rcgenova commented Jun 8, 2017

srowen commented Jun 8, 2017

FRosner Jun 8, 2017

Choose a reason for hiding this comment

timperrett commented Jun 8, 2017

jrasell commented Jun 9, 2017

srowen commented Jun 9, 2017

acaiafa commented Jun 9, 2017

jerrypeng commented Jun 9, 2017 • edited

chicagobuss commented Jun 9, 2017 • edited

stew commented Jun 9, 2017

rxin commented Jun 10, 2017 • edited

tejasapatil commented Jun 10, 2017

manojlds commented Jun 21, 2017

ash211 commented Jun 22, 2017

rcgenova commented Jun 23, 2017

rcgenova commented Jul 27, 2017

OneCricketeer commented Feb 14, 2020

OneCricketeer commented Feb 14, 2020

OneCricketeer commented Feb 27, 2020

jerrypeng commented Jun 9, 2017 •

edited

chicagobuss commented Jun 9, 2017 •

edited

rxin commented Jun 10, 2017 •

edited