Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend #18209

Closed
wants to merge 2 commits into from

Conversation

barnardb
Copy link
Contributor

@barnardb barnardb commented Jun 6, 2017

What changes were proposed in this pull request?

Adds support for Nomad as a scheduler backend. Nomad is a cluster manager designed for both long lived services and short lived batch processing workloads.

The integration supports client and cluster mode, dynamic allocation (increasing only), has basic support for python and R applications, and works with applications packaged either as JARs or as docker images.

Documentation is in docs/running-on-nomad.md.

This will be presented at Spark Summit 2017.

A build of the pull request with Nomad support is at available here.

Feedback would be much appreciated.

How was this patch tested?

This patch was tested with Integration and manual tests, and a load test was performed to ensure it doesn't have worse performance than the YARN integration.

The feature was developed and tested against Nomad 0.5.6 (current stable version)
on Spark 2.1.0, rebased to 2.1.1 and retested, and finally rebased to master and retested.

@srowen
Copy link
Member

srowen commented Jun 6, 2017

I don't believe we'd accept a change this big into the core project. This isn't a widely used scheduler AFAICT (I had never heard of it). If possible, it can begin life as an external add-on.

@SparkQA
Copy link

SparkQA commented Jun 6, 2017

Test build #3782 has finished for PR 18209 at commit c762194.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • | --class CLASS_NAME Name of your application's main class (required)
  • sealed trait PrimaryResource
  • case class PrimaryJar(url: String) extends PrimaryResource
  • case class PrimaryPythonFile(url: String) extends PrimaryResource
  • case class PrimaryRFile(url: String) extends PrimaryResource
  • case Some(_) => usageException(\"--class cannot be specified multiple times\")
  • case class Parameters(command: ApplicationRunCommand, nomadUrl: Option[HttpHost])
  • sealed trait JobDescriptor
  • case class ExistingJob(
  • case class NewJob(job: Job) extends JobDescriptor
  • case class KeyPair(certFile: String, keyFile: String)
  • protected class NomadDriverEndpoint(rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)])
  • case class CommonConf(
  • case class ConfigurablePort(label: String)

@FRosner
Copy link
Contributor

FRosner commented Jun 8, 2017

I very much appreciate this effort. We are using Nomad since one year now and it has proven to be a very well designed and robust cluster manager. Currently we use Nomad to deploy a Standalone Spark cluster which then blocks a lot of resources on Nomad just to be ready for jobs to be submitted.

If we were able to submit jobs directly to Nomad this could heavily improve our resource utilization.

Thanks a lot for the contribution @barnardb we will definitely look into it and give it a try.

@rcgenova
Copy link

rcgenova commented Jun 8, 2017

Hey @srowen. Nomad is an unusual product for HashiCorp in that we have had tremendous interest from the top of the funnel at an early stage (due in part to the success of our other products). This PR is the result of a first class, internal initiative that was driven by a number of top tier customers. The majority of the changes are unit tests and documentation. Most of the complexity is isolated from the Spark code base. We fully expect and have planned for the onus of maintenance to be on us. We are committed to the integration and will dedicate resources to resolve issues quickly as they arise.

@srowen
Copy link
Member

srowen commented Jun 8, 2017

The code should also just live outside Spark if that's true. If there are SPIs that need to be established to make that happen then that is what should be proposed. I know this has been rejected in the past though.

[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes).
[standalone mode](spark-standalone.html), [YARN mode](running-on-yarn.html),
[Mesos coarse-grained mode](running-on-mesos.html#mesos-run-modes), and
[Nomad mode](running-on-nomade.html#dynamic-allocation-of-executors).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be running-on-nomad.html, shouldn't it?

@timperrett
Copy link

Heartily agree with @FRosner here. We also have been running Nomad for quite some time and have been running this patched Spark distribution for a while and it has indeed been working really nicely.

Huge thanks to @barnardb for the hard work and effort that went into making this a reality.

@jrasell
Copy link

jrasell commented Jun 9, 2017

+1 to this. Nomad is our chosen scheduler and adding Nomad Spark support would be a huge benefit, adding additional flexibility to the great Spark project. Thanks to @barnardb for the great work on this.

@srowen
Copy link
Member

srowen commented Jun 9, 2017

All: it's not relevant whether Nomad is a good piece of software or not, or whether you use it. I do not believe it makes sense to make the project support this code. Similar things have been rejected in the past. Please focus on what SPIs might enable you to support this externally.

@acaiafa
Copy link

acaiafa commented Jun 9, 2017

+1 these efforts.

@jerrypeng
Copy link
Contributor

jerrypeng commented Jun 9, 2017

Nomad is great scheduler, solves a lot of scaling issues with schedulers that most schedulers out there today cannot. I have been around the open source community a bit and I can say that if support of something isn't directly merged into a project and its part of some external add-on, supporting it will be a nightmare and support will probably eventually disappear since there is most likely not going to be a full time person monitoring PRs to see if a change will break support for this "add-on". Given the code is of quality, I am in favor of merging this PR into spark.

@chicagobuss
Copy link

chicagobuss commented Jun 9, 2017

I don't see why spark should have native mesos support but not native nomad support - it doesn't make sense.

We heavily rely on nomad for all ephemeral workload scheduling and if this doesn't go into the trunk people will always have to follow a cumbersome patching (or just a "don't run the real thing, run this instead") process.

I think it's worth noting why we can't just run Mesos - when we run a spark workload in the cloud, we provision tens of thousands of VMs and immediately want to start using them for a short burst of work. When we're done, it all goes away. Mesos does not handle this sort of extreme elasticity very well, but nomad excels at it. We've benchmarked nomad at scheduling nearly 2,500 containers per second which is crazy fast compared to both mesos and kubernetes.

It's a win for everyone when there's more choice - especially since I'm sure hashicorp will support the ongoing efforts to support and maintain this integration.

Thanks for the hard work on this, @barnardb. We tested it thoroughly and found it great for our needs. Would love to see this pushed through.

@stew
Copy link

stew commented Jun 9, 2017

thanks very much @barnardb ! We've been testing this patch extensively and it seems to be working great for us, it solves the problem much better than we had solved it ourselves waiting for a patch like this. I'd love to see this get support!

@rxin
Copy link
Contributor

rxin commented Jun 10, 2017

The next one to add is probably Kubernetes. Even the Spark on Kubernetes is going through this cycle of maintaining a separate project for it first.

Adding a new scheduler has huge overhead. It is not the initial development that's the issue, but the following testing, releases, etc. In the past, it is almost always bugs in scheduler integration that's blocking a Spark release.

@tejasapatil
Copy link
Contributor

Given that at Facebook we use our own in-house scheduler, I see why people would want to see their scheduler impls added right in Spark codebase as a first class citizen. Like @srowen said, this should not be part of Spark codebase. There would be many schedulers that would be developed in future and if we keep adding those to Spark, it will be lot of maintenance burden. Mesos and YARN have their place in the codebase given that they were there in early days of the project and are widely used in deployments (not saying that this is good or bad). This PR is making things worse for future by taking the easier route of adding yet another if ..else or switch case for a new scheduler impl. One can blame that this should have been cleanly right in the start for mesos / YARN but to give benefit of doubt to original authors, they didn't expect new schedulers to come up in future.

I would recommend to add interfaces in Spark code to abstract out things so that plugging in new schedulers would be as simple as adding a jar to the classpath. As a matter of fact, we are currently able to use Spark with our own scheduler by extending the existing interfaces. The only patching we have to do is 3-4 lines which is not super pain during upgrades. Seems like you need more than the current interfaces.

Having said this, I do appreciate the efforts put in this PR.

@manojlds
Copy link

@rxin is there any timeline / plan around Kubernetes? Where can I follow that activity? Thanks!

@ash211
Copy link
Contributor

ash211 commented Jun 22, 2017

@manojlds I'm a part of the Spark-on-k8s team that's currently building k8s integration for Spark outside of the Apache Spark repo. You can follow our work at https://github.com/apache-spark-on-k8s/spark

Here's the current outstanding diff on top of apache/branch-2.1: apache-spark-on-k8s#200 , produced from about 6months of work in a varied group from approaching a dozen companies at this point. We aim to eventually bring this into the Apache repo in some form, though the timeline for that is not yet fixed.

You can follow from the original design and discussion for the kubernetes efforts at https://issues.apache.org/jira/browse/SPARK-18278

One of the major pieces of feedback we received back in November was that, if possible, the kubernetes integration should be done as a plugin to the Apache Spark repo. Unfortunately that plugin point does not yet exist in Spark. So we filed https://issues.apache.org/jira/browse/SPARK-19700 to begin the process of creating that API but have not yet devoted effort to building that plugin point. Once active development on the k8s integration begin to slow down and near completion (starting to see signs of this slowing this month) we'll probably shift focus to the plugin point as a means of gaining even wider use of the k8s integration. On what a plugin point could possibly look like, we produced a thought experiment at palantir#81 which you may find interesting.

As of now, it remains unclear whether the k8s code will be brought in as a first-class citizen alongside Mesos and YARN, or whether Spark will gain a plugin point and the k8s integration.

We'd be happy to be a part of any wider efforts to create a plugin point for custom resource managers at SPARK-19700. Is that something the Nomad team would be interested in contributing to?

@rcgenova
Copy link

@rxin We definitely understand the overhead involved when the integration surface area grows too large. All of the HashiCorp tools use a plugin strategy to minimize it. @ash211 We'd love to collaborate on or help validate any plugin designs.

@rcgenova
Copy link

The Apache Spark fork enhanced to support HashiCorp Nomad as a scheduler is now located at: https://github.com/hashicorp/nomad-spark. If you are interested in trying it out, the best place to get started is here: https://www.nomadproject.io/guides/spark/spark.html.

@barnardb barnardb closed this Sep 26, 2017
@barnardb barnardb deleted the nomad branch September 26, 2017 16:19
@OneCricketeer
Copy link

@rcgenova @barnardb Where does that fork stand in terms of support or feature readiness?

If possible, it can begin life as an external add-on. - @srowen

Is there a document how "add-on" are maintained? My first thought is that there is https://spark-packages.org/ , but Nomad as a scheduler seems to require rebuilding all of Spark, for some reason, not just a "Drop-in" / "Add On" package...

@OneCricketeer
Copy link

I would recommend to add interfaces in Spark code to abstract out things so that plugging in new schedulers would be as simple as adding a jar to the classpath. As a matter of fact, we are currently able to use Spark with our own scheduler by extending the existing interfaces. The only patching we have to do is 3-4 lines which is not super pain during upgrades. Seems like you need more than the current interfaces. - @tejasapatil

Do these such interfaces exist in Spark at present day? What efforts do you know of to make support for such efforts to "be as simple as adding a jar to the classpath"?

I see https://issues.apache.org/jira/browse/SPARK-19700 is still open...

it remains unclear whether the k8s code will be brought in as a first-class citizen alongside Mesos and YARN

Well, looks like it's first class now. Is there historical context one can read on how that decision was ultimately made instead of working on the plugin interface, as discussed?

@OneCricketeer
Copy link

Follow up question, what's the external folder for if not non-core packages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet