Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11714][Mesos] Make Spark on Mesos honor port restrictions on coarse grain mode #11157

Closed
wants to merge 17 commits into from

Conversation

skonto
Copy link
Contributor

@skonto skonto commented Feb 10, 2016

  • Make mesos coarse grained scheduler accept port offers and pre-assign ports

Previous attempt was for fine grained: #10808

@andrewor14
Copy link
Contributor

add to whitelist

@andrewor14
Copy link
Contributor

@skonto can you add a link in the description to your other PR?

@skonto
Copy link
Contributor Author

skonto commented Feb 11, 2016

Done @andrewor14. I will resolve conflicts.

@skonto
Copy link
Contributor Author

skonto commented Feb 11, 2016

Jenkins test this please

Conflicts:
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala
	core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackendSuite.scala
@SparkQA
Copy link

SparkQA commented Feb 11, 2016

Test build #51096 has started for PR 11157 at commit a4e575d.

@SparkQA
Copy link

SparkQA commented Feb 11, 2016

Test build #51095 has finished for PR 11157 at commit 53f2ced.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2016

Test build #51108 has finished for PR 11157 at commit a4e575d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Grouping(child: Expression) extends Expression with Unevaluable
    • case class GroupingID(groupByExprs: Seq[Expression]) extends Expression with Unevaluable
    • class ContinuousQueryManager(sqlContext: SQLContext)
    • class ContinuousQueryListenerBus(sparkListenerBus: LiveListenerBus)
    • class FileStreamSource(
    • trait HadoopFsRelationProvider extends StreamSourceProvider
    • abstract class ContinuousQueryListener

@skonto
Copy link
Contributor Author

skonto commented Feb 11, 2016

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Feb 11, 2016

Test build #51123 has finished for PR 11157 at commit a4e575d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Grouping(child: Expression) extends Expression with Unevaluable
    • case class GroupingID(groupByExprs: Seq[Expression]) extends Expression with Unevaluable
    • class ContinuousQueryManager(sqlContext: SQLContext)
    • class ContinuousQueryListenerBus(sparkListenerBus: LiveListenerBus)
    • class FileStreamSource(
    • trait HadoopFsRelationProvider extends StreamSourceProvider
    • abstract class ContinuousQueryListener

@skonto
Copy link
Contributor Author

skonto commented Feb 11, 2016

Need to fix the test. Multiple executors per slave need different handling.

@skonto
Copy link
Contributor Author

skonto commented Feb 29, 2016

Still working on this had to get aligned with @mgummelt.

partitionResources(afterCPUResources.asJava, "mem", taskMemory)
val (resourcesLeft, portResourcesToUse) = remainingMemResources
.partition {r => ! ( r.getType == Value.Type.RANGES & r.getName == "ports" )}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is still style problems here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes havent finished it.... port resources need to be managed as cpus eg. keep global logistics... but what exactly is the problem for you?

@tnachen
Copy link
Contributor

tnachen commented Mar 4, 2016

Can you fix the whitespacing in general in your patch? There are quite some extra whitespaces in the Utils.scala

@skonto
Copy link
Contributor Author

skonto commented Mar 7, 2016

ok i will fix it thnx... WIP

@skonto
Copy link
Contributor Author

skonto commented Mar 12, 2016

updated the PR @tnachen @mgummelt, @dragos ready for review....
This follows our design discussions:
Design choices:

At offer evaluation step check (scheduler side) if we have enough ports to start an executor in an offer:

  • if yes, we start it and accept the offer
    otherwise decline the offer.
  • If a user asks for a specific port like in the case of spark.executor.port, then we cannot make available the same port to all executors of the same job, due to mesos limitations (all slave ports are shared).

Our proposed solution is to start one executor if the user sets a value to spark.executor.port since there is only one port available with that value.

Spark Cluster framework port honoring will be fixed in a new PR, to make things easier and shorter for review.
One concern i have and im testing is compatibility between frameworks running in coarse mode and fain grain mode on the same cluster.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52982 has finished for PR 11157 at commit d1cb53b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto skonto force-pushed the honour_ports_coarse branch 2 times, most recently from 33d74aa to 731c2f0 Compare March 12, 2016 02:02
@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52983 has finished for PR 11157 at commit 33d74aa.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52984 has finished for PR 11157 at commit 731c2f0.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #53002 has finished for PR 11157 at commit 023eedc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #53007 has finished for PR 11157 at commit c6ceb8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -212,12 +212,32 @@ class MesosCoarseGrainedSchedulerBackendSuite extends SparkFunSuite
.registerDriverWithShuffleService(anyString, anyInt, anyLong, anyLong)
}

test("mesos kills an executor when told") {
test("Port offer decline when there is no appropriate range") {
Copy link
Contributor

@mgummelt mgummelt Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a test where the offer contains the requested ports? We should both that the offer is accepted, and that the launched task contains the requested port as a resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@skonto
Copy link
Contributor Author

skonto commented Aug 10, 2016

@mgummelt added the extra test needed and fixed the rest.

@SparkQA
Copy link

SparkQA commented Aug 11, 2016

Test build #63559 has finished for PR 11157 at commit b46b9d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Aug 11, 2016

@mgummelt : ready for merge what do you think?

@mgummelt
Copy link
Contributor

LGTM.

@srowen Can we get this merged into master?

if (requestedPorts.isEmpty) {
(offeredResources, List[Resource]())
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a quick minor style pass, since that's all I can comment on really -- pull this else up to the previous line for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@skonto
Copy link
Contributor Author

skonto commented Aug 14, 2016

@srowen ready

@SparkQA
Copy link

SparkQA commented Aug 14, 2016

Test build #63755 has finished for PR 11157 at commit 3bc31cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Aug 15, 2016

@srowen could you merge this pls?

@srowen
Copy link
Member

srowen commented Aug 15, 2016

Merged to master

@asfgit asfgit closed this in 1a028bd Aug 15, 2016
@mgummelt
Copy link
Contributor

woot

@sun-rui
Copy link
Contributor

sun-rui commented Aug 16, 2016

has this change be documented to spark on mesos guide?

@skonto
Copy link
Contributor Author

skonto commented Aug 16, 2016

@sun-rui Will do in a separate pr shortly. There was a question for that a few comments above. Thanx for mention it again.

@skonto
Copy link
Contributor Author

skonto commented Aug 16, 2016

@sun-rui here you are: #14667
@mgummelt check if it is clear enough pls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet