Skip to content

Conversation

neurons
Copy link
Contributor

@neurons neurons commented Jul 25, 2015

… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

@squito
Copy link
Contributor

squito commented Jul 25, 2015

Jenkins, this is ok to test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better as ! ... _conf.contains(...)? Although I don't think 0 executors is allowed, this logic would not work if it were.

The docs in running-on-yarn.md need to update to explain the new behavior a little.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, +1 to @srowen's comment. We also need to add a comment here to explain why it's not enabled if we set spark.executor.instances, something like

// Dynamic allocation and explicitly setting the number of executors are inherently
// incompatible. In environments where dynamic allocation is turned on by default,
// the latter should override the former (SPARK-9092).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is now duplicated in a few places, maybe it makes sense to make a Utils.isDynamicAllocationEnabled(conf) method and move the comment there instead.

@srowen
Copy link
Member

srowen commented Jul 25, 2015

The thing I'm not 100% clear on is whether spark.executor.instances is always set if you set --num-executors. It may be my unfamiliarity with this part. I see parts where both are checked as a source of this setting but want to make sure that assumption is correct, that you'll end up with this as a SparkContext value (not just sys property) when you set --num-executors. The test doesn't test that.

CC @sryza and/or @andrewor14 for thoughts

@vanzin
Copy link
Contributor

vanzin commented Jul 28, 2015

whether spark.executor.instances is always set if you set --num-executors

It depends on whether it's client or cluster mode. In client mode, this is used:

  OptionAssigner(args.numExecutors, YARN, CLIENT, sysProp = "spark.executor.instances"),

So the command line arg is turned into a sys prop. In cluster mode, this is used:

  OptionAssigner(args.numExecutors, YARN, CLUSTER, clOption = "--num-executors"),

So the handling is now in YARN's ClientArguments.scala. That can get tricky to follow in the code.

@neurons I suggest getting rid of "--num-executors" in ClientArguments, and instead using spark.executor.instances everywhere, leaving the translation solely in SparkSubmit (the first line I mention above). That should also allow some code in ApplicationMaster and ApplicationMasterArguments to be cleaned up.

@andrewor14
Copy link
Contributor

ok to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this test assert something stronger, that we actually end up with 6 executors?

@andrewor14
Copy link
Contributor

+1 to @vanzin's suggestion. I believe this patch in its current state will not work in cluster mode.

@andrewor14
Copy link
Contributor

Also, should we at least log a warning that dynamic allocation is overshadowed and not actually used? There is not another way for the user to find out otherwise; at best they could wait for N seconds to see if their executors actually got killed.

@SparkQA
Copy link

SparkQA commented Jul 28, 2015

Test build #38736 has finished for PR 7657 at commit 32bf340.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39119 has finished for PR 7657 at commit dadb40b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class IsotonicRegression(override val uid: String)

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39397 has finished for PR 7657 at commit 7aa4a92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class RequestExecutors(appId: String, requestedTotal: Int)
    • case class KillExecutors(appId: String, executorIds: Seq[String])

@vanzin
Copy link
Contributor

vanzin commented Aug 3, 2015

retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

===

@vanzin
Copy link
Contributor

vanzin commented Aug 3, 2015

Looking good. There's still one place in YarnClientSchedulerBackend where --num-executors is being used that should also be cleaned up. That spot translates a deprecated way of setting the num of executors (SPARK_WORKER_INSTANCES), I think that code should be moved to SparkConf.validateSettings where other deprecated settings are translated.

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #192 has finished for PR 7657 at commit 7aa4a92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #39559 has finished for PR 7657 at commit 7aa4a92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@neurons
Copy link
Contributor Author

neurons commented Aug 3, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Aug 3, 2015

Test build #1318 has finished for PR 7657 at commit 7aa4a92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 3, 2015

Last failure is legit. YarnAllocatorSuite uses --num-executors when instantiating ApplicationMasterArguments, that needs to be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW you should also remove the numExecutors field from this class; that should uncover a couple of other spots where you might need to also fix things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous behavior seems to be to only consider SPARK_WORKER_INSTANCES if spark.executor.instances is not set.

Niranjan Padmanabhan added 7 commits August 10, 2015 13:05
…c allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.
… Modified dependencies of this change deeper down in the code base.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you missed my previous comment so I'll just paste it here, since github makes it hard to find otherwise:

Actually, this code is not right after your changes. The value of spark.executor.instances, previously, was either the value set by the user (if dynamic allocation is disabled), or the value of spark.dynamicAllocation.initialExecutors if dynamic allocation is enabled (see ClientArguments.loadEnvironmentArgs).

So, instead, it should check both cases. Something like:

@volatile private var targetNumExecutors = 
  if (Utils.isDynamicAllocationEnabled(sparkConf)) {
    sparkConf.getInt("spark.dynamicAllocation.initialExecutors", 0)
  } else {
    sparkConf.getInt("spark.executor.instances", YarnSparkHadoopUtil.DEFAULT_NUMBER_EXECUTORS)
  }

The way these options are propagate is still a little confusing (and the lack of tests doesn't help), so I hope that's enough. There might be some other cleanup possible, but I'm not gonna ask you to go there.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40312 has finished for PR 7657 at commit 682626e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40376 has finished for PR 7657 at commit 6da06c4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 11, 2015

Alright, LGTM. Let's try jenkins again. retest this please

@vanzin
Copy link
Contributor

vanzin commented Aug 11, 2015

(I'll try to run some local tests later on too.)

@neurons
Copy link
Contributor Author

neurons commented Aug 11, 2015

Thank you @vanzin and @andrewor14 for your reviews.

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40489 has finished for PR 7657 at commit 6da06c4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 11, 2015

Tested this locally and it looks good. @andrewor14 did you have any other comments?

Test failures are unrelated (pyspark?) so I'll just ignore them.

@vanzin
Copy link
Contributor

vanzin commented Aug 12, 2015

Alright, will merge later today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to interpolate here

@andrewor14
Copy link
Contributor

Looks fine, though I won't have a chance to test this out on a real cluster and I think we should before the release. Feel free to merge it.

@vanzin
Copy link
Contributor

vanzin commented Aug 12, 2015

I won't have a chance to test this out on a real cluster

I tested yesterday, client and cluster mode, conf and command line options. All seems fine.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40661 has finished for PR 7657 at commit 7f3e1ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Aug 12, 2015

42nd time is the charm, I guess. Merging.

asfgit pushed a commit that referenced this pull request Aug 12, 2015
…c...

… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes #7657 from neurons/SPARK-9092.

(cherry picked from commit 738f353)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@vanzin
Copy link
Contributor

vanzin commented Aug 12, 2015

Merged to master and 1.5. Thanks!

@asfgit asfgit closed this in 738f353 Aug 12, 2015
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
…c...

… allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.

Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>

Closes apache#7657 from neurons/SPARK-9092.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants