[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237

erenavsarogullari · 2016-09-25T22:21:26Z

What changes were proposed in this pull request?

If spark.scheduler.allocation.file has invalid minShare or/and weight values, these cause :

NumberFormatException due to toInt function
SparkContext can not be initialized.
It does not show meaningful error message to user.

In a nutshell, this functionality can be more robust by selecting one of the following flows :

1- Currently, if schedulingMode has an invalid value, a warning message is logged and default value is set as FIFO. Same pattern can be used for minShare(default: 0) and weight(default: 1) as well
2- Meaningful error message can be shown to the user for all invalid cases.

PR offers :

schedulingMode handles just empty values. It also needs to be supported for whitespace, non-uppercase(fair, FaIr etc...) or SchedulingMode.NONE cases by setting default value(FIFO)
minShare and weight handle just empty values. They also need to be supported for non-integer cases by setting default values.
Some refactoring of PoolSuite.

Code to Reproduce :

val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local")
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml")
val sc = new SparkContext(conf)

fairscheduler-invalid-data.xml :

<allocations>
    <pool name="production">
        <schedulingMode>FIFO</schedulingMode>
        <weight>invalid_weight</weight>
        <minShare>2</minShare>
    </pool>
</allocations>

Stacktrace :

Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
    at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127)
    at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102)

How was this patch tested?

Added Unit Test Case.

erenavsarogullari · 2016-11-09T18:09:58Z

cc @kayousterhout @markhamstra @squito
Thanks and all feedbacks are welcome in advance ;)

squito · 2016-12-01T21:29:32Z

Jenkins, ok to test

squito

Hi @erenavsarogullari sorry this has taken so long for me to look at. Better error handling is definitely nice to have, thanks for working on it.

I personally would prefer to fail fast, rather than go to default values, but I guess that it already uses default values for the scheduling mode so you can stick with that.

I have a handful of relatively minor comments, but nothing big.

squito · 2016-12-01T21:43:28Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

+  val LOCAL = "local"
+  val APP_NAME = "PoolSuite"
+  val SCHEDULER_ALLOCATION_FILE_PROPERTY = "spark.scheduler.allocation.file"
+  val SCHEDULER_POOL_PROPERTY = "spark.scheduler.pool"


instead of repeating this here, why not directly access the constant in FairSchedulableBuilder? would be better if it were in an object, but you have an instance where you reference this in any case so you dont' need to do a bigger refactoring.

squito · 2016-12-01T21:44:18Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala


    val properties1 = new Properties()
-    properties1.setProperty("spark.scheduler.pool", "1")
+    properties1.setProperty(SCHEDULER_POOL_PROPERTY, "1")


eg., here you'd do

properties1.setProperty(schedulableBuilder.FAIR_SCHEDULER_PROPERTIES, "1")

squito · 2016-12-01T21:45:59Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

  def createTaskSetManager(stageId: Int, numTasks: Int, taskScheduler: TaskSchedulerImpl)
-    : TaskSetManager = {
+  : TaskSetManager = {


nit: old indentation was more correct. (Really this should have each arg on its own line but since you don't need to touch this, probably best to leave it alone in this pr.)

squito · 2016-12-01T22:05:21Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+    if (StringUtils.isNotBlank(data) && checkType(data.toInt)) {
+      data.toInt
+    } else {
+      logWarning(s"$propertyName is blank or invalid: $data, using the default $propertyName: " +


I think these warning messages need to provide more context -- the person who eventually notices these might not be the same as the one who created the xml file. Maybe change to something like:

"Error while loading scheduler allocation file at $path. $propertyName is blank ..."

squito · 2016-12-01T22:08:17Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

      logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format(
        poolName, schedulingMode, minShare, weight))
    }
  }

+  private def getSchedulingModeValue(data: String, defaultValue: SchedulingMode): SchedulingMode = {


this seems a little more complicated than is really necessary for the what you're doing here. couldn't you achieve the same thing by leaving the original code and changing the one line above the original to:

val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase

not exactly the same -- it also allows whitespace around the scheduling mode, but maybe a good thing?

Sure, trim and toUpperCase functions help to cover for blank and invalid schedulingMode cases. Also schedulingMode = none/NONE case needs to be checked. This case is valid but unsupported.

squito · 2016-12-01T22:10:10Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+  }
+
+  private def getIntValue(propertyName: String, data: String, defaultValue: Int): Int = {
+    if (StringUtils.isNotBlank(data) && checkType(data.toInt)) {


after getting rid of getSchedulingModeValue, I'd also probably get rid of checkType and just directly inline the try {...} catch { case NumberFormatException ...} here.

squito · 2016-12-01T22:11:55Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

+    verifyPool(rootPool, "pool_with_whitespace_scheduling_mode", 3, 2, FIFO)
+    verifyPool(rootPool, "pool_with_empty_min_share", 0, 3, FAIR)
+    verifyPool(rootPool, "pool_with_empty_weight", 2, 1, FAIR)
+    verifyPool(rootPool, "pool_with_empty_scheduling_mode", 2, 2, FIFO)


with my suggestion on trim, you could also add a test case for a mode w/ surrounding whitespace.

Addressed ;)

SparkQA · 2016-12-02T00:39:42Z

Test build #69506 has finished for PR 15237 at commit 4b6053f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

erenavsarogullari · 2016-12-07T23:15:23Z

Many thanks @squito for review.
I will address them asap ;)

erenavsarogullari · 2016-12-24T17:16:08Z

Hi @squito,
Many thanks again for review. All comments have just been addressed via new commit.
Also It is ready to rereview / merge ;)

SparkQA · 2016-12-24T18:32:49Z

Test build #70571 has finished for PR 15237 at commit a4e06b2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

Thanks for updating @erenavsarogullari , sorry for the delay in reviewing. I just have some minor updates requested.

squito · 2017-01-03T22:44:15Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

+    verifyPool(rootPool, "pool_with_empty_scheduling_mode", 2, 2, FIFO)
+    verifyPool(rootPool, "pool_with_min_share_surrounded_whitespace", 3, 2, FAIR)
+    verifyPool(rootPool, "pool_with_weight_surrounded_whitespace", 1, 2, FAIR)
+    verifyPool(rootPool, "pool_with_scheduling_mode_surrounded_whitespace", 3, 2, FAIR)


nit: can you just combine all these "surrounded_whitespace" cases into one for all the properties?

squito · 2017-01-03T22:55:30Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+      data.toInt
+    } catch {
+      case e: NumberFormatException =>
+        logWarning(s"Error while loading scheduler allocation file at $schedulerAllocFile. " +


ugh, this is kind of a nuisance, but I realize now that schedulerAllocFile isn't necessarily the right file -- that might be empty, and there might be a fairscheduler.xml sitting on the classpath. Can you get the right file name in both cases? (Better to leave it to not include any filename, than to

And can the warning include the poolname as well?

Finally, it would be nice to add this extra info to the mode warning too.

squito · 2017-01-03T22:56:17Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

-        minShare = xmlMinShare.toInt
-      }
+      val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase
+      val schedulingMode = getSchedulingModeValue(xmlSchedulingMode, DEFAULT_SCHEDULING_MODE)


this is minor, but if you're going to have a helper method here, can you do all the parsing inside it? include the one line above (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase.

Same goes for getIntValue

erenavsarogullari · 2017-01-26T00:16:23Z

Hi @squito,
Firstly, sorry for delay and thanks for review again.
All comments are addressed via new patch(a1b2924) and it is ready to re-review / merge ;)

SparkQA · 2017-01-26T01:34:55Z

Test build #72004 has finished for PR 15237 at commit d5597c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-26T02:30:30Z

Test build #72005 has finished for PR 15237 at commit a1b2924.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

two very minor style issues, otherwise looks good.

I guess you decided against including the xml file in the warning messages? To get it, you'd need to adjust FairSchedulerBuilder.buildPools, where it creates the inputStream -- it would also set a variable for the input filename at the same time. Would be nice to have, but this is an improvement even without that addition.

squito · 2017-01-30T15:28:35Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

      logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format(
        poolName, schedulingMode, minShare, weight))
    }
  }

+  private def getSchedulingModeValue(poolNode: Node, poolName: String, defaultValue: SchedulingMode)
+  : SchedulingMode = {


nit: multi-line method definitions should have each parameter on their own line, with the params double-indented:

private def getSchedulingModeValue( poolNode: Node poolName: String, defaultValue: SchedulingMode): SchedulingMode = { // body }

squito · 2017-01-30T15:29:21Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+  }
+
+  private def getIntValue(poolNode: Node, poolName: String, propertyName: String, defaultValue: Int)
+  : Int = {


same here on multi-line method def

…location.file

erenavsarogullari · 2017-01-30T21:04:51Z

Hi @squito,

Firstly, thanks again for the review.
Sure, scheduler filename can be gotten via buildPools function by creating a tuple / case class as (Option[InputStream], fileName). I can address this via separated PR due to current size of this PR and ready to merge.

SparkQA · 2017-01-30T23:07:49Z

Test build #72169 has finished for PR 15237 at commit 7551cbd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-01-31T03:05:52Z

@erenavsarogullari sorry to push back, but if you're willing to do the filename thing now, why not just tackle it in this same pr? seems pretty minor to separate into its own issue, and conceptually fits with this.

If you just don't have time to do it in the next day or two, than sure, I can just merge this as is and we can wait on the filename.

erenavsarogullari · 2017-01-31T19:28:31Z

Hi @squito,

Sorry, i am quite busy this week and happy to be merged this PR now if it is also ok for you. I plan to address fileName logging via separated jira.

Also To inform user for the following cases can be useful by adding logging:

If schedulerAllocFile property is not set and DEFAULT_SCHEDULER_FILE is used,
If schedulerAllocFile property is not set and DEFAULT_SCHEDULER_FILE do not exist,
If schedulerAllocFile does not exist.

So thinking to create a single Jira for FairSchedulableBuilder logging improvement by covering both fileName logging(if file is found successfully) and above failure cases.

squito · 2017-02-06T14:27:58Z

merged to master, thanks @erenavsarogullari

Fair Scheduler Logging for the following cases can be useful for the user. 1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes. 2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace** ``` Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75) ``` 3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.** 4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.** Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#16813 from erenavsarogullari/SPARK-19466.

…ess via scheduler.allocation.file ## What changes were proposed in this pull request? If `spark.scheduler.allocation.file` has invalid `minShare` or/and `weight` values, these cause : - `NumberFormatException` due to `toInt` function - `SparkContext` can not be initialized. - It does not show meaningful error message to user. In a nutshell, this functionality can be more robust by selecting one of the following flows : **1-** Currently, if `schedulingMode` has an invalid value, a warning message is logged and default value is set as `FIFO`. Same pattern can be used for `minShare`(default: 0) and `weight`(default: 1) as well **2-** Meaningful error message can be shown to the user for all invalid cases. PR offers : - `schedulingMode` handles just empty values. It also needs to be supported for **whitespace**, **non-uppercase**(fair, FaIr etc...) or `SchedulingMode.NONE` cases by setting default value(`FIFO`) - `minShare` and `weight` handle just empty values. They also need to be supported for **non-integer** cases by setting default values. - Some refactoring of `PoolSuite`. **Code to Reproduce :** ``` val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local") conf.set("spark.scheduler.mode", "FAIR") conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml") val sc = new SparkContext(conf) ``` **fairscheduler-invalid-data.xml :** ``` <allocations> <pool name="production"> <schedulingMode>FIFO</schedulingMode> <weight>invalid_weight</weight> <minShare>2</minShare> </pool> </allocations> ``` **Stacktrace :** ``` Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102) ``` ## How was this patch tested? Added Unit Test Case. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#15237 from erenavsarogullari/SPARK-17663.

Fair Scheduler Logging for the following cases can be useful for the user. 1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes. 2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace** ``` Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75) ``` 3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.** 4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.** Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#16813 from erenavsarogullari/SPARK-19466.

erenavsarogullari changed the title ~~[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.al…~~ [SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file Sep 25, 2016

squito suggested changes Dec 1, 2016

View reviewed changes

squito suggested changes Jan 3, 2017

View reviewed changes

squito mentioned this pull request Jan 3, 2017

[SPARK-18066] [CORE] [TESTS] Add Pool usage policies test coverage for FIFO & FAIR Schedulers #15604

Closed

squito approved these changes Jan 30, 2017

View reviewed changes

erenavsarogullari added 4 commits January 30, 2017 19:54

SchedulableBuilder should handle invalid data access via scheduler.al…

8ace686

…location.file

Review comments are addressed.

399bf9a

Review comments are addressed.

558f8b1

Review comments are addressed.

7551cbd

erenavsarogullari mentioned this pull request Feb 5, 2017

[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813

Closed

asfgit closed this in 7beb227 Feb 6, 2017

erenavsarogullari deleted the SPARK-17663 branch February 21, 2017 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237

[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237

erenavsarogullari commented Sep 25, 2016 •

edited

Loading

erenavsarogullari commented Nov 9, 2016

squito commented Dec 1, 2016

squito left a comment

squito Dec 1, 2016

squito Dec 1, 2016

squito Dec 1, 2016

squito Dec 1, 2016

squito Dec 1, 2016

erenavsarogullari Dec 24, 2016

squito Dec 1, 2016

squito Dec 1, 2016

erenavsarogullari Dec 24, 2016

SparkQA commented Dec 2, 2016

erenavsarogullari commented Dec 7, 2016

erenavsarogullari commented Dec 24, 2016

SparkQA commented Dec 24, 2016

squito left a comment

squito Jan 3, 2017

squito Jan 3, 2017

squito Jan 3, 2017

erenavsarogullari commented Jan 26, 2017

SparkQA commented Jan 26, 2017

SparkQA commented Jan 26, 2017

squito left a comment

squito Jan 30, 2017

squito Jan 30, 2017

erenavsarogullari commented Jan 30, 2017

SparkQA commented Jan 30, 2017

squito commented Jan 31, 2017

erenavsarogullari commented Jan 31, 2017

squito commented Feb 6, 2017

[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237

[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237

Conversation

erenavsarogullari commented Sep 25, 2016 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

erenavsarogullari commented Nov 9, 2016

squito commented Dec 1, 2016

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 2, 2016

erenavsarogullari commented Dec 7, 2016

erenavsarogullari commented Dec 24, 2016

SparkQA commented Dec 24, 2016

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erenavsarogullari commented Jan 26, 2017

SparkQA commented Jan 26, 2017

SparkQA commented Jan 26, 2017

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erenavsarogullari commented Jan 30, 2017

SparkQA commented Jan 30, 2017

squito commented Jan 31, 2017

erenavsarogullari commented Jan 31, 2017

squito commented Feb 6, 2017

erenavsarogullari commented Sep 25, 2016 •

edited

Loading