Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237

Closed
wants to merge 4 commits into from

Conversation

erenavsarogullari
Copy link
Member

@erenavsarogullari erenavsarogullari commented Sep 25, 2016

What changes were proposed in this pull request?

If spark.scheduler.allocation.file has invalid minShare or/and weight values, these cause :

  • NumberFormatException due to toInt function
  • SparkContext can not be initialized.
  • It does not show meaningful error message to user.

In a nutshell, this functionality can be more robust by selecting one of the following flows :

1- Currently, if schedulingMode has an invalid value, a warning message is logged and default value is set as FIFO. Same pattern can be used for minShare(default: 0) and weight(default: 1) as well
2- Meaningful error message can be shown to the user for all invalid cases.

PR offers :

  • schedulingMode handles just empty values. It also needs to be supported for whitespace, non-uppercase(fair, FaIr etc...) or SchedulingMode.NONE cases by setting default value(FIFO)
  • minShare and weight handle just empty values. They also need to be supported for non-integer cases by setting default values.
  • Some refactoring of PoolSuite.

Code to Reproduce :

val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local")
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml")
val sc = new SparkContext(conf)

fairscheduler-invalid-data.xml :

<allocations>
    <pool name="production">
        <schedulingMode>FIFO</schedulingMode>
        <weight>invalid_weight</weight>
        <minShare>2</minShare>
    </pool>
</allocations>

Stacktrace :

Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
    at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127)
    at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102)

How was this patch tested?

Added Unit Test Case.

@erenavsarogullari erenavsarogullari changed the title [SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.al… [SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file Sep 25, 2016
@erenavsarogullari
Copy link
Member Author

cc @kayousterhout @markhamstra @squito
Thanks and all feedbacks are welcome in advance ;)

@squito
Copy link
Contributor

squito commented Dec 1, 2016

Jenkins, ok to test

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @erenavsarogullari sorry this has taken so long for me to look at. Better error handling is definitely nice to have, thanks for working on it.

I personally would prefer to fail fast, rather than go to default values, but I guess that it already uses default values for the scheduling mode so you can stick with that.

I have a handful of relatively minor comments, but nothing big.

val LOCAL = "local"
val APP_NAME = "PoolSuite"
val SCHEDULER_ALLOCATION_FILE_PROPERTY = "spark.scheduler.allocation.file"
val SCHEDULER_POOL_PROPERTY = "spark.scheduler.pool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of repeating this here, why not directly access the constant in FairSchedulableBuilder? would be better if it were in an object, but you have an instance where you reference this in any case so you dont' need to do a bigger refactoring.


val properties1 = new Properties()
properties1.setProperty("spark.scheduler.pool", "1")
properties1.setProperty(SCHEDULER_POOL_PROPERTY, "1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg., here you'd do

properties1.setProperty(schedulableBuilder.FAIR_SCHEDULER_PROPERTIES, "1")

def createTaskSetManager(stageId: Int, numTasks: Int, taskScheduler: TaskSchedulerImpl)
: TaskSetManager = {
: TaskSetManager = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: old indentation was more correct. (Really this should have each arg on its own line but since you don't need to touch this, probably best to leave it alone in this pr.)

if (StringUtils.isNotBlank(data) && checkType(data.toInt)) {
data.toInt
} else {
logWarning(s"$propertyName is blank or invalid: $data, using the default $propertyName: " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these warning messages need to provide more context -- the person who eventually notices these might not be the same as the one who created the xml file. Maybe change to something like:

"Error while loading scheduler allocation file at $path. $propertyName is blank ..."

logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format(
poolName, schedulingMode, minShare, weight))
}
}

private def getSchedulingModeValue(data: String, defaultValue: SchedulingMode): SchedulingMode = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little more complicated than is really necessary for the what you're doing here. couldn't you achieve the same thing by leaving the original code and changing the one line above the original to:

val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase

not exactly the same -- it also allows whitespace around the scheduling mode, but maybe a good thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, trim and toUpperCase functions help to cover for blank and invalid schedulingMode cases. Also schedulingMode = none/NONE case needs to be checked. This case is valid but unsupported.

}

private def getIntValue(propertyName: String, data: String, defaultValue: Int): Int = {
if (StringUtils.isNotBlank(data) && checkType(data.toInt)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after getting rid of getSchedulingModeValue, I'd also probably get rid of checkType and just directly inline the try {...} catch { case NumberFormatException ...} here.

verifyPool(rootPool, "pool_with_whitespace_scheduling_mode", 3, 2, FIFO)
verifyPool(rootPool, "pool_with_empty_min_share", 0, 3, FAIR)
verifyPool(rootPool, "pool_with_empty_weight", 2, 1, FAIR)
verifyPool(rootPool, "pool_with_empty_scheduling_mode", 2, 2, FIFO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with my suggestion on trim, you could also add a test case for a mode w/ surrounding whitespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed ;)

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69506 has finished for PR 15237 at commit 4b6053f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari
Copy link
Member Author

Many thanks @squito for review.
I will address them asap ;)

@erenavsarogullari
Copy link
Member Author

Hi @squito,
Many thanks again for review. All comments have just been addressed via new commit.
Also It is ready to rereview / merge ;)

@SparkQA
Copy link

SparkQA commented Dec 24, 2016

Test build #70571 has finished for PR 15237 at commit a4e06b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating @erenavsarogullari , sorry for the delay in reviewing. I just have some minor updates requested.

verifyPool(rootPool, "pool_with_empty_scheduling_mode", 2, 2, FIFO)
verifyPool(rootPool, "pool_with_min_share_surrounded_whitespace", 3, 2, FAIR)
verifyPool(rootPool, "pool_with_weight_surrounded_whitespace", 1, 2, FAIR)
verifyPool(rootPool, "pool_with_scheduling_mode_surrounded_whitespace", 3, 2, FAIR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you just combine all these "surrounded_whitespace" cases into one for all the properties?

data.toInt
} catch {
case e: NumberFormatException =>
logWarning(s"Error while loading scheduler allocation file at $schedulerAllocFile. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, this is kind of a nuisance, but I realize now that schedulerAllocFile isn't necessarily the right file -- that might be empty, and there might be a fairscheduler.xml sitting on the classpath. Can you get the right file name in both cases? (Better to leave it to not include any filename, than to

And can the warning include the poolname as well?

Finally, it would be nice to add this extra info to the mode warning too.

minShare = xmlMinShare.toInt
}
val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase
val schedulingMode = getSchedulingModeValue(xmlSchedulingMode, DEFAULT_SCHEDULING_MODE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is minor, but if you're going to have a helper method here, can you do all the parsing inside it? include the one line above (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase.

Same goes for getIntValue

@erenavsarogullari
Copy link
Member Author

Hi @squito,
Firstly, sorry for delay and thanks for review again.
All comments are addressed via new patch(a1b2924) and it is ready to re-review / merge ;)

@SparkQA
Copy link

SparkQA commented Jan 26, 2017

Test build #72004 has finished for PR 15237 at commit d5597c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 26, 2017

Test build #72005 has finished for PR 15237 at commit a1b2924.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@squito squito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two very minor style issues, otherwise looks good.

I guess you decided against including the xml file in the warning messages? To get it, you'd need to adjust FairSchedulerBuilder.buildPools, where it creates the inputStream -- it would also set a variable for the input filename at the same time. Would be nice to have, but this is an improvement even without that addition.

logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format(
poolName, schedulingMode, minShare, weight))
}
}

private def getSchedulingModeValue(poolNode: Node, poolName: String, defaultValue: SchedulingMode)
: SchedulingMode = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: multi-line method definitions should have each parameter on their own line, with the params double-indented:

private def getSchedulingModeValue(
    poolNode: Node
    poolName: String,
    defaultValue: SchedulingMode): SchedulingMode = {
  // body
}

}

private def getIntValue(poolNode: Node, poolName: String, propertyName: String, defaultValue: Int)
: Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here on multi-line method def

@erenavsarogullari
Copy link
Member Author

Hi @squito,

Firstly, thanks again for the review.
Sure, scheduler filename can be gotten via buildPools function by creating a tuple / case class as (Option[InputStream], fileName). I can address this via separated PR due to current size of this PR and ready to merge.

@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72169 has finished for PR 15237 at commit 7551cbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Jan 31, 2017

@erenavsarogullari sorry to push back, but if you're willing to do the filename thing now, why not just tackle it in this same pr? seems pretty minor to separate into its own issue, and conceptually fits with this.

If you just don't have time to do it in the next day or two, than sure, I can just merge this as is and we can wait on the filename.

@erenavsarogullari
Copy link
Member Author

Hi @squito,

Sorry, i am quite busy this week and happy to be merged this PR now if it is also ok for you. I plan to address fileName logging via separated jira.

Also To inform user for the following cases can be useful by adding logging:

  • If schedulerAllocFile property is not set and DEFAULT_SCHEDULER_FILE is used,
  • If schedulerAllocFile property is not set and DEFAULT_SCHEDULER_FILE do not exist,
  • If schedulerAllocFile does not exist.

So thinking to create a single Jira for FairSchedulableBuilder logging improvement by covering both fileName logging(if file is found successfully) and above failure cases.

@squito
Copy link
Contributor

squito commented Feb 6, 2017

merged to master, thanks @erenavsarogullari

ghost pushed a commit to dbtsai/spark that referenced this pull request Feb 10, 2017
Fair Scheduler Logging for the following cases can be useful for the user.

1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes.

2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace**
```
Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75)
```
3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.**

4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.**

Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values.

## How was this patch tested?
Added new Unit Tests.

Author: erenavsarogullari <erenavsarogullari@gmail.com>

Closes apache#16813 from erenavsarogullari/SPARK-19466.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ess via scheduler.allocation.file

## What changes were proposed in this pull request?

If `spark.scheduler.allocation.file` has invalid `minShare` or/and `weight` values, these cause :
- `NumberFormatException` due to `toInt` function
- `SparkContext` can not be initialized.
- It does not show meaningful error message to user.

In a nutshell, this functionality can be more robust by selecting one of the following flows :

**1-** Currently, if `schedulingMode` has an invalid value, a warning message is logged and default value is set as `FIFO`. Same pattern can be used for `minShare`(default: 0) and `weight`(default: 1) as well
**2-** Meaningful error message can be shown to the user for all invalid cases.

PR offers :
- `schedulingMode` handles just empty values. It also needs to be supported for **whitespace**, **non-uppercase**(fair, FaIr etc...) or `SchedulingMode.NONE` cases by setting default value(`FIFO`)
- `minShare` and `weight` handle just empty values. They also need to be supported for **non-integer** cases by setting default values.
- Some refactoring of `PoolSuite`.

**Code to Reproduce :**

```
val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local")
conf.set("spark.scheduler.mode", "FAIR")
conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml")
val sc = new SparkContext(conf)
```

**fairscheduler-invalid-data.xml :**

```
<allocations>
    <pool name="production">
        <schedulingMode>FIFO</schedulingMode>
        <weight>invalid_weight</weight>
        <minShare>2</minShare>
    </pool>
</allocations>
```

**Stacktrace :**

```
Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
    at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
    at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127)
    at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102)
```
## How was this patch tested?

Added Unit Test Case.

Author: erenavsarogullari <erenavsarogullari@gmail.com>

Closes apache#15237 from erenavsarogullari/SPARK-17663.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
Fair Scheduler Logging for the following cases can be useful for the user.

1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes.

2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace**
```
Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75)
```
3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.**

4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.**

Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values.

## How was this patch tested?
Added new Unit Tests.

Author: erenavsarogullari <erenavsarogullari@gmail.com>

Closes apache#16813 from erenavsarogullari/SPARK-19466.
@erenavsarogullari erenavsarogullari deleted the SPARK-17663 branch February 21, 2017 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants