-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17663] [CORE] SchedulableBuilder should handle invalid data access via scheduler.allocation.file #15237
Conversation
cc @kayousterhout @markhamstra @squito |
Jenkins, ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @erenavsarogullari sorry this has taken so long for me to look at. Better error handling is definitely nice to have, thanks for working on it.
I personally would prefer to fail fast, rather than go to default values, but I guess that it already uses default values for the scheduling mode so you can stick with that.
I have a handful of relatively minor comments, but nothing big.
val LOCAL = "local" | ||
val APP_NAME = "PoolSuite" | ||
val SCHEDULER_ALLOCATION_FILE_PROPERTY = "spark.scheduler.allocation.file" | ||
val SCHEDULER_POOL_PROPERTY = "spark.scheduler.pool" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of repeating this here, why not directly access the constant in FairSchedulableBuilder
? would be better if it were in an object
, but you have an instance where you reference this in any case so you dont' need to do a bigger refactoring.
|
||
val properties1 = new Properties() | ||
properties1.setProperty("spark.scheduler.pool", "1") | ||
properties1.setProperty(SCHEDULER_POOL_PROPERTY, "1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eg., here you'd do
properties1.setProperty(schedulableBuilder.FAIR_SCHEDULER_PROPERTIES, "1")
def createTaskSetManager(stageId: Int, numTasks: Int, taskScheduler: TaskSchedulerImpl) | ||
: TaskSetManager = { | ||
: TaskSetManager = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: old indentation was more correct. (Really this should have each arg on its own line but since you don't need to touch this, probably best to leave it alone in this pr.)
if (StringUtils.isNotBlank(data) && checkType(data.toInt)) { | ||
data.toInt | ||
} else { | ||
logWarning(s"$propertyName is blank or invalid: $data, using the default $propertyName: " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these warning messages need to provide more context -- the person who eventually notices these might not be the same as the one who created the xml file. Maybe change to something like:
"Error while loading scheduler allocation file at $path. $propertyName is blank ..."
logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format( | ||
poolName, schedulingMode, minShare, weight)) | ||
} | ||
} | ||
|
||
private def getSchedulingModeValue(data: String, defaultValue: SchedulingMode): SchedulingMode = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a little more complicated than is really necessary for the what you're doing here. couldn't you achieve the same thing by leaving the original code and changing the one line above the original to:
val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase
not exactly the same -- it also allows whitespace around the scheduling mode, but maybe a good thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, trim
and toUpperCase
functions help to cover for blank and invalid schedulingMode
cases. Also schedulingMode = none/NONE
case needs to be checked. This case is valid but unsupported.
} | ||
|
||
private def getIntValue(propertyName: String, data: String, defaultValue: Int): Int = { | ||
if (StringUtils.isNotBlank(data) && checkType(data.toInt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after getting rid of getSchedulingModeValue
, I'd also probably get rid of checkType
and just directly inline the try {...} catch { case NumberFormatException ...}
here.
verifyPool(rootPool, "pool_with_whitespace_scheduling_mode", 3, 2, FIFO) | ||
verifyPool(rootPool, "pool_with_empty_min_share", 0, 3, FAIR) | ||
verifyPool(rootPool, "pool_with_empty_weight", 2, 1, FAIR) | ||
verifyPool(rootPool, "pool_with_empty_scheduling_mode", 2, 2, FIFO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with my suggestion on trim
, you could also add a test case for a mode w/ surrounding whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed ;)
Test build #69506 has finished for PR 15237 at commit
|
Many thanks @squito for review. |
Hi @squito, |
Test build #70571 has finished for PR 15237 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating @erenavsarogullari , sorry for the delay in reviewing. I just have some minor updates requested.
verifyPool(rootPool, "pool_with_empty_scheduling_mode", 2, 2, FIFO) | ||
verifyPool(rootPool, "pool_with_min_share_surrounded_whitespace", 3, 2, FAIR) | ||
verifyPool(rootPool, "pool_with_weight_surrounded_whitespace", 1, 2, FAIR) | ||
verifyPool(rootPool, "pool_with_scheduling_mode_surrounded_whitespace", 3, 2, FAIR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you just combine all these "surrounded_whitespace" cases into one for all the properties?
data.toInt | ||
} catch { | ||
case e: NumberFormatException => | ||
logWarning(s"Error while loading scheduler allocation file at $schedulerAllocFile. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, this is kind of a nuisance, but I realize now that schedulerAllocFile
isn't necessarily the right file -- that might be empty, and there might be a fairscheduler.xml
sitting on the classpath. Can you get the right file name in both cases? (Better to leave it to not include any filename, than to
And can the warning include the poolname as well?
Finally, it would be nice to add this extra info to the mode warning too.
minShare = xmlMinShare.toInt | ||
} | ||
val xmlSchedulingMode = (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase | ||
val schedulingMode = getSchedulingModeValue(xmlSchedulingMode, DEFAULT_SCHEDULING_MODE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is minor, but if you're going to have a helper method here, can you do all the parsing inside it? include the one line above (poolNode \ SCHEDULING_MODE_PROPERTY).text.trim.toUpperCase
.
Same goes for getIntValue
Hi @squito, |
Test build #72004 has finished for PR 15237 at commit
|
Test build #72005 has finished for PR 15237 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two very minor style issues, otherwise looks good.
I guess you decided against including the xml file in the warning messages? To get it, you'd need to adjust FairSchedulerBuilder.buildPools
, where it creates the inputStream -- it would also set a variable for the input filename at the same time. Would be nice to have, but this is an improvement even without that addition.
logInfo("Created pool %s, schedulingMode: %s, minShare: %d, weight: %d".format( | ||
poolName, schedulingMode, minShare, weight)) | ||
} | ||
} | ||
|
||
private def getSchedulingModeValue(poolNode: Node, poolName: String, defaultValue: SchedulingMode) | ||
: SchedulingMode = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: multi-line method definitions should have each parameter on their own line, with the params double-indented:
private def getSchedulingModeValue(
poolNode: Node
poolName: String,
defaultValue: SchedulingMode): SchedulingMode = {
// body
}
} | ||
|
||
private def getIntValue(poolNode: Node, poolName: String, propertyName: String, defaultValue: Int) | ||
: Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here on multi-line method def
Hi @squito, Firstly, thanks again for the review. |
Test build #72169 has finished for PR 15237 at commit
|
@erenavsarogullari sorry to push back, but if you're willing to do the filename thing now, why not just tackle it in this same pr? seems pretty minor to separate into its own issue, and conceptually fits with this. If you just don't have time to do it in the next day or two, than sure, I can just merge this as is and we can wait on the filename. |
Hi @squito, Sorry, i am quite busy this week and happy to be merged this PR now if it is also ok for you. I plan to address fileName logging via separated jira. Also To inform user for the following cases can be useful by adding logging:
So thinking to create a single Jira for |
merged to master, thanks @erenavsarogullari |
Fair Scheduler Logging for the following cases can be useful for the user. 1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes. 2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace** ``` Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75) ``` 3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.** 4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.** Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#16813 from erenavsarogullari/SPARK-19466.
…ess via scheduler.allocation.file ## What changes were proposed in this pull request? If `spark.scheduler.allocation.file` has invalid `minShare` or/and `weight` values, these cause : - `NumberFormatException` due to `toInt` function - `SparkContext` can not be initialized. - It does not show meaningful error message to user. In a nutshell, this functionality can be more robust by selecting one of the following flows : **1-** Currently, if `schedulingMode` has an invalid value, a warning message is logged and default value is set as `FIFO`. Same pattern can be used for `minShare`(default: 0) and `weight`(default: 1) as well **2-** Meaningful error message can be shown to the user for all invalid cases. PR offers : - `schedulingMode` handles just empty values. It also needs to be supported for **whitespace**, **non-uppercase**(fair, FaIr etc...) or `SchedulingMode.NONE` cases by setting default value(`FIFO`) - `minShare` and `weight` handle just empty values. They also need to be supported for **non-integer** cases by setting default values. - Some refactoring of `PoolSuite`. **Code to Reproduce :** ``` val conf = new SparkConf().setAppName("spark-fairscheduler").setMaster("local") conf.set("spark.scheduler.mode", "FAIR") conf.set("spark.scheduler.allocation.file", "src/main/resources/fairscheduler-invalid-data.xml") val sc = new SparkContext(conf) ``` **fairscheduler-invalid-data.xml :** ``` <allocations> <pool name="production"> <schedulingMode>FIFO</schedulingMode> <weight>invalid_weight</weight> <minShare>2</minShare> </pool> </allocations> ``` **Stacktrace :** ``` Exception in thread "main" java.lang.NumberFormatException: For input string: "invalid_weight" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:127) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$org$apache$spark$scheduler$FairSchedulableBuilder$$buildFairSchedulerPool$1.apply(SchedulableBuilder.scala:102) ``` ## How was this patch tested? Added Unit Test Case. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#15237 from erenavsarogullari/SPARK-17663.
Fair Scheduler Logging for the following cases can be useful for the user. 1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes. 2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace** ``` Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75) ``` 3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.** 4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.** Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#16813 from erenavsarogullari/SPARK-19466.
What changes were proposed in this pull request?
If
spark.scheduler.allocation.file
has invalidminShare
or/andweight
values, these cause :NumberFormatException
due totoInt
functionSparkContext
can not be initialized.In a nutshell, this functionality can be more robust by selecting one of the following flows :
1- Currently, if
schedulingMode
has an invalid value, a warning message is logged and default value is set asFIFO
. Same pattern can be used forminShare
(default: 0) andweight
(default: 1) as well2- Meaningful error message can be shown to the user for all invalid cases.
PR offers :
schedulingMode
handles just empty values. It also needs to be supported for whitespace, non-uppercase(fair, FaIr etc...) orSchedulingMode.NONE
cases by setting default value(FIFO
)minShare
andweight
handle just empty values. They also need to be supported for non-integer cases by setting default values.PoolSuite
.Code to Reproduce :
fairscheduler-invalid-data.xml :
Stacktrace :
How was this patch tested?
Added Unit Test Case.