-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813
Conversation
Looks reasonable, but I'd prefer slightly different log messages. |
val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) | ||
if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE)) | ||
else { | ||
logWarning(s"No Fair Scheduler file found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fair Scheduler configuration file not found."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the consequence here? (for a user who sees this and wonders why it matters) I think this would be "Fair Scheduler configuration file not found (so jobs will be scheduled in FIFO order)"
} | ||
} | ||
|
||
is.foreach { i => buildFairSchedulerPool(i) } | ||
fileData.foreach { data => | ||
logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"Creating Fair Scheduler pools from ${data.fileName}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find this a little confusing in the case where the default filename was used -- since the user didn't actually specify that file. Can you log separately in the two cases? So to use Mark's suggested message, s"Creating Fair Scheduler pools from ${data.fileName}" in the first case, and in the second case, s"Creating Fair Scheduler pools from default file ($DEFAULT_SCHEDULER_FILE). That way you can also keep the old code organization, which was a bit clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the logging here!
val LOCAL = "local" | ||
val APP_NAME = "PoolSuite" | ||
val SCHEDULER_ALLOCATION_FILE_PROPERTY = "spark.scheduler.allocation.file" | ||
private val LOCAL = "local" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we don't make test variables private because they're not exposed publicly anyway, so I'd revert this.
val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) | ||
if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE)) | ||
else { | ||
logWarning(s"No Fair Scheduler file found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the consequence here? (for a user who sees this and wonders why it matters) I think this would be "Fair Scheduler configuration file not found (so jobs will be scheduled in FIFO order)"
} | ||
} | ||
|
||
is.foreach { i => buildFairSchedulerPool(i) } | ||
fileData.foreach { data => | ||
logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I find this a little confusing in the case where the default filename was used -- since the user didn't actually specify that file. Can you log separately in the two cases? So to use Mark's suggested message, s"Creating Fair Scheduler pools from ${data.fileName}" in the first case, and in the second case, s"Creating Fair Scheduler pools from default file ($DEFAULT_SCHEDULER_FILE). That way you can also keep the old code organization, which was a bit clearer.
} | ||
} catch { | ||
case NonFatal(t) => | ||
logError("Fair Scheduler can not be built.", t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Error while building the fair scheduler pools: ", t
@@ -201,6 +202,49 @@ class PoolSuite extends SparkFunSuite with LocalSparkContext { | |||
verifyPool(rootPool, "pool_with_surrounded_whitespace", 3, 2, FAIR) | |||
} | |||
|
|||
test("SPARK-19466: Fair Scheduler should build fair scheduler when " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you actually move these test changes to a separate PR (and no need to include the JIRA name since this isn't fixing a bug)? Awesome to add test coverage but it seems separate from the logging improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i will submit these test cases via separated PR ;)
Hi @kayousterhout and @markhamstra, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking better but can still be simplified a bit
private def getFileData(): Option[FileData] = { | ||
schedulerAllocFile.map { f => | ||
val file = new File(f) | ||
val fis = new FileInputStream(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you use new FileInputStream(f) like the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because schedulerAllocFile
returns file path and we need fileName
. I think fileName
provides clearer view instead of whole file path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. It seems like the whole path might be useful for making it super explicit which place the data is coming from though? e.g., because users often have generically defined files, like config.xml, which might exist in multiple locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and if they do have multiple files with the same name on different paths and are expecting one of them to be used when Spark is actually trying to use one on a different path, then having the full path in the log message will be crucial to short-circuiting the debugging confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it might be useful. I was thinking about the case: if user has two fairscheduler.xml
files (one of them is default file in classpath and second one is in another path). I will also address this with the other comments, thanks again ;)
} | ||
|
||
// finally create "default" pool | ||
buildDefaultPool() | ||
} | ||
|
||
private def getFileData(): Option[FileData] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you in-line this like it was before? It's not very complicated to easier to just be in-line in the buildPools method. Also, the old code structure simplified the Option creation.
Some(FileData(fis, file.getName)) | ||
}.getOrElse { | ||
val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) | ||
if(is != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after "if"
logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE") | ||
Some(FileData(is, DEFAULT_SCHEDULER_FILE)) | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the else should be on the same line as the closing bracket
DEFAULT_POOL_NAME, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT)) | ||
} | ||
} | ||
|
||
private def buildFairSchedulerPool(is: InputStream) { | ||
val xml = XML.load(is) | ||
private def buildFairSchedulerPool(fileData: FileData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to make this method unaware of the FileData class, and instead just have two input params: the InputStream and the Filename
|
||
val data = (poolNode \ propertyName).text.trim | ||
try { | ||
data.toInt | ||
} catch { | ||
case e: NumberFormatException => | ||
logWarning(s"Error while loading scheduler allocation file. " + | ||
logWarning(s"Error while loading Fair Scheduler configuration file: $fileName, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "Error while loading fair scheduler configuration from $filename: "
poolName, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT)) | ||
} | ||
} | ||
parentPool.addSchedulable(manager) | ||
logInfo("Added task set " + manager.name + " tasks to pool " + poolName) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eliminate added new line
@@ -69,60 +72,81 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf) | |||
val DEFAULT_WEIGHT = 1 | |||
|
|||
override def buildPools() { | |||
var is: Option[InputStream] = None | |||
var fileData: Option[FileData] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is so simple (and used only here, with my suggestion below) so I think it would be better to just make this Option[(InputStream, String)]
Hi @kayousterhout and @markhamstra, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great just a few last comments! Thanks!
fileData.foreach { case (is, fileName) => buildFairSchedulerPool(is, fileName) } | ||
} catch { | ||
case NonFatal(t) => | ||
logError("Error while building the fair scheduler pools: ", t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the filename (if it's defined) to this error message?
logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE") | ||
Some((is, DEFAULT_SCHEDULER_FILE)) | ||
} else { | ||
logWarning("Fair Scheduler configuration file not found so jobs will be scheduled " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add "($DEFAULT_SCHEDULER_FILE)" after "file "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case happens when spark.scheduler.allocation.file
property is not set and default scheduler file does not exist in classpath so warning message is not specific for only default one. I think current generic message looks more suitable, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok what about adding at the end "To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE, or set spark.scheduler.allocation.file to a file that contains the configuration." I think it's nice to give users as much info as possible about how to fix the problem, although I don't feel strongly so if you prefer the current message, that's fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, i totally agree for informing the user about how to fix the problem. Addressing by adding information.
@@ -140,14 +157,15 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf) | |||
private def getIntValue( | |||
poolNode: Node, | |||
poolName: String, | |||
propertyName: String, defaultValue: Int): Int = { | |||
propertyName: String, | |||
defaultValue: Int, fileName: String): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you fix the spacing here? (fileName should be on its own line)
@@ -83,15 +83,19 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf) | |||
Some((is, DEFAULT_SCHEDULER_FILE)) | |||
} else { | |||
logWarning("Fair Scheduler configuration file not found so jobs will be scheduled " + | |||
"in FIFO order") | |||
s"in FIFO order. To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE " + | |||
"or set spark.scheduler.allocation.file to a file that contains the configuration.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last tiny nit: can you make a varaibel name spark.scheduler.allocaiton.file and use it here + above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, addressed ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- I'll merge this once tests pass. Thanks for your work on this. Awesome to have more clear logging / errors for this!
Thanks @kayousterhout. |
Jenkins, this is OK to test |
Jenkins test this please |
I think problem still continues. We can get techops support to trigger jenkins. |
Jenkins, test this please |
Ok I asked Shane about this. |
sorted. @kayousterhout try to trigger another build. the daemon checks every 5 mins for status, so be patient. :) |
Jenkins, this is ok to test |
Thanks Shane! |
Test build #72673 has finished for PR 16813 at commit
|
failure is probably unrelated Jenkins, retest this please |
@shaneknapp the build failed to trigger for me again, I did it manually via spark-prs.appspot.com |
Test build #3572 has finished for PR 16813 at commit
|
ok to test |
add to whitelist |
@squito -- you also were also missing from the admin group. this should fix that. |
I merged this into master -- thanks for your work on this @erenavsarogullari, and for adding Imran and I to the Jenkins admin group @shaneknapp! |
Thanks everyone for all your support :) |
Test build #72714 has finished for PR 16813 at commit
|
Fair Scheduler Logging for the following cases can be useful for the user. 1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes. 2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace** ``` Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75) ``` 3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.** 4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.** Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#16813 from erenavsarogullari/SPARK-19466.
…for different build cases ## What changes were proposed in this pull request? Fair Scheduler can be built via one of the following options: - By setting a `spark.scheduler.allocation.file` property, - By setting `fairscheduler.xml` into classpath. These options are checked **in order** and fair-scheduler is built via first found option. If invalid path is found, `FileNotFoundException` will be expected. This PR aims unit test coverage of these use cases and a minor documentation change has been added for second option(`fairscheduler.xml` into classpath) to inform the users. Also, this PR was related with #16813 and has been created separately to keep patch content as isolated and to help the reviewers. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes #16992 from erenavsarogullari/SPARK-19662.
Fair Scheduler Logging for the following cases can be useful for the user.
If valid
spark.scheduler.allocation.file
property is set, user can be informed and aware which scheduler file is processed whenSparkContext
initializes.If invalid
spark.scheduler.allocation.file
property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as Fair Scheduler can not be built. + exception stacktraceIf
spark.scheduler.allocation.file
property is not set and default fair scheduler file (fairscheduler.xml) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.If spark.scheduler.allocation.file property is not set and default fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as No Fair Scheduler file found.
Also this PR is related with #15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values.
How was this patch tested?
Added new Unit Tests.