Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813

Closed
wants to merge 5 commits into from
Closed

[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813

wants to merge 5 commits into from

Conversation

erenavsarogullari
Copy link
Member

Fair Scheduler Logging for the following cases can be useful for the user.

  1. If valid spark.scheduler.allocation.file property is set, user can be informed and aware which scheduler file is processed when SparkContext initializes.

  2. If invalid spark.scheduler.allocation.file property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as Fair Scheduler can not be built. + exception stacktrace

Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75)
  1. If spark.scheduler.allocation.file property is not set and default fair scheduler file (fairscheduler.xml) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.

  2. If spark.scheduler.allocation.file property is not set and default fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as No Fair Scheduler file found.

Also this PR is related with #15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values.

How was this patch tested?

Added new Unit Tests.

@erenavsarogullari
Copy link
Member Author

cc @squito @kayousterhout

@markhamstra
Copy link
Contributor

Looks reasonable, but I'd prefer slightly different log messages.

val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE))
else {
logWarning(s"No Fair Scheduler file found.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fair Scheduler configuration file not found."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the consequence here? (for a user who sees this and wonders why it matters) I think this would be "Fair Scheduler configuration file not found (so jobs will be scheduled in FIFO order)"

}
}

is.foreach { i => buildFairSchedulerPool(i) }
fileData.foreach { data =>
logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s"Creating Fair Scheduler pools from ${data.fileName}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find this a little confusing in the case where the default filename was used -- since the user didn't actually specify that file. Can you log separately in the two cases? So to use Mark's suggested message, s"Creating Fair Scheduler pools from ${data.fileName}" in the first case, and in the second case, s"Creating Fair Scheduler pools from default file ($DEFAULT_SCHEDULER_FILE). That way you can also keep the old code organization, which was a bit clearer.

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the logging here!

val LOCAL = "local"
val APP_NAME = "PoolSuite"
val SCHEDULER_ALLOCATION_FILE_PROPERTY = "spark.scheduler.allocation.file"
private val LOCAL = "local"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we don't make test variables private because they're not exposed publicly anyway, so I'd revert this.

val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE))
else {
logWarning(s"No Fair Scheduler file found.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the consequence here? (for a user who sees this and wonders why it matters) I think this would be "Fair Scheduler configuration file not found (so jobs will be scheduled in FIFO order)"

}
}

is.foreach { i => buildFairSchedulerPool(i) }
fileData.foreach { data =>
logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find this a little confusing in the case where the default filename was used -- since the user didn't actually specify that file. Can you log separately in the two cases? So to use Mark's suggested message, s"Creating Fair Scheduler pools from ${data.fileName}" in the first case, and in the second case, s"Creating Fair Scheduler pools from default file ($DEFAULT_SCHEDULER_FILE). That way you can also keep the old code organization, which was a bit clearer.

}
} catch {
case NonFatal(t) =>
logError("Fair Scheduler can not be built.", t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Error while building the fair scheduler pools: ", t

@@ -201,6 +202,49 @@ class PoolSuite extends SparkFunSuite with LocalSparkContext {
verifyPool(rootPool, "pool_with_surrounded_whitespace", 3, 2, FAIR)
}

test("SPARK-19466: Fair Scheduler should build fair scheduler when " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you actually move these test changes to a separate PR (and no need to include the JIRA name since this isn't fixing a bug)? Awesome to add test coverage but it seems separate from the logging improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i will submit these test cases via separated PR ;)

@erenavsarogullari
Copy link
Member Author

Hi @kayousterhout and @markhamstra,
Many thanks for the review and comments. All of them have just been addressed via new patch.
It is ready for re-review.

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better but can still be simplified a bit

private def getFileData(): Option[FileData] = {
schedulerAllocFile.map { f =>
val file = new File(f)
val fis = new FileInputStream(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you use new FileInputStream(f) like the old code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because schedulerAllocFile returns file path and we need fileName. I think fileName provides clearer view instead of whole file path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. It seems like the whole path might be useful for making it super explicit which place the data is coming from though? e.g., because users often have generically defined files, like config.xml, which might exist in multiple locations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and if they do have multiple files with the same name on different paths and are expecting one of them to be used when Spark is actually trying to use one on a different path, then having the full path in the log message will be crucial to short-circuiting the debugging confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it might be useful. I was thinking about the case: if user has two fairscheduler.xml files (one of them is default file in classpath and second one is in another path). I will also address this with the other comments, thanks again ;)

}

// finally create "default" pool
buildDefaultPool()
}

private def getFileData(): Option[FileData] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you in-line this like it was before? It's not very complicated to easier to just be in-line in the buildPools method. Also, the old code structure simplified the Option creation.

Some(FileData(fis, file.getName))
}.getOrElse {
val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
if(is != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after "if"

logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE")
Some(FileData(is, DEFAULT_SCHEDULER_FILE))
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the else should be on the same line as the closing bracket

DEFAULT_POOL_NAME, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
}
}

private def buildFairSchedulerPool(is: InputStream) {
val xml = XML.load(is)
private def buildFairSchedulerPool(fileData: FileData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to make this method unaware of the FileData class, and instead just have two input params: the InputStream and the Filename


val data = (poolNode \ propertyName).text.trim
try {
data.toInt
} catch {
case e: NumberFormatException =>
logWarning(s"Error while loading scheduler allocation file. " +
logWarning(s"Error while loading Fair Scheduler configuration file: $fileName, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "Error while loading fair scheduler configuration from $filename: "

poolName, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
}
}
parentPool.addSchedulable(manager)
logInfo("Added task set " + manager.name + " tasks to pool " + poolName)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eliminate added new line

@@ -69,60 +72,81 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf)
val DEFAULT_WEIGHT = 1

override def buildPools() {
var is: Option[InputStream] = None
var fileData: Option[FileData] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is so simple (and used only here, with my suggestion below) so I think it would be better to just make this Option[(InputStream, String)]

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Feb 8, 2017

Hi @kayousterhout and @markhamstra,
All comments have been addressed via latest patch

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great just a few last comments! Thanks!

fileData.foreach { case (is, fileName) => buildFairSchedulerPool(is, fileName) }
} catch {
case NonFatal(t) =>
logError("Error while building the fair scheduler pools: ", t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the filename (if it's defined) to this error message?

logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE")
Some((is, DEFAULT_SCHEDULER_FILE))
} else {
logWarning("Fair Scheduler configuration file not found so jobs will be scheduled " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add "($DEFAULT_SCHEDULER_FILE)" after "file "?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case happens when spark.scheduler.allocation.file property is not set and default scheduler file does not exist in classpath so warning message is not specific for only default one. I think current generic message looks more suitable, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok what about adding at the end "To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE, or set spark.scheduler.allocation.file to a file that contains the configuration." I think it's nice to give users as much info as possible about how to fix the problem, although I don't feel strongly so if you prefer the current message, that's fine too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, i totally agree for informing the user about how to fix the problem. Addressing by adding information.

@@ -140,14 +157,15 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf)
private def getIntValue(
poolNode: Node,
poolName: String,
propertyName: String, defaultValue: Int): Int = {
propertyName: String,
defaultValue: Int, fileName: String): Int = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you fix the spacing here? (fileName should be on its own line)

@@ -83,15 +83,19 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf)
Some((is, DEFAULT_SCHEDULER_FILE))
} else {
logWarning("Fair Scheduler configuration file not found so jobs will be scheduled " +
"in FIFO order")
s"in FIFO order. To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE " +
"or set spark.scheduler.allocation.file to a file that contains the configuration.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last tiny nit: can you make a varaibel name spark.scheduler.allocaiton.file and use it here + above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, addressed ;)

Copy link
Contributor

@kayousterhout kayousterhout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- I'll merge this once tests pass. Thanks for your work on this. Awesome to have more clear logging / errors for this!

@erenavsarogullari
Copy link
Member Author

Thanks @kayousterhout.
I think jenkins needs to be triggered.

@kayousterhout
Copy link
Contributor

Jenkins, this is OK to test

@kayousterhout
Copy link
Contributor

Jenkins test this please

@erenavsarogullari
Copy link
Member Author

I think problem still continues. We can get techops support to trigger jenkins.

@kayousterhout
Copy link
Contributor

Jenkins, test this please

@kayousterhout
Copy link
Contributor

Ok I asked Shane about this.

@shaneknapp
Copy link
Contributor

sorted. @kayousterhout try to trigger another build. the daemon checks every 5 mins for status, so be patient. :)

@kayousterhout
Copy link
Contributor

Jenkins, this is ok to test

@kayousterhout
Copy link
Contributor

Thanks Shane!

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72673 has finished for PR 16813 at commit d508d92.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Feb 10, 2017

failure is probably unrelated

Jenkins, retest this please

@squito
Copy link
Contributor

squito commented Feb 10, 2017

@shaneknapp the build failed to trigger for me again, I did it manually via spark-prs.appspot.com

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #3572 has finished for PR 16813 at commit d508d92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

ok to test

@shaneknapp
Copy link
Contributor

add to whitelist

@shaneknapp
Copy link
Contributor

@squito -- you also were also missing from the admin group. this should fix that.

@asfgit asfgit closed this in dadff5f Feb 10, 2017
@kayousterhout
Copy link
Contributor

I merged this into master -- thanks for your work on this @erenavsarogullari, and for adding Imran and I to the Jenkins admin group @shaneknapp!

@erenavsarogullari
Copy link
Member Author

Thanks everyone for all your support :)

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72714 has finished for PR 16813 at commit d508d92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@erenavsarogullari erenavsarogullari deleted the SPARK-19466 branch February 11, 2017 21:05
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
Fair Scheduler Logging for the following cases can be useful for the user.

1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes.

2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace**
```
Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75)
```
3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.**

4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.**

Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values.

## How was this patch tested?
Added new Unit Tests.

Author: erenavsarogullari <erenavsarogullari@gmail.com>

Closes apache#16813 from erenavsarogullari/SPARK-19466.
asfgit pushed a commit that referenced this pull request Aug 28, 2017
…for different build cases

## What changes were proposed in this pull request?
Fair Scheduler can be built via one of the following options:
- By setting a `spark.scheduler.allocation.file` property,
- By setting `fairscheduler.xml` into classpath.

These options are checked **in order** and fair-scheduler is built via first found option. If invalid path is found, `FileNotFoundException` will be expected.

This PR aims unit test coverage of these use cases and a minor documentation change has been added for second option(`fairscheduler.xml` into classpath) to inform the users.

Also, this PR was related with #16813 and has been created separately to keep patch content as isolated and to help the reviewers.

## How was this patch tested?
Added new Unit Tests.

Author: erenavsarogullari <erenavsarogullari@gmail.com>

Closes #16992 from erenavsarogullari/SPARK-19662.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants