[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813

erenavsarogullari · 2017-02-05T21:38:32Z

Fair Scheduler Logging for the following cases can be useful for the user.

If valid spark.scheduler.allocation.file property is set, user can be informed and aware which scheduler file is processed when SparkContext initializes.
If invalid spark.scheduler.allocation.file property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as Fair Scheduler can not be built. + exception stacktrace

Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76)
	at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75)

If spark.scheduler.allocation.file property is not set and default fair scheduler file (fairscheduler.xml) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.
If spark.scheduler.allocation.file property is not set and default fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as No Fair Scheduler file found.

Also this PR is related with #15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values.

How was this patch tested?

Added new Unit Tests.

erenavsarogullari · 2017-02-05T21:42:58Z

cc @squito @kayousterhout

markhamstra · 2017-02-06T21:21:38Z

Looks reasonable, but I'd prefer slightly different log messages.

markhamstra · 2017-02-06T21:18:04Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+        val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+        if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE))
+        else {
+          logWarning(s"No Fair Scheduler file found.")


"Fair Scheduler configuration file not found."

Can you add the consequence here? (for a user who sees this and wonders why it matters) I think this would be "Fair Scheduler configuration file not found (so jobs will be scheduled in FIFO order)"

markhamstra · 2017-02-06T21:20:16Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

        }
      }

-      is.foreach { i => buildFairSchedulerPool(i) }
+      fileData.foreach { data =>
+        logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.")


s"Creating Fair Scheduler pools from ${data.fileName}"

nit: I find this a little confusing in the case where the default filename was used -- since the user didn't actually specify that file. Can you log separately in the two cases? So to use Mark's suggested message, s"Creating Fair Scheduler pools from ${data.fileName}" in the first case, and in the second case, s"Creating Fair Scheduler pools from default file ($DEFAULT_SCHEDULER_FILE). That way you can also keep the old code organization, which was a bit clearer.

kayousterhout

Thanks for improving the logging here!

kayousterhout · 2017-02-06T23:34:37Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

-  val LOCAL = "local"
-  val APP_NAME = "PoolSuite"
-  val SCHEDULER_ALLOCATION_FILE_PROPERTY = "spark.scheduler.allocation.file"
+  private val LOCAL = "local"


In general we don't make test variables private because they're not exposed publicly anyway, so I'd revert this.

kayousterhout · 2017-02-07T00:01:21Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+        val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+        if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE))
+        else {
+          logWarning(s"No Fair Scheduler file found.")


Can you add the consequence here? (for a user who sees this and wonders why it matters) I think this would be "Fair Scheduler configuration file not found (so jobs will be scheduled in FIFO order)"

kayousterhout · 2017-02-07T00:08:51Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

        }
      }

-      is.foreach { i => buildFairSchedulerPool(i) }
+      fileData.foreach { data =>
+        logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.")


nit: I find this a little confusing in the case where the default filename was used -- since the user didn't actually specify that file. Can you log separately in the two cases? So to use Mark's suggested message, s"Creating Fair Scheduler pools from ${data.fileName}" in the first case, and in the second case, s"Creating Fair Scheduler pools from default file ($DEFAULT_SCHEDULER_FILE). That way you can also keep the old code organization, which was a bit clearer.

kayousterhout · 2017-02-07T00:10:47Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+      }
+    } catch {
+      case NonFatal(t) =>
+        logError("Fair Scheduler can not be built.", t)


How about "Error while building the fair scheduler pools: ", t

kayousterhout · 2017-02-07T00:12:11Z

core/src/test/scala/org/apache/spark/scheduler/PoolSuite.scala

@@ -201,6 +202,49 @@ class PoolSuite extends SparkFunSuite with LocalSparkContext {
    verifyPool(rootPool, "pool_with_surrounded_whitespace", 3, 2, FAIR)
  }

+  test("SPARK-19466: Fair Scheduler should build fair scheduler when " +


Could you actually move these test changes to a separate PR (and no need to include the JIRA name since this isn't fixing a bug)? Awesome to add test coverage but it seems separate from the logging improvement.

Sure, i will submit these test cases via separated PR ;)

erenavsarogullari · 2017-02-07T21:45:02Z

Hi @kayousterhout and @markhamstra,
Many thanks for the review and comments. All of them have just been addressed via new patch.
It is ready for re-review.

kayousterhout

This is looking better but can still be simplified a bit

kayousterhout · 2017-02-07T21:59:28Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+  private def getFileData(): Option[FileData] = {
+    schedulerAllocFile.map { f =>
+      val file = new File(f)
+      val fis = new FileInputStream(file)


Why can't you use new FileInputStream(f) like the old code?

Because schedulerAllocFile returns file path and we need fileName. I think fileName provides clearer view instead of whole file path.

Ah I see. It seems like the whole path might be useful for making it super explicit which place the data is coming from though? e.g., because users often have generically defined files, like config.xml, which might exist in multiple locations.

Yes, and if they do have multiple files with the same name on different paths and are expecting one of them to be used when Spark is actually trying to use one on a different path, then having the full path in the log message will be crucial to short-circuiting the debugging confusion.

Yep, it might be useful. I was thinking about the case: if user has two fairscheduler.xml files (one of them is default file in classpath and second one is in another path). I will also address this with the other comments, thanks again ;)

kayousterhout · 2017-02-07T22:08:41Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

    }

    // finally create "default" pool
    buildDefaultPool()
  }

+  private def getFileData(): Option[FileData] = {


Can you in-line this like it was before? It's not very complicated to easier to just be in-line in the buildPools method. Also, the old code structure simplified the Option creation.

kayousterhout · 2017-02-07T22:12:01Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+      Some(FileData(fis, file.getName))
+    }.getOrElse {
+      val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+      if(is != null) {


nit: space after "if"

kayousterhout · 2017-02-07T22:13:19Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+        logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE")
+        Some(FileData(is, DEFAULT_SCHEDULER_FILE))
+      }
+      else {


the else should be on the same line as the closing bracket

kayousterhout · 2017-02-07T22:16:05Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

        DEFAULT_POOL_NAME, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
    }
  }

-  private def buildFairSchedulerPool(is: InputStream) {
-    val xml = XML.load(is)
+  private def buildFairSchedulerPool(fileData: FileData) {


it would be better to make this method unaware of the FileData class, and instead just have two input params: the InputStream and the Filename

kayousterhout · 2017-02-07T22:18:07Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala


    val data = (poolNode \ propertyName).text.trim
    try {
      data.toInt
    } catch {
      case e: NumberFormatException =>
-        logWarning(s"Error while loading scheduler allocation file. " +
+        logWarning(s"Error while loading Fair Scheduler configuration file: $fileName, " +


how about "Error while loading fair scheduler configuration from $filename: "

kayousterhout · 2017-02-07T22:18:17Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

          poolName, DEFAULT_SCHEDULING_MODE, DEFAULT_MINIMUM_SHARE, DEFAULT_WEIGHT))
      }
    }
    parentPool.addSchedulable(manager)
    logInfo("Added task set " + manager.name + " tasks to pool " + poolName)
  }
+


eliminate added new line

kayousterhout · 2017-02-07T22:22:11Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

@@ -69,60 +72,81 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf)
  val DEFAULT_WEIGHT = 1

  override def buildPools() {
-    var is: Option[InputStream] = None
+    var fileData: Option[FileData] = None


this class is so simple (and used only here, with my suggestion below) so I think it would be better to just make this Option[(InputStream, String)]

erenavsarogullari · 2017-02-08T22:54:50Z

Hi @kayousterhout and @markhamstra,
All comments have been addressed via latest patch

kayousterhout

This is looking great just a few last comments! Thanks!

kayousterhout · 2017-02-08T23:02:47Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+      fileData.foreach { case (is, fileName) => buildFairSchedulerPool(is, fileName) }
+    } catch {
+      case NonFatal(t) =>
+        logError("Error while building the fair scheduler pools: ", t)


can you add the filename (if it's defined) to this error message?

kayousterhout · 2017-02-08T23:04:22Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

+          logInfo(s"Creating Fair Scheduler pools from default file: $DEFAULT_SCHEDULER_FILE")
+          Some((is, DEFAULT_SCHEDULER_FILE))
+        } else {
+          logWarning("Fair Scheduler configuration file not found so jobs will be scheduled " +


can you add "($DEFAULT_SCHEDULER_FILE)" after "file "?

This case happens when spark.scheduler.allocation.file property is not set and default scheduler file does not exist in classpath so warning message is not specific for only default one. I think current generic message looks more suitable, WDYT?

Hm ok what about adding at the end "To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE, or set spark.scheduler.allocation.file to a file that contains the configuration." I think it's nice to give users as much info as possible about how to fix the problem, although I don't feel strongly so if you prefer the current message, that's fine too.

Exactly, i totally agree for informing the user about how to fix the problem. Addressing by adding information.

kayousterhout · 2017-02-08T23:08:19Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

@@ -140,14 +157,15 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf)
  private def getIntValue(
      poolNode: Node,
      poolName: String,
-      propertyName: String, defaultValue: Int): Int = {
+      propertyName: String,
+      defaultValue: Int, fileName: String): Int = {


nit: can you fix the spacing here? (fileName should be on its own line)

kayousterhout · 2017-02-09T21:46:48Z

core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala

@@ -83,15 +83,19 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf)
          Some((is, DEFAULT_SCHEDULER_FILE))
        } else {
          logWarning("Fair Scheduler configuration file not found so jobs will be scheduled " +
-            "in FIFO order")
+            s"in FIFO order. To use fair scheduling, configure pools in $DEFAULT_SCHEDULER_FILE " +
+            "or set spark.scheduler.allocation.file to a file that contains the configuration.")


one last tiny nit: can you make a varaibel name spark.scheduler.allocaiton.file and use it here + above?

Sure, addressed ;)

kayousterhout

LGTM -- I'll merge this once tests pass. Thanks for your work on this. Awesome to have more clear logging / errors for this!

erenavsarogullari · 2017-02-09T22:34:38Z

Thanks @kayousterhout.
I think jenkins needs to be triggered.

kayousterhout · 2017-02-09T22:36:51Z

Jenkins, this is OK to test

kayousterhout · 2017-02-09T23:13:18Z

Jenkins test this please

erenavsarogullari · 2017-02-10T00:09:37Z

I think problem still continues. We can get techops support to trigger jenkins.

kayousterhout · 2017-02-10T01:24:18Z

Jenkins, test this please

kayousterhout · 2017-02-10T01:29:07Z

Ok I asked Shane about this.

shaneknapp · 2017-02-10T01:32:21Z

sorted. @kayousterhout try to trigger another build. the daemon checks every 5 mins for status, so be patient. :)

kayousterhout · 2017-02-10T01:34:17Z

Jenkins, this is ok to test

kayousterhout · 2017-02-10T01:35:15Z

Thanks Shane!

SparkQA · 2017-02-10T03:40:44Z

Test build #72673 has finished for PR 16813 at commit d508d92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-02-10T04:40:43Z

failure is probably unrelated

Jenkins, retest this please

squito · 2017-02-10T04:55:39Z

@shaneknapp the build failed to trigger for me again, I did it manually via spark-prs.appspot.com

SparkQA · 2017-02-10T07:30:21Z

Test build #3572 has finished for PR 16813 at commit d508d92.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shaneknapp · 2017-02-10T16:26:50Z

ok to test

shaneknapp · 2017-02-10T16:32:32Z

add to whitelist

shaneknapp · 2017-02-10T16:33:07Z

@squito -- you also were also missing from the admin group. this should fix that.

kayousterhout · 2017-02-10T16:36:49Z

I merged this into master -- thanks for your work on this @erenavsarogullari, and for adding Imran and I to the Jenkins admin group @shaneknapp!

erenavsarogullari · 2017-02-10T17:41:41Z

Thanks everyone for all your support :)

SparkQA · 2017-02-10T19:02:17Z

Test build #72714 has finished for PR 16813 at commit d508d92.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Fair Scheduler Logging for the following cases can be useful for the user. 1. If **valid** `spark.scheduler.allocation.file` property is set, user can be informed and aware which scheduler file is processed when `SparkContext` initializes. 2. If **invalid** `spark.scheduler.allocation.file` property is set, currently, the following stacktrace is shown to user. In addition to this, more meaningful message can be shown to user by emphasizing the problem at building level of fair scheduler. Also other potential issues can be covered at this level as **Fair Scheduler can not be built. + exception stacktrace** ``` Exception in thread "main" java.io.FileNotFoundException: INVALID_FILE (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:76) at org.apache.spark.scheduler.FairSchedulableBuilder$$anonfun$buildPools$1.apply(SchedulableBuilder.scala:75) ``` 3. If `spark.scheduler.allocation.file` property is not set and **default** fair scheduler file (**fairscheduler.xml**) is found in classpath, it will be loaded but currently, user is not informed for using default file so logging can be useful as **Fair Scheduler file: fairscheduler.xml is found successfully and will be parsed.** 4. If **spark.scheduler.allocation.file** property is not set and **default** fair scheduler file does not exist in classpath, currently, user is not informed so logging can be useful as **No Fair Scheduler file found.** Also this PR is related with apache#15237 to emphasize fileName in warning logs when fair scheduler file has invalid minShare, weight or schedulingMode values. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes apache#16813 from erenavsarogullari/SPARK-19466.

…for different build cases ## What changes were proposed in this pull request? Fair Scheduler can be built via one of the following options: - By setting a `spark.scheduler.allocation.file` property, - By setting `fairscheduler.xml` into classpath. These options are checked **in order** and fair-scheduler is built via first found option. If invalid path is found, `FileNotFoundException` will be expected. This PR aims unit test coverage of these use cases and a minor documentation change has been added for second option(`fairscheduler.xml` into classpath) to inform the users. Also, this PR was related with #16813 and has been created separately to keep patch content as isolated and to help the reviewers. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes #16992 from erenavsarogullari/SPARK-19662.

markhamstra suggested changes Feb 6, 2017

View reviewed changes

kayousterhout reviewed Feb 7, 2017

View reviewed changes

erenavsarogullari added 3 commits February 8, 2017 22:23

Improve Fair Scheduler Logging

2ed8cad

Review comments are addressed.

64a88af

Latest review comments are addressed.

790097e

erenavsarogullari mentioned this pull request Feb 8, 2017

[SPARK-18066] [CORE] [TESTS] Add Pool usage policies test coverage for FIFO & FAIR Schedulers #15604

Closed

kayousterhout reviewed Feb 8, 2017

View reviewed changes

Review comments are addressed.

3da98c7

kayousterhout reviewed Feb 9, 2017

View reviewed changes

Review comments are addressed.

d508d92

kayousterhout approved these changes Feb 9, 2017

View reviewed changes

asfgit closed this in dadff5f Feb 10, 2017

erenavsarogullari deleted the SPARK-19466 branch February 11, 2017 21:05

erenavsarogullari mentioned this pull request Feb 19, 2017

[SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit Test coverage for different build cases #16992

Closed

kayousterhout mentioned this pull request Mar 24, 2017

[SPARK-17759] [CORE] Avoid adding duplicate schedulables #15326

Closed

[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813

[SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Logging #16813

Conversation

erenavsarogullari commented Feb 5, 2017

How was this patch tested?

erenavsarogullari commented Feb 5, 2017

markhamstra commented Feb 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kayousterhout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erenavsarogullari commented Feb 7, 2017

kayousterhout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erenavsarogullari commented Feb 8, 2017 • edited Loading

kayousterhout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kayousterhout left a comment

Choose a reason for hiding this comment

erenavsarogullari commented Feb 9, 2017

kayousterhout commented Feb 9, 2017

kayousterhout commented Feb 9, 2017

erenavsarogullari commented Feb 10, 2017

kayousterhout commented Feb 10, 2017

kayousterhout commented Feb 10, 2017

shaneknapp commented Feb 10, 2017

kayousterhout commented Feb 10, 2017

kayousterhout commented Feb 10, 2017

SparkQA commented Feb 10, 2017

squito commented Feb 10, 2017

squito commented Feb 10, 2017

SparkQA commented Feb 10, 2017

shaneknapp commented Feb 10, 2017

shaneknapp commented Feb 10, 2017

shaneknapp commented Feb 10, 2017

kayousterhout commented Feb 10, 2017

erenavsarogullari commented Feb 10, 2017

SparkQA commented Feb 10, 2017

erenavsarogullari commented Feb 8, 2017 •

edited

Loading