Skip to content

[SPARK-3900][YARN] ApplicationMaster's shutdown hook fails and IllegalStateException is thrown.#2924

Closed
sarutak wants to merge 8 commits intoapache:masterfrom
sarutak:SPARK-3900-2
Closed

[SPARK-3900][YARN] ApplicationMaster's shutdown hook fails and IllegalStateException is thrown.#2924
sarutak wants to merge 8 commits intoapache:masterfrom
sarutak:SPARK-3900-2

Conversation

@sarutak
Copy link
Member

@sarutak sarutak commented Oct 24, 2014

ApplicationMaster registers a shutdown hook and it calls ApplicationMaster#cleanupStagingDir.

cleanupStagingDir invokes FileSystem.get(yarnConf) and it invokes FileSystem.getInternal. FileSystem.getInternal also registers shutdown hook.
In FileSystem of hadoop 0.23, the shutdown hook registration does not consider whether shutdown is in progress or not (In 2.2, it's considered).

// 0.23 
if (map.isEmpty() ) {
  ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
}

// 2.2
if (map.isEmpty()
            && !ShutdownHookManager.get().isShutdownInProgress()) {
   ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
}

Thus, in 0.23, another shutdown hook can be registered when ApplicationMaster's shutdown hook run.

This issue cause IllegalStateException as follows.

java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook
        at org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:152)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2306)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
        at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:307)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:118)
        at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22132 has finished for PR 2924 at commit 9112817.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

looks good, thanks @sarutak !

@asfgit asfgit closed this in d2987e8 Oct 24, 2014
@sarutak sarutak deleted the SPARK-3900-2 branch April 11, 2015 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants