Skip to content

Commit

Permalink
[SPARK-11105] [YARN] Distribute log4j.properties to executors
Browse files Browse the repository at this point in the history
Currently log4j.properties file is not uploaded to executor's which is leading them to use the default values. This fix will make sure that file is always uploaded to distributed cache so that executor will use the latest settings.

If user specifies log configurations through --files then executors will be picking configs from --files instead of $SPARK_CONF_DIR/log4j.properties

Author: vundela <vsr@cloudera.com>
Author: Srinivasa Reddy Vundela <vsr@cloudera.com>

Closes #9118 from vundela/master.
  • Loading branch information
vundela authored and Marcelo Vanzin committed Oct 20, 2015
1 parent e18b571 commit 2f6dd63
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
5 changes: 4 additions & 1 deletion docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,17 @@ all environment variables used for launching each container. This process is use
classpath problems in particular. (Note that enabling this requires admin privileges on cluster
settings and a restart of all node managers. Thus, this is not applicable to hosted clusters).

To use a custom log4j configuration for the application master or executors, there are two options:
To use a custom log4j configuration for the application master or executors, here are the options:

- upload a custom `log4j.properties` using `spark-submit`, by adding it to the `--files` list of files
to be uploaded with the application.
- add `-Dlog4j.configuration=<location of configuration file>` to `spark.driver.extraJavaOptions`
(for the driver) or `spark.executor.extraJavaOptions` (for executors). Note that if using a file,
the `file:` protocol should be explicitly provided, and the file needs to exist locally on all
the nodes.
- update the `$SPARK_CONF_DIR/log4j.properties` file and it will be automatically uploaded along
with the other configurations. Note that other 2 options has higher priority than this option if
multiple options are specified.

Note that for the first option, both executors and the application master will share the same
log4j configuration, which may cause issues when they run on the same node (e.g. trying to write
Expand Down
13 changes: 13 additions & 0 deletions yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,19 @@ private[spark] class Client(
*/
private def createConfArchive(): File = {
val hadoopConfFiles = new HashMap[String, File]()

// Uploading $SPARK_CONF_DIR/log4j.properties file to the distributed cache to make sure that
// the executors will use the latest configurations instead of the default values. This is
// required when user changes log4j.properties directly to set the log configurations. If
// configuration file is provided through --files then executors will be taking configurations
// from --files instead of $SPARK_CONF_DIR/log4j.properties.
val log4jFileName = "log4j.properties"
Option(Utils.getContextOrSparkClassLoader.getResource(log4jFileName)).foreach { url =>
if (url.getProtocol == "file") {
hadoopConfFiles(log4jFileName) = new File(url.getPath)
}
}

Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR").foreach { envKey =>
sys.env.get(envKey).foreach { path =>
val dir = new File(path)
Expand Down

0 comments on commit 2f6dd63

Please sign in to comment.