Skip to content

Commit

Permalink
[SPARK-6662][YARN] Allow variable substitution in spark.yarn.historyS…
Browse files Browse the repository at this point in the history
…erver.address

In Spark on YARN, explicit hostname and port number need to be set for "spark.yarn.historyServer.address" in SparkConf to make the HISTORY link. If the history server address is known and static, this is usually not a problem.

But in cloud, that is usually not true. Particularly in EMR, the history server always runs on the same node as with RM. So I could simply set it to ${yarn.resourcemanager.hostname}:18080 if variable substitution is allowed.

In fact, Hadoop configuration already implements variable substitution, so if this property is read via YarnConf, this can be easily achievable.

Author: Cheolsoo Park <cheolsoop@netflix.com>

Closes #5321 from piaozhexiu/SPARK-6662 and squashes the following commits:

e37de75 [Cheolsoo Park] Preserve the space between the Hadoop and Spark imports
79757c6 [Cheolsoo Park] Incorporate review comments
10e2917 [Cheolsoo Park] Add helper function that substitutes hadoop vars to SparkHadoopUtil
589b52c [Cheolsoo Park] Revert "Allow variable substitution for spark.yarn. properties"
ff9c35d [Cheolsoo Park] Allow variable substitution for spark.yarn. properties
  • Loading branch information
Cheolsoo Park authored and tgravescs committed Apr 13, 2015
1 parent c5b0b29 commit 6cc5b3e
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 5 deletions.
38 changes: 34 additions & 4 deletions core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,10 @@ import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
import org.apache.hadoop.fs.FileSystem.Statistics
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.mapreduce.{JobContext, TaskAttemptContext}
import org.apache.hadoop.security.Credentials
import org.apache.hadoop.security.UserGroupInformation
import org.apache.hadoop.mapreduce.JobContext
import org.apache.hadoop.security.{Credentials, UserGroupInformation}

import org.apache.spark.{Logging, SparkContext, SparkConf, SparkException}
import org.apache.spark.{Logging, SparkConf, SparkException}
import org.apache.spark.annotation.DeveloperApi
import org.apache.spark.util.Utils

Expand Down Expand Up @@ -201,6 +200,37 @@ class SparkHadoopUtil extends Logging {
val baseStatus = fs.getFileStatus(basePath)
if (baseStatus.isDir) recurse(basePath) else Array(baseStatus)
}

private val HADOOP_CONF_PATTERN = "(\\$\\{hadoopconf-[^\\}\\$\\s]+\\})".r.unanchored

/**
* Substitute variables by looking them up in Hadoop configs. Only variables that match the
* ${hadoopconf- .. } pattern are substituted.
*/
def substituteHadoopVariables(text: String, hadoopConf: Configuration): String = {
text match {
case HADOOP_CONF_PATTERN(matched) => {
logDebug(text + " matched " + HADOOP_CONF_PATTERN)
val key = matched.substring(13, matched.length() - 1) // remove ${hadoopconf- .. }
val eval = Option[String](hadoopConf.get(key))
.map { value =>
logDebug("Substituted " + matched + " with " + value)
text.replace(matched, value)
}
if (eval.isEmpty) {
// The variable was not found in Hadoop configs, so return text as is.
text
} else {
// Continue to substitute more variables.
substituteHadoopVariables(eval.get, hadoopConf)
}
}
case _ => {
logDebug(text + " didn't match " + HADOOP_CONF_PATTERN)
text
}
}
}
}

object SparkHadoopUtil {
Expand Down
3 changes: 2 additions & 1 deletion docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,8 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
<td><code>spark.yarn.historyServer.address</code></td>
<td>(none)</td>
<td>
The address of the Spark history server (i.e. host.com:18080). The address should not contain a scheme (http://). Defaults to not being set since the history server is an optional service. This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.
The address of the Spark history server (i.e. host.com:18080). The address should not contain a scheme (http://). Defaults to not being set since the history server is an optional service. This address is given to the YARN ResourceManager when the Spark application finishes to link the application from the ResourceManager UI to the Spark history server UI.
For this property, YARN properties can be used as variables, and these are substituted by Spark at runtime. For eg, if the Spark history server runs on the same node as the YARN ResourceManager, it can be set to `${hadoopconf-yarn.resourcemanager.hostname}:18080`.
</td>
</tr>
<tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ private[spark] class ApplicationMaster(
val appId = client.getAttemptId().getApplicationId().toString()
val historyAddress =
sparkConf.getOption("spark.yarn.historyServer.address")
.map { text => SparkHadoopUtil.get.substituteHadoopVariables(text, yarnConf) }
.map { address => s"${address}${HistoryServer.UI_PATH_PREFIX}/${appId}" }
.getOrElse("")

Expand Down

0 comments on commit 6cc5b3e

Please sign in to comment.