diff --git a/docs/configuration.md b/docs/configuration.md
index a7a1477b35628..a8fddbc084568 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -143,6 +143,7 @@ of the most common options to set are:
The name of your application. This will appear in the UI and in log data.
|
+ 0.9.0 |
spark.driver.cores |
@@ -206,6 +207,7 @@ of the most common options to set are:
spark.driver.resource.{resourceName}.discoveryScript
for the driver to find the resource on startup.
+ 3.0.0 |
spark.driver.resource.{resourceName}.discoveryScript |
@@ -216,6 +218,7 @@ of the most common options to set are:
name and an array of addresses. For a client-submitted driver, discovery script must assign
different resource addresses to this driver comparing to other drivers on the same host.
+ 3.0.0 |
spark.driver.resource.{resourceName}.vendor |
@@ -226,6 +229,7 @@ of the most common options to set are:
the Kubernetes device plugin naming convention. (e.g. For GPUs on Kubernetes
this config would be set to nvidia.com or amd.com)
+ 3.0.0 |
spark.resources.discoveryPlugin |
@@ -293,6 +297,7 @@ of the most common options to set are:
spark.executor.resource.{resourceName}.discoveryScript
for the executor to find the resource on startup.
+ 3.0.0 |
spark.executor.resource.{resourceName}.discoveryScript |
@@ -302,6 +307,7 @@ of the most common options to set are:
write to STDOUT a JSON string in the format of the ResourceInformation class. This has a
name and an array of addresses.
+ 3.0.0 |
spark.executor.resource.{resourceName}.vendor |
@@ -312,6 +318,7 @@ of the most common options to set are:
the Kubernetes device plugin naming convention. (e.g. For GPUs on Kubernetes
this config would be set to nvidia.com or amd.com)
+ 3.0.0 |
spark.extraListeners |
@@ -337,6 +344,7 @@ of the most common options to set are:
Note: This will be overridden by SPARK_LOCAL_DIRS (Standalone), MESOS_SANDBOX (Mesos) or
LOCAL_DIRS (YARN) environment variables set by the cluster manager.
+ 0.5.0 |
spark.logConf |
@@ -344,6 +352,7 @@ of the most common options to set are:
Logs the effective SparkConf as INFO when a SparkContext is started.
|
+ 0.9.0 |
spark.master |
@@ -352,6 +361,7 @@ of the most common options to set are:
The cluster manager to connect to. See the list of
allowed master URL's.
+ 0.9.0 |
spark.submit.deployMode |
@@ -467,6 +477,7 @@ Apart from these, the following properties are also available, and may be useful
Instead, please set this through the --driver-java-options command line option or in
your default properties file.
+ 3.0.0 |
spark.driver.extraJavaOptions |
@@ -540,6 +551,7 @@ Apart from these, the following properties are also available, and may be useful
verbose gc logging to a file named for the executor ID of the app in /tmp, pass a 'value' of:
-verbose:gc -Xloggc:/tmp/{{APP_ID}}-{{EXECUTOR_ID}}.gc
+ 3.0.0 |
spark.executor.extraJavaOptions |
@@ -636,6 +648,7 @@ Apart from these, the following properties are also available, and may be useful
Add the environment variable specified by EnvironmentVariableName to the Executor
process. The user can specify multiple of these to set multiple environment variables.
+ 0.9.0 |
spark.redaction.regex |
@@ -659,7 +672,7 @@ Apart from these, the following properties are also available, and may be useful
By default the pyspark.profiler.BasicProfiler will be used, but this can be overridden by
passing a profiler class in as a parameter to the SparkContext constructor.
- |
+ 1.2.0 |
spark.python.profile.dump |
@@ -670,6 +683,7 @@ Apart from these, the following properties are also available, and may be useful
by pstats.Stats(). If this is specified, the profile result will not be displayed
automatically.
+ 1.2.0 |
spark.python.worker.memory |
@@ -680,6 +694,7 @@ Apart from these, the following properties are also available, and may be useful
(e.g. 512m, 2g).
If the memory used during aggregation goes above this amount, it will spill the data into disks.
+ 1.1.0 |
spark.python.worker.reuse |
@@ -727,6 +742,7 @@ Apart from these, the following properties are also available, and may be useful
repositories given by the command-line option --repositories. For more details, see
Advanced Dependency Management.
+ 1.5.0 |
spark.jars.excludes |
@@ -735,6 +751,7 @@ Apart from these, the following properties are also available, and may be useful
Comma-separated list of groupId:artifactId, to exclude while resolving the dependencies
provided in spark.jars.packages to avoid dependency conflicts.
+ 1.5.0 |
spark.jars.ivy |
@@ -744,6 +761,7 @@ Apart from these, the following properties are also available, and may be useful
spark.jars.packages. This will override the Ivy property ivy.default.ivy.user.dir
which defaults to ~/.ivy2.
+ 1.3.0 |
spark.jars.ivySettings |
@@ -756,6 +774,7 @@ Apart from these, the following properties are also available, and may be useful
artifact server like Artifactory. Details on the settings file format can be
found at Settings Files
+ 2.2.0 |
spark.jars.repositories |
@@ -764,6 +783,7 @@ Apart from these, the following properties are also available, and may be useful
Comma-separated list of additional remote repositories to search for the maven coordinates
given with --packages or spark.jars.packages.
+ 2.3.0 |
spark.pyspark.driver.python |
@@ -849,6 +869,7 @@ Apart from these, the following properties are also available, and may be useful
set to a non-zero value. This retry logic helps stabilize large shuffles in the face of long GC
pauses or transient network connectivity issues.
+ 1.2.0 |
spark.shuffle.io.numConnectionsPerPeer |
@@ -858,6 +879,7 @@ Apart from these, the following properties are also available, and may be useful
large clusters. For clusters with many hard disks and few hosts, this may result in insufficient
concurrency to saturate all disks, and so users may consider increasing this value.
+ 1.2.1 |
spark.shuffle.io.preferDirectBufs |
@@ -867,6 +889,7 @@ Apart from these, the following properties are also available, and may be useful
block transfer. For environments where off-heap memory is tightly limited, users may wish to
turn this off to force all allocations from Netty to be on-heap.
+ 1.2.0 |
spark.shuffle.io.retryWait |
@@ -875,6 +898,7 @@ Apart from these, the following properties are also available, and may be useful
(Netty only) How long to wait between retries of fetches. The maximum delay caused by retrying
is 15 seconds by default, calculated as maxRetries * retryWait.
+ 1.2.1 |
spark.shuffle.io.backLog |
@@ -887,6 +911,7 @@ Apart from these, the following properties are also available, and may be useful
application (see spark.shuffle.service.enabled option below). If set below 1,
will fallback to OS default defined by Netty's io.netty.util.NetUtil#SOMAXCONN.
+ 1.1.1 |
spark.shuffle.service.enabled |
@@ -915,6 +940,7 @@ Apart from these, the following properties are also available, and may be useful
Cache entries limited to the specified memory footprint, in bytes unless otherwise specified.
|
+ 2.3.0 |
spark.shuffle.maxChunksBeingTransferred |
@@ -926,6 +952,7 @@ Apart from these, the following properties are also available, and may be useful
spark.shuffle.io.retryWait), if those limits are reached the task will fail with
fetch failure.
+ 2.3.0 |
spark.shuffle.sort.bypassMergeThreshold |
@@ -1233,6 +1260,7 @@ Apart from these, the following properties are also available, and may be useful
How many finished executions the Spark UI and status APIs remember before garbage collecting.
|
+ 1.5.0 |
spark.streaming.ui.retainedBatches |
@@ -1240,6 +1268,7 @@ Apart from these, the following properties are also available, and may be useful
How many finished batches the Spark UI and status APIs remember before garbage collecting.
|
+ 1.0.0 |
spark.ui.retainedDeadExecutors |
@@ -1633,6 +1662,7 @@ Apart from these, the following properties are also available, and may be useful
Default number of partitions in RDDs returned by transformations like join,
reduceByKey, and parallelize when not set by user.
+ 0.5.0 |
spark.executor.heartbeatInterval |
@@ -1652,6 +1682,7 @@ Apart from these, the following properties are also available, and may be useful
Communication timeout to use when fetching files added through SparkContext.addFile() from
the driver.
+ 1.0.0 |
spark.files.useFetchCache |
@@ -1664,6 +1695,7 @@ Apart from these, the following properties are also available, and may be useful
disabled in order to use Spark local directories that reside on NFS filesystems (see
SPARK-6313 for more details).
+ 1.2.2 |
spark.files.overwrite |
@@ -1672,6 +1704,7 @@ Apart from these, the following properties are also available, and may be useful
Whether to overwrite files added through SparkContext.addFile() when the target file exists and
its contents do not match those of the source.
+ 1.0.0 |
spark.files.maxPartitionBytes |
@@ -1692,23 +1725,29 @@ Apart from these, the following properties are also available, and may be useful
2.1.0 |
- spark.hadoop.cloneConf |
- false |
- If set to true, clones a new Hadoop Configuration object for each task. This
+ | spark.hadoop.cloneConf |
+ false |
+
+ If set to true, clones a new Hadoop Configuration object for each task. This
option should be enabled to work around Configuration thread-safety issues (see
SPARK-2546 for more details).
This is disabled by default in order to avoid unexpected performance regressions for jobs that
- are not affected by these issues. |
+ are not affected by these issues.
+
+ 1.0.3 |
- spark.hadoop.validateOutputSpecs |
- true |
- If set to true, validates the output specification (e.g. checking if the output directory already exists)
+ | spark.hadoop.validateOutputSpecs |
+ true |
+
+ If set to true, validates the output specification (e.g. checking if the output directory already exists)
used in saveAsHadoopFile and other variants. This can be disabled to silence exceptions due to pre-existing
- output directories. We recommend that users do not disable this except if trying to achieve compatibility with
- previous versions of Spark. Simply use Hadoop's FileSystem API to delete output directories by hand.
- This setting is ignored for jobs generated through Spark Streaming's StreamingContext, since
- data may need to be rewritten to pre-existing output directories during checkpoint recovery. |
+ output directories. We recommend that users do not disable this except if trying to achieve compatibility
+ with previous versions of Spark. Simply use Hadoop's FileSystem API to delete output directories by hand.
+ This setting is ignored for jobs generated through Spark Streaming's StreamingContext, since data may
+ need to be rewritten to pre-existing output directories during checkpoint recovery.
+
+ 1.0.1 |
spark.storage.memoryMapThreshold |
@@ -1728,6 +1767,7 @@ Apart from these, the following properties are also available, and may be useful
Version 2 may have better performance, but version 1 may handle failures better in certain situations,
as per MAPREDUCE-4815.
+ 2.2.0 |
@@ -1842,7 +1882,7 @@ Apart from these, the following properties are also available, and may be useful
need to be increased, so that incoming connections are not dropped when a large number of
connections arrives in a short period of time.
- |
+ 3.0.0 |
spark.network.timeout |
@@ -1865,7 +1905,7 @@ Apart from these, the following properties are also available, and may be useful
block transfer. For environments where off-heap memory is tightly limited, users may wish to
turn this off to force all allocations to be on-heap.
- |
+ 3.0.0 |
spark.port.maxRetries |
@@ -1877,7 +1917,7 @@ Apart from these, the following properties are also available, and may be useful
essentially allows it to try a range of ports from the start port specified
to port + maxRetries.
- |
+ 1.1.1 |
spark.rpc.numRetries |
@@ -1920,7 +1960,7 @@ Apart from these, the following properties are also available, and may be useful
out and giving up. To avoid unwilling timeout caused by long pause like GC,
you can set larger value.
- |
+ 1.1.1 |
spark.network.maxRemoteBlockSizeFetchToMem |
@@ -2053,6 +2093,7 @@ Apart from these, the following properties are also available, and may be useful
that register to the listener bus. Consider increasing value, if the listener events corresponding
to shared queue are dropped. Increasing this value may result in the driver using more memory.
+ 3.0.0 |
spark.scheduler.listenerbus.eventqueue.appStatus.capacity |
@@ -2062,6 +2103,7 @@ Apart from these, the following properties are also available, and may be useful
Consider increasing value, if the listener events corresponding to appStatus queue are dropped.
Increasing this value may result in the driver using more memory.
+ 3.0.0 |
spark.scheduler.listenerbus.eventqueue.executorManagement.capacity |
@@ -2071,6 +2113,7 @@ Apart from these, the following properties are also available, and may be useful
executor management listeners. Consider increasing value if the listener events corresponding to
executorManagement queue are dropped. Increasing this value may result in the driver using more memory.
+ 3.0.0 |
spark.scheduler.listenerbus.eventqueue.eventLog.capacity |
@@ -2080,6 +2123,7 @@ Apart from these, the following properties are also available, and may be useful
that write events to eventLogs. Consider increasing value if the listener events corresponding to eventLog queue
are dropped. Increasing this value may result in the driver using more memory.
+ 3.0.0 |
spark.scheduler.listenerbus.eventqueue.streams.capacity |
@@ -2089,6 +2133,7 @@ Apart from these, the following properties are also available, and may be useful
Consider increasing value if the listener events corresponding to streams queue are dropped. Increasing
this value may result in the driver using more memory.
+ 3.0.0 |
spark.scheduler.blacklist.unschedulableTaskSetTimeout |
@@ -2271,6 +2316,7 @@ Apart from these, the following properties are also available, and may be useful
in order to assign resource slots (e.g. a 0.2222 configuration, or 1/0.2222 slots will become
4 tasks/resource, not 5).
+ 3.0.0 |
spark.task.maxFailures |
@@ -2335,6 +2381,7 @@ Apart from these, the following properties are also available, and may be useful
Number of consecutive stage attempts allowed before a stage is aborted.
|
+ 2.2.0 |
@@ -2526,13 +2573,14 @@ like shuffle, just replace "rpc" with "shuffle" in the property names except
spark.{driver|executor}.rpc.netty.dispatcher.numThreads, which is only for RPC module.
-| Property Name | Default | Meaning |
+| Property Name | Default | Meaning | Since Version |
spark.{driver|executor}.rpc.io.serverThreads |
Fall back on spark.rpc.io.serverThreads
|
Number of threads used in the server thread pool |
+ 1.6.0 |
spark.{driver|executor}.rpc.io.clientThreads |
@@ -2540,6 +2588,7 @@ like shuffle, just replace "rpc" with "shuffle" in the property names except
Fall back on spark.rpc.io.clientThreads
Number of threads used in the client thread pool |
+ 1.6.0 |
spark.{driver|executor}.rpc.netty.dispatcher.numThreads |
@@ -2547,6 +2596,7 @@ like shuffle, just replace "rpc" with "shuffle" in the property names except
Fall back on spark.rpc.netty.dispatcher.numThreads
Number of threads used in RPC message dispatcher thread pool |
+ 3.0.0 |
@@ -2728,7 +2778,7 @@ Spark subsystems.
Executable for executing R scripts in client modes for driver. Ignored in cluster modes.
|
- |
+ 1.5.3 |
spark.r.shell.command |
@@ -2737,7 +2787,7 @@ Spark subsystems.
Executable for executing sparkR shell in client modes for driver. Ignored in cluster modes. It is the same as environment variable SPARKR_DRIVER_R, but take precedence over it.
spark.r.shell.command is used for sparkR shell while spark.r.driver.command is used for running R script.
- |
+ 2.1.0 |
spark.r.backendConnectionTimeout |
@@ -2769,6 +2819,7 @@ Spark subsystems.
Checkpoint interval for graph and message in Pregel. It used to avoid stackOverflowError due to long lineage chains
after lots of iterations. The checkpoint is disabled by default.
+ 2.2.0 |