-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-31295][DOC] Supplement version for configuration appear in doc #28064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -192,13 +192,15 @@ SPARK_MASTER_OPTS supports the following system properties: | |
| <td> | ||
| The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit.<br/> | ||
| </td> | ||
| <td>0.8.0</td> | ||
| </tr> | ||
| <tr> | ||
| <td><code>spark.deploy.retainedDrivers</code></td> | ||
| <td>200</td> | ||
| <td> | ||
| The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit.<br/> | ||
| </td> | ||
| <td>1.1.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No JIRA ID, commit ID: 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.deploy.spreadOut</code></td> | ||
|
|
@@ -208,6 +210,7 @@ SPARK_MASTER_OPTS supports the following system properties: | |
| to consolidate them onto as few nodes as possible. Spreading out is usually better for | ||
| data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/> | ||
| </td> | ||
| <td>0.6.1</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No JIRA ID, commit ID: bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.deploy.defaultCores</code></td> | ||
|
|
@@ -219,6 +222,7 @@ SPARK_MASTER_OPTS supports the following system properties: | |
| Set this lower on a shared cluster to prevent users from grabbing | ||
| the whole cluster by default. <br/> | ||
| </td> | ||
| <td>0.9.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No JIRA ID, commit ID: d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.deploy.maxExecutorRetries</code></td> | ||
|
|
@@ -234,6 +238,7 @@ SPARK_MASTER_OPTS supports the following system properties: | |
| <code>-1</code>. | ||
| <br/> | ||
| </td> | ||
| <td>1.6.3</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-16956, commit ID: ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.worker.timeout</code></td> | ||
|
|
@@ -250,6 +255,7 @@ SPARK_MASTER_OPTS supports the following system properties: | |
| <td> | ||
| Amount of a particular resource to use on the worker. | ||
| </td> | ||
| <td>3.0.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-27371, commit ID: cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.worker.resource.{resourceName}.discoveryScript</code></td> | ||
|
|
@@ -258,6 +264,7 @@ SPARK_MASTER_OPTS supports the following system properties: | |
| Path to resource discovery script, which is used to find a particular resource while worker starting up. | ||
| And the output of the script should be formatted like the <code>ResourceInformation</code> class. | ||
| </td> | ||
| <td>3.0.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-27371, commit ID: cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.worker.resourcesFile</code></td> | ||
|
|
@@ -317,6 +324,7 @@ SPARK_WORKER_OPTS supports the following system properties: | |
| enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state | ||
| eventually gets cleaned up. This config may be removed in the future. | ||
| </td> | ||
| <td>3.0.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-26288, commit ID: 8b0aa59#diff-6bdad48cfc34314e89599655442ff210 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.storage.cleanupFilesAfterExecutorExit</code></td> | ||
|
|
@@ -329,6 +337,7 @@ SPARK_WORKER_OPTS supports the following system properties: | |
| all files/subdirectories of a stopped and timeout application. | ||
| This only affects Standalone mode, support of other cluster manangers can be added in the future. | ||
| </td> | ||
| <td>2.4.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-24340, commit ID: 8ef167a#diff-916ca56b663f178f302c265b7ef38499 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.worker.ui.compressedLogFileLengthCacheSize</code></td> | ||
|
|
@@ -490,14 +499,16 @@ ZooKeeper is the best way to go for production-level high availability, but if y | |
| In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration: | ||
|
|
||
| <table class="table"> | ||
| <tr><th style="width:21%">System property</th><th>Meaning</th></tr> | ||
| <tr><th style="width:21%">System property</th><th>Meaning</th><th>Since Version</th></tr> | ||
| <tr> | ||
| <td><code>spark.deploy.recoveryMode</code></td> | ||
| <td>Set to FILESYSTEM to enable single-node recovery mode (default: NONE).</td> | ||
| <td>0.8.1</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No JIRA ID, commit ID: d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.deploy.recoveryDirectory</code></td> | ||
| <td>The directory in which Spark will store recovery state, accessible from the Master's perspective.</td> | ||
| <td>0.8.1</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No JIRA ID, commit ID: d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |
||
| </tr> | ||
| </table> | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -258,21 +258,34 @@ Data source options of Avro can be set via: | |
| ## Configuration | ||
| Configuration of Avro can be done using the `setConf` method on SparkSession or by running `SET key=value` commands using SQL. | ||
| <table class="table"> | ||
| <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> | ||
| <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr> | ||
| <tr> | ||
| <td>spark.sql.legacy.replaceDatabricksSparkAvro.enabled</td> | ||
| <td>true</td> | ||
| <td>If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped to the built-in but external Avro data source module for backward compatibility.</td> | ||
| <td> | ||
| If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped | ||
| to the built-in but external Avro data source module for backward compatibility. | ||
| </td> | ||
| <td>2.4.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-25129, commit ID: ac0174e#diff-9a6b543db706f1a90f790783d6930a13 |
||
| </tr> | ||
| <tr> | ||
| <td>spark.sql.avro.compression.codec</td> | ||
| <td>snappy</td> | ||
| <td>Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, snappy, bzip2 and xz. Default codec is snappy.</td> | ||
| <td> | ||
| Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, | ||
| snappy, bzip2 and xz. Default codec is snappy. | ||
| </td> | ||
| <td>2.4.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-24881, commit ID: 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |
||
| </tr> | ||
| <tr> | ||
| <td>spark.sql.avro.deflate.level</td> | ||
| <td>-1</td> | ||
| <td>Compression level for the deflate codec used in writing of AVRO files. Valid value must be in the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level in the current implementation.</td> | ||
| <td> | ||
| Compression level for the deflate codec used in writing of AVRO files. Valid value must be in | ||
| the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level | ||
| in the current implementation. | ||
| </td> | ||
| <td>2.4.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-24881, commit ID: 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |
||
| </tr> | ||
| </table> | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,15 +27,25 @@ serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileF | |
| the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`. | ||
|
|
||
| <table class="table"> | ||
| <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr> | ||
| <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr> | ||
| <tr> | ||
| <td><code>spark.sql.orc.impl</code></td> | ||
| <td><code>native</code></td> | ||
| <td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support. <code>hive</code> means the ORC library in Hive.</td> | ||
| <td> | ||
| The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. | ||
| <code>native</code> means the native ORC support. <code>hive</code> means the ORC library | ||
| in Hive. | ||
| </td> | ||
| <td>2.3.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-20728, commit ID: 326f1d6#diff-9a6b543db706f1a90f790783d6930a13 |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.orc.enableVectorizedReader</code></td> | ||
| <td><code>true</code></td> | ||
| <td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td> | ||
| <td> | ||
| Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, | ||
| a new non-vectorized ORC reader is used in <code>native</code> implementation. | ||
| For <code>hive</code> implementation, this is ignored. | ||
| </td> | ||
| <td>2.3.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-16060, commit ID: 60f6b99#diff-9a6b543db706f1a90f790783d6930a13 |
||
| </tr> | ||
| </table> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -258,7 +258,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| `SET key=value` commands using SQL. | ||
|
|
||
| <table class="table"> | ||
| <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> | ||
| <tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr> | ||
| <tr> | ||
| <td><code>spark.sql.parquet.binaryAsString</code></td> | ||
| <td>false</td> | ||
|
|
@@ -267,6 +267,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| not differentiate between binary data and strings when writing out the Parquet schema. This | ||
| flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems. | ||
| </td> | ||
| <td>1.1.1</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-2927, commit ID: de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.parquet.int96AsTimestamp</code></td> | ||
|
|
@@ -275,6 +276,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This | ||
| flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. | ||
| </td> | ||
| <td>1.3.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-4987, commit ID: 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.parquet.compression.codec</code></td> | ||
|
|
@@ -287,11 +289,13 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| Note that <code>zstd</code> requires <code>ZStandardCodec</code> to be installed before Hadoop 2.9.0, <code>brotli</code> requires | ||
| <code>BrotliCodec</code> to be installed. | ||
| </td> | ||
| <td>1.1.1</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-3131, commit ID: 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.parquet.filterPushdown</code></td> | ||
| <td>true</td> | ||
| <td>Enables Parquet filter push-down optimization when set to true.</td> | ||
| <td>1.2.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-4391, commit ID: 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.hive.convertMetastoreParquet</code></td> | ||
|
|
@@ -300,6 +304,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| When set to false, Spark SQL will use the Hive SerDe for parquet tables instead of the built in | ||
| support. | ||
| </td> | ||
| <td>1.1.1</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-2406, commit ID: cc4015d#diff-ff50aea397a607b79df9bec6f2a841db |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.parquet.mergeSchema</code></td> | ||
|
|
@@ -310,6 +315,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| schema is picked from the summary file or a random data file if no summary file is available. | ||
| </p> | ||
| </td> | ||
| <td>1.5.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-8690, commit ID: 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d |
||
| </tr> | ||
| <tr> | ||
| <td><code>spark.sql.parquet.writeLegacyFormat</code></td> | ||
|
|
@@ -321,5 +327,6 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession | |
| example, decimals will be written in int-based format. If Parquet output is intended for use | ||
| with systems that do not support this newer format, set to true. | ||
| </td> | ||
| <td>1.6.0</td> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SPARK-10400, commit ID: 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d |
||
| </tr> | ||
| </table> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No JIRA ID, commit ID: 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668