Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,13 +192,15 @@ SPARK_MASTER_OPTS supports the following system properties:
<td>
The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit.<br/>
</td>
<td>0.8.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668

</tr>
<tr>
<td><code>spark.deploy.retainedDrivers</code></td>
<td>200</td>
<td>
The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit.<br/>
</td>
<td>1.1.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668

</tr>
<tr>
<td><code>spark.deploy.spreadOut</code></td>
Expand All @@ -208,6 +210,7 @@ SPARK_MASTER_OPTS supports the following system properties:
to consolidate them onto as few nodes as possible. Spreading out is usually better for
data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/>
</td>
<td>0.6.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325

</tr>
<tr>
<td><code>spark.deploy.defaultCores</code></td>
Expand All @@ -219,6 +222,7 @@ SPARK_MASTER_OPTS supports the following system properties:
Set this lower on a shared cluster to prevent users from grabbing
the whole cluster by default. <br/>
</td>
<td>0.9.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668

</tr>
<tr>
<td><code>spark.deploy.maxExecutorRetries</code></td>
Expand All @@ -234,6 +238,7 @@ SPARK_MASTER_OPTS supports the following system properties:
<code>-1</code>.
<br/>
</td>
<td>1.6.3</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-16956, commit ID: ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668

</tr>
<tr>
<td><code>spark.worker.timeout</code></td>
Expand All @@ -250,6 +255,7 @@ SPARK_MASTER_OPTS supports the following system properties:
<td>
Amount of a particular resource to use on the worker.
</td>
<td>3.0.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-27371, commit ID: cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8

</tr>
<tr>
<td><code>spark.worker.resource.{resourceName}.discoveryScript</code></td>
Expand All @@ -258,6 +264,7 @@ SPARK_MASTER_OPTS supports the following system properties:
Path to resource discovery script, which is used to find a particular resource while worker starting up.
And the output of the script should be formatted like the <code>ResourceInformation</code> class.
</td>
<td>3.0.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-27371, commit ID: cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8

</tr>
<tr>
<td><code>spark.worker.resourcesFile</code></td>
Expand Down Expand Up @@ -317,6 +324,7 @@ SPARK_WORKER_OPTS supports the following system properties:
enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state
eventually gets cleaned up. This config may be removed in the future.
</td>
<td>3.0.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-26288, commit ID: 8b0aa59#diff-6bdad48cfc34314e89599655442ff210

</tr>
<tr>
<td><code>spark.storage.cleanupFilesAfterExecutorExit</code></td>
Expand All @@ -329,6 +337,7 @@ SPARK_WORKER_OPTS supports the following system properties:
all files/subdirectories of a stopped and timeout application.
This only affects Standalone mode, support of other cluster manangers can be added in the future.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-24340, commit ID: 8ef167a#diff-916ca56b663f178f302c265b7ef38499

</tr>
<tr>
<td><code>spark.worker.ui.compressedLogFileLengthCacheSize</code></td>
Expand Down Expand Up @@ -490,14 +499,16 @@ ZooKeeper is the best way to go for production-level high availability, but if y
In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:

<table class="table">
<tr><th style="width:21%">System property</th><th>Meaning</th></tr>
<tr><th style="width:21%">System property</th><th>Meaning</th><th>Since Version</th></tr>
<tr>
<td><code>spark.deploy.recoveryMode</code></td>
<td>Set to FILESYSTEM to enable single-node recovery mode (default: NONE).</td>
<td>0.8.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668

</tr>
<tr>
<td><code>spark.deploy.recoveryDirectory</code></td>
<td>The directory in which Spark will store recovery state, accessible from the Master's perspective.</td>
<td>0.8.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668

</tr>
</table>

Expand Down
21 changes: 17 additions & 4 deletions docs/sql-data-sources-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,21 +258,34 @@ Data source options of Avro can be set via:
## Configuration
Configuration of Avro can be done using the `setConf` method on SparkSession or by running `SET key=value` commands using SQL.
<table class="table">
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr>
<tr>
<td>spark.sql.legacy.replaceDatabricksSparkAvro.enabled</td>
<td>true</td>
<td>If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped to the built-in but external Avro data source module for backward compatibility.</td>
<td>
If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped
to the built-in but external Avro data source module for backward compatibility.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-25129, commit ID: ac0174e#diff-9a6b543db706f1a90f790783d6930a13

</tr>
<tr>
<td>spark.sql.avro.compression.codec</td>
<td>snappy</td>
<td>Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, snappy, bzip2 and xz. Default codec is snappy.</td>
<td>
Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate,
snappy, bzip2 and xz. Default codec is snappy.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

@beliefer beliefer Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-24881, commit ID: 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13

</tr>
<tr>
<td>spark.sql.avro.deflate.level</td>
<td>-1</td>
<td>Compression level for the deflate codec used in writing of AVRO files. Valid value must be in the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level in the current implementation.</td>
<td>
Compression level for the deflate codec used in writing of AVRO files. Valid value must be in
the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level
in the current implementation.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-24881, commit ID: 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13

</tr>
</table>

Expand Down
16 changes: 13 additions & 3 deletions docs/sql-data-sources-orc.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,25 @@ serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileF
the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`.

<table class="table">
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr>
<tr>
<td><code>spark.sql.orc.impl</code></td>
<td><code>native</code></td>
<td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support. <code>hive</code> means the ORC library in Hive.</td>
<td>
The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>.
<code>native</code> means the native ORC support. <code>hive</code> means the ORC library
in Hive.
</td>
<td>2.3.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-20728, commit ID: 326f1d6#diff-9a6b543db706f1a90f790783d6930a13

</tr>
<tr>
<td><code>spark.sql.orc.enableVectorizedReader</code></td>
<td><code>true</code></td>
<td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
<td>
Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>,
a new non-vectorized ORC reader is used in <code>native</code> implementation.
For <code>hive</code> implementation, this is ignored.
</td>
<td>2.3.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-16060, commit ID: 60f6b99#diff-9a6b543db706f1a90f790783d6930a13

</tr>
</table>
9 changes: 8 additions & 1 deletion docs/sql-data-sources-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
`SET key=value` commands using SQL.

<table class="table">
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
<tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr>
<tr>
<td><code>spark.sql.parquet.binaryAsString</code></td>
<td>false</td>
Expand All @@ -267,6 +267,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
not differentiate between binary data and strings when writing out the Parquet schema. This
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
</td>
<td>1.1.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-2927, commit ID: de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d

</tr>
<tr>
<td><code>spark.sql.parquet.int96AsTimestamp</code></td>
Expand All @@ -275,6 +276,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This
flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems.
</td>
<td>1.3.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-4987, commit ID: 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d

</tr>
<tr>
<td><code>spark.sql.parquet.compression.codec</code></td>
Expand All @@ -287,11 +289,13 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
Note that <code>zstd</code> requires <code>ZStandardCodec</code> to be installed before Hadoop 2.9.0, <code>brotli</code> requires
<code>BrotliCodec</code> to be installed.
</td>
<td>1.1.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-3131, commit ID: 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d

</tr>
<tr>
<td><code>spark.sql.parquet.filterPushdown</code></td>
<td>true</td>
<td>Enables Parquet filter push-down optimization when set to true.</td>
<td>1.2.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-4391, commit ID: 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d

</tr>
<tr>
<td><code>spark.sql.hive.convertMetastoreParquet</code></td>
Expand All @@ -300,6 +304,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
When set to false, Spark SQL will use the Hive SerDe for parquet tables instead of the built in
support.
</td>
<td>1.1.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-2406, commit ID: cc4015d#diff-ff50aea397a607b79df9bec6f2a841db

</tr>
<tr>
<td><code>spark.sql.parquet.mergeSchema</code></td>
Expand All @@ -310,6 +315,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
schema is picked from the summary file or a random data file if no summary file is available.
</p>
</td>
<td>1.5.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-8690, commit ID: 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d

</tr>
<tr>
<td><code>spark.sql.parquet.writeLegacyFormat</code></td>
Expand All @@ -321,5 +327,6 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
example, decimals will be written in int-based format. If Parquet output is intended for use
with systems that do not support this newer format, set to true.
</td>
<td>1.6.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-10400, commit ID: 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d

</tr>
</table>