Skip to content

[SPARK-31295][DOC] Supplement version for configuration appear in doc#28064

Closed
beliefer wants to merge 1 commit intoapache:masterfrom
beliefer:supplement-doc-for-data-sources
Closed

[SPARK-31295][DOC] Supplement version for configuration appear in doc#28064
beliefer wants to merge 1 commit intoapache:masterfrom
beliefer:supplement-doc-for-data-sources

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Mar 29, 2020

What changes were proposed in this pull request?

This PR supplements version for configuration appear in docs.
I sorted out some information show below.

docs/spark-standalone.md

Item name Since version JIRA ID Commit ID Note
spark.deploy.retainedApplications 0.8.0 None 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668  
spark.deploy.retainedDrivers 1.1.0 None 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668  
spark.deploy.spreadOut 0.6.1 None bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325  
spark.deploy.defaultCores 0.9.0 None d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668  
spark.deploy.maxExecutorRetries 1.6.3 SPARK-16956 ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668  
spark.worker.resource.{resourceName}.amount 3.0.0 SPARK-27371 cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8  
spark.worker.resource.{resourceName}.discoveryScript 3.0.0 SPARK-27371 cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8  
spark.worker.resourcesFile 3.0.0 SPARK-27369 7cbe01e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da  
spark.shuffle.service.db.enabled 3.0.0 SPARK-26288 8b0aa59#diff-6bdad48cfc34314e89599655442ff210  
spark.storage.cleanupFilesAfterExecutorExit 2.4.0 SPARK-24340 8ef167a#diff-916ca56b663f178f302c265b7ef38499  
spark.deploy.recoveryMode 0.8.1 None d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668  
spark.deploy.recoveryDirectory 0.8.1 None d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668  

docs/sql-data-sources-avro.md

Item name Since version JIRA ID Commit ID Note
spark.sql.legacy.replaceDatabricksSparkAvro.enabled 2.4.0 SPARK-25129 ac0174e#diff-9a6b543db706f1a90f790783d6930a13  
spark.sql.avro.compression.codec 2.4.0 SPARK-24881 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13  
spark.sql.avro.deflate.level 2.4.0 SPARK-24881 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13  

docs/sql-data-sources-orc.md

Item name Since version JIRA ID Commit ID Note
spark.sql.orc.impl 2.3.0 SPARK-20728 326f1d6#diff-9a6b543db706f1a90f790783d6930a13  
spark.sql.orc.enableVectorizedReader 2.3.0 SPARK-16060 60f6b99#diff-9a6b543db706f1a90f790783d6930a13  

docs/sql-data-sources-parquet.md

Item name Since version JIRA ID Commit ID Note
spark.sql.parquet.binaryAsString 1.1.1 SPARK-2927 de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d  
spark.sql.parquet.int96AsTimestamp 1.3.0 SPARK-4987 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d  
spark.sql.parquet.compression.codec 1.1.1 SPARK-3131 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d  
spark.sql.parquet.filterPushdown 1.2.0 SPARK-4391 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d  
spark.sql.hive.convertMetastoreParquet 1.1.1 SPARK-2406 cc4015d#diff-ff50aea397a607b79df9bec6f2a841db  
spark.sql.parquet.mergeSchema 1.5.0 SPARK-8690 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d  
spark.sql.parquet.writeLegacyFormat 1.6.0 SPARK-10400 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d  

Why are the changes needed?

Supplemental configuration version information.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

Jenkins test

<td>
The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit.<br/>
</td>
<td>0.8.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668

<td>
The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit.<br/>
</td>
<td>1.1.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668

to consolidate them onto as few nodes as possible. Spreading out is usually better for
data locality in HDFS, but consolidating is more efficient for compute-intensive workloads. <br/>
</td>
<td>0.6.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325

Set this lower on a shared cluster to prevent users from grabbing
the whole cluster by default. <br/>
</td>
<td>0.9.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668

<code>-1</code>.
<br/>
</td>
<td>1.6.3</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-16956, commit ID: ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668

<td>
Amount of a particular resource to use on the worker.
</td>
<td>3.0.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-27371, commit ID: cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8

Path to resource discovery script, which is used to find a particular resource while worker starting up.
And the output of the script should be formatted like the <code>ResourceInformation</code> class.
</td>
<td>3.0.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-27371, commit ID: cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8

enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state
eventually gets cleaned up. This config may be removed in the future.
</td>
<td>3.0.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-26288, commit ID: 8b0aa59#diff-6bdad48cfc34314e89599655442ff210

all files/subdirectories of a stopped and timeout application.
This only affects Standalone mode, support of other cluster manangers can be added in the future.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-24340, commit ID: 8ef167a#diff-916ca56b663f178f302c265b7ef38499

<tr>
<td><code>spark.deploy.recoveryMode</code></td>
<td>Set to FILESYSTEM to enable single-node recovery mode (default: NONE).</td>
<td>0.8.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668

<tr>
<td><code>spark.deploy.recoveryDirectory</code></td>
<td>The directory in which Spark will store recovery state, accessible from the Master's perspective.</td>
<td>0.8.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No JIRA ID, commit ID: d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668

If it is set to true, the data source provider <code>com.databricks.spark.avro</code> is mapped
to the built-in but external Avro data source module for backward compatibility.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-25129, commit ID: ac0174e#diff-9a6b543db706f1a90f790783d6930a13

Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate,
snappy, bzip2 and xz. Default codec is snappy.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

@beliefer beliefer Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-24881, commit ID: 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13

the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level
in the current implementation.
</td>
<td>2.4.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-24881, commit ID: 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13

<code>native</code> means the native ORC support. <code>hive</code> means the ORC library
in Hive.
</td>
<td>2.3.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-20728, commit ID: 326f1d6#diff-9a6b543db706f1a90f790783d6930a13

a new non-vectorized ORC reader is used in <code>native</code> implementation.
For <code>hive</code> implementation, this is ignored.
</td>
<td>2.3.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-16060, commit ID: 60f6b99#diff-9a6b543db706f1a90f790783d6930a13

not differentiate between binary data and strings when writing out the Parquet schema. This
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
</td>
<td>1.1.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-2927, commit ID: de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d

Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This
flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems.
</td>
<td>1.3.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-4987, commit ID: 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d

Note that <code>zstd</code> requires <code>ZStandardCodec</code> to be installed before Hadoop 2.9.0, <code>brotli</code> requires
<code>BrotliCodec</code> to be installed.
</td>
<td>1.1.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-3131, commit ID: 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d

<td><code>spark.sql.parquet.filterPushdown</code></td>
<td>true</td>
<td>Enables Parquet filter push-down optimization when set to true.</td>
<td>1.2.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-4391, commit ID: 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d

When set to false, Spark SQL will use the Hive SerDe for parquet tables instead of the built in
support.
</td>
<td>1.1.1</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-2406, commit ID: cc4015d#diff-ff50aea397a607b79df9bec6f2a841db

schema is picked from the summary file or a random data file if no summary file is available.
</p>
</td>
<td>1.5.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-8690, commit ID: 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d

example, decimals will be written in int-based format. If Parquet output is intended for use
with systems that do not support this newer format, set to true.
</td>
<td>1.6.0</td>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SPARK-10400, commit ID: 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d

@SparkQA
Copy link

SparkQA commented Mar 29, 2020

Test build #120540 has finished for PR 28064 at commit 6787d16.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I will merge in few days if there are no comments.

@HyukjinKwon
Copy link
Member

Merged to master.

@beliefer
Copy link
Contributor Author

@HyukjinKwon Thanks for all your help.

HyukjinKwon pushed a commit that referenced this pull request Apr 7, 2020
### What changes were proposed in this pull request?
This PR supplements version for configuration appear in docs.
I sorted out some information show below.

**docs/spark-standalone.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.deploy.retainedApplications | 0.8.0 | None | 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.retainedDrivers | 1.1.0 | None | 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.spreadOut | 0.6.1 | None | bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325 |  
spark.deploy.defaultCores | 0.9.0 | None | d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.maxExecutorRetries | 1.6.3 | SPARK-16956 | ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.worker.resource.{resourceName}.amount | 3.0.0 | SPARK-27371 | cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |  
spark.worker.resource.{resourceName}.discoveryScript | 3.0.0 | SPARK-27371 | cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |  
spark.worker.resourcesFile | 3.0.0 | SPARK-27369 | 7cbe01e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da |  
spark.shuffle.service.db.enabled | 3.0.0 | SPARK-26288 | 8b0aa59#diff-6bdad48cfc34314e89599655442ff210 |  
spark.storage.cleanupFilesAfterExecutorExit | 2.4.0 | SPARK-24340 | 8ef167a#diff-916ca56b663f178f302c265b7ef38499 |  
spark.deploy.recoveryMode | 0.8.1 | None | d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.recoveryDirectory | 0.8.1 | None | d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  

**docs/sql-data-sources-avro.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.legacy.replaceDatabricksSparkAvro.enabled | 2.4.0 | SPARK-25129 | ac0174e#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.avro.compression.codec | 2.4.0 | SPARK-24881 | 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.avro.deflate.level | 2.4.0 | SPARK-24881 | 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-data-sources-orc.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.orc.impl | 2.3.0 | SPARK-20728 | 326f1d6#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.orc.enableVectorizedReader | 2.3.0 | SPARK-16060 | 60f6b99#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-data-sources-parquet.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.parquet.binaryAsString | 1.1.1 | SPARK-2927 | de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.int96AsTimestamp | 1.3.0 | SPARK-4987 | 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.compression.codec | 1.1.1 | SPARK-3131 | 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.filterPushdown | 1.2.0 | SPARK-4391 | 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.parquet.mergeSchema | 1.5.0 | SPARK-8690 | 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.writeLegacyFormat | 1.6.0 | SPARK-10400 | 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #28064 from beliefer/supplement-doc-for-data-sources.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

Merged to branch-3.0 too.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
This PR supplements version for configuration appear in docs.
I sorted out some information show below.

**docs/spark-standalone.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.deploy.retainedApplications | 0.8.0 | None | 46eecd1#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.retainedDrivers | 1.1.0 | None | 7446f5f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.spreadOut | 0.6.1 | None | bb2b9ff#diff-0e7ae91819fc8f7b47b0f97be7116325 |  
spark.deploy.defaultCores | 0.9.0 | None | d8bcc8e#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.maxExecutorRetries | 1.6.3 | SPARK-16956 | ace458f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.worker.resource.{resourceName}.amount | 3.0.0 | SPARK-27371 | cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |  
spark.worker.resource.{resourceName}.discoveryScript | 3.0.0 | SPARK-27371 | cbad616#diff-d25032e4a3ae1b85a59e4ca9ccf189a8 |  
spark.worker.resourcesFile | 3.0.0 | SPARK-27369 | 7cbe01e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da |  
spark.shuffle.service.db.enabled | 3.0.0 | SPARK-26288 | 8b0aa59#diff-6bdad48cfc34314e89599655442ff210 |  
spark.storage.cleanupFilesAfterExecutorExit | 2.4.0 | SPARK-24340 | 8ef167a#diff-916ca56b663f178f302c265b7ef38499 |  
spark.deploy.recoveryMode | 0.8.1 | None | d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  
spark.deploy.recoveryDirectory | 0.8.1 | None | d66c01f#diff-29dffdccd5a7f4c8b496c293e87c8668 |  

**docs/sql-data-sources-avro.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.legacy.replaceDatabricksSparkAvro.enabled | 2.4.0 | SPARK-25129 | ac0174e#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.avro.compression.codec | 2.4.0 | SPARK-24881 | 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.avro.deflate.level | 2.4.0 | SPARK-24881 | 0a0f68b#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-data-sources-orc.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.orc.impl | 2.3.0 | SPARK-20728 | 326f1d6#diff-9a6b543db706f1a90f790783d6930a13 |  
spark.sql.orc.enableVectorizedReader | 2.3.0 | SPARK-16060 | 60f6b99#diff-9a6b543db706f1a90f790783d6930a13 |  

**docs/sql-data-sources-parquet.md**
Item name | Since version | JIRA ID | Commit ID | Note
-- | -- | -- | -- | --
spark.sql.parquet.binaryAsString | 1.1.1 | SPARK-2927 | de501e1#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.int96AsTimestamp | 1.3.0 | SPARK-4987 | 67d5220#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.compression.codec | 1.1.1 | SPARK-3131 | 3a9d874#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.filterPushdown | 1.2.0 | SPARK-4391 | 576688a#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.hive.convertMetastoreParquet | 1.1.1 | SPARK-2406 | cc4015d#diff-ff50aea397a607b79df9bec6f2a841db |  
spark.sql.parquet.mergeSchema | 1.5.0 | SPARK-8690 | 246265f#diff-41ef65b9ef5b518f77e2a03559893f4d |  
spark.sql.parquet.writeLegacyFormat | 1.6.0 | SPARK-10400 | 01cd688#diff-41ef65b9ef5b518f77e2a03559893f4d |  

### Why are the changes needed?
Supplemental configuration version information.

### Does this PR introduce any user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes apache#28064 from beliefer/supplement-doc-for-data-sources.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@beliefer beliefer deleted the supplement-doc-for-data-sources branch April 23, 2024 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments