diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index 4d4b85e31c8ff..2c2ed53b478c3 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -192,6 +192,7 @@ SPARK_MASTER_OPTS supports the following system properties: The maximum number of completed applications to display. Older applications will be dropped from the UI to maintain this limit.
+ 0.8.0 spark.deploy.retainedDrivers @@ -199,6 +200,7 @@ SPARK_MASTER_OPTS supports the following system properties: The maximum number of completed drivers to display. Older drivers will be dropped from the UI to maintain this limit.
+ 1.1.0 spark.deploy.spreadOut @@ -208,6 +210,7 @@ SPARK_MASTER_OPTS supports the following system properties: to consolidate them onto as few nodes as possible. Spreading out is usually better for data locality in HDFS, but consolidating is more efficient for compute-intensive workloads.
+ 0.6.1 spark.deploy.defaultCores @@ -219,6 +222,7 @@ SPARK_MASTER_OPTS supports the following system properties: Set this lower on a shared cluster to prevent users from grabbing the whole cluster by default.
+ 0.9.0 spark.deploy.maxExecutorRetries @@ -234,6 +238,7 @@ SPARK_MASTER_OPTS supports the following system properties: -1.
+ 1.6.3 spark.worker.timeout @@ -250,6 +255,7 @@ SPARK_MASTER_OPTS supports the following system properties: Amount of a particular resource to use on the worker. + 3.0.0 spark.worker.resource.{resourceName}.discoveryScript @@ -258,6 +264,7 @@ SPARK_MASTER_OPTS supports the following system properties: Path to resource discovery script, which is used to find a particular resource while worker starting up. And the output of the script should be formatted like the ResourceInformation class. + 3.0.0 spark.worker.resourcesFile @@ -317,6 +324,7 @@ SPARK_WORKER_OPTS supports the following system properties: enabled). You should also enable spark.worker.cleanup.enabled, to ensure that the state eventually gets cleaned up. This config may be removed in the future. + 3.0.0 spark.storage.cleanupFilesAfterExecutorExit @@ -329,6 +337,7 @@ SPARK_WORKER_OPTS supports the following system properties: all files/subdirectories of a stopped and timeout application. This only affects Standalone mode, support of other cluster manangers can be added in the future. + 2.4.0 spark.worker.ui.compressedLogFileLengthCacheSize @@ -490,14 +499,16 @@ ZooKeeper is the best way to go for production-level high availability, but if y In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration: - + + +
System propertyMeaning
System propertyMeaningSince Version
spark.deploy.recoveryMode Set to FILESYSTEM to enable single-node recovery mode (default: NONE).0.8.1
spark.deploy.recoveryDirectory The directory in which Spark will store recovery state, accessible from the Master's perspective.0.8.1
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index 8e6a4079cd5de..d926ae7703268 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -258,21 +258,34 @@ Data source options of Avro can be set via: ## Configuration Configuration of Avro can be done using the `setConf` method on SparkSession or by running `SET key=value` commands using SQL. - + - + + - + + - + +
Property NameDefaultMeaning
Property NameDefaultMeaningSince Version
spark.sql.legacy.replaceDatabricksSparkAvro.enabled trueIf it is set to true, the data source provider com.databricks.spark.avro is mapped to the built-in but external Avro data source module for backward compatibility. + If it is set to true, the data source provider com.databricks.spark.avro is mapped + to the built-in but external Avro data source module for backward compatibility. + 2.4.0
spark.sql.avro.compression.codec snappyCompression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, snappy, bzip2 and xz. Default codec is snappy. + Compression codec used in writing of AVRO files. Supported codecs: uncompressed, deflate, + snappy, bzip2 and xz. Default codec is snappy. + 2.4.0
spark.sql.avro.deflate.level -1Compression level for the deflate codec used in writing of AVRO files. Valid value must be in the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level in the current implementation. + Compression level for the deflate codec used in writing of AVRO files. Valid value must be in + the range of from 1 to 9 inclusive or -1. The default value is -1 which corresponds to 6 level + in the current implementation. + 2.4.0
diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md index bddffe02602e8..4c4b3b1eee8c2 100644 --- a/docs/sql-data-sources-orc.md +++ b/docs/sql-data-sources-orc.md @@ -27,15 +27,25 @@ serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileF the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`. - + - + + - + +
Property NameDefaultMeaning
Property NameDefaultMeaningSince Version
spark.sql.orc.impl nativeThe name of ORC implementation. It can be one of native and hive. native means the native ORC support. hive means the ORC library in Hive. + The name of ORC implementation. It can be one of native and hive. + native means the native ORC support. hive means the ORC library + in Hive. + 2.3.0
spark.sql.orc.enableVectorizedReader trueEnables vectorized orc decoding in native implementation. If false, a new non-vectorized ORC reader is used in native implementation. For hive implementation, this is ignored. + Enables vectorized orc decoding in native implementation. If false, + a new non-vectorized ORC reader is used in native implementation. + For hive implementation, this is ignored. + 2.3.0
diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index 53a1111cd8286..6e52446c9e39e 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -258,7 +258,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession `SET key=value` commands using SQL. - + @@ -267,6 +267,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession not differentiate between binary data and strings when writing out the Parquet schema. This flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems. + @@ -275,6 +276,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. + @@ -287,11 +289,13 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession Note that zstd requires ZStandardCodec to be installed before Hadoop 2.9.0, brotli requires BrotliCodec to be installed. + + @@ -300,6 +304,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession When set to false, Spark SQL will use the Hive SerDe for parquet tables instead of the built in support. + @@ -310,6 +315,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession schema is picked from the summary file or a random data file if no summary file is available.

+ @@ -321,5 +327,6 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession example, decimals will be written in int-based format. If Parquet output is intended for use with systems that do not support this newer format, set to true. +
Property NameDefaultMeaning
Property NameDefaultMeaningSince Version
spark.sql.parquet.binaryAsString false1.1.1
spark.sql.parquet.int96AsTimestamp1.3.0
spark.sql.parquet.compression.codec1.1.1
spark.sql.parquet.filterPushdown true Enables Parquet filter push-down optimization when set to true.1.2.0
spark.sql.hive.convertMetastoreParquet1.1.1
spark.sql.parquet.mergeSchema 1.5.0
spark.sql.parquet.writeLegacyFormat1.6.0