[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. #24187

10110346 · 2019-03-23T08:51:01Z

What changes were proposed in this pull request?

Currently, if we want to configure spark.sql.files.maxPartitionBytes to 256 megabytes, we must set spark.sql.files.maxPartitionBytes=268435456, which is very unfriendly to users.

And if we set it like this:spark.sql.files.maxPartitionBytes=256M, we will encounter this exception:

Exception in thread "main" java.lang.IllegalArgumentException:
 spark.sql.files.maxPartitionBytes should be long, but was 256M
        at org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala)

This PR use bytesConf to replace longConf or intConf, if the configuration is used to set the number of bytes.
Configuration change list:
spark.files.maxPartitionBytes
spark.files.openCostInBytes
spark.shuffle.sort.initialBufferSize
spark.shuffle.spill.initialMemoryThreshold
spark.sql.autoBroadcastJoinThreshold
spark.sql.files.maxPartitionBytes
spark.sql.files.openCostInBytes
spark.sql.defaultSizeInBytes

How was this patch tested?

1.Existing unit tests
2.Manual testing

SparkQA · 2019-03-23T22:36:09Z

Test build #103839 has finished for PR 24187 at commit 761e2b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2019-03-24T02:25:25Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

@@ -841,7 +841,7 @@ package object config {
  private[spark] val SHUFFLE_SORT_INIT_BUFFER_SIZE =
    ConfigBuilder("spark.shuffle.sort.initialBufferSize")
      .internal()
-      .intConf
+      .bytesConf(ByteUnit.BYTE)
      .checkValue(v => v > 0, "The value should be a positive integer.")


We need to check if the input value exists in the integer range?

Yeah,thanks

srowen · 2019-03-25T01:23:17Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

@@ -579,15 +579,15 @@ package object config {

  private[spark] val FILES_MAX_PARTITION_BYTES = ConfigBuilder("spark.files.maxPartitionBytes")
    .doc("The maximum number of bytes to pack into a single partition when reading files.")
-    .longConf
+    .bytesConf(ByteUnit.BYTE)


I'm not sure this is a net helpful change, as the parameter is maxPartitionBytes. I agree it would have been better to call it maxPartitionSize and accept values like "10m". I'm not strongly against it, as existing values would still work.

For other property values without "Bytes", I agree.

Thanks.
Yeah, the parameter name is a bit confusing, but I think it is not very important whether the parameter name contains "Bytes" or not, I prefer to change it.

I like this change since both styles (i.e. 1024 and 1k) can be accepted.

OK, I'm fine with it.

SparkQA · 2019-03-25T04:56:47Z

Test build #103881 has finished for PR 24187 at commit ef67451.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

10110346 · 2019-03-25T05:27:53Z

retest this please

maropu · 2019-03-25T06:07:05Z

We could have the same fix below, too?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala

Line 61 in 8ec6cb6

val FILESOURCE_TABLE_RELATION_CACHE_SIZE =

Anyway, have you checked all the related places for the same fix?

SparkQA · 2019-03-25T07:05:01Z

Test build #103894 has finished for PR 24187 at commit ef67451.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

10110346 · 2019-03-25T07:31:13Z

We could have the same fix below, too?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala

Line 61 in 8ec6cb6
val FILESOURCE_TABLE_RELATION_CACHE_SIZE =

Anyway, have you checked all the related places for the same fix?

Thanks.
FILESOURCE_TABLE_RELATION_CACHE_SIZE is used to configure the number of entries, not the number of bytes.
I've checked where buildConf and ConfigBuilder are called.

10110346 · 2019-03-25T07:33:42Z

retest this please

maropu · 2019-03-25T10:02:07Z

ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description?

10110346 · 2019-03-25T10:42:39Z

ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description?

Ok, thanks.

maropu · 2019-03-25T11:14:42Z

core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java

@@ -144,7 +144,7 @@ public UnsafeShuffleWriter(
    this.sparkConf = sparkConf;
    this.transferToEnabled = sparkConf.getBoolean("spark.file.transferTo", true);
    this.initialSortBufferSize =
-      (int) sparkConf.get(package$.MODULE$.SHUFFLE_SORT_INIT_BUFFER_SIZE());
+      (int) (long) sparkConf.get(package$.MODULE$.SHUFFLE_SORT_INIT_BUFFER_SIZE());


what if we dont' have this long cast?

If we don't convert to long first , it will encounter exception like this:
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

maropu · 2019-03-25T11:15:01Z

core/src/main/scala/org/apache/spark/internal/config/package.scala

-      .intConf
-      .checkValue(v => v > 0, "The value should be a positive integer.")
+      .bytesConf(ByteUnit.BYTE)
+      .checkValue(v => v > 0 && v <= Int.MaxValue, "The value should be a positive integer.")


plz update the message, too.

maropu

LGTM and I leave it to other reviwers. cc: @cloud-fan @srowen @dongjoon-hyun

SparkQA · 2019-03-25T12:26:04Z

Test build #103900 has finished for PR 24187 at commit ef67451.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-25T17:17:59Z

Test build #103909 has finished for PR 24187 at commit 206857d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-25T21:48:04Z

thanks, merging to master!

maropu reviewed Mar 24, 2019

View reviewed changes

10110346 force-pushed the bytesConf branch from 761e2b7 to ef67451 Compare March 25, 2019 01:00

srowen reviewed Mar 25, 2019

View reviewed changes

maropu reviewed Mar 25, 2019

View reviewed changes

byteconf

206857d

10110346 force-pushed the bytesConf branch from ef67451 to 206857d Compare March 25, 2019 11:51

maropu approved these changes Mar 25, 2019

View reviewed changes

cloud-fan closed this in e4b36df Mar 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. #24187

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. #24187

10110346 commented Mar 23, 2019 •

edited

SparkQA commented Mar 23, 2019

maropu Mar 24, 2019

10110346 Mar 25, 2019

srowen Mar 25, 2019

10110346 Mar 25, 2019

kiszk Mar 25, 2019

srowen Mar 25, 2019

SparkQA commented Mar 25, 2019

10110346 commented Mar 25, 2019

maropu commented Mar 25, 2019

SparkQA commented Mar 25, 2019

10110346 commented Mar 25, 2019 •

edited

10110346 commented Mar 25, 2019

maropu commented Mar 25, 2019

10110346 commented Mar 25, 2019

maropu Mar 25, 2019

10110346 Mar 25, 2019

maropu Mar 25, 2019

10110346 Mar 25, 2019

maropu left a comment

SparkQA commented Mar 25, 2019

SparkQA commented Mar 25, 2019

cloud-fan commented Mar 25, 2019

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use bytesConf'. #24187

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use bytesConf'. #24187

Conversation

10110346 commented Mar 23, 2019 • edited

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Mar 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 25, 2019

10110346 commented Mar 25, 2019

maropu commented Mar 25, 2019

SparkQA commented Mar 25, 2019

10110346 commented Mar 25, 2019 • edited

10110346 commented Mar 25, 2019

maropu commented Mar 25, 2019

10110346 commented Mar 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maropu left a comment

Choose a reason for hiding this comment

SparkQA commented Mar 25, 2019

SparkQA commented Mar 25, 2019

cloud-fan commented Mar 25, 2019

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. #24187

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. #24187

10110346 commented Mar 23, 2019 •

edited

10110346 commented Mar 25, 2019 •

edited