Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use bytesConf'. #24187

Closed
wants to merge 1 commit into from

Conversation

10110346
Copy link
Contributor

@10110346 10110346 commented Mar 23, 2019

What changes were proposed in this pull request?

Currently, if we want to configure spark.sql.files.maxPartitionBytes to 256 megabytes, we must set spark.sql.files.maxPartitionBytes=268435456, which is very unfriendly to users.

And if we set it like this:spark.sql.files.maxPartitionBytes=256M, we will encounter this exception:

Exception in thread "main" java.lang.IllegalArgumentException:
 spark.sql.files.maxPartitionBytes should be long, but was 256M
        at org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala)

This PR use bytesConf to replace longConf or intConf, if the configuration is used to set the number of bytes.
Configuration change list:
spark.files.maxPartitionBytes
spark.files.openCostInBytes
spark.shuffle.sort.initialBufferSize
spark.shuffle.spill.initialMemoryThreshold
spark.sql.autoBroadcastJoinThreshold
spark.sql.files.maxPartitionBytes
spark.sql.files.openCostInBytes
spark.sql.defaultSizeInBytes

How was this patch tested?

1.Existing unit tests
2.Manual testing

@SparkQA
Copy link

SparkQA commented Mar 23, 2019

Test build #103839 has finished for PR 24187 at commit 761e2b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -841,7 +841,7 @@ package object config {
private[spark] val SHUFFLE_SORT_INIT_BUFFER_SIZE =
ConfigBuilder("spark.shuffle.sort.initialBufferSize")
.internal()
.intConf
.bytesConf(ByteUnit.BYTE)
.checkValue(v => v > 0, "The value should be a positive integer.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if the input value exists in the integer range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah,thanks

@@ -579,15 +579,15 @@ package object config {

private[spark] val FILES_MAX_PARTITION_BYTES = ConfigBuilder("spark.files.maxPartitionBytes")
.doc("The maximum number of bytes to pack into a single partition when reading files.")
.longConf
.bytesConf(ByteUnit.BYTE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a net helpful change, as the parameter is maxPartitionBytes. I agree it would have been better to call it maxPartitionSize and accept values like "10m". I'm not strongly against it, as existing values would still work.

For other property values without "Bytes", I agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Yeah, the parameter name is a bit confusing, but I think it is not very important whether the parameter name contains "Bytes" or not, I prefer to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change since both styles (i.e. 1024 and 1k) can be accepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm fine with it.

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103881 has finished for PR 24187 at commit ef67451.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346
Copy link
Contributor Author

retest this please

@maropu
Copy link
Member

maropu commented Mar 25, 2019

We could have the same fix below, too?

Anyway, have you checked all the related places for the same fix?

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103894 has finished for PR 24187 at commit ef67451.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@10110346
Copy link
Contributor Author

10110346 commented Mar 25, 2019

We could have the same fix below, too?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala

Line 61 in 8ec6cb6
val FILESOURCE_TABLE_RELATION_CACHE_SIZE =

Anyway, have you checked all the related places for the same fix?

Thanks.
FILESOURCE_TABLE_RELATION_CACHE_SIZE is used to configure the number of entries, not the number of bytes.
I've checked where buildConf and ConfigBuilder are called.

@10110346
Copy link
Contributor Author

retest this please

@maropu
Copy link
Member

maropu commented Mar 25, 2019

ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description?

@10110346
Copy link
Contributor Author

ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description?

Ok, thanks.

@@ -144,7 +144,7 @@ public UnsafeShuffleWriter(
this.sparkConf = sparkConf;
this.transferToEnabled = sparkConf.getBoolean("spark.file.transferTo", true);
this.initialSortBufferSize =
(int) sparkConf.get(package$.MODULE$.SHUFFLE_SORT_INIT_BUFFER_SIZE());
(int) (long) sparkConf.get(package$.MODULE$.SHUFFLE_SORT_INIT_BUFFER_SIZE());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we dont' have this long cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't convert to long first , it will encounter exception like this:
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

.intConf
.checkValue(v => v > 0, "The value should be a positive integer.")
.bytesConf(ByteUnit.BYTE)
.checkValue(v => v > 0 && v <= Int.MaxValue, "The value should be a positive integer.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz update the message, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I leave it to other reviwers. cc: @cloud-fan @srowen @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103900 has finished for PR 24187 at commit ef67451.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103909 has finished for PR 24187 at commit 206857d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e4b36df Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants