New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27256][CORE][SQL]If the configuration is used to set the number of bytes, we'd better use bytesConf
'.
#24187
Conversation
Test build #103839 has finished for PR 24187 at commit
|
@@ -841,7 +841,7 @@ package object config { | |||
private[spark] val SHUFFLE_SORT_INIT_BUFFER_SIZE = | |||
ConfigBuilder("spark.shuffle.sort.initialBufferSize") | |||
.internal() | |||
.intConf | |||
.bytesConf(ByteUnit.BYTE) | |||
.checkValue(v => v > 0, "The value should be a positive integer.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check if the input value exists in the integer range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah,thanks
@@ -579,15 +579,15 @@ package object config { | |||
|
|||
private[spark] val FILES_MAX_PARTITION_BYTES = ConfigBuilder("spark.files.maxPartitionBytes") | |||
.doc("The maximum number of bytes to pack into a single partition when reading files.") | |||
.longConf | |||
.bytesConf(ByteUnit.BYTE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a net helpful change, as the parameter is maxPartitionBytes. I agree it would have been better to call it maxPartitionSize
and accept values like "10m". I'm not strongly against it, as existing values would still work.
For other property values without "Bytes", I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Yeah, the parameter name is a bit confusing, but I think it is not very important whether the parameter name contains "Bytes" or not, I prefer to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change since both styles (i.e. 1024
and 1k
) can be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm fine with it.
Test build #103881 has finished for PR 24187 at commit
|
retest this please |
We could have the same fix below, too? spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala Line 61 in 8ec6cb6
Anyway, have you checked all the related places for the same fix? |
Test build #103894 has finished for PR 24187 at commit
|
Thanks. |
retest this please |
ok, to make other reviewers understood easily, could you list up all the configs that this pr changed in the PR description? |
Ok, thanks. |
@@ -144,7 +144,7 @@ public UnsafeShuffleWriter( | |||
this.sparkConf = sparkConf; | |||
this.transferToEnabled = sparkConf.getBoolean("spark.file.transferTo", true); | |||
this.initialSortBufferSize = | |||
(int) sparkConf.get(package$.MODULE$.SHUFFLE_SORT_INIT_BUFFER_SIZE()); | |||
(int) (long) sparkConf.get(package$.MODULE$.SHUFFLE_SORT_INIT_BUFFER_SIZE()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we dont' have this long cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't convert to long first , it will encounter exception like this:
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
.intConf | ||
.checkValue(v => v > 0, "The value should be a positive integer.") | ||
.bytesConf(ByteUnit.BYTE) | ||
.checkValue(v => v > 0 && v <= Int.MaxValue, "The value should be a positive integer.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz update the message, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and I leave it to other reviwers. cc: @cloud-fan @srowen @dongjoon-hyun
Test build #103900 has finished for PR 24187 at commit
|
Test build #103909 has finished for PR 24187 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
Currently, if we want to configure
spark.sql.files.maxPartitionBytes
to 256 megabytes, we must setspark.sql.files.maxPartitionBytes=268435456
, which is very unfriendly to users.And if we set it like this:
spark.sql.files.maxPartitionBytes=256M
, we will encounter this exception:This PR use
bytesConf
to replacelongConf
orintConf
, if the configuration is used to set the number of bytes.Configuration change list:
spark.files.maxPartitionBytes
spark.files.openCostInBytes
spark.shuffle.sort.initialBufferSize
spark.shuffle.spill.initialMemoryThreshold
spark.sql.autoBroadcastJoinThreshold
spark.sql.files.maxPartitionBytes
spark.sql.files.openCostInBytes
spark.sql.defaultSizeInBytes
How was this patch tested?
1.Existing unit tests
2.Manual testing