[SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which command is compressed by broadcast#25123
[SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which command is compressed by broadcast#25123jessecai wants to merge 5 commits intoapache:masterfrom
Conversation
|
ok to test |
| private[spark] val BROADCAST_UDF_THRESHOLD = ConfigBuilder("spark.broadcast.UDFThreshold") | ||
| .doc("The threshold at which a serialized command is compressed by broadcast, in " + | ||
| "bytes unless otherwise specified") | ||
| .bytesConf(ByteUnit.BYTE) |
| "mechanisms to guarantee data won't be corrupted during broadcast") | ||
| .booleanConf.createWithDefault(true) | ||
|
|
||
| private[spark] val BROADCAST_UDF_THRESHOLD = ConfigBuilder("spark.broadcast.UDFThreshold") |
There was a problem hiding this comment.
The variable name looks confusing. COMMAND_COMPRESSION_THRESHOLD?
|
cc @HyukjinKwon |
|
Test build #107560 has finished for PR 25123 at commit
|
|
|
||
| private[spark] val BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD = | ||
| ConfigBuilder("spark.broadcast.UDFCompressionThreshold") | ||
| .doc("The threshold at which a a user-defined function (UDF) is compressed by broadcast, " + |
There was a problem hiding this comment.
I think this also applies to RDD APIs. We can just say, for instance, The threshold at which Python commands for RDD APIs and user-defined function (UDF) are serialized by broadcast .... Feel free to change wording.
There was a problem hiding this comment.
I updated the description to include this.
|
Test build #107564 has finished for PR 25123 at commit
|
|
Test build #107603 has finished for PR 25123 at commit
|
|
Test build #107623 has finished for PR 25123 at commit
|
|
LGTM Thanks! Merged to master. |
|
LGTM too |
…mand is compressed by broadcast ## What changes were proposed in this pull request? The `_prepare_for_python_RDD` method currently broadcasts a pickled command if its length is greater than the hardcoded value `1 << 20` (1M). This change sets this value as a Spark conf instead. ## How was this patch tested? Unit tests, manual tests. Closes apache#25123 from jessecai/SPARK-28355. Authored-by: Jesse Cai <jesse.cai@databricks.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
The
_prepare_for_python_RDDmethod currently broadcasts a pickled command if its length is greater than the hardcoded value1 << 20(1M). This change sets this value as a Spark conf instead.How was this patch tested?
Unit tests, manual tests.