[SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side #30592

gaborgsomogyi · 2020-12-03T12:45:10Z

What changes were proposed in this pull request?

spark.buffer.size not applied in driver from pyspark. In this PR I've fixed this issue.

Why are the changes needed?

Apply the mentioned config on driver side.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests + manually.

Added the following code temporarily:

def local_connect_and_auth(port, auth_secret):
...
            sock.connect(sa)
            print("SPARK_BUFFER_SIZE: %d" % int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) <- This is the addition
            sockfile = sock.makefile("rwb", int(os.environ.get("SPARK_BUFFER_SIZE", 65536)))
...

Test:

#Compile Spark

echo "spark.buffer.size 10000" >> conf/spark-defaults.conf

$ ./bin/pyspark 
Python 3.8.5 (default, Jul 21 2020, 10:48:26) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
20/12/03 13:38:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/12/03 13:38:14 WARN SparkEnv: I/O encryption enabled without RPC encryption: keys will be visible on the wire.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Python version 3.8.5 (default, Jul 21 2020 10:48:26)
Spark context Web UI available at http://192.168.0.189:4040
Spark context available as 'sc' (master = local[*], app id = local-1606999094506).
SparkSession available as 'spark'.
>>> sc.setLogLevel("TRACE")
>>> sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect()
...
SPARK_BUFFER_SIZE: 10000
...
[[0], [2], [3], [4], [6]]
>>>

…driver side

HyukjinKwon · 2020-12-03T12:51:07Z

Thank you @gaborgsomogyi for fixing this.

gaborgsomogyi · 2020-12-03T13:04:54Z

@HyukjinKwon thank you for taking care of the PR!

SparkQA · 2020-12-03T14:01:36Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36734/

SparkQA · 2020-12-03T14:28:39Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36734/

SparkQA · 2020-12-03T16:05:20Z

Test build #132133 has finished for PR 30592 at commit 34786ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-12-03T16:37:24Z

Merged to master and branch-3.0.

… driver side `spark.buffer.size` not applied in driver from pyspark. In this PR I've fixed this issue. Apply the mentioned config on driver side. No. Existing unit tests + manually. Added the following code temporarily: ``` def local_connect_and_auth(port, auth_secret): ... sock.connect(sa) print("SPARK_BUFFER_SIZE: %d" % int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) <- This is the addition sockfile = sock.makefile("rwb", int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) ... ``` Test: ``` echo "spark.buffer.size 10000" >> conf/spark-defaults.conf $ ./bin/pyspark Python 3.8.5 (default, Jul 21 2020, 10:48:26) [Clang 11.0.3 (clang-1103.0.32.62)] on darwin Type "help", "copyright", "credits" or "license" for more information. 20/12/03 13:38:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/12/03 13:38:14 WARN SparkEnv: I/O encryption enabled without RPC encryption: keys will be visible on the wire. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Python version 3.8.5 (default, Jul 21 2020 10:48:26) Spark context Web UI available at http://192.168.0.189:4040 Spark context available as 'sc' (master = local[*], app id = local-1606999094506). SparkSession available as 'spark'. >>> sc.setLogLevel("TRACE") >>> sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect() ... SPARK_BUFFER_SIZE: 10000 ... [[0], [2], [3], [4], [6]] >>> ``` Closes #30592 from gaborgsomogyi/SPARK-33629. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit bd71186) Signed-off-by: HyukjinKwon <gurwls223@apache.org>

[SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on …

34786ed

…driver side

HyukjinKwon approved these changes Dec 3, 2020

View reviewed changes

github-actions bot added CORE PYTHON labels Dec 3, 2020

srowen approved these changes Dec 3, 2020

View reviewed changes

HyukjinKwon closed this in bd71186 Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side #30592

[SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side #30592

gaborgsomogyi commented Dec 3, 2020 •

edited

Loading

HyukjinKwon commented Dec 3, 2020

gaborgsomogyi commented Dec 3, 2020

SparkQA commented Dec 3, 2020

SparkQA commented Dec 3, 2020

SparkQA commented Dec 3, 2020

HyukjinKwon commented Dec 3, 2020

[SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side #30592

[SPARK-33629][PYTHON]Make spark.buffer.size configuration visible on driver side #30592

Conversation

gaborgsomogyi commented Dec 3, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Dec 3, 2020

gaborgsomogyi commented Dec 3, 2020

SparkQA commented Dec 3, 2020

SparkQA commented Dec 3, 2020

SparkQA commented Dec 3, 2020

HyukjinKwon commented Dec 3, 2020

gaborgsomogyi commented Dec 3, 2020 •

edited

Loading