[SPARK-37244][PYTHON][FOLLOWUP] Adjust `pyspark.rdd` doctest #34529

dongjoon-hyun · 2021-11-09T05:43:12Z

What changes were proposed in this pull request?

This PR is a follow-up of #34526 to adjust one pyspark.rdd doctest additionally.

- >>> b''.join(result).decode('utf-8')
+ >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result])

Why are the changes needed?

Python 3.8/3.9

Using Python version 3.8.12 (default, Nov  8 2021 17:15:19)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636432954207).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
[b'bar\n', b'foo\n']

Python 3.10

Using Python version 3.10.0 (default, Oct 29 2021 14:35:18)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636433378727).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
['bar\n', 'foo\n']

Does this PR introduce any user-facing change?

No.

How was this patch tested?

$ python/run-tests --testnames pyspark.rdd

SparkQA · 2021-11-09T06:30:40Z

Test build #145018 has finished for PR 34529 at commit 795f083.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2021-11-09T06:35:31Z

Merged to master.

SparkQA · 2021-11-09T06:47:12Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49490/

dongjoon-hyun · 2021-11-09T06:51:03Z

Thank you, @HyukjinKwon !

SparkQA · 2021-11-09T07:46:01Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49490/

This PR aims to support building and running tests on Python 3.10. Python 3.10 added many new features and breaking changes. - https://docs.python.org/3/whatsnew/3.10.html This PR is a follow-up of apache#34526 to adjust one `pyspark.rdd` doctest additionally. ```python - >>> b''.join(result).decode('utf-8') + >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result]) ``` **Python 3.8/3.9** ```python Using Python version 3.8.12 (default, Nov 8 2021 17:15:19) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1636432954207). SparkSession available as 'spark'. >>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> from glob import glob >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> result [b'bar\n', b'foo\n'] ``` **Python 3.10** ```python Using Python version 3.10.0 (default, Oct 29 2021 14:35:18) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1636433378727). SparkSession available as 'spark'. >>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> from glob import glob >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> result ['bar\n', 'foo\n'] ``` No. ``` $ python/run-tests --testnames pyspark.rdd ``` Closes apache#34529 from dongjoon-hyun/SPARK-37244-2. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 47ceae4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

fix doctest

795f083

github-actions bot added CORE PYTHON labels Nov 9, 2021

HyukjinKwon approved these changes Nov 9, 2021

View reviewed changes

HyukjinKwon closed this in 47ceae4 Nov 9, 2021

dongjoon-hyun deleted the SPARK-37244-2 branch November 9, 2021 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37244][PYTHON][FOLLOWUP] Adjust `pyspark.rdd` doctest #34529

[SPARK-37244][PYTHON][FOLLOWUP] Adjust `pyspark.rdd` doctest #34529

dongjoon-hyun commented Nov 9, 2021 •

edited

SparkQA commented Nov 9, 2021

HyukjinKwon commented Nov 9, 2021

SparkQA commented Nov 9, 2021

dongjoon-hyun commented Nov 9, 2021

SparkQA commented Nov 9, 2021

[SPARK-37244][PYTHON][FOLLOWUP] Adjust pyspark.rdd doctest #34529

[SPARK-37244][PYTHON][FOLLOWUP] Adjust pyspark.rdd doctest #34529

Conversation

dongjoon-hyun commented Nov 9, 2021 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented Nov 9, 2021

HyukjinKwon commented Nov 9, 2021

SparkQA commented Nov 9, 2021

dongjoon-hyun commented Nov 9, 2021

SparkQA commented Nov 9, 2021

[SPARK-37244][PYTHON][FOLLOWUP] Adjust `pyspark.rdd` doctest #34529

[SPARK-37244][PYTHON][FOLLOWUP] Adjust `pyspark.rdd` doctest #34529

dongjoon-hyun commented Nov 9, 2021 •

edited