Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37244][PYTHON][FOLLOWUP] Adjust pyspark.rdd doctest #34529

Closed
wants to merge 1 commit into from
Closed

[SPARK-37244][PYTHON][FOLLOWUP] Adjust pyspark.rdd doctest #34529

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 9, 2021

What changes were proposed in this pull request?

This PR is a follow-up of #34526 to adjust one pyspark.rdd doctest additionally.

- >>> b''.join(result).decode('utf-8')
+ >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result])

Why are the changes needed?

Python 3.8/3.9

Using Python version 3.8.12 (default, Nov  8 2021 17:15:19)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636432954207).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
[b'bar\n', b'foo\n']

Python 3.10

Using Python version 3.10.0 (default, Oct 29 2021 14:35:18)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636433378727).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
['bar\n', 'foo\n']

Does this PR introduce any user-facing change?

No.

How was this patch tested?

$ python/run-tests --testnames pyspark.rdd

@SparkQA
Copy link

SparkQA commented Nov 9, 2021

Test build #145018 has finished for PR 34529 at commit 795f083.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Nov 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49490/

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-37244-2 branch November 9, 2021 06:51
@SparkQA
Copy link

SparkQA commented Nov 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49490/

sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
This PR aims to support building and running tests on Python 3.10.

Python 3.10 added many new features and breaking changes.
- https://docs.python.org/3/whatsnew/3.10.html

This PR is a follow-up of apache#34526 to adjust one `pyspark.rdd` doctest additionally.

```python
- >>> b''.join(result).decode('utf-8')
+ >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result])
```

**Python 3.8/3.9**
```python
Using Python version 3.8.12 (default, Nov  8 2021 17:15:19)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636432954207).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
[b'bar\n', b'foo\n']
```

**Python 3.10**
```python
Using Python version 3.10.0 (default, Oct 29 2021 14:35:18)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1636433378727).
SparkSession available as 'spark'.
>>> from tempfile import NamedTemporaryFile
>>> tempFile3 = NamedTemporaryFile(delete=True)
>>> tempFile3.close()
>>> codec = "org.apache.hadoop.io.compress.GzipCodec"
>>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec)
>>> from fileinput import input, hook_compressed
>>> from glob import glob
>>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed))
>>> result
['bar\n', 'foo\n']
```

No.

```
$ python/run-tests --testnames pyspark.rdd
```

Closes apache#34529 from dongjoon-hyun/SPARK-37244-2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 47ceae4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants