New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32435][PYTHON] Remove heapq3 port from Python 3 #29229
Conversation
@srowen, @JoshRosen @viirya can you take a look when you're available please? |
@@ -498,7 +498,7 @@ def load(f): | |||
if current_chunk: | |||
chunks.append(iter(current_chunk)) | |||
|
|||
return heapq.merge(chunks, key=key, reverse=reverse) | |||
return heapq.merge(*chunks, key=key, reverse=reverse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a bit of modification when this file was ported from Python 3 because heapq
has to be able to compile with Python 2 as well. The diffs made are:
< def merge(iterables, key=None, reverse=False):
---
> def merge(*iterables, key=None, reverse=False):
216c218,219
< h_append([next(it), order * direction, it])
---
> next = it.__next__
> h_append([next(), order * direction, next])
223c226
< value, order, it = s = h[0]
---
> value, order, next = s = h[0]
225c228
< s[0] = next(it) # raises StopIteration when exhausted
---
> s[0] = next() # raises StopIteration when exhausted
231c234
< value, order, it = h[0]
---
> value, order, next = h[0]
233,234c236
< for value in it:
< yield value
---
> yield from next.__self__
239,240c241,243
< value = next(it)
< h_append([key(value), order * direction, value, it])
---
> next = it.__next__
> value = next()
> h_append([key(value), order * direction, value, next])
247c250
< key_value, order, value, it = s = h[0]
---
> key_value, order, value, next = s = h[0]
249c252
< value = next(it)
---
> value = next()
256c259
< key_value, order, value, it = h[0]
---
> key_value, order, value, next = h[0]
258,259c261
< for value in it:
< yield value
---
> yield from next.__self__
These differences don't look affecting any behaviours.
I think it was ported from Python 3.5: https://github.com/python/cpython/blob/3.5/Lib/heapq.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for doing this.
@@ -25,7 +25,7 @@ | |||
import random | |||
import sys | |||
|
|||
import pyspark.heapq3 as heapq | |||
import heapq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put it with other python built-in imports above?
Test build #126529 has finished for PR 29229 at commit
|
Retest this please. |
Test build #126537 has finished for PR 29229 at commit
|
Thank you guys. Let me take a look for the test failure tomorrow. Should be trivial to handle I believe. |
@@ -796,7 +796,7 @@ def load_partition(j): | |||
|
|||
if self._sorted: | |||
# all the partitions are already sorted | |||
sorted_items = heapq.merge(disk_items, key=operator.itemgetter(0)) | |||
sorted_items = heapq.merge(*disk_items, key=operator.itemgetter(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed one place here which caused the test failure. I double checked this is the last one I missed.
Test build #126612 has finished for PR 29229 at commit
|
retest this please |
Simply rebased to resolve conflicts. |
Test build #126625 has finished for PR 29229 at commit
|
Merged to master. Thanks @dongjoon-hyun and @viirya |
Test build #126631 has finished for PR 29229 at commit
|
What changes were proposed in this pull request?
This PR removes the manual port of
heapq3.py
introduced from SPARK-3073. The main reason of this was to support Python 2.6 and 2.7 because Python 2'sheapq.merge()
doesn't not supportkey
andreverse
.See
Since we dropped the Python 2 at SPARK-32138, we can remove this away.
Why are the changes needed?
To remove unnecessary codes. Also, we can leverage bug fixes made in Python 3.x at
heapq
.Does this PR introduce any user-facing change?
No, dev-only.
How was this patch tested?
Existing tests should cover. I locally ran and verified: