-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38491][PYTHON] Support ignore_index
of Series.sort_values
#35794
Conversation
ignore_index
of Series.sort_values
ignore_index
of Series.sort_values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise.
python/pyspark/pandas/frame.py
Outdated
Ignore index for the resulting axis | ||
|
||
>>> df.sort_values(by=['col1'], ignore_index=True) | ||
col1 col2 col3 | ||
0 A 2 0 | ||
1 B 9 9 | ||
2 C 4 3 | ||
3 D 7 2 | ||
4 None 8 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe can we refine the df
by manually creating the index to show how ignore_index
works more clearly??
e.g.
df = ps.DataFrame({
'col1': ['A', 'B', None, 'D', 'C'],
'col2': [2, 9, 8, 7, 4],
'col3': [0, 9, 4, 2, 3],
},
index=['idx1', 'idx2', 'idx3', 'idx4', 'idx5'],
columns=['col1', 'col2', 'col3'])
Seems like the current example shows the same result regardless of ignore_index
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @itholic . The resulting index order is different considering different ignore_index
input.
Sort by col1
>>> df.sort_values(by=['col1'])
col1 col2 col3
0 A 2 0
1 B 9 9
4 C 4 3
3 D 7 2
2 None 8 4
Ignore index for the resulting axis
>>> df.sort_values(by=['col1'], ignore_index=True)
col1 col2 col3
0 A 2 0
1 B 9 9
2 C 4 3
3 D 7 2
4 None 8 4
I may modify it if you still think it confusing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh.. I see. Then I think we can keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified @itholic :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Merged to master. |
What changes were proposed in this pull request?
Support
ignore_index
ofSeries.sort_values
, in which the resulting axis will be labeled0, 1, …, n - 1
.Why are the changes needed?
To reach parity with pandas.
Older pandas support
ignore_index
as well:Does this PR introduce any user-facing change?
Yes.
ignore_index
ofSeries.sort_values
is supported.How was this patch tested?
Unit tests.