Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38491][PYTHON] Support ignore_index of Series.sort_values #35794

Closed
wants to merge 3 commits into from

Conversation

xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Mar 10, 2022

What changes were proposed in this pull request?

Support ignore_index of Series.sort_values, in which the resulting axis will be labeled 0, 1, …, n - 1.

Why are the changes needed?

To reach parity with pandas.

Older pandas support ignore_index as well:

>>> pdf = pd.DataFrame({"a": [1, 2, 3, 4, 5, None, 7], "b": [7, 6, 5, 4, 3, 2, 1]}, index=np.random.rand(7))
>>> pdf.sort_values("b", ignore_index=True)
     a  b
0  7.0  1
1  NaN  2
2  5.0  3
3  4.0  4
4  3.0  5
5  2.0  6
6  1.0  7
>>> pd.__version__
'1.0.0'

Does this PR introduce any user-facing change?

Yes. ignore_index of Series.sort_values is supported.

>>> psdf = ps.DataFrame({"a": [1, 2, 3, 4, 5, None, 7], "b": [7, 6, 5, 4, 3, 2, 1]}, index=np.random.rand(7))
>>> psdf
            a  b
0.971253  1.0  7
0.401039  2.0  6
0.322310  3.0  5
0.932521  4.0  4
0.058432  5.0  3
0.122754  NaN  2
0.842971  7.0  1
>>> psdf.sort_values("b", ignore_index=True)
     a  b
0  7.0  1
1  NaN  2
2  5.0  3
3  4.0  4
4  3.0  5
5  2.0  6
6  1.0  7

How was this patch tested?

Unit tests.

@xinrong-meng xinrong-meng changed the title Support ignore_index of Series.sort_values [SPARK-38491][PYTHON] Support ignore_index of Series.sort_values Mar 10, 2022
@xinrong-meng xinrong-meng marked this pull request as ready for review March 10, 2022 04:45
@xinrong-meng
Copy link
Member Author

Also CC @ueshin @itholic Thanks!

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

Comment on lines 6907 to 6915
Ignore index for the resulting axis

>>> df.sort_values(by=['col1'], ignore_index=True)
col1 col2 col3
0 A 2 0
1 B 9 9
2 C 4 3
3 D 7 2
4 None 8 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can we refine the df by manually creating the index to show how ignore_index works more clearly??

e.g.

df = ps.DataFrame({
    'col1': ['A', 'B', None, 'D', 'C'],
    'col2': [2, 9, 8, 7, 4],
    'col3': [0, 9, 4, 2, 3],
  },
  index=['idx1', 'idx2', 'idx3', 'idx4', 'idx5'],
  columns=['col1', 'col2', 'col3'])

Seems like the current example shows the same result regardless of ignore_index value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @itholic . The resulting index order is different considering different ignore_index input.

        Sort by col1

        >>> df.sort_values(by=['col1'])
           col1  col2  col3
        0     A     2     0
        1     B     9     9
        4     C     4     3
        3     D     7     2
        2  None     8     4

        Ignore index for the resulting axis

        >>> df.sort_values(by=['col1'], ignore_index=True)
           col1  col2  col3
        0     A     2     0
        1     B     9     9
        2     C     4     3
        3     D     7     2
        4  None     8     4

I may modify it if you still think it confusing though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh.. I see. Then I think we can keep this as is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified @itholic :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants