[SPARK-38491][PYTHON] Support `ignore_index` of `Series.sort_values` #35794

xinrong-meng · 2022-03-10T02:08:24Z

What changes were proposed in this pull request?

Support ignore_index of Series.sort_values, in which the resulting axis will be labeled 0, 1, …, n - 1.

Why are the changes needed?

To reach parity with pandas.

Older pandas support ignore_index as well:

>>> pdf = pd.DataFrame({"a": [1, 2, 3, 4, 5, None, 7], "b": [7, 6, 5, 4, 3, 2, 1]}, index=np.random.rand(7))
>>> pdf.sort_values("b", ignore_index=True)
     a  b
0  7.0  1
1  NaN  2
2  5.0  3
3  4.0  4
4  3.0  5
5  2.0  6
6  1.0  7
>>> pd.__version__
'1.0.0'

Does this PR introduce any user-facing change?

Yes. ignore_index of Series.sort_values is supported.

>>> psdf = ps.DataFrame({"a": [1, 2, 3, 4, 5, None, 7], "b": [7, 6, 5, 4, 3, 2, 1]}, index=np.random.rand(7))
>>> psdf
            a  b
0.971253  1.0  7
0.401039  2.0  6
0.322310  3.0  5
0.932521  4.0  4
0.058432  5.0  3
0.122754  NaN  2
0.842971  7.0  1
>>> psdf.sort_values("b", ignore_index=True)
     a  b
0  7.0  1
1  NaN  2
2  5.0  3
3  4.0  4
4  3.0  5
5  2.0  6
6  1.0  7

How was this patch tested?

Unit tests.

xinrong-meng · 2022-03-10T04:53:58Z

Also CC @ueshin @itholic Thanks!

itholic

Looks good otherwise.

itholic · 2022-03-10T05:27:09Z

python/pyspark/pandas/frame.py

+        Ignore index for the resulting axis
+
+        >>> df.sort_values(by=['col1'], ignore_index=True)
+           col1  col2  col3
+        0     A     2     0
+        1     B     9     9
+        2     C     4     3
+        3     D     7     2
+        4  None     8     4


Maybe can we refine the df by manually creating the index to show how ignore_index works more clearly??

e.g.

df = ps.DataFrame({ 'col1': ['A', 'B', None, 'D', 'C'], 'col2': [2, 9, 8, 7, 4], 'col3': [0, 9, 4, 2, 3], }, index=['idx1', 'idx2', 'idx3', 'idx4', 'idx5'], columns=['col1', 'col2', 'col3'])

Seems like the current example shows the same result regardless of ignore_index value.

Thanks @itholic . The resulting index order is different considering different ignore_index input.

Sort by col1 >>> df.sort_values(by=['col1']) col1 col2 col3 0 A 2 0 1 B 9 9 4 C 4 3 3 D 7 2 2 None 8 4 Ignore index for the resulting axis >>> df.sort_values(by=['col1'], ignore_index=True) col1 col2 col3 0 A 2 0 1 B 9 9 2 C 4 3 3 D 7 2 4 None 8 4

I may modify it if you still think it confusing though.

Ohh.. I see. Then I think we can keep this as is.

Modified @itholic :)

HyukjinKwon · 2022-03-11T10:15:33Z

Merged to master.

ignore_index

c5a0ca3

github-actions bot added CORE PYTHON labels Mar 10, 2022

HyukjinKwon approved these changes Mar 10, 2022

View reviewed changes

xinrong-meng changed the title ~~Support ignore_index of Series.sort_values~~ [SPARK-38491][PYTHON] Support ignore_index of Series.sort_values Mar 10, 2022

xinrong-meng marked this pull request as ready for review March 10, 2022 04:45

doc e.g.

c0ad677

itholic reviewed Mar 10, 2022

View reviewed changes

doc e.g.

c96c5a7

itholic approved these changes Mar 10, 2022

View reviewed changes

HyukjinKwon closed this in 36023c2 Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-38491][PYTHON] Support `ignore_index` of `Series.sort_values` #35794

[SPARK-38491][PYTHON] Support `ignore_index` of `Series.sort_values` #35794

xinrong-meng commented Mar 10, 2022 •

edited

Loading

xinrong-meng commented Mar 10, 2022

itholic left a comment

itholic Mar 10, 2022

xinrong-meng Mar 10, 2022

itholic Mar 10, 2022

xinrong-meng Mar 10, 2022

itholic Mar 10, 2022

HyukjinKwon commented Mar 11, 2022

[SPARK-38491][PYTHON] Support ignore_index of Series.sort_values #35794

[SPARK-38491][PYTHON] Support ignore_index of Series.sort_values #35794

Conversation

xinrong-meng commented Mar 10, 2022 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

xinrong-meng commented Mar 10, 2022

itholic left a comment

Choose a reason for hiding this comment

itholic Mar 10, 2022

Choose a reason for hiding this comment

xinrong-meng Mar 10, 2022

Choose a reason for hiding this comment

itholic Mar 10, 2022

Choose a reason for hiding this comment

xinrong-meng Mar 10, 2022

Choose a reason for hiding this comment

itholic Mar 10, 2022

Choose a reason for hiding this comment

HyukjinKwon commented Mar 11, 2022

[SPARK-38491][PYTHON] Support `ignore_index` of `Series.sort_values` #35794

[SPARK-38491][PYTHON] Support `ignore_index` of `Series.sort_values` #35794

xinrong-meng commented Mar 10, 2022 •

edited

Loading