Skip to content

BUG: DataFrame.sort_values() by 2 columns and a key function produces incorrect results #60673

Closed
@Dr-Irv

Description

@Dr-Irv

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame.from_records(
    [[let, num] for let in "DCBA" for num in [2, 1]], columns=["let", "num"]
)
print(df)

r1 = df.sort_values(["let", "num"])
print(r1)


def key_func(s: pd.Series) -> pd.Series:
    result = s.sort_values()
    return result


r2 = df.sort_values(["let", "num"], key=key_func)
print(r2)

Issue Description

When providing a key argument to sort_values() or sort_index(), and specifying more than one column, the results are not sorted correctly.

In the above code, the output is:

  let  num
0   D    2
1   D    1
2   C    2
3   C    1
4   B    2
5   B    1
6   A    2
7   A    1
  let  num
7   A    1
6   A    2
5   B    1
4   B    2
3   C    1
2   C    2
1   D    1
0   D    2
  let  num
0   D    2
1   D    1
2   C    2
3   C    1
4   B    2
5   B    1
6   A    2
7   A    1
  • The first DF is the original DF
  • The second DF is the sorted DF without a key function. The results are first sorted on the column let, then to break ties, sorted on the column num
  • The third DF is the result of using a key function that sorts each column (based on the specification in the API - the function has to return a sorted column). The result is a DF that is not sorted. It should be the same as the second DF

Expected Behavior

The result of the sort with a key argument in this case should be the same as without the function. When specifying the key argument with more than one column, the result should be hierarchically sorted.

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.10.14
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.26100
machine : AMD64
processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.2.3
numpy : 2.2.1
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.2
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : 1.1
hypothesis : None
gcsfs : None
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : 3.8.4
numba : None
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 18.1.0
pyreadstat : 1.2.8
pytest : N/A
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.15.0
sqlalchemy : 2.0.36
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.11.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsSortinge.g. sort_index, sort_values

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions