Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame.from_records(
[[let, num] for let in "DCBA" for num in [2, 1]], columns=["let", "num"]
)
print(df)
r1 = df.sort_values(["let", "num"])
print(r1)
def key_func(s: pd.Series) -> pd.Series:
result = s.sort_values()
return result
r2 = df.sort_values(["let", "num"], key=key_func)
print(r2)
Issue Description
When providing a key
argument to sort_values()
or sort_index()
, and specifying more than one column, the results are not sorted correctly.
In the above code, the output is:
let num
0 D 2
1 D 1
2 C 2
3 C 1
4 B 2
5 B 1
6 A 2
7 A 1
let num
7 A 1
6 A 2
5 B 1
4 B 2
3 C 1
2 C 2
1 D 1
0 D 2
let num
0 D 2
1 D 1
2 C 2
3 C 1
4 B 2
5 B 1
6 A 2
7 A 1
- The first DF is the original DF
- The second DF is the sorted DF without a
key
function. The results are first sorted on the columnlet
, then to break ties, sorted on the columnnum
- The third DF is the result of using a key function that sorts each column (based on the specification in the API - the function has to return a sorted column). The result is a DF that is not sorted. It should be the same as the second DF
Expected Behavior
The result of the sort with a key
argument in this case should be the same as without the function. When specifying the key
argument with more than one column, the result should be hierarchically sorted.
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.10.14
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.26100
machine : AMD64
processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.2.3
numpy : 2.2.1
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.2
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : 1.1
hypothesis : None
gcsfs : None
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : 3.8.4
numba : None
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 18.1.0
pyreadstat : 1.2.8
pytest : N/A
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.15.0
sqlalchemy : 2.0.36
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.11.0
xlrd : 2.0.1
xlsxwriter : 3.2.0
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None