New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong ordering of ordered pandas categoricals #137
Comments
Out of curiosity, why use |
it is part of generic code that handles different cases. In my case, I have an auto-generated report table. That's a pandas dataframe that can have multiple index levels. It is recently emerged as a useful usecase that some levels can be categorical with a strict pre-defined sorting order. Example: level1 - string, level2 - categorical. Sorting should be done on both. |
Could you encapsulate this in a function? def mysort(x):
try:
if x.ordered:
return x.sort_values()
except AttributeError:
pass
return natsorted(x) This way you can pass any data collection, and it will sort with |
I'll try to provide a more complete example: # create dataframe with three columns
df = pd.DataFrame(dict(col1=[1, 2, 3, 4, 5], col2=['third', 'fourth', 'first', 'first', 'third'], col3=['type 10', 'type 10', 'type 12', 'type 2', 'type 2']))
# convert 'col2' to categorical.
# Values are only possible from the provided list of values, and comparison between them is defined by an order
df['col2'] = pd.Categorical(df['col2'], ['first', 'second', 'third', 'fourth'], ordered=True)
# Move two columns into index
df = df.set_index(['col2', 'col3'])
df = pd.DataFrame(dict(col1=[1, 2, 3, 4, 5], col2=['third', 'second', 'first', 'first', 'third'], col3=['type 10', 'type 10', 'type 12', 'type 2', 'type 2']))
Expected order:
Normal sorting can't deal with col2 - it sorts those as strings, but uses given ordering for the col1
Natsorting doesn't properly handle the first one:
If that's too abstract, you can imagine setting like col2->department name, col3 -> task (number with description). In my case the structure of index (number of levels and their particular types) is different each time. |
After a bit of digging I see that it is challenging because of auto-conversion to string type. I wonder if there is some other string-like type with custom ordering that both natsort and pandas can handle. |
I'm sorry, but I am still not clear what you want to achieve. Can you annotate your examples a bit more? |
tried to add more details and some comments to the example above |
Would it be possible to enforce all index columns to be categorical? For example, for the column with the "non obvious" order you did df['col2'] = pd.Categorical(df['col2'], ['first', 'second', 'third', 'fourth'], ordered=True) For the other index column you could do df['col3'] = pd.Categorical(df['col3'], natsorted(set(df['col3'])), ordered=True) Then |
@arogozhnikov Did this solution work for you? |
@SethMMorton sorry for dropping the ball, yes I believe this should cover my needs |
Describe the bug
There is a separate dtype in pandas - Categorical, which can optionally be turned to sortable with arbitrary order of elements. It is very convenient when order of elements should be fixed and can't be described by any logic.
Expected behavior
natsorted should when dealing with ordered categorical columns should just use comparison of elements defined by ordering.
Environment (please complete the following information):
To Reproduce
The text was updated successfully, but these errors were encountered: