New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement to the version sorting (versorted) output #13

Closed
tdruez opened this Issue Jul 28, 2014 · 7 comments

Comments

Projects
None yet
3 participants
@tdruez

tdruez commented Jul 28, 2014

Hi Seth, thanks for the great work on this lib!
I'd like your opinion on something regarding the version sorting.

From https://github.com/SethMMorton/natsort/blob/master/test_natsort/test_natsort.py#L211

>>>  a = ['1.9.9a', '1.11', '1.9.9b', '1.11.4', '1.10.1']

Let's add the '1.11a' version to the list.

>>> a = ['1.9.9a', '1.11', '1.9.9b', '1.11.4', '1.10.1']
>>> natsorted(a)
['1.9.9a', '1.9.9b', '1.10.1', '1.11', '1.11.4', '1.11a']

I think the '1.11a' should be sorted before the '1.11.4'.
What's your take on this? Thanks.

@SethMMorton SethMMorton added question and removed enhancement labels Jul 29, 2014

@SethMMorton

This comment has been minimized.

Show comment
Hide comment
@SethMMorton

SethMMorton Jul 29, 2014

Owner

What output would you expect from the following:

>>> a = ['1.11', '1.11.4', '1.11a', '1.11.0', '1.11a.0', '1.11a.4']

I agree that '1.11a' would go before '1.11.4' in some cases, but I am not sure if everyone would agree to this 100% percent of the time. The problem is that versioning for pre-release is not overly strict, so you can get cases like this where the pattern doesn't match up.

We can investigate why the ordering is the way it is using the output from the natsort_keygen- generated function:

>>> from natsort import natsorted, natsort_keygen
>>> nsk = natsort_keygen()
>>> [nsk(x) for x in natsorted(a)]
[(u'', 1, '.', 11),
 (u'', 1, '.', 11, '.', 0),
 (u'', 1, '.', 11, '.', 4),
 (u'', 1, '.', 11, 'a'),
 (u'', 1, '.', 11, 'a.', 0),
 (u'', 1, '.', 11, 'a.', 4)]

Since 'a' comes after '.' in the ASCII table, the '.' is put first. If you wanted to reverse this, you could replace '.' in your strings with something that comes at the end of the ASCII table (such as '~'):

>>> natsorted(a, key=lambda x: x.replace('.', '~'))
>>> ['1.11', '1.11a', '1.11a.0', '1.11a.4', '1.11.0', '1.11.4']

Would this work for your use case? Is this common enough that you think should be added to the natsort API?

Owner

SethMMorton commented Jul 29, 2014

What output would you expect from the following:

>>> a = ['1.11', '1.11.4', '1.11a', '1.11.0', '1.11a.0', '1.11a.4']

I agree that '1.11a' would go before '1.11.4' in some cases, but I am not sure if everyone would agree to this 100% percent of the time. The problem is that versioning for pre-release is not overly strict, so you can get cases like this where the pattern doesn't match up.

We can investigate why the ordering is the way it is using the output from the natsort_keygen- generated function:

>>> from natsort import natsorted, natsort_keygen
>>> nsk = natsort_keygen()
>>> [nsk(x) for x in natsorted(a)]
[(u'', 1, '.', 11),
 (u'', 1, '.', 11, '.', 0),
 (u'', 1, '.', 11, '.', 4),
 (u'', 1, '.', 11, 'a'),
 (u'', 1, '.', 11, 'a.', 0),
 (u'', 1, '.', 11, 'a.', 4)]

Since 'a' comes after '.' in the ASCII table, the '.' is put first. If you wanted to reverse this, you could replace '.' in your strings with something that comes at the end of the ASCII table (such as '~'):

>>> natsorted(a, key=lambda x: x.replace('.', '~'))
>>> ['1.11', '1.11a', '1.11a.0', '1.11a.4', '1.11.0', '1.11.4']

Would this work for your use case? Is this common enough that you think should be added to the natsort API?

@tdruez

This comment has been minimized.

Show comment
Hide comment
@tdruez

tdruez Jul 29, 2014

Thanks for the detailed explanation.
I'm able to get the result I was looking for thanks to your suggestion replacing the '.' with a '~'.
My use case was actually:

>>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta', '1.2alpha', '1.2.1', '1.1', '1.3']
>>> natsorted(a, key=lambda x: x.replace('.', '~'), reverse=True)
['1.3', '1.2.1', '1.2rc1', '1.2beta2', '1.2beta', '1.2alpha', '1.2', '1.1' ]

I don't think it needs to be added to the API, until other people manifest a need too.
Your solution is good enough for me.
Thanks :)

tdruez commented Jul 29, 2014

Thanks for the detailed explanation.
I'm able to get the result I was looking for thanks to your suggestion replacing the '.' with a '~'.
My use case was actually:

>>> a = ['1.2', '1.2rc1', '1.2beta2', '1.2beta', '1.2alpha', '1.2.1', '1.1', '1.3']
>>> natsorted(a, key=lambda x: x.replace('.', '~'), reverse=True)
['1.3', '1.2.1', '1.2rc1', '1.2beta2', '1.2beta', '1.2alpha', '1.2', '1.1' ]

I don't think it needs to be added to the API, until other people manifest a need too.
Your solution is good enough for me.
Thanks :)

@tdruez tdruez closed this Jul 29, 2014

@SethMMorton

This comment has been minimized.

Show comment
Hide comment
@SethMMorton

SethMMorton Jul 29, 2014

Owner

I'm glad I could help! But there is one thing I am confused about... The results you show look like it sorted in reversed order. Did you actually use the reverse keyword but not add that in your example? If not, it would be a bug.

Owner

SethMMorton commented Jul 29, 2014

I'm glad I could help! But there is one thing I am confused about... The results you show look like it sorted in reversed order. Did you actually use the reverse keyword but not add that in your example? If not, it would be a bug.

@tdruez

This comment has been minimized.

Show comment
Hide comment
@tdruez

tdruez Jul 30, 2014

Good catch, I do use the reverse.
I've edited my example to avoid further confusion.

tdruez commented Jul 30, 2014

Good catch, I do use the reverse.
I've edited my example to avoid further confusion.

@cel4

This comment has been minimized.

Show comment
Hide comment
@cel4

cel4 Aug 24, 2014

Sorry for jumping in here, but I have a related problem like @tdruez described in this bug and it probably does not make sense to open a new issue for this.

['1.9.9a', '1.9.9b', '1.10.1', '1.11', '1.11.4', '1.11a']

I think the '1.11a' should be sorted before the '1.11.4'.
What's your take on this? Thanks.

I have the same problem, but with release candidates. But I would like to add 1.11rc1 to the list. I would expect to get 1.11a1 < 1.11b1 < 1.11rc1 < 1.11. I'm pretty surprised that @tdruez suggested 1.11a should be after 1.11.

cel4 commented Aug 24, 2014

Sorry for jumping in here, but I have a related problem like @tdruez described in this bug and it probably does not make sense to open a new issue for this.

['1.9.9a', '1.9.9b', '1.10.1', '1.11', '1.11.4', '1.11a']

I think the '1.11a' should be sorted before the '1.11.4'.
What's your take on this? Thanks.

I have the same problem, but with release candidates. But I would like to add 1.11rc1 to the list. I would expect to get 1.11a1 < 1.11b1 < 1.11rc1 < 1.11. I'm pretty surprised that @tdruez suggested 1.11a should be after 1.11.

@SethMMorton

This comment has been minimized.

Show comment
Hide comment
@SethMMorton

SethMMorton Aug 24, 2014

Owner

If '1.11' were '1.11.0' instead, this would work as expected (assuming you do the '~' trick I suggested). The sorting algorithm doesn't actually comprehend the input as versions numbers, but rather separates out the numbers for you so that things ascend properly. What is happening is that each of the four numbers you suggest have '1.11' at the front, so the one with no trailing characters is placed first. Imagine that we replaced '1.11' with 'and', and you will see what I mean: and < anda1 < andb1 < andrc1

To remedy this, you can try something bold like this:

>>> natsorted(['1.11', '1.11rc1', '1.11a1', '1.11b1'], key=lambda x : x+'z')
['1.11a1', '1.11b1', '1.11rc1', '1.11']

This will tack on the 'z' character to each version, so that you will be sorting ['1.11z, '1.11rc1z', '1.11a1z', '1.11b1z'] instead. 'z' comes after any of 'a', 'b', or 'rc', so '1.11' ends up last. If you need to also do the '~' trick I suggested above, you could use the key lambda x : x.replace('.', '~')+'z'

If for some reason this does not work, let me know why and I can try and suggest other ways.

Owner

SethMMorton commented Aug 24, 2014

If '1.11' were '1.11.0' instead, this would work as expected (assuming you do the '~' trick I suggested). The sorting algorithm doesn't actually comprehend the input as versions numbers, but rather separates out the numbers for you so that things ascend properly. What is happening is that each of the four numbers you suggest have '1.11' at the front, so the one with no trailing characters is placed first. Imagine that we replaced '1.11' with 'and', and you will see what I mean: and < anda1 < andb1 < andrc1

To remedy this, you can try something bold like this:

>>> natsorted(['1.11', '1.11rc1', '1.11a1', '1.11b1'], key=lambda x : x+'z')
['1.11a1', '1.11b1', '1.11rc1', '1.11']

This will tack on the 'z' character to each version, so that you will be sorting ['1.11z, '1.11rc1z', '1.11a1z', '1.11b1z'] instead. 'z' comes after any of 'a', 'b', or 'rc', so '1.11' ends up last. If you need to also do the '~' trick I suggested above, you could use the key lambda x : x.replace('.', '~')+'z'

If for some reason this does not work, let me know why and I can try and suggest other ways.

@cel4

This comment has been minimized.

Show comment
Hide comment
@cel4

cel4 Aug 24, 2014

Thanks, the adding-z hack seems to work for me 👍

cel4 commented Aug 24, 2014

Thanks, the adding-z hack seems to work for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment