Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq.sortBy cannot handle sequences of floats containing NaN #370

Closed
darklajid opened this issue Apr 17, 2015 · 9 comments
Closed

Seq.sortBy cannot handle sequences of floats containing NaN #370

darklajid opened this issue Apr 17, 2015 · 9 comments
Labels

Comments

@darklajid
Copy link

I stumbled upon these cases in a one-off fsx - I'm not regularly using F#. This might be entirely my fault.
That said:

Microsoft (R) F# Interactive version 12.0.30815.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

> [| 3.2; 1.5; nan; 4.7; nan; 8.5 |] |> Array.sortBy (fun x -> x);;
val it : float [] = [|nan; nan; 1.5; 3.2; 4.7; 8.5|]

Good.

> [| 3.2; 1.5; nan; 4.7; nan; 8.5 |] |> Seq.sort;;
val it : seq<float> = seq [nan; nan; 1.5; 3.2; ...]

Fine.

> [| 3.2; 1.5; nan; 4.7; nan; 8.5 |] |> Seq.sortBy (fun x -> x);;
val it : seq<float> = seq [3.2; 1.5; nan; 4.7; ...]

What? Why?

I would expect that Array.sortBy and Seq.sortBy behave the same. And I would expect that sortBy identity is the same as sort? What's going on here?

@latkin
Copy link
Contributor

latkin commented Apr 17, 2015

Sorting on Seq and List can, in general, give different results than sorting on Array. Seq and List use stable sorts, whereas Array uses unstable sort. This is an intentional choice, allowing for max performance in array processing code.

That explains why different codepaths are used, but it doesn't explain this particular bad behavior.

F# uses generic comparison by default, which has some subtleties. It seems that:

  • Some generic compares go down this path, where NaN isn't specially considered, and winds up being considered "equal to" everything
    • Seq.sortBy
    • compare nan 1.0 // returns 0
    • Array.sortDescending
  • Some generic compares go down this path, where NaN compares as "less than" true numbers
    • Array.sortBy
    • Seq.sort
    • compare (nan, 1.0) (1.0, 1.0) // returns -1

I think the former should be considered a bug, but @dsyme would need to weigh in. FWIW this inconsistency has been present for at least a few versions, it's not a new behavior.

Note that this same issue affected the standard .NET framework Array.Sort in .NET 3.5, and behavior was changed in .NET 4.0.

@latkin latkin added the Bug label Apr 17, 2015
@darklajid
Copy link
Author

Whoa. Thanks a lot for the fast and in-depth feedback. Yes, that linked SO discussion about C#/.Net is exactly the same issue. I'm certainly just a random newbie here, but I'd consider the .Net 4 (and Array.sort/Array.sortBy) behavior more sane and would be glad if F# would follow that bug fix.

On top of that: If I understand your explanation you're saying that Array.sortBy and Array.sortDescending would sort completely different if NaN is involved? Regardless of what you guys think about the handling of NaN in comparisons, the inconsistency is certainly unwanted/a bug, right? Just like Seq.sort and Seq.sortBy should behave the same way?

@latkin
Copy link
Contributor

latkin commented Apr 17, 2015

Your understanding is correct - as it stands we are in an uncomfortable situation where we have

let nums = [|nan; 3.; 1.; nan; 4.; 1.; 5.; |]

// result [|nan; nan; 1.0; 1.0; 3.0; 4.0; 5.0|]
nums |> Array.sort

// result [|nan; 3.0; 1.0; nan; 5.0; 4.0; 1.0|]
nums |> Array.sortDescending

@darklajid
Copy link
Author

My last lay man comment: In your first paragraph you point out that List/Seq and Array use different sort algorithms. It's worth mentioning that the error / interesting behavior happens regardless (Array, Seq, List), so that might be a red herring?

@forki
Copy link
Contributor

forki commented Apr 18, 2015

removed my last comment.

https://github.com/Microsoft/visualfsharp/pull/372/files#diff-21b017f6f70c9eecf67cd91ccbdc88a4R891 seems to actually fix it, but then we loose stability. @theimowski is working on a property test which check sorting stability for List and Seq

@forki
Copy link
Contributor

forki commented Apr 18, 2015

so I think I found the issue. The sorting code is actually working. It seems the float comparer is broken:

let c = Microsoft.FSharp.Core.LanguagePrimitives.FastGenericComparer<float>
c.Compare(3.,4.)  // -1
c.Compare(3.,3.)  // 0
c.Compare(nan,8.) // 0 -- ??? 

if we change the stableSortWithKeysAndComparer to use additional equality check then sortBy is working (of course this is not the solution). (at moment we do c.Compare(ki, keys.[j]) = 0 which doesn't work with nan)

Instead I assume we have to fix GenericComparisonFast for floats.
I mean I have no idea what c.Compare(nan,8.) should return, but 0 is not helpful for our case. ;-)
I assume we should have c.Compare(nan,8.) = -1 at least this is what @PatrickMcDonald suggested by testing this:

let c2 = System.Collections.Generic.Comparer<_>.Default
c2.Compare(nan,8.) // -1

@latkin
Copy link
Contributor

latkin commented Apr 18, 2015

Yes, as I mentioned in my earlier comment, root cause is that GenericCompareFast does not treat nan specially, which is IMO a bug (though maybe intentional as this is how .NET 3.5-era sorts did it?). Implementation of the various sorting functions is correct, as far as I can tell.

Current expected behavior when comparing nan is well-documented.

@forki
Copy link
Contributor

forki commented Apr 18, 2015

yeah seems I completely misread your comments above. I blame yesterday evening.

@forki
Copy link
Contributor

forki commented Apr 18, 2015

One question: would it be a good idea to replace the GenericComparisonFast cases for floats with calls to CompareTo? At the moment it's spitting out inline IL, but we would probably endup reimplementing CompareTo.

Another question: for the concrete part in the sorting algorithm we only need to decide if two keys are equal. maybe we could use a different method there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants