Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10864: [Rust] Use standard ordering for floats #8882

Closed
wants to merge 7 commits into from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Dec 9, 2020

This implements ordering using IEEE 754 ordering as mentioned in this discussion (only for sort now).

I think this simplifies NaN-handling quite a bit. Performance-wise doesn't seem to be a big difference.

#8685

@Dandandan Dandandan marked this pull request as draft December 9, 2020 18:42
@github-actions
Copy link

github-actions bot commented Dec 9, 2020

@codecov-io
Copy link

codecov-io commented Dec 9, 2020

Codecov Report

Merging #8882 (eac99bd) into master (dfef236) will decrease coverage by 0.00%.
The diff coverage is 68.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8882      +/-   ##
==========================================
- Coverage   82.60%   82.60%   -0.01%     
==========================================
  Files         204      204              
  Lines       50189    50177      -12     
==========================================
- Hits        41459    41448      -11     
+ Misses       8730     8729       -1     
Impacted Files Coverage Δ
rust/arrow/src/compute/kernels/sort.rs 93.56% <68.62%> (+0.14%) ⬆️
rust/parquet/src/encodings/encoding.rs 95.24% <0.00%> (-0.20%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfef236...eac99bd. Read the comment docs.

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 11, 2020
@nevi-me
Copy link
Contributor

nevi-me commented Jan 2, 2021

@Dandandan what more work do you still need to do on this PR?

@Dandandan Dandandan changed the title ARROW-10864: [Rust] Use standard ordering for floats [WIP] ARROW-10864: [Rust] Use standard ordering for floats Jan 2, 2021
@Dandandan
Copy link
Contributor Author

Thanks @nevi-me for remembering me. I updated and finalized the PR.

@Dandandan Dandandan marked this pull request as ready for review January 2, 2021 10:06
Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, needs rebase though :)

I've gone through the totalOrder logic, looks the same as what I've seen on the net.
I also saw this old RFC which would be useful by implementing Ord on floats, potentially giving us totalOrder in the standard library rust-lang/rfcs#1249.

@Dandandan
Copy link
Contributor Author

@nevi-me yes, that is the info I also found.

There are also nightly implementations of total_cmp, we could switch whenever it is stabilized.
https://doc.rust-lang.org/std/primitive.f64.html#method.total_cmp

@Dandandan
Copy link
Contributor Author

@nevi-me rebased, but looks like needs-rebase label isn't removed

@nevi-me nevi-me removed the needs-rebase A PR that needs to be rebased by the author label Jan 3, 2021
@jhorstmann
Copy link
Contributor

@Dandandan there is another implementation of float comparison in use for lexicographical ordering (sorting by multiple columns) in ord.rs. That would also need to be updated to be consistent.

Ideally the min/max kernels would also follow the same ordering. For the simd implementations that could be tricky because they currently delegate all types to the packed_simd lt implementation. I didn't have time to look into it in more detail yet.

In my understanding, the only difference in behaviour should be around negative NaN and that small difference shouldn't block this PR.

@nevi-me nevi-me requested a review from alamb January 4, 2021 21:44
@alamb
Copy link
Contributor

alamb commented Jan 5, 2021

I will try and look at this PR later today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code and verified we had test coverage for sorting of floating point nulls:

https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/sort.rs#L1177

I vote :shipit: Nice work @Dandandan

} else {
unreachable!("Partition by nan is only applicable to float types")
}
// sorts f32 in IEEE 754 total ordering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good here to provide a pointer to the original source implementation (e.g. https://doc.rust-lang.org/std/primitive.f64.html#method.total_cmp) and a TODO to change to use that API when is stabilized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments here: #9129

nevi-me pushed a commit that referenced this pull request Jan 9, 2021
…algorithm came from

Follow up from #8882

Closes #9129 from alamb/alamb/doc-improvement

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
alamb added a commit to apache/arrow-rs that referenced this pull request Apr 20, 2021
…algorithm came from

Follow up from apache/arrow#8882

Closes #9129 from alamb/alamb/doc-improvement

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
This implements ordering using IEEE 754 ordering as mentioned in this discussion (only for sort now).

I think this simplifies NaN-handling quite a bit. Performance-wise doesn't seem to be a big difference.

apache#8685

Closes apache#8882 from Dandandan/float_order

Authored-by: Heres, Daniel <danielheres@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
…algorithm came from

Follow up from apache#8882

Closes apache#9129 from alamb/alamb/doc-improvement

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants