Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider integration in core and std ? #15

Closed
Urgau opened this issue May 12, 2021 · 76 comments · Fixed by rust-lang/rust#86761
Closed

Consider integration in core and std ? #15

Urgau opened this issue May 12, 2021 · 76 comments · Fixed by rust-lang/rust#86761

Comments

@Urgau
Copy link

Urgau commented May 12, 2021

Have you consider replacing the dec2flt algorithm in the standard library by your crate ?

I remember have seen several members of the Rust Libs Team encouraging people how have better algorithm (fast, efficient, ...) to integrate their crate in the std library.

Source of dec2flt

@aldanor
Copy link
Owner

aldanor commented May 12, 2021

@Urgau I remember this has been already discussed somewhere in rustc GH issues and as I was searching for it, I noticed a related question has been raised again just yesterday - see rust-lang/rust#85198 for further details.

One of concerns to keep in mind: binary size; both dec2flt and the algo used here use lookup tables, but dec2flt will be a bit leaner, I believe.

@Urgau
Copy link
Author

Urgau commented May 12, 2021

Yeah I saw this issue, it's the primary reason why I search crate that do really fast float parsing and ask you that question.

One of concerns to keep in mind: binary size; both dec2flt and the algo used here use lookup tables, but dec2flt will be a bit leaner, I believe.

I know that for certain target, like embedded object, binary size is a big deal but for vast majority of users who use Rust on computers, size is not really a big deal. I think that most people will accept +10/20 kio (just a guess not a actual number) increase for blazing performance improvement in float parsing and for size constraint environment a cfg with -Zbuild-std for disabling the fast algo and go to a slower one in exchange for binary size reduce.

If binary size is the only major concern I think you should consider making a pull request to the rust repository.

@lemire
Copy link
Contributor

lemire commented May 12, 2021

I think that what @Urgau wrote makes sense. The binary size can be important, but that's something one can tweak depending on one's needs.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented May 12, 2021

@aldanor I'm working on gradual integrations of faster float-parsing algorithms into dec2flt, although obviously, faster algorithms would be preferable if applicable (the benchmarks for this look really promising).

As detailed in this internals post, I'm looking to fix in the following order:

  • Fix disguised fast-path algorithms.
  • Fix the Bellerophon algorithm, which incorrectly requires arbitrary-precision arithmetic for the significant digits currently (making it very slow).
  • Implement faster, arbitrary-precision algorithms (both use iterative scaling based on powers of 2, making them very slow).

My goal is gradual changes, since obviously, it's much easier to justify these changes with minimal differences to binary size but major overall performance improvements. Especially since @hanna-kruppe, the author of the float parsing algorithms, has retired from Rust development, meaning people reviewing these changes are less familiar with the codebase or float-parsing algorithms in general.

Any feedback however would be greatly appreciated.

EDIT In conversations a few years ago with @hanna-kruppe, it was very clear that both binary size was important, and there was at the time a tepid response to moving away from Big32x40 (which can only hold ~385 digits, which is insufficient for a lot of floating-point cases).

@Alexhuszagh
Copy link
Contributor

I should add, I've got a working repository for implementing these changes into Rust's core library here, and the different branches provide improvements to different components of dec2flt.

  • core: A minimal, extracted version identical to Rust's dec2flt implementation.
  • disguised: Handle disguised fast-path cases for better performance.
  • infer: Reduce binary sizes by inferring the binary exponent, rather than explicitly store them (no performance penalty).
  • moderate: Fix major performance penalties in the Bellerophon algorithm.

I'll be working on slow-path algorithms later, and my goal is a quick but comprehensive fix of dec2flt in a way that should minimize the number of changes made initially, which hopefully should ease the burden of integrating these changes. I'm wholly amenable to making small, incremental changes and then having all improvements replaced by integration of this library, and if I can help in any way, let me know.

@lemire
Copy link
Contributor

lemire commented May 12, 2021

@Alexhuszagh

One point to consider is that the current library can effectively act as a fast path. That is, there is no need to throw away anything but maybe an existing fast path. So it can be non-invasive (and also, easily made optional).

In the Go runtime, the equivalent of the code in this repo. is running. All they did was to remove an existing fast path and replace it with a new one.

It is also relatively thin. Here is the gist of the code...

pub fn compute_float<F: Float>(q: i64, mut w: u64) -> AdjustedMantissa {
    let am_zero = AdjustedMantissa::zero_pow2(0);
    let am_inf = AdjustedMantissa::zero_pow2(F::INFINITE_POWER);
    let am_error = AdjustedMantissa::zero_pow2(-1);

    if w == 0 || q < F::SMALLEST_POWER_OF_TEN as i64 {
        return am_zero;
    } else if q > F::LARGEST_POWER_OF_TEN as i64 {
        return am_inf;
    }
    let lz = w.leading_zeros();
    w <<= lz;
    let (lo, hi) = compute_product_approx(q, w, F::MANTISSA_EXPLICIT_BITS + 3);
    if lo == 0xFFFF_FFFF_FFFF_FFFF {
        let inside_safe_exponent = (q >= -27) && (q <= 55);
        if !inside_safe_exponent {
            return am_error;
        }
    }
    let upperbit = (hi >> 63) as i32;
    let mut mantissa = hi >> (upperbit + 64 - F::MANTISSA_EXPLICIT_BITS as i32 - 3);
    let mut power2 = power(q as i32) + upperbit - lz as i32 - F::MINIMUM_EXPONENT;
    if power2 <= 0 {
        if -power2 + 1 >= 64 {
            return am_zero;
        }
        mantissa >>= -power2 + 1;
        mantissa += mantissa & 1;
        mantissa >>= 1;
        power2 = (mantissa >= (1_u64 << F::MANTISSA_EXPLICIT_BITS)) as i32;
        return AdjustedMantissa { mantissa, power2 };
    }
    if lo <= 1
        && q >= F::MIN_EXPONENT_ROUND_TO_EVEN as i64
        && q <= F::MAX_EXPONENT_ROUND_TO_EVEN as i64
        && mantissa & 3 == 1
        && (mantissa << (upperbit + 64 - F::MANTISSA_EXPLICIT_BITS as i32 - 3)) == hi
    {
        mantissa &= !1_u64;
    }
    mantissa += mantissa & 1;
    mantissa >>= 1;
    if mantissa >= (2_u64 << F::MANTISSA_EXPLICIT_BITS) {
        mantissa = 1_u64 << F::MANTISSA_EXPLICIT_BITS;
        power2 += 1;
    }
    mantissa &= !(1_u64 << F::MANTISSA_EXPLICIT_BITS);
    if power2 >= F::INFINITE_POWER {
        return am_inf;
    }
    AdjustedMantissa { mantissa, power2 }
}

@Alexhuszagh
Copy link
Contributor

@lemire I've mostly been using the term "moderate" path for that, to disambiguate from the term "fast path" used in dec2flt. Since the above uses an extended-precision representation of the significant digits of the float, it would replace the Bellerophon algorithm, which is used if fast_path cannot be used. Rust's dec2flt has a function named fast_path which only works in cases where both the significant digits and the exponent can exactly be represented as native floats (see below). In my opinion, the whole thing should be thrown out and replaced with the above code sample, but I'm working on providing minimal changes with major performance tweaks in the meantime.

https://github.com/rust-lang/rust/blob/28e2b29b8952485679367cc05699fb5154f4e5c3/library/core/src/num/dec2flt/algorithm.rs#L109-L144

@Alexhuszagh
Copy link
Contributor

Oh I should probably add: Rust's dec2flt doesn't parse the first 19-20 significant digits of the float except in the fast path, which is a major performance issue and trivial to fix (and then your algorithm could then be implemented into Rust's core library). The fast path just sees if there is less than 16-digits in the float's integral and fractional components, and then checks if these significant digits can be exactly represented as a native float.

If not, it defaults to creating a big integer from the significant digits, and then calls it's unoptimized Bellerophon algorithm, which can lead to extreme performance hits for no good reason.

In addition, the subsequent algorithm_r ideally prefers a value of z0 that is within 1 ULP of the correct value. This is trivial to implement, using your above algorithm, but would of course have to be modified for that purpose.

@lemire
Copy link
Contributor

lemire commented May 12, 2021

@Alexhuszagh

Right.

You have Clinger's fast path which is very cheap and handles things like (small) integers and so forth.

Otherwise, what you want to do is to grab, say, the first 19 digits. Most of the time, numbers won't have more than 17 digits. If you have no more than 19 digits, then the code above is good enough 99.99999% of the time, and even if you have more than 19 digits, the truncation is good enough almost all of the time. See section 11... https://arxiv.org/pdf/2101.11408.pdf

So it should be uncommon that you ever need to materialize a big integer.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented May 12, 2021

Yep. My modifications keep Clinger's algorithm (since I don't believe anyone currently on the Rust core team has worked on dec2flt before), but moves significant digit parsing outside of the fast path, and adds error calculation for truncated values, and only after that do we use their big-integer algorithms (which have major issues in nearly every facet, but that's a different story).

Ideally, we'd remove it, and replace it with your code, but I'm trying minor, incremental improvements at first. Ideally, my goals right now are:

  1. Fix disguised fast path cases.
  2. Improve the Bellerophon algorithm, to handle subnormal floats and avoid needlessly using big-integers.
  3. Fix the glacially slow algorithm_m and algorithm_r with easy-to-implement slow-path algorithms.
  4. Increase the size of the big-integer storage, so it can handle more than 385 digits (since we need up to 767, which means a lot of floats cannot be parsed by Rust).
  5. Remove the last remaining Clinger algorithm, Bellerophon, and replace it with the Eisel-Lemire algorithm based on the code above.

@Urgau
Copy link
Author

Urgau commented May 12, 2021

Ideally, we'd remove it, and replace it with your code, but I'm trying minor, incremental improvements at first. Ideally, my goals right now are:

1. Fix disguised fast path cases.

2. Improve the Bellerophon algorithm, to handle subnormal floats and avoid needlessly using big-integers.

3. Fix the glacially slow `algorithm_m` and `algorithm_r` with easy-to-implement slow-path algorithms.

4. Increase the size of the big-integer storage, so it can handle more than 385 digits
   (since we need up to 767, which means a lot of floats cannot be parsed by Rust).

5. Remove the last remaining Clinger algorithm, Bellerophon, and replace it with the Eisel-Lemire algorithm
   based on the code above.

From an outside point of view: Wouldn't be it simpler and faster to simply replace the actual rust code with the code of fast-float ?

As I see things, there is actual many faults in the rust std implementation, and you are proposing to do a 5 steps improvement (but not a total fix ?). I think that instead of trying to patch the std rust code to ideally be perfect we should take the fastest road and completely replace the actual suboptimal and broken (apparently) by a working one. In this case fast-float or another crate if it fits better the need of the std library.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented May 12, 2021

@Urgau Sure, if the maintainers are up for that. There's a few issues mentioned though, and that is:

  1. The original implementer is no longer working on Rust, so no one is familiar with the codebase I believe.
  2. There's a few assumptions internally that need to be addressed prior to making these changes.
  3. The changes required here are relatively trivial, and therefore easy to justify, prior to making larger changes.

If you can justify the changes, that would be ideal. I'm going to test the binary sizes, but this crate should be pretty lean (likely leaner than core), actually. Also, sure it's not a total fix, but it's an improvement of like 10,000x for some floats, which is not insignificant.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented May 12, 2021

Ok I've run the numbers and the binary sizes are practically identical, albeit slightly smaller.

fast-float

opt-level size size(stripped)
0 3.3M 324K
1 3.3M 300K
2 1.3M 248K
3 1.3M 252K
s 1.3M 244K
z 1.3M 248K

core

opt-level size size(stripped)
0 3.6M 360K
1 3.5M 316K
2 1.3M 236K
3 1.3M 248K
s 1.3M 244K
z 1.3M 248K

If you can get the core team to approve of it, it would be a great addition to the core library. I'm still going to try to get incremental improvements implemented, for the reasons above (IMO, it's easier to justify), but ideally, this should be the end goal.

You would likely, however, have to use the [bignum[(https://github.com/rust-lang/rust/blob/master/library/core/src/num/bignum.rs) implementation for the Decimal struct.

As an FYI: I would recommend forking my dec2flt fork of Rust's num module if you're going to do this, since it allows you to implement these changes within the context of the Rust core library without having to build Rust from scratch every time. It's a minimal fork of the dec2flt library, in the context of the Rust core library, and should make it quick and easy to make changes, diffs, and then incorporate them back into the core library.

@Urgau
Copy link
Author

Urgau commented May 13, 2021

@Urgau Sure, if the maintainers are up for that.

Posted an question on the zulip stream for T-Libs.

Ok I've run the numbers and the binary sizes are practically identical, albeit slightly smaller.

That great news.

As an FYI: I would recommend forking my dec2flt fork of Rust's num module if you're going to do this, since it allows you to implement these changes within the context of the Rust core library without having to build Rust from scratch every time. It's a minimal fork of the dec2flt library, in the context of the Rust core library, and should make it quick and easy to make changes, diffs, and then incorporate them back into the core library.

Thanks for the tips but I'm not sure I'm qualified to try doing the integration in std. In fact I barely understand how float are store internally. I will read the academic paper but I'm not sure it will be sufficient.

@Urgau
Copy link
Author

Urgau commented May 13, 2021

@Alexhuszagh I have receive an response from Josh Triplett (who is co-lead of the Rust Libs Team). He said here:

[..] this seems like a great idea to me [...]

So if you want try to do the pull request replacing the dec2flt with the fast-float code, go for it.

There is however an legal concern also raised by Josh.

fast-float-rust says it's directly based on the C++ library. Is the Rust implementation derived from the C++ code or the algorithm in the paper? Asking because the C++ code is just Apache-2.0, but the Rust code says MIT/Apache-2.0.

If the Rust code were a derived work of the C++, it wouldn't be able to add the MIT dual license. (And the C++ code has had third-party contributions, so it isn't just the original author who would need to grant permission for that.)

@aldanor @lemire Would you mind clarifying for Josh Triplett the situation here or directly on zulip concerning the licenses.

Also I have being acting like your were okay for your implementation to be part of the rust standard library without never asking you if you were okay for that. So do you agree for your implementation to be part of the rust standard library ?

@lemire
Copy link
Contributor

lemire commented May 13, 2021

I have zero ownership of anything. I am just an interested fellow.

@lemire
Copy link
Contributor

lemire commented May 13, 2021

If anyone wants to chat about the algorithm, I am available (via Zoom). I did not write a single code of Rust for this project though. I am just around to help.

@Alexhuszagh
Copy link
Contributor

Thanks for the tips but I'm not sure I'm qualified to try doing the integration in std. In fact I barely understand how float are store internally. I will read the academic paper but I'm not sure it will be sufficient.

I was more so responding to the author of the crate in this section, sorry if that wasn't clear.

@aldanor
Copy link
Owner

aldanor commented May 13, 2021

I'm also "just an interested fellow", in a way - as Daniel is the true owner of the algorithm, and I was porting the C++ implementation to Rust (with some minor implementational changes and updates here and there) while also having the paper by my side as a reference.

If there are any license concerns - we can sure resolve them; I'm pretty sure all of us are interested first and foremost in this to succeed. We can either make the original library Apache2/MIT (which would require all contributors to sign off on that if I understand it correctly), or make this crate Apache2-only. Not a big deal.

If Josh and the rest of the core team are happy to go ahead with this - should we give it a go? I'd be happy to have a separate zoom/zulip/whatever discussion if needed. Never had experience of contributing anything major to rustc myself, but as long as there's some guidance re: what needs to go where, I think I can try.

@Alexhuszagh
Copy link
Contributor

@aldanor Rust is dual-licensed under both MIT/Apache-2.0, so I believe keeping the original license intact. I'm more than happy to have a Zoom call, if needed, or chat on Zulip. I've had a failed attempt (personal reasons forced me to take a step back) previously at integrating changes into the core library, but I'm tangentially familiar with the process. I would be more than happy to contribute any help I can.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented May 13, 2021

Looking carefully at the implementation, there's likely a few things that would additionally need to be done to integrate this into Rust core.

  1. Validate digits occur after the exponent symbol.
  2. Ensure there are <a href=https://github.com/aldanor/fast-float-rust/blob/4680af7666df5b341c6bc46ab7a112ce1adea615/src/number.rs#L122-L166">significant digits at all. Rust does not allow floats of the format e10, .e10, \, +, or .. You currently only handle + and \.
  3. Case-sensitive matches on special values.

A simple playground of all these examples is here.

@Urgau
Copy link
Author

Urgau commented May 13, 2021

Looking carefully at the implementation, there's likely a few things that would additionally need to be done to integrate this into Rust core.

  1. Validate digits occur after the exponent symbol.
  2. Ensure there are significant digits at all. Rust does not allow floats of the format e10, .e10, \, +, or .. You currently only handle + and \.

These doesn't seems to be allowed by the IEEE 754 standard either.

  1. Case-sensitive matches on special values.

This was a bug on the implementation, that have been fix on nightly some weeks ago. See here and also here.

@Alexhuszagh
Copy link
Contributor

Ah cool, then only 1). and 2). are necessary.

@aldanor
Copy link
Owner

aldanor commented May 13, 2021

Hmm..

fn main() {
    for s in &[
        "e10", "1.23", "1.23e", "1.", ".1", "1.23e5", ".", ".e1", "0e", "0e1",
    ] {
        println!("{}", s);
        println!("  std: {:?}", s.parse::<f64>());
        println!("  ff:  {:?}", fast_float::parse::<f64, _>(s));
    }
}

yields (as of the current version)

e10
  std: Err(ParseFloatError { kind: Invalid })
  ff:  Err(Error)
1.23
  std: Ok(1.23)
  ff:  Ok(1.23)
1.23e
  std: Err(ParseFloatError { kind: Invalid })
  ff:  Err(Error)
1.
  std: Ok(1.0)
  ff:  Ok(1.0)
.1
  std: Ok(0.1)
  ff:  Ok(0.1)
1.23e5
  std: Ok(123000.0)
  ff:  Ok(123000.0)
.
  std: Err(ParseFloatError { kind: Invalid })
  ff:  Err(Error)
.e1
  std: Err(ParseFloatError { kind: Invalid })
  ff:  Err(Error)
0e
  std: Err(ParseFloatError { kind: Invalid })
  ff:  Err(Error)
0e1
  std: Ok(0.0)
  ff:  Ok(0.0)

@Alexhuszagh
Copy link
Contributor

Nevermind, I missed this: *s = start. My fault. So then we look good to go.

@Urgau
Copy link
Author

Urgau commented May 14, 2021

@aldanor I have discuss with Josh and the only way forward with license is to have fast_float be dual license under MIT/APACHE.

I have open a pull request in the fast_float repository to change the license.

If Josh and the rest of the core team are happy to go ahead with this - should we give it a go? I'd be happy to have a separate zoom/zulip/whatever discussion if needed. Never had experience of contributing anything major to rustc myself, but as long as there's some guidance re: what needs to go where, I think I can try.

Yep, you should give it a try. You should also watch the rustc-dev-guide, there are some very useful information about compiling rustc (shouldn't be necessary) and the standard library.

The files that are relevant for you are under the library/core/src/num/dev2flt directories. From what I can see all of the files except mod.rs who needs to be modify can be deleted/replaced with your fast_float port.

@lemire
Copy link
Contributor

lemire commented May 14, 2021

+1

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented May 14, 2021

Looking at the commits for fast-float, only @biojppm and @kitaisreal made commits prior to this port being made, and so would likely be the only ones who would need to agree to re-license their code so an (older) version of fast-float could be dual licensed, and therefore this crate dual-licensed.

We could ask permission from both of them. kitaisreal's contributions are trivial, so we could likely remove those changes if licensing becomes a major issue.

EDIT: Their name is "kitaisreal".

@lemire
Copy link
Contributor

lemire commented May 14, 2021

I have emailed @biojppm

@Alexhuszagh
Copy link
Contributor

I've got on my own fork all the data tests from Rust's core library, and I'll be running them tonight on a little-endian (32-bit) and big-endian (32-bit) architecture, since those are the only cases that haven't yet been comprehensively tested IIRC. None of the logic should affect these cases, but it's nice to be extra sure.

I'll be running this branch with all the unittests within for the following two architectures:

  • armv7-unknown-linux-gnueabihf (32-bit, LE)
  • powerpc-unknown-linux-gnu (32-bit, BE)

This is going to take a while, since these tests basically trigger the slow algorithm every time, but it shouldn't be too bad.

@Alexhuszagh
Copy link
Contributor

This should all be done as of Alexhuszagh/rust@24f5414. Let me know if all the changes seem satisfactory, and I'll submit the PR today.

@lemire
Copy link
Contributor

lemire commented Jun 30, 2021

@Alexhuszagh Very nice.

@Alexhuszagh
Copy link
Contributor

A PR has been submitted.

@Urgau
Copy link
Author

Urgau commented Jul 2, 2021

@Alexhuszagh @lemire I saw on zulip this PR rust-lang/rust#31407 about valid float being treated as invalid. Will the new algorithm fix this ?

@lemire
Copy link
Contributor

lemire commented Jul 2, 2021

@Urgau Nice point.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented Jul 2, 2021

@Urgau That is actually addressed in the PR, in the introduction. Not the issue specifically, but the general idea:

Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the Big32x40, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.

// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>();   // Err(ParseFloatError { kind: Invalid })

So that's absolutely part of the motivation.

@Alexhuszagh
Copy link
Contributor

Major updates on the PR:

  1. Almost all the unsafe code has been removed, without affecting performance.
  2. A lot of internal compiler error workarounds have been solved, giving even more motivation for the PR.
  3. A lot of code has been de-duplicated, reducing the PR size substantially.
  4. All comments on code logic so far seem to have been addressed, and the code seems to be simple enough to clearly audit. A lot of preconditions have been documented, and a lot of "complex" sections of the code have been extensively documented so they're easy to understand.

The only major bottleneck right now seems to be they're waiting for a computer scientist who is familiar with float-parsing to comment on the PR. I think I know an expert....

@Urgau
Copy link
Author

Urgau commented Jul 4, 2021

@Alexhuszagh This is great. Thank you for creating the PR and taking it to the end.

I just have a small suggestion, it might be nice to add/update the benchmarks comparing speed and code size between the current implementation and the PR one.

PS: I just noticed that this PR rust-lang/rust#86048 is going to have conflicts with yours, you will certainly have to be rebased.

@Alexhuszagh
Copy link
Contributor

Alexhuszagh commented Jul 4, 2021

@Alexhuszagh This is great. Thank you for creating the PR and taking it to the end.

I just have a small suggestion, it might be nice to add/update the benchmarks comparing speed and code size between the current implementation and the PR one.

I'll update that after most of the changes have been made: there's one more involving Miri that I'm finishing. I doubt the benchmarks have changed much, because I've ensured every change to ensure the safe routines don't change the resulting ASM. See this repository for all the changes. In fact, everything besides step_by is not just practically (which I've kept unsafe, since it would have significant performance penalties), but actually identical,

PS: I just noticed that this PR rust-lang/rust#86048 is going to have conflicts with yours, you will certainly have to be rebased.

Yeah, once there's conflicting changes, I'll be more than happy to rebase. However, as it's not yet merged, I'll wait. I'll likely do a rebase once the Miri issues are fixed as well.

@lemire
Copy link
Contributor

lemire commented Jul 4, 2021

@Alexhuszagh This code looks great, let me know if there is anything specific that needs to be examined.

Alexhuszagh added a commit to Alexhuszagh/rust that referenced this issue Jul 17, 2021
Implementation is based off fast-float-rust, with a few notable changes.

- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for rust-lang#31109 and rust-lang#31407 in core tests.
- Added regression tests for rust-lang#31109 and rust-lang#31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.

- resolves rust-lang#85198
- resolves rust-lang#85214
- resolves rust-lang#85234
- fixes rust-lang#31407
- fixes rust-lang#31109
- fixes rust-lang#53015
- resolves rust-lang#68396
- closes aldanor/fast-float-rust#15
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 17, 2021
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.

# Summary

Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).

Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](https://github.com/lemire/simple_fastfloat_benchmark/blob/0e2b5d163d4074cc0bde2acdaae78546d6e5c5f1/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.

In addition, [edge-cases](rust-lang#85234) in Rust's [dec2flt](https://github.com/rust-lang/rust/tree/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.

Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.

```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>();   // Err(ParseFloatError { kind: Invalid })
```

# Solution

This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.

**Documentation**

Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:

```rust
    // Round-to-even only happens for negative values of q
    // when q ≥ −4 in the 64-bit case and when q ≥ −17 in
    // the 32-bitcase.
    //
    // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
    // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
    // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
    //
    // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
    // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
    // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
    // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
    // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
    //
    // Thus we have that we only need to round ties to even when
    // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
    // (in the 32-bit case). In both cases,the power of five(5^|q|)
    // fits in a 64-bit word.
    const MIN_EXPONENT_ROUND_TO_EVEN: i32;
    const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```

This ensures maintainability of the code base.

**Improvements for Disguised Fast-Path Cases**

The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.

However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in rust-lang#85198.

**Digit Parsing Improvements**

Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.

**Unsafe Changes**

Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.

```rust
impl AsciiStr {
    #[inline]
    pub fn step_by(&mut self, n: usize) -> &mut Self {
        unsafe { self.ptr = self.ptr.add(n) };
        self
    }
}

...

#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    // the first character is 'e'/'E' and scientific mode is enabled
    let start = *s;
    s.step();
    ...
}
```

The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.

```rust
impl AsciiStr {
    /// Advance the view by n, advancing it in-place to (n..).
    pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
        // SAFETY: same as step_by, safe as long n is less than the buffer length
        self.ptr = unsafe { self.ptr.add(n) };
        self
    }
}

...

/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    let start = *s;
    // SAFETY: the first character is 'e'/'E' and scientific mode is enabled
    unsafe {
        s.step();
    }
    ...
}
```

This allows us to trivially demonstrate the new implementation of dec2flt is safe.

**Inline Annotations Have Been Removed**

In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](aldanor/fast-float-rust#15 (comment)).

**Fixed Correctness Tests**

Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.

**Undefined Behavior**

An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:

```rust
    #[inline]
    pub fn check_len(&self, n: usize) -> bool {
        unsafe { self.ptr.add(n) <= self.end }
    }
```

And the new implementation is as follows:

```rust
    /// Check if the slice at least `n` length.
    fn check_len(&self, n: usize) -> bool {
        n <= self.as_ref().len()
    }
```

Note that this has since been fixed in [fast-float-rust](aldanor/fast-float-rust#29).

**Inferring Binary Exponents**

Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](https://github.com/rust-lang/rust/blob/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt/table.rs#L8).

# Code Size

The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.

**new**

Using rustc version 1.55.0-dev.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K

**old**

Using rustc version 1.53.0-nightly.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K

# Correctness

The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham  [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.

# Issues Addressed

This will fix and close the following issues:

- resolves rust-lang#85198
- resolves rust-lang#85214
- resolves rust-lang#85234
- fixes rust-lang#31407
- fixes rust-lang#31109
- fixes rust-lang#53015
- resolves rust-lang#68396
- closes aldanor/fast-float-rust#15
@Urgau
Copy link
Author

Urgau commented Jul 17, 2021

The pull-request for integrating the algorthim in the rust standard library has been merged 🎉.

Thank you @Alexhuszagh for having created and brought this pull-request to the end.
Thank's also to you @lemire and @aldanor for your help.

@Urgau Urgau closed this as completed Jul 17, 2021
@lemire
Copy link
Contributor

lemire commented Jul 17, 2021

@Urgau @Alexhuszagh Great work. One fun challenge would be to re-use the same tables for serialization. It is on my todo to investigate this option.

@Alexhuszagh
Copy link
Contributor

@lemire Thanks for all the hard work, and helpful feedback. As of right now... 🎉🎉🎉 :crab rave:

@lemire
Copy link
Contributor

lemire commented Jul 18, 2021

@Alexhuszagh Do not hesitate to get in touch if you'd like to continue this work and you feel I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants