Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch float printing from grisu to ryu algorithm #32799

Merged
merged 1 commit into from
Aug 28, 2019
Merged

Switch float printing from grisu to ryu algorithm #32799

merged 1 commit into from
Aug 28, 2019

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Aug 5, 2019

Because it's faster and simpler.

TODO:

  • Update the license of ryu code
  • remove grisu code (might need to chat w/ @StefanKarpinski about printf here, but I think that's the only place we were using grisu internals)
  • Currently, this prints Float16/Float32/Float64 the same all the time (like we do for Signed); personally, I'm not a huge fan of the current 1.0, 1.0f0, Float16(1.0) inconsistency, so I'd rather just treat them like signed integers and print them all the same. If people think that's too heretical, I can look into keeping the same "typed" logic

Will fix #14698 and #22137

@JeffBezanson
Copy link
Sponsor Member

so I'd rather just treat them like integers and print them all the same

Unsigned integers are printed differently by show.

@quinnj
Copy link
Member Author

quinnj commented Aug 5, 2019

Unsigned integers are printed differently by show.

Yes, I should clarify that I was referring to Signed and how Int8, Int16, Int32, and Int64 all show the same, regardless of size.

@fredrikekre fredrikekre added the domain:display and printing Aesthetics and correctness of printed representations of objects. label Aug 5, 2019
@JeffreySarnoff
Copy link
Contributor

@quinnj something in the transition is reaching out from Base and grabbing DoubleFloats.
I am sure this is not intended, but I am not familiar with the flow of control and fallbacks.
If you shine some light on what is happening and what, added into DoubleFloats, would
preclude crashing -- I am happy to do that.
https://discourse.julialang.org/t/base-printf-ini-dec-has-a-fall-thru-issue/27247

@quinnj
Copy link
Member Author

quinnj commented Aug 7, 2019

Ok, I've pushed some updates here, notably adding some enhancements to Ryu.writeshortest and Ryu.writefixed to allow better matching of how we print things in Base (specifically, embedding a bunch of the logic from this function into the float-printing algorithms themselves).

So to give an overall status update on everything going on here:

  • Switching from using grisu to ryu algorithm for float printing; the ryu code is simpler and more performant (typically in the range of 2x-3x, but in some cases more)
  • For Float64, we pretty much print things exactly how we used to, the sole difference I'm aware of is (current) 1.0e-45 vs. (ryu) 1e-45; that is, for integral scientific notation printing, ryu avoids the extra .0. Both are valid floats for any parsers out there. If people think this would be too disruptive a change, I can try to dig into the ryu code again to see where this could be added, but personally, I like the more concise notation while the e-45 still has a strong "this is a float thing" connotation
  • For Float16/Float32, this PR removes their special display, i.e. Float16(1.0) and 1.0f0 both print as 1.0; this matches what we do for Int8/Int16/Int32/Int64/BigInt printing. I personally like the consistency, while noting that the implementation code is much simpler, and the outputting is faster (though admittedly, show performance isn't necessarily the most important).

The remaining TODO item is switching over the usages of Grisu in printf.jl to use Ryu. I've looked into it a little, and part of the difficulty is that Grisu is "lower level" than Ryu, with Grisu.grisu populating just plain digits in a buffer, and returning the # of digits, the position of the decimal point, and whether the input was negative. It's then left up to the user to take the buffer of digits and do the actual printing. Ryu, on the other hand, takes a buffer, and populates everything, negative sign, digits, decimal point, scientific notation, etc. So in my mind, we have two options, copy-paste some parts of Ryu into a new Ryu.printf function that only does what the Grisu.grisu function does, populating digits and returning len/decimal point. That requires no changes in printf.jl, but has some non-ideal code duplication in Ryu. Otherwise, we adapt printf.jl to be able to use Ryu directly, probably w/o needing to do as much work (since Ryu can do more). Probably depending on what @StefanKarpinski thinks, I can take one approach or the other.

@simonbyrne
Copy link
Contributor

This looks really nice.

  • It would be nice if it was somewhat extensible (allowing digit separators, custom spacing, exponent control, etc.) I'm thinking in particular of WIP: string formatting Julep Juleps#49 (yes I know it is old...).
  • I'd also love a way to print the exact decimal value of a float.

@quinnj
Copy link
Member Author

quinnj commented Aug 13, 2019

Alright, I think this is good to go; the 32-bit and windows 64-bit CI failures look unrelated. As a quick update/recap:

  • The commit in this PR add the the base/ryu directory w/ functions Ryu.writeshortest, Ryu.writefixed and Ryu.writeexp, which correspond to the printf format options %g, %f, and %e, respectively
  • This PR doesn't remove the grisu code, since the Printf stdlib relies heavily on grisu; see Rewrite printf functionality #32859
  • This PR switches core show for Float16, Float32, and Float64to use the ryu writeshortest algorithm instead of grisu
  • The only functional change here should be dropping the Float16 and Float32 type-specific printing; e.g. Float16(1.0) => 1.0 and 1.0f6 => 1.0e6. Note that I previously considered also having 1.0e6 => 1e6, but have since change it back to printing at least one decimal digit on integral scientific notation prints
  • Performance is great! On the order of 2-5x faster across the board, here are a couple examples to whet the appetite of your merge finger:
julia> @btime Ryu.writeshortest(2.2250738585072014e-308)
  61.852 ns (2 allocations: 496 bytes)
"2.2250738585072014e-308"

julia> @btime string(2.2250738585072014e-308)
  299.455 ns (7 allocations: 352 bytes)
"2.2250738585072014e-308"

julia> @btime Ryu.writeshortest(1.0)
  46.744 ns (2 allocations: 480 bytes)
"1.0"

julia> @btime string(1.0)
  162.301 ns (4 allocations: 192 bytes)
"1.0"

julia> @btime Ryu.writeshortest(512.0e+3)
  50.171 ns (2 allocations: 480 bytes)
"512000.0"

julia> @btime string(512.0e+3)
  189.952 ns (4 allocations: 192 bytes)
"512000.0"

Note that I didn't add a NEWS entry because I'm not sure where I'd put it; we seem to have a 1.3 section, but I think 1.3 is frozen? So this would be 1.4? Not sure.

@StefanKarpinski
Copy link
Sponsor Member

1.3 is not yet frozen, you've got a couple of days. Then again, I'm not sure if we want to merge this right before 1.3 feature freeze or very early in the 1.4 release cycle.

@quinnj
Copy link
Member Author

quinnj commented Aug 13, 2019

Ah, good to know. I mean, speaking somewhat selfishly, I'd rather merge sooner than wait for the next release cycle, just because I've kind of got the momentum going here and can respond quickly to things that crop up over the release candidates. But if people feel like it's too risky, then I guess we can wait. I'm not currently planning on doing any more here or w/ Printf now that the tests are all passing (there's the one weird generate_precompile.jl thing that @KristofferC says should be fixed soon), but I am going to incorporate the code, probably into Parsers.jl that can use the code on older versions.

@StefanKarpinski
Copy link
Sponsor Member

Yeah, I'd say let's go for it and get lots of testing in the 1.3 release.

base/ryu/Ryu.jl Outdated
decchar::UInt8=UInt8('.')) where {T <: Base.IEEEFloat}
buf = Vector{UInt8}(undef, neededdigits(T))
pos = writeshortest(buf, 1, x)
return unsafe_string(pointer(buf), pos-1)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forcing an (unsafe) copy here seem unnecessary. above do _string_n(neededdigits(T)) to make the Vector, then use String(resize!(buf, pos - 1)) here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just add an @assert that pos <= length(buf)? It seems like that would actually ensure more safety (in the case pos becomes something unsafe-ly large)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that would still be slow. I'm suggesting you don't make an extra copy of the data here by sticking with the safe API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.......got it. Yeah, we can do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we actually want StringVector(n) instead of _string_n, right? The underlying functions all expect a Vector{UInt8}.

i = len - 1
while i >= 0
j = p10bits - e2
#=@inbounds=# mula, mulb, mulc = POW10_SPLIT[POW10_OFFSET[idx + 1] + i + 1]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#=@inbounds=# mula, mulb, mulc = POW10_SPLIT[POW10_OFFSET[idx + 1] + i + 1]
mula, mulb, mulc = POW10_SPLIT[POW10_OFFSET[idx + 1] + i + 1]

end
if e >= 100
c = e % 10
unsafe_copyto!(buf, pos, DIGIT_TABLE, 2 * div(e, 10) + 1, 2)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsafe_copyto!(buf, pos, DIGIT_TABLE, 2 * div(e, 10) + 1, 2)
copyto!(buf, pos, DIGIT_TABLE, 2 * div(e, 10) + 1, 2)

exp_form = true
pt = nexp + olength
prec = precision
if -4 < pt <= (precision == -1 ? 6 : precision) #&& !(pt >= olength && abs(mod(x + 0.05, 10^(pt - olength)) - 0.05) > 0.05)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if -4 < pt <= (precision == -1 ? 6 : precision) #&& !(pt >= olength && abs(mod(x + 0.05, 10^(pt - olength)) - 0.05) > 0.05)
if -4 < pt <= (precision == -1 ? 6 : precision)

const MANTISSA_MASK = 0x000fffffffffffff
const EXP_MASK = 0x00000000000007ff

memcpy(d, doff, s, soff, n) = ccall(:memcpy, Cvoid, (Ptr{UInt8}, Ptr{UInt8}, Int), d + doff - 1, s + soff - 1, n)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong method signature for memcpy and memmove—also I think this is already unsafe_copyto!

base/ryu/Ryu.jl Outdated
end
buf = Vector{UInt8}(undef, neededdigits(T))
pos = writeshortest(buf, 1, x)
GC.@preserve buf unsafe_write(io, pointer(buf), pos - 1)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GC.@preserve buf unsafe_write(io, pointer(buf), pos - 1)
write(io, resize!(bu, pos - 1))

although it also seems like write(io, writeshortest(x)) would give the same result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write(io, writeshortest(x)) would make an extra copy of the buffer though (when creating the string)

@@ -0,0 +1,362 @@
@inline function writeshortest(buf::Vector{UInt8}, pos, x::T,
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the only function which uses unsafe operations, should we add some bounds-checking for sanity?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 14, 2019

I'm not really qualified to comment on the algorithm, so I just made a few comments on other stuff.

printf.jl to be able to use Ryu directly

you could see how we handled BigFloat, which had the same problem, but I know you're already looking into this.

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to have always failed on linux32 CI?

@quinnj
Copy link
Member Author

quinnj commented Aug 14, 2019

On the 32-bit CI: both windows & linux 32-bit have seemed to be really flaky. The windows CI seems to often have some kind of error building julia dependencies, and the linux 32-bit times out during the "generate precompile statements" step. I could look further into the latter, but at first glance it didn't seem like the changes here would have caused any issues.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 14, 2019

I think you're making that up: I don't see any other time the linux32 bot run into an error like that in at least the past week, except every time it ran this PR. There were a couple configuration errors seen earlier this week so it would fail outside of the build, but those have gotten fixed.

@quinnj
Copy link
Member Author

quinnj commented Aug 14, 2019

Sorry, I didn't mean they've been flaky in general, just on this PR; I inspected the 32-bit results a few times and the errors seems different enough from run to run that it didn't seem likely that anything here was a problem.

I've since identified indeed a bug on 32-bit and pushed a fix; along with updates to address @vtjnash's review comments.

@quinnj
Copy link
Member Author

quinnj commented Aug 27, 2019

I'll merge this in 12 hours or so unless there are other comments/concerns.

@mbauman mbauman requested a review from vtjnash August 27, 2019 16:12
@StefanKarpinski StefanKarpinski dismissed vtjnash’s stale review August 27, 2019 16:37

not applicable: passes on 32-bit now

@quinnj quinnj merged commit ef75269 into master Aug 28, 2019
@delete-merged-branch delete-merged-branch bot deleted the jq/ryu branch August 28, 2019 02:25
@Keno
Copy link
Member

Keno commented Aug 28, 2019

For Float16/Float32, this PR removes their special display, i.e. Float16(1.0) and 1.0f0 both print as 1.0; this matches what we do for Int8/Int16/Int32/Int64/BigInt printing. I personally like the consistency, while noting that the implementation code is much simpler, and the outputting is faster (though admittedly, show performance isn't necessarily the most important).

I really don't think I'm ok with this (sorry for being late to the party). For T,S <: Signed (with T of larger bit size than S), we have the property that for some string representation x, parse(T, x) == T(parse(S, x)). However, in floating point this may not necessarily be true. Example:

julia> x = rand(Float16)
0.907

julia> x == 0.907
false

julia> Float64(x)
0.9072265625

@Keno Keno added the status:triage This should be discussed on a triage call label Aug 28, 2019

julia> reinterpret(Float32, UInt32[1 2 3 4 5])
1×5 reinterpret(Float32, ::Array{UInt32,2}):
1.4013e-45 2.8026e-45 4.2039e-45 5.60519e-45 7.00649e-45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here? These are not the same floating point value:

julia> 1.4013e-45 == 1.0e-45
false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore me, they are in Float32:

julia> 1.4013f-45 == 1.0f-45
true

@JeffBezanson
Copy link
Sponsor Member

I agree we should restore the type info in show.

@quinnj
Copy link
Member Author

quinnj commented Aug 28, 2019

So in my mind, this is a key difference between show and repr: show is just that, showing things, useful, pragmatic defaults. repr on the other hand has one additional requirement, the output can be input to get the same "thing". Do I have this wrong? My proposal then, would be to have repr print the Float16/1.0f0 junk, but have show be the more consistent, clean output as changed in this PR. Happy to go with whatever people want, but just wanted to give some context from where I came from w/ the change here.

@JeffBezanson
Copy link
Sponsor Member

  repr(x; context=nothing)

  Create a string from any value using the show function. You should not add methods to repr; define a
  show method instead.

@quinnj
Copy link
Member Author

quinnj commented Aug 29, 2019

@Keno, I understand the example you share of the difference between Signed and floats (w/ floats essentially adding extra precision when converted "up"), but I also don't necessarily feel that printing additional type information really solves anything, as that "issue" exists regardless. It's just a fundamental property of IEEE floats there that the up conversion adds extra precision; does printing type information help a user somehow be more aware of that? I don't think that's certain at all.

Frankly, I think

julia> Float16(1)
Float16(1.0)

julia> Float32(1)
1.0f0

julia> Float64(1)
1.0

is inconsistent and ugly. While we're making changes to float printing code internals, I think it's a good time to try and address this and come up with something better. Maybe we introduce 1.0h0 for Float16 literal syntax? Maybe we print Float32 as Float32(1.0)? I think there are a few different things we can consider, but I feel like we should at least improve the situation here.

@JeffBezanson
Copy link
Sponsor Member

The immediate problem is this:

julia> repr(Any[1.0f0, 1.0])
"Any[1.0, 1.0]"

because show == repr. We could restore the type info for show, but remove it for show with a MIME type. That way they'd all print as 1.0 on their own in the REPL.

@jmkuhn
Copy link
Contributor

jmkuhn commented Aug 29, 2019

Even before this PR we have:

julia> repr(Any[1.0f0, 1.0, big"1.0"])
"Any[1.0f0, 1.0, 1.0]"

@StefanKarpinski
Copy link
Sponsor Member

Yeah, the BigFloat printing should probably be fixed to print as big"1.0". A major issue there, of course, is that big"1.0" means different things depending on what the precision is set to.

@simonbyrne
Copy link
Contributor

What if we were to break numerical juxtaposition for just macros? We could then allow something like

1.0@big

to be parsed as

@val_big("1.0")

(and similar for @val_F16, @val_F32, etc.)

@chethega
Copy link
Contributor

I am a really big fan if the 1.0f0 printing and syntax.

I would actually prefer to extend it, i.e. print signed integers like 0i8, 14i16, 5i32 (on 64 bit systems, and just 5 on 32 bit), 7 (on 64 bit systems, and 7i64 on 64 bit) and -6i128. Unfortunately breaking for people who have variables named i8 or the like, and use juxtaposition for multiplication.

@quinnj quinnj mentioned this pull request Sep 17, 2019
@simonbyrne
Copy link
Contributor

simonbyrne commented Feb 14, 2020

microsoft/STL#498 includes a fairly extensive test suite of Ryu functionality that would be good to incorporate.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Feb 16, 2020

@Keno Keno removed the status:triage This should be discussed on a triage call label Feb 27, 2020
Keno pushed a commit that referenced this pull request Jun 5, 2024
Switch float printing from grisu to ryu algorithm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:display and printing Aesthetics and correctness of printed representations of objects.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

printf inexact for very small values when printing to high precision
10 participants