Add LLVM intrinsics for floor/ceil/trunc/abs. #8364

ArchRobison · 2014-09-15T16:26:07Z

This PR adds intrinsics for floor/ceil/trunc. For this example:

function f(b,a)
    @simd for i=1:length(a)
        @inbounds b[i]=floor(a[i])
    end
end

with LLVM 3.5 I'm seeing an improvement of about 28x on a Haswell processor for Float32 arrays. For Float64 arrays, I'm getting about 4x improvement. Even with just LLVM 3.3, which can't vectorize floor, I'm seeing about 2.8x improvement for both Float32 and Float64.

The patch improves the "vector" floor from base/math.jl by about 1.4x (Float32) or 1.8x (Float64) using LLVM 3.3.

While writing this patch, I discovered that the tests do not test whether "vector" floor, ceil, and trunc work at all. I'm happy to write such tests. I'm just not sure whether they belong in test/math.jl or test/numbers.jl. Latter is where the scalar tests for floor, ceil, and trunc are.

ViralBShah · 2014-09-15T17:56:47Z

Perhaps best to have all the tests related to the same function in the same place - in numbers.jl.

JeffBezanson · 2014-09-15T19:25:02Z

Of course it's a bit silly to have a test/math.jl; everything is math :) Could be separated into specfun.jl and a couple other things maybe.

ArchRobison · 2014-09-15T20:12:15Z

I added tests to numbers.jl. Is there a form of == that tests not only for numerical equality, but type equality? The tests that I added don't check the return type of vector trunc, round, etc.

jakebolewski · 2014-09-15T20:19:24Z

There is the @inferred macro in base/test.jl.

ArchRobison · 2014-09-15T20:48:47Z

I'd prefer to have something that I can just use in place of ==, so that I don't have have write twice as many lines of test (a line for value checking and a line for type checking). It's a feature that cuts across other tests, so is probably best not tangled with this PR. I'll leave my tests as is for now.

eschnett · 2014-09-15T22:09:22Z

Would === check type equality as well?

If you're testing floating point numbers, don't you want to use isequal instead of ==?

ArchRobison · 2014-09-15T23:12:36Z

Goldilocks would say the problem is a bear:

== is too lenient: it checks only numerical equality, not type equality.
=== is too picky: it distinguishes arrays by address.
isequal is both too picky and too lenient: it distinguishes -0 from 0, but considers Float32[0,0,0] to be equal to Float64[0,0,0])

JeffBezanson · 2014-09-16T02:47:44Z

It's not unheard-of to write something like

let ==(x,y) = isequal(x,y) && typeof(x)==typeof(y)
    @test foo() == bar()
end

ArchRobison · 2014-09-16T14:47:02Z

What's the syntax for using "Base.==" from inside the local definition of ==?

JeffBezanson · 2014-09-16T14:49:50Z

Ugh, this makes me wince, but I believe the only working syntax for that is Base.(:(==)).

ArchRobison · 2014-09-16T15:00:19Z

Now I'm wincing less at what used earlier this morning:

let ==(x,y) = !(x!=y || typeof(x)!=typeof(y))

vchuravy · 2014-09-16T15:11:46Z

I would rather alias it (or use another equivalence operator that isn't used yet)

let ≈(x, y) = ==(x,y),  ==(x, y) = isequal(x,y) && typeof(x) ≈ typeof(y)
   ...
end

ArchRobison · 2014-09-16T17:01:56Z

~~I liked the alias idea.~~ I liked using another equivalence operator. PR updated to "wiggle" out of the problem.

simonbyrne · 2014-09-18T12:05:54Z

Fantastic, thanks for doing this: while you're at it, would it be possible to also add a rint or nearbyint function? (in case you're wondering, they only differ by how they raise floating-point exceptions). At the moment, there is no other way to get round-to-even semantics, which is required to implement @printf hex-float format.

simonbyrne · 2014-09-18T12:09:50Z

Also, there is an llvm intrinsic for round as well (which was the slowest of the bunch in #5983).

ArchRobison · 2014-09-18T22:48:19Z

I've added used of LLVM 3.4 (and later) fabs intrinsic. It generates slightly better code than what Julia currently generates. Sadly, LLVM 3.4 copysign generated slightly better scalar code, but is not vectorizable by LLVM, even up to my copy of LLVM trunk. So I recommend not using the LLVM intrinsic until LLVM can vectorize it.

Per remarks for #5983, what would be the Julia interface for usingrint and nearbyint?

ArchRobison · 2014-09-18T23:12:52Z

I think I have a solution for round: use Julia instead of calling C :-)

function round(x::Float32)
    y = trunc(x)
    ifelse(x==y,y,trunc(x-y+x))
end

Please check this proof of correctness:

If x is +/- 0, infinity, or an integer, then x==y is true and the ifelse returns y.
If x is a Nan, then y is the same NaN value. The NaN propagates through the trunc(x-y+x).
Otherwise x is finite. x-y will be computed exactly and have the same fractional part as x. When we add the x-y to x, the resulting sum is exact. To see this, note that we need at most one extra bit for the carry-out of the addition, but the least-significant 1 in the fractions sum to 10 (binary), so we can toss the 0 to save a bit without loss of accuracy. If the fraction's absolute value was less than 0.5, then the sum will not cross into the next integer, otherwise it will.

With this definition of rounding, and LLVM trunc intrinsic support, I am seeing 10x improvement in an @simd loop using LLVM 3.5, and some improvement for scalar execution even with LLVM 3.3.

Of course you can cheat and do the proof by exhaustive search.

ArchRobison · 2014-09-19T01:19:23Z

Here's a version with lower latency, since the 2*x can issue while trunc(x) is running.

function round(x::Float32)
    y = trunc(x)
    ifelse(x==y,y,trunc(2*x-y))
end

The rewrite of x-y+x as 2*x-y okay because:

The computation of 2*x is exact when y!=trunc(x).
The subtraction in 2*x-y is exact, because it must have the same result as x-y+x rounded to nearest, and the proof of the previous version showed that x-y+x is computed exactly.

timholy · 2014-09-19T01:37:41Z

That's very clever.

simonbyrne · 2014-09-19T09:52:13Z

+Inf

The logic seems correct to me. The nice thing about Float32 is that you can also just check all the values:

function round(x::Float32)
    y = trunc(x)
    ifelse(x==y,y,trunc(2*x-y))
end

function check_round()
    for u in 0x0000_0000:0xffff_ffff
        x = reinterpret(Float32,u)
        isequal(round(x),Base.round(x)) || error("Invalid round: ", x)
    end
end

check_round()

which passes okay (and only takes 77 seconds on my underpowered laptop).

EDIT: Oops, I see you've done that already.

StefanKarpinski · 2014-09-19T11:50:42Z

Very nice. I do love being able to check every value. It's hard to trust anything else – proofs can be wrong!

ArchRobison · 2014-09-19T14:59:30Z

Thanks @simonbyrne for showing how to do the exhaustive proof in Julia. I has been using a C variant because I didn't know about reinterpret. For the record, the Intel compiler's code for vectorized roundf is about 1.26x faster than the Julia code on a Haswell processor that I tried, but the icc code requires copysign, which LLVM can't vectorize.

I'll update the pull request with the fast round.

eschnett · 2014-09-19T15:25:30Z

LLVM can't vectorize copysign? That's... very surprising; copysign is a rather simple function that is very easy to vectorize. One hopes that this changes soon.

JeffBezanson · 2014-09-19T15:33:16Z

I can never remember the differences between rint and nearbyint. Those names are just terrible.

JeffBezanson · 2014-09-19T15:34:31Z

test/numbers.jl

@@ -1312,6 +1312,21 @@ for x = 2^24-10:2^24+10
    @test iceil(y)      == i
 end

+# rounding vectors
+let ≈(x,y) = x==y || typeof(x)==typeof(y)


Do you mean && instead of || here?

Yes, it should be &&. The || is a mistake left over from my earlier != work-around.

jwmerrill · 2014-09-23T02:00:16Z

Aren't different rounding modes required for doing interval arithmetic correctly?

ArchRobison · 2014-09-23T14:26:04Z

Different rounding modes are required for interval arithmetic, but it requires changing the mode frequently. E.g., a+b requires a "round up +" and a "round down +" operation. Multiplication gets trickier. So lexical scoping, or explicitly writing the operations with different names, seems more appropriate than having global state control.

simonbyrne · 2014-10-01T17:58:14Z

At the risk of getting sidetracked: according to the standard, rounding mode handling doesn't have to be always dynamically scoped, there just has to be a way to set it so that it is dynamically scoped (precedence of these is also language-defined). C just happened to only implement the dynamic-mode interface, and other languages seem to have just copied that.

Add tests for vector trunc, round, floor, ceil. Add fast algorithm for round.

ArchRobison · 2014-12-09T19:17:33Z

Holiday clearance time 😃 I think this PR is in a good state to commit, and (as noted before) if ccall is improved, we can redo ceil_llvm, floor_llvm, trunc_llvm, sqrt_llvm, powi_llvm, abs_float, and copysign_float.

simonbyrne · 2014-12-09T19:21:24Z

+1

Add LLVM intrinsics for floor/ceil/trunc/abs.

simonbyrne · 2014-12-09T19:48:58Z

base/float.jl

-floor(x::Float64) = ccall((:floor, Base.libm_name), Float64, (Float64,), x)
+function round(x::Float64)
+    y = trunc(x)
+    ifelse(x==y,y,trunc(2.0*x-y))


I realise I should have said this beforehand, but won't 2.0*x overflow for large values of x?

Yes 2.0*x can overlow. But in that case, x is so large that x==trunc(x), so the other arm of the ifelse is taken.

Large values of x are already integers so x == y and y is returned. At least that's my analysis.

Ah, of course.

ViralBShah · 2014-12-09T20:40:57Z

Sweet!

ArchRobison force-pushed the adr/floor branch from 869b764 to 327f62e Compare September 15, 2014 20:09

ArchRobison force-pushed the adr/floor branch from 327f62e to 3848b4c Compare September 16, 2014 15:11

ArchRobison force-pushed the adr/floor branch from 3848b4c to b389e26 Compare September 16, 2014 15:34

ArchRobison force-pushed the adr/floor branch from b389e26 to bfc600f Compare September 18, 2014 20:58

ArchRobison force-pushed the adr/floor branch from bfc600f to 81c614a Compare September 19, 2014 15:21

JeffBezanson reviewed Sep 19, 2014
View reviewed changes

ArchRobison force-pushed the adr/floor branch from 01101d7 to 7bba8a1 Compare September 30, 2014 21:21

jiahao force-pushed the master branch 2 times, most recently from 2ef98c5 to 0388647 Compare October 5, 2014 00:57

ArchRobison force-pushed the adr/floor branch from 7bba8a1 to fcee85d Compare October 6, 2014 22:13

jiahao force-pushed the master branch from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

ArchRobison force-pushed the adr/floor branch 2 times, most recently from 96fc421 to c06a3c0 Compare October 16, 2014 18:54

simonbyrne mentioned this pull request Oct 21, 2014

round ties behaviour #8750

Closed

ArchRobison force-pushed the adr/floor branch 2 times, most recently from 4b07849 to 7e58dfa Compare October 24, 2014 14:06

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

ArchRobison force-pushed the adr/floor branch from 7e58dfa to a5429b0 Compare November 11, 2014 23:32

MikeInnes force-pushed the master branch from 5c60996 to b1c3df3 Compare November 14, 2014 17:07

simonbyrne mentioned this pull request Nov 24, 2014

itrunc -> trunc, etc, fix Int128 vs float comparisons #9133

Merged

ArchRobison force-pushed the adr/floor branch from 913d8fd to 1f26721 Compare November 25, 2014 17:19

ArchRobison force-pushed the adr/floor branch from 1f26721 to 2a30d71 Compare December 2, 2014 18:54

Add LLVM intrinsics for floor/ceil/trunc/abs.

bde8f79

Add tests for vector trunc, round, floor, ceil. Add fast algorithm for round.

ArchRobison force-pushed the adr/floor branch from 2a30d71 to bde8f79 Compare December 8, 2014 20:09

JeffBezanson added a commit that referenced this pull request Dec 9, 2014

Merge pull request #8364 from ArchRobison/adr/floor

15fafce

Add LLVM intrinsics for floor/ceil/trunc/abs.

JeffBezanson merged commit 15fafce into JuliaLang:master Dec 9, 2014

simonbyrne reviewed Dec 9, 2014
View reviewed changes

simonbyrne mentioned this pull request Dec 11, 2014

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

Closed

tkelman mentioned this pull request Jan 3, 2015

Added tests for math.jl functions #9568

Merged

hayd mentioned this pull request Jan 6, 2015

CLN move isqrt test to intfunc #9636

Merged

Add LLVM intrinsics for floor/ceil/trunc/abs. #8364

Add LLVM intrinsics for floor/ceil/trunc/abs. #8364

Conversation

ArchRobison commented Sep 15, 2014

ViralBShah commented Sep 15, 2014

JeffBezanson commented Sep 15, 2014

ArchRobison commented Sep 15, 2014

jakebolewski commented Sep 15, 2014

ArchRobison commented Sep 15, 2014

eschnett commented Sep 15, 2014

ArchRobison commented Sep 15, 2014

JeffBezanson commented Sep 16, 2014

ArchRobison commented Sep 16, 2014

JeffBezanson commented Sep 16, 2014

ArchRobison commented Sep 16, 2014

vchuravy commented Sep 16, 2014

ArchRobison commented Sep 16, 2014

simonbyrne commented Sep 18, 2014

simonbyrne commented Sep 18, 2014

ArchRobison commented Sep 18, 2014

ArchRobison commented Sep 18, 2014

ArchRobison commented Sep 19, 2014

timholy commented Sep 19, 2014

simonbyrne commented Sep 19, 2014

StefanKarpinski commented Sep 19, 2014

ArchRobison commented Sep 19, 2014

eschnett commented Sep 19, 2014

JeffBezanson commented Sep 19, 2014

JeffBezanson Sep 19, 2014

Choose a reason for hiding this comment

ArchRobison Sep 19, 2014

Choose a reason for hiding this comment

jwmerrill commented Sep 23, 2014

ArchRobison commented Sep 23, 2014

simonbyrne commented Oct 1, 2014

ArchRobison commented Dec 9, 2014

simonbyrne commented Dec 9, 2014

simonbyrne Dec 9, 2014

Choose a reason for hiding this comment

ArchRobison Dec 9, 2014

Choose a reason for hiding this comment

StefanKarpinski Dec 9, 2014

Choose a reason for hiding this comment

simonbyrne Dec 9, 2014

Choose a reason for hiding this comment

ViralBShah commented Dec 9, 2014