Prevent allocations for FD{Int128} by freezing max_exp10(Int128) #41

NHDaly · 2018-12-04T18:42:31Z

This is a follow-up to https://github.com/JuliaMath/FixedPointDecimals.jl/pull. It seems that removing applicable wasn't sufficient for the case where T=Int128.

Before this PR, reinterpret(FD{Int128,f}, x) allocates because:

max_exp10(Int128) widens the Int128 to a BigInt, which causes allocations.
The result of max_exp10(Int128) isn't getting const-folded away, so the above widen operation occurs every time it's called at runtime.

julia> @btime reinterpret(FixedPointDecimals.FD{Int64,2}, 200)
  1.704 ns (0 allocations: 0 bytes)
>> FixedDecimal{Int64,2}(2.00)

julia> @btime reinterpret(FixedPointDecimals.FD{Int128,2}, 200)
  5.090 μs (152 allocations: 2.61 KiB)
>> FixedDecimal{Int128,2}(2.00)

I'm not entirely sure why max_exp10(Int128) isn't const-folding, but here's my guess:
Because max_exp10(::Type{Int128}) ends up more complicated (because it allocates, etc), I think it ends up not being inlined, where for other int types it is inlined. And so then, because it's not inlined, LLVM isn't able to determine that its input is a static constant, so it doesn't know it can eliminate the function entirely.

Here we can see it not being inlined:

julia> @code_typed reinterpret(FixedPointDecimals.FD{Int64,2}, 200)
>> CodeInfo(
   1 ──       nothing::Nothing                                                                                                                                  │
93 2 ┄─ %2  = φ (#1 => 1, #3 => %8)::Int128                                                                                                                     │╻  max_exp10
   │    %3  = φ (#1 => 0, #3 => %9)::Int64                                                                                                                      ││
   │    %4  = π (9223372036854775807, Int128)                                                                                                                   ││
   │    %5  = (Base.slt_int)(%2, %4)::Bool                                                                                                                      ││╻  >
   └───       goto #4 if not %5                                                                                                                                 ││
   3 ── %7  = π (10, Int128)                                                                                                                                    ││
   │    %8  = (Base.mul_int)(%2, %7)::Int128                                                                                                                    ││╻  *
   │    %9  = (Base.add_int)(%3, 1)::Int64                                                                                                                      ││╻  +
   └───       goto #2                                                                                                                                           ││
   4 ── %11 = (Base.sub_int)(%3, 1)::Int64                                                                                                                      ││╻  -
   └───       goto #5                                                                                                                                           ││
94 5 ── %13 = (Base.slt_int)(%11, 0)::Bool                                                                                                                      │╻  <
   └───       goto #7 if not %13                                                                                                                                │
   6 ──       goto #8                                                                                                                                           │
   7 ── %16 = $(Expr(:static_parameter, 2))::Const(2, false)                                                                                                    │
   └─── %17 = (Base.sle_int)(%16, %11)::Bool                                                                                                                    │╻  <=
   8 ┄─ %18 = φ (#6 => %13, #7 => %17)::Bool                                                                                                                    │
   └───       goto #10 if not %18                                                                                                                               │
95 9 ── %20 = %new(FixedDecimal{Int64,2}, i)::FixedDecimal{Int64,2}                                                                                             │
   └───       return %20                                                                                                                                        │
99 10 ─       invoke FixedPointDecimals._throw_storage_error($(Expr(:static_parameter, 2))::Int64, $(Expr(:static_parameter, 1))::Type, %11::Int64)::Union{}    │
   └───       $(Expr(:unreachable))::Union{}                                                                                                                    │
) => FixedDecimal{Int64,2}

julia> @code_typed reinterpret(FixedPointDecimals.FD{Int128,2}, 200)
>> CodeInfo(
93 1 ─ %1  = invoke FixedPointDecimals.max_exp10($(Expr(:static_parameter, 1))::Type{Int128})::Int64                                                                   │
94 │   %2  = (Base.slt_int)(%1, 0)::Bool                                                                                                                               │╻ <
   └──       goto #3 if not %2                                                                                                                                         │
   2 ─       goto #4                                                                                                                                                   │
   3 ─ %5  = $(Expr(:static_parameter, 2))::Const(2, false)                                                                                                            │
   └── %6  = (Base.sle_int)(%5, %1)::Bool                                                                                                                              │╻ <=
   4 ┄ %7  = φ (#2 => %2, #3 => %6)::Bool                                                                                                                              │
   └──       goto #6 if not %7                                                                                                                                         │
95 5 ─ %9  = (Base.sext_int)(Int128, i)::Int128                                                                                                                        │╻ rem
   │   %10 = %new(FixedDecimal{Int128,2}, %9)::FixedDecimal{Int128,2}                                                                                                  │
   └──       return %10                                                                                                                                                │
99 6 ─       invoke FixedPointDecimals._throw_storage_error($(Expr(:static_parameter, 2))::Int64, $(Expr(:static_parameter, 1))::Type, %1::Int64)::Union{}             │
   └──       $(Expr(:unreachable))::Union{}                                                                                                                            │
) => FixedDecimal{Int128,2}

And so then LLVM is able to const fold everything away, even though julia wasn't able to:

julia> @code_native reinterpret(FixedPointDecimals.FD{Int64,2}, 200)
        .section        __TEXT,__text,regular,pure_instructions
; Function reinterpret {
; Location: FixedPointDecimals.jl:93
        decl    %eax
        movl    %esi, %eax
        retl
        nopw    %cs:(%eax,%eax)
;}

(I'm not including the @code_native for Int128, because it's very long.)

Before this commit, reinterpret(FD{Int128,f}, x) allocates because: 1. max_exp10(Int128) widens the Int128 to a BigInt, which causes allocations. 2. The result of max_exp10(Int128) isn't getting const-folded away, so the above widen operation occurs every time it's called at runtime. This commit simply "freezes" the result for Int128s via a top-level `@eval` statement.

NHDaly · 2018-12-04T18:47:36Z

Actually, another solution that only just occurred to me as I was writing the above explanation, is that we can just mark max_exp10 as Base.@pure, which seems to be enough of a hint to Julia that it can optimize away the entire call, that it disappears from the typed code before it even reaches LLVM:

julia> @code_typed reinterpret(FixedPointDecimals.FD{Int128,2}, 200)
>> CodeInfo(
94 1 ─      goto #3 if not false                                                                                         │
   2 ─      nothing::Nothing                                                                                             │
   3 ─      nothing::Nothing                                                                                             │
95 │   %4 = (Base.sext_int)(Int128, i)::Int128                                                                           │╻ rem
   │   %5 = %new(FixedDecimal{Int128,2}, %4)::FixedDecimal{Int128,2}                                                     │
   └──      return %5                                                                                                    │
) => FixedDecimal{Int128,2}

julia> @code_typed reinterpret(FixedPointDecimals.FD{Int64,2}, 200)
>> CodeInfo(
94 1 ─      goto #3 if not false                                                                                              │
   2 ─      nothing::Nothing                                                                                                  │
   3 ─      nothing::Nothing                                                                                                  │
95 │   %4 = %new(FixedDecimal{Int64,2}, i)::FixedDecimal{Int64,2}                                                             │
   └──      return %4                                                                                                         │
) => FixedDecimal{Int64,2}

Here's a commit where I did that solution, and got this result: 341c931

Maybe that's a better solution? But I know that I've heard from people in that past that we should be wary of using Base.@pure, and that it's better to avoid if possible. Does anyone else have an opinion on that?

@JeffBezanson @vtjnash: This is another example of what we talked about last week. Do you have a preference for either of these two solutions in this case?

coveralls · 2018-12-04T18:49:47Z

Coverage increased (+0.01%) to 98.837% when pulling 349643c on NHDaly:eval-maxexp10-Int128 into 1768c58 on JuliaMath:master.

TotalVerb

This LGTM. The @pure solution is interesting but I'm not sure myself on when it is safe and correct first.

codecov-io · 2018-12-06T04:21:51Z

Codecov Report

Merging #41 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
+ Coverage   98.82%   98.83%   +0.01%     
==========================================
  Files           1        1              
  Lines         170      172       +2     
==========================================
+ Hits          168      170       +2     
  Misses          2        2

Impacted Files	Coverage Δ
src/FixedPointDecimals.jl	`98.83% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1768c58...349643c. Read the comment docs.

NHDaly · 2018-12-10T15:44:16Z

This LGTM. The @pure solution is interesting but I'm not sure myself on when it is safe and correct first.

Yeah makes sense. I'm not really sure either. That said, I do see that there are lots of @pure functions in this file already..

But yeah, I think this PR is good to merge as-is if you're alright with it! :) Thanks!

TotalVerb · 2018-12-14T22:38:29Z

This looks good to me, so I'll merge it. I've invited you to collaborate on this repo; let me know if you did not get the invite link.

NHDaly · 2018-12-18T18:18:25Z

This looks good to me, so I'll merge it.

:) Thanks!

I've invited you to collaborate on this repo; let me know if you did not get the invite link.

Ah, sorry!! I really appreciate that. I've just accepted it now. I'm sorry I missed it -- thanks for the reminder!

TotalVerb approved these changes Dec 4, 2018

View reviewed changes

NHDaly mentioned this pull request Dec 6, 2018

Improve performance for FD multiplication: allow LLVM to optimize away the division by a constant. #43

Merged

Improve comment for freezing max_exp10(Int128)

349643c

TotalVerb merged commit 483325a into JuliaMath:master Dec 14, 2018

NHDaly deleted the eval-maxexp10-Int128 branch December 18, 2018 18:17

ghost pushed a commit to RelationalAI-oss/FixedPointDecimals.jl that referenced this pull request Dec 19, 2018

Remove @pure from maxexp10 after merging in JuliaMath#41

a8a1361

ghost pushed a commit to RelationalAI-oss/FixedPointDecimals.jl that referenced this pull request Dec 19, 2018

Remove @pure from maxexp10 after merging in JuliaMath#41

03ae80f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent allocations for FD{Int128} by freezing max_exp10(Int128) #41

Prevent allocations for FD{Int128} by freezing max_exp10(Int128) #41

NHDaly commented Dec 4, 2018

NHDaly commented Dec 4, 2018

coveralls commented Dec 4, 2018 •

edited

Loading

TotalVerb left a comment

codecov-io commented Dec 6, 2018 •

edited

Loading

NHDaly commented Dec 10, 2018 •

edited

Loading

TotalVerb commented Dec 14, 2018

NHDaly commented Dec 18, 2018

Prevent allocations for FD{Int128} by freezing max_exp10(Int128) #41

Prevent allocations for FD{Int128} by freezing max_exp10(Int128) #41

Conversation

NHDaly commented Dec 4, 2018

NHDaly commented Dec 4, 2018

coveralls commented Dec 4, 2018 • edited Loading

TotalVerb left a comment

Choose a reason for hiding this comment

codecov-io commented Dec 6, 2018 • edited Loading

Codecov Report

NHDaly commented Dec 10, 2018 • edited Loading

TotalVerb commented Dec 14, 2018

NHDaly commented Dec 18, 2018

coveralls commented Dec 4, 2018 •

edited

Loading

codecov-io commented Dec 6, 2018 •

edited

Loading

NHDaly commented Dec 10, 2018 •

edited

Loading