WIP: fix slowness of randn introduced by 84b6aec6 #8941

ViralBShah · 2014-11-08T15:45:34Z

@ViralBShah Great! I remember having tested inlined rand() at some point (before #8854 I think, and without other @inlines of this commit) but noticing degraded performances somewhere else (I think for the array version rand(n::Int)). What I find now is that this commit slows down randn[!] methods by up to 50%, do you confirm?

I just pushed a branch where randmtzig_randn uses a non-inlined version of rand which seem to restore previous perfs of randn, but this starts to make me feel uneasy with all those @inline tweaks. Should I open a PR? (In this branch, I also revert/modify part of this commit because I think rand_close_open is dSFMT specific so I prefer to keep rand(r::AbstractRNG) = rand(r, Float64) which I find more self-documenting, do you agree?).

ViralBShah · 2014-11-08T15:47:17Z

Do you think that if we simplify the fast code path in randn and @inline that while moving all the slow code paths to a different function, we will get good performance for randn? I was trying that, but didn't quite finish the exploration.

Cc: @andreasnoack @vtjnash for the inlining sensitivity.

vtjnash · 2014-11-08T16:04:25Z

I don't know of any reason that inlining should be making something slower. Although, in general, I don't recommend scattering @inbounds everywhere (it shouldn't be needed)

rfourquet · 2014-11-08T16:12:47Z

I will try to see if randn performances can be improved when I get back to my fast computer within 2 weeks.

ViralBShah · 2014-11-09T04:51:27Z

BTW, I am comparing the randn performance on master vs. 0.3.2, and I am only seeing at most a 10% difference.

rfourquet · 2014-11-09T11:29:58Z

I compared master vs. master with 84b6aec reverted (or this PR). On my machine, the scalar (resp. array) version is about 55% (resp. 35%) slower on master, tested with those functions:

function scalN(n)
    m = MersenneTwister()
    tic()
    for i in 1:n
        randn(m)
    end
    toc()
end

function arrN(n, r=1)
    a = Array(Float64, n)
    m = MersenneTwister()
    tic()
    for i in 1:r
        randn!(m, a)
    end
    toc()
end

With the equivalent functions working with the global RNG, the timings are slightly different but of the same order.

rfourquet · 2014-11-09T16:16:15Z

While preparing another PR, I found another fix for the slowdown: get the random values from the MersenneTwister.vals field starting from the end of the array. If this fix is prefered, I can include it in this upcoming PR (note that this implies that many hardcoded values in test/random.jl have to be modified accordingly).

ViralBShah · 2014-11-10T04:53:15Z

Could you elaborate a little bit on what this means? I don't fully understand the downsides of reading from end of array, and why it may be faster. I had only tested the global RNG, where I found timing to be the same, which is the most common case. Good to know that we both have the same results there.

Focussing on the rand(m) case now, it would be good to understand why the inlining is slow and the non-inlining version is faster. @vtjnash can perhaps help us here. Also, let's remove as many of the @inbounds and @inlines as we can to get it to the minimal, before we make more changes.

Modifying hardcoded values in test/random.jl is perfectly ok.

rfourquet · 2014-11-10T08:43:28Z

Oh sorry I was unclear: the timings using the following functions are almost the same than with those I posted above using explicitly an RNG, which means that even with the global RNG there is a slowdown of about 30% and 50% (array/scalar versions) fixed by this PR.

function scalNG(n)
    tic()
    for i in 1:n
        randn()
    end
    toc()
end

function arrNG(n, r=1)
    a = Array(Float64, n)
    tic()
    for i in 1:r
        randn!(a)
    end
    toc()
end

WRT the other fix I mentioned, I really don't understand what happens, and it is a very fragile solution too. Also, I found that this fix could decrease slightly performances of other functions. I'will post later a small patch which can be discussed.

rfourquet · 2014-11-10T11:28:51Z

The sus-mentioned patch is against https://github.com/JuliaLang/julia/blob/rf/rand-fast-array/base/random.jl#L29, by changing all mt_* functions:

@inline mt_avail(r::MersenneTwister) = r.idx
@inline mt_empty(r::MersenneTwister) = r.idx == 0
@inline mt_setfull!(r::MersenneTwister) = r.idx = length(r.vals)
@inline mt_setempty!(r::MersenneTwister) = r.idx = 0
@inline mt_pop!(r::MersenneTwister) = (@inbounds f = r.vals[r.idx]; r.idx -= 1; f)

WIP: fix slowness of randn introduced by 84b6aec

ViralBShah · 2014-11-10T17:35:51Z

I still don't see a change pre/post this PR in randn() or randn(m) for either scalar/array case. Could you post what timings you are seeing in all the cases?

ViralBShah · 2014-11-10T17:53:47Z

@rfourquet I just reverted this one as I was unable to see better performance after merging it. Let's verify this issue again after the other open PRs have been merged.

rfourquet · 2014-11-10T18:44:37Z

The timings I get are, for fun(n) = (scalN(n); scalNG(n); arrN(n); arrNG(n) ); fun(10^7):
On 3 days old master (64aa84d):

elapsed time: 1.153464712 seconds
elapsed time: 1.156994426 seconds
elapsed time: 1.293826568 seconds
elapsed time: 1.369713691 seconds

With this PR:

elapsed time: 0.733746595 seconds
elapsed time: 0.738688603 seconds
elapsed time: 0.927415479 seconds
elapsed time: 1.017022933 seconds

ViralBShah · 2014-11-11T04:43:21Z

With the current master on my 2.4GHz Core i5 mac, I get numbers that are orders of magnitude faster than yours.

julia> fun(n) = (scalN(n); scalNG(n); arrN(n); arrNG(n) ); fun(10^7)
elapsed time: 0.188870645 seconds
elapsed time: 0.183882291 seconds
elapsed time: 0.218336373 seconds
elapsed time: 0.216428959 seconds

rfourquet · 2014-11-11T08:53:29Z

Yes my laptop is so slow (Core 2 Duo, 1.20GHz)... But what are your numbers with this PR (12420d2)?

ViralBShah · 2014-11-11T12:23:17Z

With 12420d2 I get the following:

julia> fun(n) = (scalN(n); scalNG(n); arrN(n); arrNG(n) ); fun(10^7)
elapsed time: 0.170618011 seconds
elapsed time: 0.172811678 seconds
elapsed time: 0.179000492 seconds
elapsed time: 0.204780084 seconds

rfourquet · 2014-11-12T11:34:59Z

OK, so this PR does not seem particularly useful in this case. Do you think the larger speed differences I observe could be due to a smaller cache on my computer?

ViralBShah · 2014-11-12T15:23:50Z

Can't really tell. I think we can keep the codebase clean for now, and keep an eye out for other regressions.

@ViralBShah

All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.

@ViralBShah

All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.

ViralBShah added the performance Must go faster label Nov 8, 2014

fix slowness of randn introduced by 84b6aec

12420d2

rfourquet force-pushed the rf/randn-noinline branch from d337b38 to 12420d2 Compare November 8, 2014 16:04

ViralBShah changed the title ~~fix slowness of randn introduced by 84b6aec6~~ WIP: fix slowness of randn introduced by 84b6aec6 Nov 10, 2014

ViralBShah pushed a commit that referenced this pull request Nov 10, 2014

Merge pull request #8941 from JuliaLang/rf/randn-noinline

4b4655e

WIP: fix slowness of randn introduced by 84b6aec

ViralBShah merged commit 4b4655e into master Nov 10, 2014

ViralBShah mentioned this pull request Nov 10, 2014

Revert "WIP: fix slowness of randn introduced by 84b6aec6" #8969

Merged

ViralBShah deleted the rf/randn-noinline branch November 10, 2014 17:52

ViralBShah added the domain:randomness Random number generation and the Random stdlib label Nov 22, 2014

rfourquet added a commit that referenced this pull request Nov 24, 2014

faster randn by separating out unlikely branch in a function

616e1d7

All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.

rfourquet mentioned this pull request Nov 24, 2014

faster randn by separating out unlikely branch in a function #9132

Merged

rfourquet added a commit that referenced this pull request Nov 24, 2014

faster randn by separating out unlikely branch in a function

b99ea92

All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fix slowness of randn introduced by 84b6aec6 #8941

WIP: fix slowness of randn introduced by 84b6aec6 #8941

ViralBShah commented Nov 8, 2014

ViralBShah commented Nov 8, 2014

vtjnash commented Nov 8, 2014

rfourquet commented Nov 8, 2014

ViralBShah commented Nov 9, 2014

rfourquet commented Nov 9, 2014

rfourquet commented Nov 9, 2014

ViralBShah commented Nov 10, 2014

rfourquet commented Nov 10, 2014

rfourquet commented Nov 10, 2014

ViralBShah commented Nov 10, 2014

ViralBShah commented Nov 10, 2014

rfourquet commented Nov 10, 2014

ViralBShah commented Nov 11, 2014

rfourquet commented Nov 11, 2014

ViralBShah commented Nov 11, 2014

rfourquet commented Nov 12, 2014

ViralBShah commented Nov 12, 2014

WIP: fix slowness of randn introduced by 84b6aec6 #8941

WIP: fix slowness of randn introduced by 84b6aec6 #8941

Conversation

ViralBShah commented Nov 8, 2014

ViralBShah commented Nov 8, 2014

vtjnash commented Nov 8, 2014

rfourquet commented Nov 8, 2014

ViralBShah commented Nov 9, 2014

rfourquet commented Nov 9, 2014

rfourquet commented Nov 9, 2014

ViralBShah commented Nov 10, 2014

rfourquet commented Nov 10, 2014

rfourquet commented Nov 10, 2014

ViralBShah commented Nov 10, 2014

ViralBShah commented Nov 10, 2014

rfourquet commented Nov 10, 2014

ViralBShah commented Nov 11, 2014

rfourquet commented Nov 11, 2014

ViralBShah commented Nov 11, 2014

rfourquet commented Nov 12, 2014

ViralBShah commented Nov 12, 2014