-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DisjointSets #36
Conversation
The I'm inclined to merge this, but I'm wondering if @JeffBezanson (or anyone else) has any thoughts on this optimization. It would be nice if we didn't have to resort to this to get good performance. |
@@ -57,7 +59,7 @@ function union!(s::IntDisjointSets, x::Integer, y::Integer) | |||
s.ranks[xroot] += 1 | |||
end | |||
end | |||
s.ngroups -= 1 | |||
@inbounds s.ngroups.val -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inbounds
shouldn't be necessary here or below. Did you test it both ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was something that confused me, and I was trying to figure out how it works.
Without the @inbounds
macro, I get times and memory allocation like:
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl disjoint *=
elapsed time: 0.152123427 seconds (16006872 bytes allocated)
Here's the C++ code. I compiled it with
|
Interestingly, when I test this, I get less memory allocation, but almost no speedup: $ git co master
Switched to branch 'master'
$ julia bench_disjoint_set.jl
elapsed time: 0.207052933 seconds (16006840 bytes allocated)
$ git co test
Switched to branch 'test'
$ julia bench_disjoint_set.jl
elapsed time: 0.199057531 seconds (6888 bytes allocated)
$ julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+3201
Commit 7807302* (2014-05-24 21:21 UTC)
Platform Info:
System: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm What does your |
I'm on
|
Gist of require("bench_disjoint_set.jl")
@code_native union!(s, x[1], y[1])
@code_llvm union!(s, x[1], y[1]) |
I still don't understand why this modification makes things faster -- even you turn |
Yes, I agree this is suspicious. There is something strange going on with the memory allocation. Even in the original
Can anyone else confirm this behavior? |
I am looking into this. My feeling is that there is something subtle here. I am tweaking the code a bit and see how it fares. |
I tweaked the implementation of It is now a little bit faster on my machine. I also tried your idea of turning On my machine, the immutable + IntWrapper version is not faster than the current improved version. |
Yes, I think the idea of making the type If I remove the macro from line 65 of 3ffe27a, I get the old performance. Why? I was under the impression that |
I'm wondering if this is still relevant? |
The disjoint-set data structure seemed a bit slow, over 3x the speed of a C++ reference:
Making the type immutable results in an almost 2x speed increase, as well as eliminating essentially all of the memory allocation. This is within 1.7x C++ speed.