Optimize DisjointSets #36

wsliang · 2014-05-26T17:25:17Z

The disjoint-set data structure seemed a bit slow, over 3x the speed of a C++ reference:

~/disjoint > ./disjoint
0.061481
~/disjoint > ./disjoint
0.061004
~/disjoint > ./disjoint
0.061247

~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                       master=
elapsed time: 0.193454033 seconds (16006792 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                       master=
elapsed time: 0.192407743 seconds (16006856 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                       master=
elapsed time: 0.195339228 seconds (16006824 bytes allocated)

Making the type immutable results in an almost 2x speed increase, as well as eliminating essentially all of the memory allocation. This is within 1.7x C++ speed.

~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                     disjoint=
elapsed time: 0.103117607 seconds (6888 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                     disjoint=
elapsed time: 0.109216987 seconds (6888 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                     disjoint=
elapsed time: 0.103264055 seconds (6888 bytes allocated)

kmsquire · 2014-05-26T18:36:15Z

The IntWrapper is quite clever! Can you post the C++ code?

I'm inclined to merge this, but I'm wondering if @JeffBezanson (or anyone else) has any thoughts on this optimization. It would be nice if we didn't have to resort to this to get good performance.

kmsquire · 2014-05-26T18:40:53Z

src/disjoint_set.jl

@@ -57,7 +59,7 @@ function union!(s::IntDisjointSets, x::Integer, y::Integer)
                s.ranks[xroot] += 1
            end
        end
-        s.ngroups -= 1
+        @inbounds s.ngroups.val -= 1


@inbounds shouldn't be necessary here or below. Did you test it both ways?

Yes, this was something that confused me, and I was trying to figure out how it works.

Without the @inbounds macro, I get times and memory allocation like:

~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl disjoint *= elapsed time: 0.152123427 seconds (16006872 bytes allocated)

wsliang · 2014-05-26T18:52:38Z

Here's the C++ code. I compiled it with -O2.

#include <cstdio>
#include <cstdlib>
#include <ctime>
using namespace std;

const int N = 2000000;
const int T = N/2;

int parents[N];
int ranks[N];
int num_groups;

int x[T];
int y[T];

int find_root(int x)
{
    int p = parents[x];
    if (p != x)
        parents[x] = p = find_root(p);
    return p;
}

void merge(int x, int y)
{
    int xroot = find_root(x);
    int yroot = find_root(y);
    if (xroot != yroot)
    {
        int xrank = ranks[xroot];
        int yrank = ranks[yroot];

        if (xrank < yrank) parents[xroot] = yroot;
        else
        {
            parents[yroot] = xroot;
            if (xrank == yrank) ranks[xroot] += 1;
        }

        num_groups--;
    }
}

int main()
{
    for (int i = 0; i < N; i++)
    {
        parents[i] = i;
        ranks[i] = 0;
    }
    num_groups = N;

    for (int i = 0; i < T; i++)
    {
        x[i] = rand() % N;
        y[i] = rand() % N;
    }

    clock_t start = clock();
    for (int i = 0; i < T; i++)
        merge(x[i], y[i]);
    clock_t end = clock();

    printf("%lf\n", (end - start) / (double)(CLOCKS_PER_SEC));
}

kmsquire · 2014-05-26T18:53:53Z

Interestingly, when I test this, I get less memory allocation, but almost no speedup:

$ git co master
Switched to branch 'master'
$ julia bench_disjoint_set.jl 
elapsed time: 0.207052933 seconds (16006840 bytes allocated)
$ git co test
Switched to branch 'test'
$ julia bench_disjoint_set.jl 
elapsed time: 0.199057531 seconds (6888 bytes allocated)
$ julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+3201
Commit 7807302* (2014-05-24 21:21 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

What does your versioninfo() look like?

wsliang · 2014-05-26T18:57:19Z

I'm on

~ > julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+3213
Commit 631b022 (2014-05-26 00:59 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

kmsquire · 2014-05-26T19:19:16Z

Hmm. Even after recompiling, I don't get a speedup. Probably a difference in generated code for the two processors. @ihnorton, @vtjnash, @loladiro, any thoughts?

kmsquire · 2014-05-26T19:35:00Z

Gist of

require("bench_disjoint_set.jl")
@code_native union!(s, x[1], y[1])
@code_llvm union!(s, x[1], y[1])

at https://gist.github.com/kmsquire/f9263e55326d624e4821

lindahua · 2014-05-26T19:39:16Z

I still don't understand why this modification makes things faster -- even you turn IntDisjointSets into an immutable type, you have to use a mutable type IntWrapper (or Vector{Int}) to hold the number of groups.

wsliang · 2014-05-26T20:02:47Z

Yes, I agree this is suspicious.

There is something strange going on with the memory allocation. Even in the original master branch, if I just add an @inbounds macro call in front s.ngroups -= 1 on line 60 of disjoint_set.jl, I somehow eliminate all the extra memory allocation:

~/.julia/v0.3/DataStructures/src > julia ../test/bench_disjoint_set.jl                      inbounds 
elapsed time: 0.122493139 seconds (6888 bytes allocated)

Can anyone else confirm this behavior?

lindahua · 2014-05-26T20:14:21Z

I am looking into this. My feeling is that there is something subtle here. I am tweaking the code a bit and see how it fares.

lindahua · 2014-05-26T21:11:04Z

I tweaked the implementation of IntDisjointSet a little bit (see 3ffe27a).

It is now a little bit faster on my machine. I also tried your idea of turning IntDisjointSet into an immutable type using an IntWrapper (implemented in this branch.)

On my machine, the immutable + IntWrapper version is not faster than the current improved version.

wsliang · 2014-05-26T21:25:34Z

Yes, I think the idea of making the type immutable is a bit of a red herring. To me, the real mystery is why the @inbounds macro helps so much for field access of types.

If I remove the macro from line 65 of 3ffe27a, I get the old performance. Why? I was under the impression that @inbounds speeds up array accesses, but there are no arrays in the macro call on that line.

kmsquire · 2014-08-31T15:53:08Z

I'm wondering if this is still relevant?

Remove trailing whitespace

f64c5b0

wsliang changed the title ~~Disjoint~~ Oprimize DisjointSets May 26, 2014

wsliang changed the title ~~Oprimize DisjointSets~~ Optimize DisjointSets May 26, 2014

Make DisjointSets immutable

d587d45

kmsquire reviewed May 26, 2014
View reviewed changes

wsliang closed this Sep 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DisjointSets #36

Optimize DisjointSets #36

wsliang commented May 26, 2014

kmsquire commented May 26, 2014

kmsquire May 26, 2014

wsliang May 26, 2014

wsliang commented May 26, 2014

kmsquire commented May 26, 2014

wsliang commented May 26, 2014

kmsquire commented May 26, 2014

kmsquire commented May 26, 2014

lindahua commented May 26, 2014

wsliang commented May 26, 2014

lindahua commented May 26, 2014

lindahua commented May 26, 2014

wsliang commented May 26, 2014

kmsquire commented Aug 31, 2014

Optimize DisjointSets #36

Optimize DisjointSets #36

Conversation

wsliang commented May 26, 2014

kmsquire commented May 26, 2014

kmsquire May 26, 2014

Choose a reason for hiding this comment

wsliang May 26, 2014

Choose a reason for hiding this comment

wsliang commented May 26, 2014

kmsquire commented May 26, 2014

wsliang commented May 26, 2014

kmsquire commented May 26, 2014

kmsquire commented May 26, 2014

lindahua commented May 26, 2014

wsliang commented May 26, 2014

lindahua commented May 26, 2014

lindahua commented May 26, 2014

wsliang commented May 26, 2014

kmsquire commented Aug 31, 2014