Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DisjointSets #36

Closed
wants to merge 2 commits into from
Closed

Optimize DisjointSets #36

wants to merge 2 commits into from

Conversation

wsliang
Copy link
Contributor

@wsliang wsliang commented May 26, 2014

The disjoint-set data structure seemed a bit slow, over 3x the speed of a C++ reference:

~/disjoint > ./disjoint
0.061481
~/disjoint > ./disjoint
0.061004
~/disjoint > ./disjoint
0.061247
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                       master=
elapsed time: 0.193454033 seconds (16006792 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                       master=
elapsed time: 0.192407743 seconds (16006856 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                       master=
elapsed time: 0.195339228 seconds (16006824 bytes allocated)

Making the type immutable results in an almost 2x speed increase, as well as eliminating essentially all of the memory allocation. This is within 1.7x C++ speed.

~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                     disjoint=
elapsed time: 0.103117607 seconds (6888 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                     disjoint=
elapsed time: 0.109216987 seconds (6888 bytes allocated)
~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                     disjoint=
elapsed time: 0.103264055 seconds (6888 bytes allocated)

@wsliang wsliang changed the title Disjoint Oprimize DisjointSets May 26, 2014
@wsliang wsliang changed the title Oprimize DisjointSets Optimize DisjointSets May 26, 2014
@kmsquire
Copy link
Member

The IntWrapper is quite clever! Can you post the C++ code?

I'm inclined to merge this, but I'm wondering if @JeffBezanson (or anyone else) has any thoughts on this optimization. It would be nice if we didn't have to resort to this to get good performance.

@@ -57,7 +59,7 @@ function union!(s::IntDisjointSets, x::Integer, y::Integer)
s.ranks[xroot] += 1
end
end
s.ngroups -= 1
@inbounds s.ngroups.val -= 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inbounds shouldn't be necessary here or below. Did you test it both ways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was something that confused me, and I was trying to figure out how it works.

Without the @inbounds macro, I get times and memory allocation like:

~/.julia/v0.3/DataStructures/test > julia bench_disjoint_set.jl                    disjoint *= 
elapsed time: 0.152123427 seconds (16006872 bytes allocated)

@wsliang
Copy link
Contributor Author

wsliang commented May 26, 2014

Here's the C++ code. I compiled it with -O2.

#include <cstdio>
#include <cstdlib>
#include <ctime>
using namespace std;

const int N = 2000000;
const int T = N/2;

int parents[N];
int ranks[N];
int num_groups;

int x[T];
int y[T];

int find_root(int x)
{
    int p = parents[x];
    if (p != x)
        parents[x] = p = find_root(p);
    return p;
}

void merge(int x, int y)
{
    int xroot = find_root(x);
    int yroot = find_root(y);
    if (xroot != yroot)
    {
        int xrank = ranks[xroot];
        int yrank = ranks[yroot];

        if (xrank < yrank) parents[xroot] = yroot;
        else
        {
            parents[yroot] = xroot;
            if (xrank == yrank) ranks[xroot] += 1;
        }

        num_groups--;
    }
}

int main()
{
    for (int i = 0; i < N; i++)
    {
        parents[i] = i;
        ranks[i] = 0;
    }
    num_groups = N;

    for (int i = 0; i < T; i++)
    {
        x[i] = rand() % N;
        y[i] = rand() % N;
    }

    clock_t start = clock();
    for (int i = 0; i < T; i++)
        merge(x[i], y[i]);
    clock_t end = clock();

    printf("%lf\n", (end - start) / (double)(CLOCKS_PER_SEC));
}

@kmsquire
Copy link
Member

Interestingly, when I test this, I get less memory allocation, but almost no speedup:

$ git co master
Switched to branch 'master'
$ julia bench_disjoint_set.jl 
elapsed time: 0.207052933 seconds (16006840 bytes allocated)
$ git co test
Switched to branch 'test'
$ julia bench_disjoint_set.jl 
elapsed time: 0.199057531 seconds (6888 bytes allocated)
$ julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+3201
Commit 7807302* (2014-05-24 21:21 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

What does your versioninfo() look like?

@wsliang
Copy link
Contributor Author

wsliang commented May 26, 2014

I'm on

~ > julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+3213
Commit 631b022 (2014-05-26 00:59 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

@kmsquire
Copy link
Member

Hmm. Even after recompiling, I don't get a speedup. Probably a difference in generated code for the two processors. @ihnorton, @vtjnash, @loladiro, any thoughts?

@kmsquire
Copy link
Member

Gist of

require("bench_disjoint_set.jl")
@code_native union!(s, x[1], y[1])
@code_llvm union!(s, x[1], y[1])

at https://gist.github.com/kmsquire/f9263e55326d624e4821

@lindahua
Copy link
Contributor

I still don't understand why this modification makes things faster -- even you turn IntDisjointSets into an immutable type, you have to use a mutable type IntWrapper (or Vector{Int}) to hold the number of groups.

@wsliang
Copy link
Contributor Author

wsliang commented May 26, 2014

Yes, I agree this is suspicious.

There is something strange going on with the memory allocation. Even in the original master branch, if I just add an @inbounds macro call in front s.ngroups -= 1 on line 60 of disjoint_set.jl, I somehow eliminate all the extra memory allocation:

~/.julia/v0.3/DataStructures/src > julia ../test/bench_disjoint_set.jl                      inbounds 
elapsed time: 0.122493139 seconds (6888 bytes allocated)

Can anyone else confirm this behavior?

@lindahua
Copy link
Contributor

I am looking into this. My feeling is that there is something subtle here. I am tweaking the code a bit and see how it fares.

@lindahua
Copy link
Contributor

I tweaked the implementation of IntDisjointSet a little bit (see 3ffe27a).

It is now a little bit faster on my machine. I also tried your idea of turning IntDisjointSet into an immutable type using an IntWrapper (implemented in this branch.)

On my machine, the immutable + IntWrapper version is not faster than the current improved version.

@wsliang
Copy link
Contributor Author

wsliang commented May 26, 2014

Yes, I think the idea of making the type immutable is a bit of a red herring. To me, the real mystery is why the @inbounds macro helps so much for field access of types.

If I remove the macro from line 65 of 3ffe27a, I get the old performance. Why? I was under the impression that @inbounds speeds up array accesses, but there are no arrays in the macro call on that line.

@kmsquire
Copy link
Member

I'm wondering if this is still relevant?

@wsliang wsliang closed this Sep 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants