Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix right candidate selection for large cluster refinement #20

Merged
merged 4 commits into from
Sep 27, 2019

Conversation

fapaul
Copy link
Contributor

@fapaul fapaul commented Sep 26, 2019

After reverse-engineering the algorithm and trying to understand the equation for computing the right candidate, we came to the conclusion the index need to be halved. Before it was causing issues due to IndexOutOfBoundsExceptions for larger clusters.

Now the equation seems to be sound, at least a quick check with Wolfram Alpha prooves that we cannot run out of bounds anymore.

@AHeise Can you give it a short review, if we haven't broken something unintentionally?

@@ -139,7 +139,7 @@ private static double scoreClustering(final byte[] partitions, final double[][]
.mapToObj(i -> {
// reverse of Gaussian
int leftIndex = (int) (Math.sqrt(i + 0.25) - 0.5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First error, it should be: int leftIndex = (int) (Math.sqrt(2 * i + 0.25) - 0.5);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation: Gaussian formula is i = n * (n + 1) / 2, with n is leftIndex
To solve: 0 = n^2 + n - 2 * i

@@ -139,7 +139,7 @@ private static double scoreClustering(final byte[] partitions, final double[][]
.mapToObj(i -> {
// reverse of Gaussian
int leftIndex = (int) (Math.sqrt(i + 0.25) - 0.5);
int rightIndex = i - getNumEdges(leftIndex) + leftIndex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second error, it should just be int rightIndex = i - getNumEdges(leftIndex);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation: leftIndex and rightIndex should take all possible values where leftIndex > rightIndex and leftIndex < numNodes and rightIndex < numNodes.
i is a random edge < numEdges
leftIndex is the largest node, such that leftIndex * (leftIndex + 1) / 2 <= i
rightIndex is then the remainder.

Examples:

i leftIndex rightIndex
0 0 0
1 1 0
2 1 1
3 2 0
4 2 1
5 2 2
6 3 0
7 3 1
8 3 2
9 3 3
10 4 0
11 4 1
12 4 2
13 4 3
14 4 4
15 5 0
16 5 1
17 5 2
18 5 3
19 5 4
20 5 5
21 6 0
22 6 1
23 6 2
24 6 3
25 6 4
26 6 5
27 6 6

@fapaul fapaul merged commit ea7e079 into master Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants