Skip to content

Conversation

Markoy8
Copy link

@Markoy8 Markoy8 commented Oct 4, 2018

Added a class extending the existing AffinityPropagation.java class.

Its implementation upgrades the previous one in terms of computation efficiency.
The two core functions are reimplemented. Computational complexity reduced from n^3 to n^2 for both the functions.

Moreover it adds the computation of the median of similarities if the "selfDivergence" parameter is not specified in input by the client.

The new implementation could substitute the existing one. It obtains exactly the same results, in a much shorter computation time. (With a local test dataset it is 28 times faster)

Its implementation upgrage the previous one in terms of computation efficiency.
The two core functions are reimplemented. Computational complexity from n^3 to n^2

Moreover it adds the computation of the median of similarities if the preference parameter is not specified in input by the client.

The new implementation could substitute the existing one.
Its implementation upgrage the previous one in terms of computation efficiency.
The two core functions are reimplemented. Computational complexity from n^3 to n^2

Moreover it adds the computation of the median of similarities if the preference parameter is not specified in input by the client.

The new implementation could substitute the existing one.
@@ -0,0 +1,261 @@
package gov.sandia.cognition.learning.algorithm.clustering;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.... Whitespace is different with this file versus all the other files in the Foundry. (This has 2-space tabs generally, while the rest of the Foundry uses 4-space tabs.) Could you please fix this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to have done

return true; }

private void fillSimilarityDiagonal() {
// Set the self similarity based on the self divergence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this comment be the function javadoc instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for sure.

similarities[i][i] = -getSelfDivergence();
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the divergenciesArray (should be divergencesArray?) is a complex enough (but cool!) trick that it's worth j function Javadoc explaining its internal referencing specifically. It appears to be a 1-d vector of an upper-triangular 2-d array, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this point. I simply extended the self divergence hanling, set to the median of all the divergencies, when it's not specified by the client. What do i have to explain in a more specific way? Thank you.

* Updates the responsibilities matrix using the similarity values and the
* current availability values.
*/
@Override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Succinct and cool. Well done!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

columnArray.sum += columnArray.array[i];
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make these classes "private static", unless you think a deriving class would want access to them, then "protected static".

I'm happy for this next part to be a discussion instead of a "change this".

These two classes are essentially just data stores (C-like structs) instead of full classes. As they're completely contained in this class, they aren't part of the API for the Foundry. However, would it make sense to move some of the logic around adding new values from the various helper functions above into these classes themselves?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are just data stores, but really currently i don't know the code structure of the project and i wouldn't know where put them. In order to make "private static" the class, indeed, previously i should move the ColumnArray and MaxResult classes. If you could provide me a suggestion where to put these new structures, i would be happy to do that.
If i understand well, you would move some logic , handling this classes, in the classe themselves, am i right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can have them stay as internal static classes here. I think Jeremy's comment is you can put something like the computeMax in the MaxResult class.

@@ -0,0 +1,261 @@
package gov.sandia.cognition.learning.algorithm.clustering;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your email to me you said this optimized class gives identical results to the original class. Can you please update the tests for the original class so that this optimized class is tested in the exact same ways? That way, if a future coder changes something in this class, we can ensure there are no unexpected regressions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for sure. I added a new Test, named OptimizedAffinityPropagationTest, testing the class OptimizedAffinityPropagation exactly in the way the test AffinityPropagationTest tested the class AffinityPropagation. Maybe i should add a new test method in order to test the case in which the user does not specify the self divergence in input. Could be useful for you?

added new test for the new class

added Java doc

Fixed some requests from GitHub
}


private double[] createSumArray(double[] availability, double[] similarity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to avoid needing a utility function like this if you use DenseVectors instead of doing it all with raw arrays. There may be a slight performance penalty from the vector overhead, though that is generally what we use in the rest of the foundry to get the easier utility methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get if you would like to change the raw arrays structure wherever or only in this point. I'm following what is already implemented in AffinityPropagation class. If i have to change it, could you provide me an example, please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its true those classes use the raw arrays, but they don't need to do these kinds of vector-style operations in their implementation. The way to do it would be to use the DenseVector via VectorFactory.getDenseDefault to make the two arrays, then fill them via set(), then this operation is just availability.plus(similarity).
But that may require changing quite a bit of the code, so if you think it is better to keep it closer to the other implementation then you can leave it this way (though may want to add a note saying why).

assertTrue( index >= 0 );
assertEquals( membershipCounts[index], cluster.getMembers().size() );
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Jeremy's comment, it would be good to have a unit test that generates some random data and then checks that the two implementations return the same result.

}
}

private double sum(Vector columnVector) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just do columnVector.sum() instead of needing this method.

* @param selfDivergence The maxValue for self-divergence to use, which
* controls the number of clusters created.
*/
OptimizedAffinityPropagation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These other constructors should be public too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants