Optimize AffinityPropagation class #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Markoy8 wants to merge 6 commits into algorithmfoundry:master from Markoy8:master

Markoy8 commented Oct 4, 2018

Added a class extending the existing AffinityPropagation.java class.

Its implementation upgrades the previous one in terms of computation efficiency.
The two core functions are reimplemented. Computational complexity reduced from n^3 to n^2 for both the functions.

Moreover it adds the computation of the median of similarities if the "selfDivergence" parameter is not specified in input by the client.

The new implementation could substitute the existing one. It obtains exactly the same results, in a much shorter computation time. (With a local test dataset it is 28 times faster)

Markoy8 added 3 commits

October 3, 2018 16:06


          Added a class extending the existing AffinityPropagation.java class.

435c0fb

Its implementation upgrage the previous one in terms of computation efficiency.
The two core functions are reimplemented. Computational complexity from n^3 to n^2

Moreover it adds the computation of the median of similarities if the preference parameter is not specified in input by the client.

The new implementation could substitute the existing one.


          Added a class extending the existing AffinityPropagation.java class.

fbdfa1d

Its implementation upgrage the previous one in terms of computation efficiency.
The two core functions are reimplemented. Computational complexity from n^3 to n^2

Moreover it adds the computation of the median of similarities if the preference parameter is not specified in input by the client.

The new implementation could substitute the existing one.


          package change

8c5ade0

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated

		@@ -0,0 +1,261 @@
		package gov.sandia.cognition.learning.algorithm.clustering;

Member

jeremy-wendt Feb 6, 2019

Hmmm.... Whitespace is different with this file versus all the other files in the Foundry. (This has 2-space tabs generally, while the rest of the Foundry uses 4-space tabs.) Could you please fix this?

Author

Markoy8 Feb 26, 2019

I hope to have done

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated

+                  return true;  }
+                private void fillSimilarityDiagonal() {
+                  // Set the self similarity based on the self divergence.

Member

jeremy-wendt Feb 6, 2019

Could this comment be the function javadoc instead?

Author

Markoy8 Feb 26, 2019

Yes, for sure.

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java

+                    similarities[i][i] = -getSelfDivergence();
+                  }
+                }

Member

jeremy-wendt Feb 6, 2019

I think the divergenciesArray (should be divergencesArray?) is a complex enough (but cool!) trick that it's worth j function Javadoc explaining its internal referencing specifically. It appears to be a 1-d vector of an upper-triangular 2-d array, right?

Author

Markoy8 Feb 26, 2019

I didn't get this point. I simply extended the self divergence hanling, set to the median of all the divergencies, when it's not specified by the client. What do i have to explain in a more specific way? Thank you.

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated

+                 * Updates the responsibilities matrix using the similarity values and the
+                 * current availability values.
+                 */
+                @Override

Member

jeremy-wendt Feb 6, 2019

Succinct and cool. Well done!

Author

Markoy8 Feb 26, 2019

Thanks :)

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java

+                    columnArray.sum += columnArray.array[i];
+                  }
+                }

Member

jeremy-wendt Feb 6, 2019

Please make these classes "private static", unless you think a deriving class would want access to them, then "protected static".

I'm happy for this next part to be a discussion instead of a "change this".

These two classes are essentially just data stores (C-like structs) instead of full classes. As they're completely contained in this class, they aren't part of the API for the Foundry. However, would it make sense to move some of the logic around adding new values from the various helper functions above into these classes themselves?

Author

Markoy8 Feb 26, 2019

Yes, they are just data stores, but really currently i don't know the code structure of the project and i wouldn't know where put them. In order to make "private static" the class, indeed, previously i should move the ColumnArray and MaxResult classes. If you could provide me a suggestion where to put these new structures, i would be happy to do that.
If i understand well, you would move some logic , handling this classes, in the classe themselves, am i right?

Member

jbasilico Aug 8, 2019

I think you can have them stay as internal static classes here. I think Jeremy's comment is you can put something like the computeMax in the MaxResult class.

jeremy-wendt reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated

		@@ -0,0 +1,261 @@
		package gov.sandia.cognition.learning.algorithm.clustering;

Member

jeremy-wendt Feb 6, 2019

In your email to me you said this optimized class gives identical results to the original class. Can you please update the tests for the original class so that this optimized class is tested in the exact same ways? That way, if a future coder changes something in this class, we can ensure there are no unexpected regressions?

Author

Markoy8 Feb 26, 2019

Yes for sure. I added a new Test, named OptimizedAffinityPropagationTest, testing the class OptimizedAffinityPropagation exactly in the way the test AffinityPropagationTest tested the class AffinityPropagation. Maybe i should add a new test method in order to test the case in which the user does not specify the self divergence in input. Could be useful for you?


          Reformatted

d6f476d

added new test for the new class

added Java doc

Fixed some requests from GitHub

jbasilico reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java

		}


		private double[] createSumArray(double[] availability, double[] similarity)

Member

jbasilico Mar 4, 2019

You may be able to avoid needing a utility function like this if you use DenseVectors instead of doing it all with raw arrays. There may be a slight performance penalty from the vector overhead, though that is generally what we use in the rest of the foundry to get the easier utility methods.

Author

Markoy8 Jul 30, 2019

I don't get if you would like to change the raw arrays structure wherever or only in this point. I'm following what is already implemented in AffinityPropagation class. If i have to change it, could you provide me an example, please?

Member

jbasilico Aug 8, 2019

Its true those classes use the raw arrays, but they don't need to do these kinds of vector-style operations in their implementation. The way to do it would be to use the DenseVector via VectorFactory.getDenseDefault to make the two arrays, then fill them via set(), then this operation is just availability.plus(similarity).
But that may require changing quite a bit of the code, so if you think it is better to keep it closer to the other implementation then you can leave it this way (though may want to add a note saying why).

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated Show resolved Hide resolved

...est/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagationTest.java Outdated Show resolved Hide resolved

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Show resolved Hide resolved

...est/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagationTest.java

+                          assertTrue( index >= 0 );
+                          assertEquals( membershipCounts[index], cluster.getMembers().size() );
+                      }
+                  }

Member

jbasilico Mar 4, 2019

Per Jeremy's comment, it would be good to have a unit test that generates some random data and then checks that the two implementations return the same result.


          Requested modifications done

c895362

jbasilico reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java

+                    }
+                  }
+                  private double sum(Vector columnVector) {

Member

jbasilico Aug 8, 2019

You should be able to just do columnVector.sum() instead of needing this method.

jbasilico reviewed

View reviewed changes

.../Source/gov/sandia/cognition/learning/algorithm/clustering/OptimizedAffinityPropagation.java Outdated

+                   * @param  selfDivergence The maxValue for self-divergence to use, which
+                   *         controls the number of clusters created.
+                   */
+                  OptimizedAffinityPropagation(

Member

jbasilico Aug 8, 2019

These other constructors should be public too.


          Fixed a typo; Implemented a new test to compare results of old and ne…

c3ec00a

…w algorithm; minor changes after review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet