Clustering Implementation #59

AndreasDahl · 2015-10-22T12:47:53Z

I have now created two different naive clustering implementations, each living in a separate branch. We have to decide which implementation we want to move forward with.

https://github.com/BIO-DIKU/klust/compare/naive-clust-collection

This implementation looks the most like our UML and simply stores all the results in a collection (currently a vector).
The problem i see with this implementation is that it does not scale well with a high amount of sequences.

https://github.com/BIO-DIKU/klust/compare/naive-clust-immediate

To try to fix the problems of the first solution i have created this solution, which returns an immediate result for each sequence provided, and then when the clustering is done, it is possible to get the summarised clusters with how many sequences they contain (but no references to specific sequences).
This implementation is suitable for the output format UCLUST uses.

Please ask if you have any questions. When a decision is made we can open a pull request for the appropriate branch and fix it up and review it properly.

Also. Please join the discussion. Even if you have no strong opinion, i would like to hear that too.

AndreasDahl added the discussion label Oct 22, 2015

AndreasDahl mentioned this issue Oct 22, 2015

Centroid class #40

Open

AndreasDahl added the question label Oct 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering Implementation #59

Clustering Implementation #59

AndreasDahl commented Oct 22, 2015

Clustering Implementation #59

Clustering Implementation #59

Comments

AndreasDahl commented Oct 22, 2015

https://github.com/BIO-DIKU/klust/compare/naive-clust-collection

https://github.com/BIO-DIKU/klust/compare/naive-clust-immediate