Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering Implementation #59

Open
AndreasDahl opened this issue Oct 22, 2015 · 0 comments
Open

Clustering Implementation #59

AndreasDahl opened this issue Oct 22, 2015 · 0 comments

Comments

@AndreasDahl
Copy link
Member

I have now created two different naive clustering implementations, each living in a separate branch. We have to decide which implementation we want to move forward with.

https://github.com/BIO-DIKU/klust/compare/naive-clust-collection

This implementation looks the most like our UML and simply stores all the results in a collection (currently a vector).
The problem i see with this implementation is that it does not scale well with a high amount of sequences.

https://github.com/BIO-DIKU/klust/compare/naive-clust-immediate

To try to fix the problems of the first solution i have created this solution, which returns an immediate result for each sequence provided, and then when the clustering is done, it is possible to get the summarised clusters with how many sequences they contain (but no references to specific sequences).
This implementation is suitable for the output format UCLUST uses.


Please ask if you have any questions. When a decision is made we can open a pull request for the appropriate branch and fix it up and review it properly.

Also. Please join the discussion. Even if you have no strong opinion, i would like to hear that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant