You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have now created two different naive clustering implementations, each living in a separate branch. We have to decide which implementation we want to move forward with.
This implementation looks the most like our UML and simply stores all the results in a collection (currently a vector).
The problem i see with this implementation is that it does not scale well with a high amount of sequences.
To try to fix the problems of the first solution i have created this solution, which returns an immediate result for each sequence provided, and then when the clustering is done, it is possible to get the summarised clusters with how many sequences they contain (but no references to specific sequences).
This implementation is suitable for the output format UCLUST uses.
Please ask if you have any questions. When a decision is made we can open a pull request for the appropriate branch and fix it up and review it properly.
Also. Please join the discussion. Even if you have no strong opinion, i would like to hear that too.
The text was updated successfully, but these errors were encountered:
I have now created two different naive clustering implementations, each living in a separate branch. We have to decide which implementation we want to move forward with.
https://github.com/BIO-DIKU/klust/compare/naive-clust-collection
This implementation looks the most like our UML and simply stores all the results in a collection (currently a vector).
The problem i see with this implementation is that it does not scale well with a high amount of sequences.
https://github.com/BIO-DIKU/klust/compare/naive-clust-immediate
To try to fix the problems of the first solution i have created this solution, which returns an immediate result for each sequence provided, and then when the clustering is done, it is possible to get the summarised clusters with how many sequences they contain (but no references to specific sequences).
This implementation is suitable for the output format UCLUST uses.
Please ask if you have any questions. When a decision is made we can open a pull request for the appropriate branch and fix it up and review it properly.
Also. Please join the discussion. Even if you have no strong opinion, i would like to hear that too.
The text was updated successfully, but these errors were encountered: