Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the number of clusters estimation #3

Open
QuentinLetourneur opened this issue Dec 20, 2017 · 2 comments
Open

Problem with the number of clusters estimation #3

QuentinLetourneur opened this issue Dec 20, 2017 · 2 comments

Comments

@QuentinLetourneur
Copy link

I ran MetaGen with default parameters on a simulated dataset containing 40 bacterial species but the optimal number of clusters found was very low (5).

I raised the bic_min option to 22 to avoid something like a local minimum at 5 and the bic_step to 5 but the determined optimal number of cluster was still 5.

Here is the MetaGen output :

Initializing...
Initialization finished.
Selecting the number of clusters ...
The searching range for the number of clusters is from 22 to 49 with step size 5
Running time for selecting number of clusters
19.17818The optimal number of cluster is 5
number of iterations= 9
The BIC score for 5 clusters finished

I'd like to have your thoughts on the matter.

Thanks in advance,

Quentin

@BioAlgs
Copy link
Owner

BioAlgs commented Dec 21, 2017 via email

@QuentinLetourneur
Copy link
Author

Thanks for your reply,

I can't see the image you sent.

I forgot some important details about the dataset that I used : it's composed of 30 samples each containing 30 bacterias sampled from a list of 40 bacterias. The coverage is at least of 50x for all genomes

It's a simulated dataset that I created from genomes took in the NCBI.

What is confusing is that I have assembled this dataset with CLC with and without doing the scaffolding step. In the first case MetaGen works fine but I have the issue I mentioned in the second case.
The metrics between these 2 assemblies aren't that different so I don't think there was a problem with it.

Quentin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants