-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with the number of clusters estimation #3
Comments
Thank you for your feedback.
In my simulation setting, we only have 40 samples with 100 bacteria
species. We do not have 40 bacterial species simulated data.
I re-ran the MetaGen on 120x-40-100sp data and the following is my output.
Could you specify which data you are using?
[image: Inline image 1]
Have a nice holiday!
Xin Xing
…On Wed, Dec 20, 2017 at 12:07 PM, QuentinLetourneur < ***@***.***> wrote:
I ran MetaGen with default parameters on a simulated dataset containing 40
bacterial species but the optimal number of clusters found was very low (5).
I raised the bic_min option to 22 to avoid something like a local minimum
at 5 and the bic_step to 5 but the determined optimal number of cluster was
still 5.
Here is the MetaGen output :
Initializing...
Initialization finished.
Selecting the number of clusters ...
The searching range for the number of clusters is from 22 to 49 with step
size 5
Running time for selecting number of clusters
19.17818The optimal number of cluster is 5
number of iterations= 9
The BIC score for 5 clusters finished
I'd like to have your thoughts on the matter.
Thanks in advance,
Quentin
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC-hJpjZX-doOQzjQl4NxqNizaB_YORxks5tCT68gaJpZM4RIsTc>
.
--
Xin Xing
Department of Statistics
University of Georgia
|
Thanks for your reply, I can't see the image you sent. I forgot some important details about the dataset that I used : it's composed of 30 samples each containing 30 bacterias sampled from a list of 40 bacterias. The coverage is at least of 50x for all genomes It's a simulated dataset that I created from genomes took in the NCBI. What is confusing is that I have assembled this dataset with CLC with and without doing the scaffolding step. In the first case MetaGen works fine but I have the issue I mentioned in the second case. Quentin |
I ran MetaGen with default parameters on a simulated dataset containing 40 bacterial species but the optimal number of clusters found was very low (5).
I raised the bic_min option to 22 to avoid something like a local minimum at 5 and the bic_step to 5 but the determined optimal number of cluster was still 5.
Here is the MetaGen output :
Initializing...
Initialization finished.
Selecting the number of clusters ...
The searching range for the number of clusters is from 22 to 49 with step size 5
Running time for selecting number of clusters
19.17818The optimal number of cluster is 5
number of iterations= 9
The BIC score for 5 clusters finished
I'd like to have your thoughts on the matter.
Thanks in advance,
Quentin
The text was updated successfully, but these errors were encountered: