Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the sum of unique clusters in "01.clusterseq" file does not match the number of unique clusters in "03.sum_cluseter" file #30

Closed
xuanji2017 opened this issue Sep 10, 2021 · 1 comment

Comments

@xuanji2017
Copy link

Hi,
Thank you to make this great tool.
I finally get the 03. results folder. But when I check the number of unique clusters in "01.clusterseq.GCA_000210735.tsv", I found the number is not the same as the number of clusters in 03.summarize.GCA_000210735.clusters.tsv. For example, 1331 vs 1234. The number of groups is also the same case. Besides, the number of unique inferred_seq in "01.clusterseq.GCA_000210735.tsv" is also not the same as the number of contigs in "04.makefasta.GCA_000210735.all_seqs.fna". Do you have any explanation for this? Thanks a lot!

@durrantmm
Copy link
Collaborator

Great question! So the genotyping step applies a filter to clusters. It's called "--filter-clusters-inferred-assembly". This removes clusters that were never identified from an assembly, meaning they were only found in the reference. You can remove this filter if you make your own custom snakemake pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants