-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use CAMISIM to simulate a sample with multiple strains of the same species? #86
Comments
Hi, |
Perfect! A distribution is generated after running the simulation with default parameters (distribution_*.txt). For instance in my case, one of my genomes ("Genome_4") has an abundance value of 0.50, meaning 50% of the genetic data in the simulated metagenome originates from this genome (if all genomes are of equal size at least). In addition, there is also a reads_mappings.tsv file. In this file I only see 1 read with genome_id Genome_4, and many other reads for the other genomes in my sample. Why is this? Should the amount of reads in the reads_mapping.tsv file reflect the distribution in the distribution_*.txt file, or not? I think I might be mixing some things up. Clarification: Regards, |
The distribution files are not percentage based, but just as a relative measure to each other. I would assume that if Genome_4 has an abundance of 0.5, the other genomes have a much higher abundance. Apart from that every read appears in the reads_mapping.tsv and should be uniquely mapped to a genome. |
Perfect, many thanks |
Hello,
I am working on a classifier that is capable of giving a taxonomic, strain-level ID to WGS reads. I know that CAMISIM can be used to generate simulated strains based on 1 genome, but can I also use CAMISIM to simulate a sample consisting of different, but real(!), strains?
I have a number of genomes (fasta) of multiple strains belonging to different species. These strains are present already in the NCBI RefSeq database. Imagine 4 strains for species A, 10 strains for species B, 2 strains for species C, etc. . I also know the taxonomy ID of these strains. I'd like to generate simulated reads based on this community, to see if my classifier can detect those individual strains correctly.
If the answer is yes, could you give me some guidance on how to best go about this? For instance, would you advice generating a .biom profile, or shall I use the de novo option? Would you advice to set the genomes_total equal to the genomes_real in this case?
Regards,
Nienke
The text was updated successfully, but these errors were encountered: