Can I use CAMISIM to simulate a sample with multiple strains of the same species? #86

NienkeMekkes · 2020-09-08T13:40:04Z

Hello,

I am working on a classifier that is capable of giving a taxonomic, strain-level ID to WGS reads. I know that CAMISIM can be used to generate simulated strains based on 1 genome, but can I also use CAMISIM to simulate a sample consisting of different, but real(!), strains?

I have a number of genomes (fasta) of multiple strains belonging to different species. These strains are present already in the NCBI RefSeq database. Imagine 4 strains for species A, 10 strains for species B, 2 strains for species C, etc. . I also know the taxonomy ID of these strains. I'd like to generate simulated reads based on this community, to see if my classifier can detect those individual strains correctly.

If the answer is yes, could you give me some guidance on how to best go about this? For instance, would you advice generating a .biom profile, or shall I use the de novo option? Would you advice to set the genomes_total equal to the genomes_real in this case?

Regards,
Nienke

AlphaSquad · 2020-09-08T14:28:58Z

Hi,
this is absolutely possible, I would even go as far as saying that this is one of the main ideas we had in mind when developing CAMISIM.
If you have the genomes but no BIOM profile already present, I would recommend the de novo mode. Then you can either decide to let CAMISIM choose the abundances based on a log-normal distribution or, if you want to fine-tune the individual abundances of the genomes, provide all the abundances yourself (using the distribution_file_paths option). As you described, if you have all the genomes you want to simulate reads from, then setting genomes_total = genomes_real is the way to go

NienkeMekkes · 2020-09-10T13:42:17Z

Perfect!
Thanks for your explanation, I have just run CAMISIN on some closely related strains without errors. I do have a question about the output files that I didn't understand:

A distribution is generated after running the simulation with default parameters (distribution_*.txt). For instance in my case, one of my genomes ("Genome_4") has an abundance value of 0.50, meaning 50% of the genetic data in the simulated metagenome originates from this genome (if all genomes are of equal size at least).

In addition, there is also a reads_mappings.tsv file. In this file I only see 1 read with genome_id Genome_4, and many other reads for the other genomes in my sample. Why is this? Should the amount of reads in the reads_mapping.tsv file reflect the distribution in the distribution_*.txt file, or not? I think I might be mixing some things up.

Clarification:
My desired output would be a big set of closely related reads of which the taxonomic ID is known, so I can see how well my taxonomic read classifier works.

Regards,
Nienke

AlphaSquad · 2020-09-10T17:03:53Z

The distribution files are not percentage based, but just as a relative measure to each other. I would assume that if Genome_4 has an abundance of 0.5, the other genomes have a much higher abundance. Apart from that every read appears in the reads_mapping.tsv and should be uniquely mapped to a genome.

NienkeMekkes · 2020-09-11T09:28:16Z

Perfect, many thanks

NienkeMekkes closed this as completed Sep 11, 2020

This was referenced Mar 12, 2023

Grasping CAMISIM ndreey/ghost-magnet#13

Open

CAMISIM: Generate the abundance profiles ndreey/ghost-magnet#19

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use CAMISIM to simulate a sample with multiple strains of the same species? #86

Can I use CAMISIM to simulate a sample with multiple strains of the same species? #86

NienkeMekkes commented Sep 8, 2020

AlphaSquad commented Sep 8, 2020

NienkeMekkes commented Sep 10, 2020

AlphaSquad commented Sep 10, 2020

NienkeMekkes commented Sep 11, 2020

Can I use CAMISIM to simulate a sample with multiple strains of the same species? #86

Can I use CAMISIM to simulate a sample with multiple strains of the same species? #86

Comments

NienkeMekkes commented Sep 8, 2020

AlphaSquad commented Sep 8, 2020

NienkeMekkes commented Sep 10, 2020

AlphaSquad commented Sep 10, 2020

NienkeMekkes commented Sep 11, 2020