-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation from different genome for same specie #186
Comments
Hi, thank you for your interest in CAMISIM. In terms of abundance: If you use ART as read simulator, the abundance CAMISIM uses is the depth at which a genome is sequenced - this means you would need to normalise by genome length if you wanted the same number of reads. For Nanosim and wgsim there currently is a bug which ignores genome sizes, so you actually would get the same number of reads. So generally I think your files look correct. I would recommend simulating a small data set with like 100MB in a single sample and check if the output looks like you would expect. If it does not, feel free to come back. |
Thank you for your quick answer ! I will try it out with a small generation and come back if it looks extraneous. |
Hello. First of all, thank you for your help last time, your method worked well and I could generate data two different times. Here is the output : 2024-03-25 17:10:37 WARNING: [MetagenomeSimulationPipeline] The output will require approximately 108.0 GigaByte. And the content of my different files (they are qite long, but I prefer to write them entirely : cousinConfig.ini [Main] [ReadSimulator] [CommunityDesign] [community0] metadata.tsv genome_to_id.tsv abundance0.tsv MGYG-HGUT-00870 1.594999612415094e-09 |
Hi, it seems like you actually have the genome with the ID |
Hello,
Rather than an issue with code, I would need an advice on data generation.
I would like to generate a sample of some species using only my own genomes.
I only need to generate reads and need no assembly, I just want to make sure I will have the good proportions.
For each specie, I have xi consensus genomes, themselves obtained from clustering of yi genomes.
Rather than using one genome by specie, I would like, for the sake of diversity in data generation, to use all the xi * yi genomes linked to the specie.
For example, Iet's say I have 2 species : E. coli and V. parvula.
I have two clusters of genomes for E. coli : E.coli_A and E.coli_B.
Cluster A contains 3 genome and B contains 2 genomes, so I have, as a whole, E.coli_A_1, E.coli_A_2, E.coli_A_3, E.coli_B_1 and E.coli_B_2.
V. parvula has one cluster of 4 genomes : V. parvula_A_1, V. parvula_A_2, V. parvula_A_3, V. parvula_A_4.
I want my species to be equally represented, and each genome to be equally represented in the specie.
My abundance would therefore be of ½ * 1/5 = 1/10 for each E.coli and ½ * ¼ = ⅛ for each V.parvula.
How would you recommand designing my abundance file and metadata ?
Can I just consider each genome to be on the same level and have metadata like this :
Genome_ID. OTU. NCBI_ID. novelty_category
E.coli_A_1. 1. 2. novel_species
E.coli_A_2. 2. 2. novel_species
E.coli_A_3. 3 2. novel_species
E.coli_B_1. 4 2. novel_species
E.coli_B_2. 5. 2. novel_species
V.parvula_A_1 6 2. novel_species
V.parvula_A_2 7 2. novel_species
V.parvula_A_3 8. 2. novel_species
V.parvula_A_4 9. 2. novel_species
and abundance like this :
E.coli_A_1. 0.1
E.coli_A_2. 0.1
E.coli_A_3 0.1
E.coli_B_1 0.1
E.coli_B_2 0.1
V.parvula_A_1. 0.125
V.parvula_A_2. 0.125
V.parvula_A_3. 0.125
V.parvula_A_4. 0.125
Or do I have to notify CAMISIM when they are from the same specie ? I figured this repartition would make sure each genome is treated independently to the proportion I hope to get, but I just want to make sure I am not mistaken.
(For the sake of abundance, I was notified that if I wanted each genome to give the exact same number of reads, I should normalize its abundance by its length, as longer genomes will produce more reads, is that so ?)
Thank you for your quick answer.
The text was updated successfully, but these errors were encountered: