Is profile-based design with custom genomes possible? #177

CassandraHjo · 2023-12-04T13:28:06Z

I want to use my own fasta files as input genomes to the simulation. I am wondering if profile-based community design is possible with custom genomes. i.e. I do not want to use genomes from NCBI. Would it be an option to provide my own genome sequence collection in the profile based design, like I can do in the de-novo based design?

AlphaSquad · 2023-12-04T17:16:50Z

Actually this is possible, yes! You will need to use the -ar/--additional-references option. This file has to be in a tab-separated format with the 4 columns NCBI_ID Scientific_name genome_path novelty_category (without header). The NCBI ID is required for mapping the scientific name from the profile to your genome, for novelty category you can just use known_strain.
If you do not want to use genomes from the NCBI at all, you will additionally need to use the -ref/--reference-genomes option and point to an empty file, so CAMISIM does not use the default reference list in addition to the one provided for you. The command could look like this:
./metagenome_from_profile --additional-references /your/reference/file.tsv --reference-genomes /path/to/empty/file.tsv -p /your/profile.biom

CassandraHjo · 2023-12-12T09:48:59Z

I am not interested in gsa or pooled_gsa. Do I still need to find a unique NCBI ID or can I just use 2?

For example in the reference_file.tsv can an entry look like this?
2 MAG0001 genomes/MAG0001.fasta known_strain

AlphaSquad · 2023-12-12T12:09:44Z

You don't need to provide these NCBI IDs then, so yes it could look like this, though just to be safe I'd advise using absolute paths to your genomes.

CassandraHjo · 2023-12-12T12:24:04Z

What does the biom file need to contain if an entry in the reference file looks like the one above?

AlphaSquad · 2023-12-12T13:24:45Z

Actually, looking at the code right now I was mistaken. For CAMISIM to work, every entry in the reference file needs to have a "correct" NCBI ID and scientific name, if you choose 2 as your taxonomy ID, CAMISIM assume that all your input genomes are on the taxonomic level of superkingdom and the mapping will not work. Additionally, the mapping from BIOM profile to your genome is performed via the scientific name, so using MAG0001 will not work as this will not be recognised as scientific name.
The format of your BIOM profile should be similar to the mini.biom profile provided. The abundances are stored under data and the taxonomy in the same format as in the mini.biom, i.e. they need the metadata and taxonomy keywords - usually QIIME produces these files in the correct format already.

CassandraHjo · 2024-01-04T08:34:41Z

I do not have the NCBI ID for all my custom genomes. Is there another way to make the profile-based design work, or is de-novo design (which I am able to run) be the best option?

AlphaSquad · 2024-01-04T10:04:30Z

Since you do not use CAMISIM's option to download genomes the de novo design might actually be best (and more accurate). To use the abundances from the input profile you would need to use the distribution_file_paths option to provide them for your genomes, tab-separated with genome ID and abundance from the BIOM file. Note that for the de novo design to work you will still need to provide NCBI taxonomy IDs, but if you do not plan on using the taxonomic profile gold standard any valid NCBI ID should work (e.g. 2 for Bacteria)

CassandraHjo · 2024-01-08T08:12:09Z

Do I need to change the phase in the config file if I am using the distribution_file_path option?

AlphaSquad · 2024-01-08T09:58:54Z

No, you should not need to change the phase, CAMISIM will automatically use the files if they are provided. Note that for multiple samples these need to be absolute paths and comma-separated without whitespace:
distribution_file_paths=/path/to/sample1.tsv,/path/to/sample2.tsv

CassandraHjo · 2024-01-08T10:00:26Z

Should the tsv files include headers?

AlphaSquad · 2024-01-08T10:08:10Z

No, these do not need a header, just genome_ID and abundance tab-separated

AlphaSquad mentioned this issue Jan 11, 2024

How to map relevant genomes to OTU through scientific names？ #185

Closed

AlphaSquad closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is profile-based design with custom genomes possible? #177

Is profile-based design with custom genomes possible? #177

CassandraHjo commented Dec 4, 2023

AlphaSquad commented Dec 4, 2023

CassandraHjo commented Dec 12, 2023

AlphaSquad commented Dec 12, 2023

CassandraHjo commented Dec 12, 2023

AlphaSquad commented Dec 12, 2023

CassandraHjo commented Jan 4, 2024

AlphaSquad commented Jan 4, 2024

CassandraHjo commented Jan 8, 2024

AlphaSquad commented Jan 8, 2024

CassandraHjo commented Jan 8, 2024

AlphaSquad commented Jan 8, 2024

Is profile-based design with custom genomes possible? #177

Is profile-based design with custom genomes possible? #177

Comments

CassandraHjo commented Dec 4, 2023

AlphaSquad commented Dec 4, 2023

CassandraHjo commented Dec 12, 2023

AlphaSquad commented Dec 12, 2023

CassandraHjo commented Dec 12, 2023

AlphaSquad commented Dec 12, 2023

CassandraHjo commented Jan 4, 2024

AlphaSquad commented Jan 4, 2024

CassandraHjo commented Jan 8, 2024

AlphaSquad commented Jan 8, 2024

CassandraHjo commented Jan 8, 2024

AlphaSquad commented Jan 8, 2024