-
Notifications
You must be signed in to change notification settings - Fork 2
Default Databases With Custom Partitions
In a situation where only "complete genome" bacteria needs to be to be considered in the analysis. The follow steps may be followed to yield an FM-index of those sequences.
Download all default database (GenBank, RefSeq Complete Genomes):
mtsv_setup database --path today --thread 8 --download_only
The following command would build a FASTA datastore using the Complete Genome download in the previous command using 4 cores.
mtsv_setup database --path today --thread 4 --build_only --includedb "Complete Genome" genbank
The formatting convention used can be thought of as a list of set operations between comma separated TaxIDs to include and exclude. For example, a partition of Bacilli (91061) without B. anthracis (1392) or B. cereus (1396) could be specified with the string "91061-1396,1392". The "-" denotes the difference set operation and can be left out if no exclusion is desired.
The following command would build 1 FM-indices of bacteria Using the complete genome assemblies with 2 cores.
mtsv_setup custom_db --path today --thread 2 --partition 2 --customdb "Complete Genome"