Skip to content

Default Databases With Custom Partitions

ktaed edited this page Dec 18, 2019 · 7 revisions

Default Databases (oneclick) with Custom Partitions

In a situation where only "complete genome" bacteria needs to be to be considered in the analysis. The follow steps may be followed to yield an FM-index of those sequences.

Steps 1a (only if not previously built) and 1b:

Download all default database (GenBank, RefSeq Complete Genomes):

 mtsv_setup database --path today --thread 8 --download_only

The following command would build a FASTA datastore using the Complete Genome download in the previous command using 4 cores.

 mtsv_setup database --path today --thread 4 --build_only --includedb "Complete Genome" genbank

Step 2:

The formatting convention used can be thought of as a list of set operations between comma separated TaxIDs to include and exclude. For example, a partition of Bacilli (91061) without B. anthracis (1392) or B. cereus (1396) could be specified with the string "91061-1396,1392". The "-" denotes the difference set operation and can be left out if no exclusion is desired.

The following command would build 1 FM-indices of bacteria Using the complete genome assemblies with 2 cores.

mtsv_setup custom_db --path today --thread 2 --partition 2 --customdb "Complete Genome"