"Genome-Matcher" DNA Sequence Search System
This Software is aimed at helping researchers better search through and compare the DNA of bacteria responsible for common illnesses. The thought is that if the CDCs researchers have better tools to process the DNA of various organisms, they can more quickly identify new bacterial strains and create cures.
Before running the search system, you need to change PROVIDED_DIR in the following line 15 of main.cpp into the absolute path of your"data" repository :
const string PROVIDED_DIR = "/Users/jin/Gee-nomics/data";
To get the absolute path of your "data" repository, you can perform the following steps: On Mac OS X, you open up your terminal, change your directory into where you cloned your Genome-Matcher repository
$ cd data $ pwd /Users/jin/Genome-Matcher/data
Then replace PROVIDED_DIR with this path.
When you run the program, it might look like this:
28 Welcome to the Gee-nomics test harness! The genome library is initially empty, with Commands: c - create new genome library a - add one genome manually l - load one data file d - load all provided data files e - find matches exactly a default minSearchLength of 10 s-find matching SNiPs r-find related genomes (manual) f-find related genomes (file) ?-show this menu q-quit Enter command: a Enter name: yeti Enter DNA sequence: ACGTACGTAAAACCCCGGGGTTTTNANANANANA Enter command: e Enter DNA sequence for which to find Enter minimum sequence match length: 1 matches of AAAACCCCGGGGTTNN found: exact matches: AAAACCCCGGGGTTNN 12 length 14 position 8 in yeti Enter command: e Enter DNA sequence for which to find Enter minimum sequence match length: No matches of CCCCAAAATTTT were found. Enter command: s Enter DNA sequence for which to find exact matches and SNiPs: AAAACCTCGGGGTTNN Enter minimum sequence match length: 12 1 matches and/or SniPs of AAAACCTCGGGGTTNN found: length 14 position 8 in yeti Enter command: c Enter minimum search length (3-100): 4 Enter command: a Enter name: sasquatch Enter DNA sequence: GGGGTTTTAAAACCCCACGTACGTACGTNANANANA Enter command: r Enter DNA sequence: AAATCCCTGGGGTTTTNANA Enter match percentage threshold (0-100): 20 Require (e)xact match or allow (S)NiPs (e or s): s 1 related genomes were found: 50.00% sasquatch Enter command: c Enter minimum search length (3-100): 10 Enter command: d Loaded 1 genomes from Ferroplasma_acidarmanus.txt Loaded 2 genomes from Halobacterium_jilantaiense.txt Loaded 105 genomes from Halorubrum_chaoviator.txt Loaded 83 genomes from Halorubrum_californiense.txt Loaded 55 genomes from Halorientalis_regularis.txt Loaded 121 genomes from Halorientalis_persicus.txt Loaded 1 genomes from Ferroglobus_placidus.txt Loaded 1 genomes from Desulfurococcus_mucosus.txt Enter command: e Enter DNA sequence for which to find exact matches: ACGAATCACGTGCGAGA Enter minimum sequence match length: 11 2 matches of ACGAATCACGTGCGAGA found: length 17 position 568 in NZ_AOJK01000080.1 Halorubrum californiensis DSM 19288 contig_80, whole genome shotgun sequence length 12 position 1977 in NZ_FOCX01000065.1 Halorientalis persicus strain IBRC- M 10043, whole genome shotgun sequence Enter command: q