Permalink
| Bowtie: an Ultrafast, Lightweight Short Read Aligner | |
| Bowtie Getting Started Guide | |
| ============================ | |
| Download and extract the appropriate Bowtie binary release from | |
| http://bowtie-bio.sf.net into a fresh directory. Change to that | |
| directory. | |
| Performing alignments | |
| --------------------- | |
| The Bowtie source and binary packages come with a pre-built index of | |
| the E. coli genome, and a set of 1,000 35-bp reads simulated from that | |
| genome. To use Bowtie to align those reads, issue the following | |
| command. If you get an error message "command not found", try adding | |
| a "./" before the "bowtie". | |
| bowtie e_coli reads/e_coli_1000.fq | |
| The first argument to bowtie is the basename of the index for the | |
| genome to be searched. The second argument is the name of a FASTQ file | |
| containing the reads. | |
| Depending on your computer, the run might take a few seconds up to | |
| about a minute. You will see bowtie print many lines of output. Each | |
| line is an alignment for a read. The name of the aligned read appears | |
| in the leftmost column. The final line should say "Reported 698 | |
| alignments to 1 output stream(s)" or something similar. | |
| Next, issue this command: | |
| bowtie -t e_coli reads/e_coli_1000.fq e_coli.map | |
| This run calculates the same alignments as the previous run, but the | |
| alignments are written to e_coli.map (the final argument) rather than | |
| to the screen. Also, the -t option instructs Bowtie to print timing | |
| statistics. The output should look something like this: | |
| Time loading forward index: 00:00:00 | |
| Time loading mirror index: 00:00:00 | |
| Seeded quality full-index search: 00:00:00 | |
| # reads processed: 1000 | |
| # reads with at least one reported alignment: 699 (69.90%) | |
| # reads that failed to align: 301 (30.10%) | |
| Reported 699 alignments to 1 output stream(s) | |
| Time searching: 00:00:00 | |
| Overall time: 00:00:00 | |
| Installing a pre-built index | |
| ---------------------------- | |
| Download the pre-built S. cerevisiae genome package from the Bowtie | |
| FTP site: | |
| ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/s_cerevisiae.ebwt.zip | |
| All pre-built indexes are packaged as .zip archives, and the S. | |
| cerevisiae archive is named s_cerevisiae.ebwt.zip. When it has | |
| finished downloading, extract the archive into the Bowtie 'indexes' | |
| subdirectory using your preferred unzip tool. The index is now | |
| installed. | |
| To test that the index is properly installed, issue this command from | |
| the Bowtie install directory: | |
| bowtie -c s_cerevisiae ATTGTAGTTCGAGTAAGTAATGTGGGTTTG | |
| This command searches the S. cerevisiae index with a single read. The | |
| -c argument instructs Bowtie to obtain read sequences directly from | |
| the command line rather than from a file. If the index is installed | |
| properly, this command should print a single alignment and then exit. | |
| If you would rather install pre-built indexes somewhere other than the | |
| 'indexes' subdirectory of the Bowtie install directory, simply set the | |
| BOWTIE_INDEXES environment variable to point to your preferred | |
| directory and extract indexes there instead. | |
| Building a new index | |
| -------------------- | |
| The pre-built E. coli index included with Bowtie is built from the | |
| sequence for strain 536, known to cause urinary tract infections. We | |
| will create a new index from the sequence of E. coli strain O157:H7, a | |
| strain known to cause food poisoning. Download and decompress the | |
| sequence file from: | |
| ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/Escherichia_coli/all_assembly_versions/GCF_000513035.1_E._coli_O157/GCF_000513035.1_E._coli_O157_genomic.fna.gz | |
| Once it has been downloaded and decompressed, move it to the Bowtie | |
| install directory and issue this command: | |
| bowtie-build GCF_000513035.1_E._coli_O157_genomic.fna e_coli_O157_H7 | |
| The command should finish quickly, and print several lines of status | |
| messages. When the command has completed, note that the current | |
| directory contains four new files named e_coli_O157_H7.1.ebwt, | |
| e_coli_O157_H7.2.ebwt, e_coli_O157_H7.rev.1.ebwt, and | |
| e_coli_O157_H7.rev.2.ebwt. These files constitute the index. Move | |
| these files to the indexes subdirectory to install it. | |
| To test that the index is properly installed, issue this command: | |
| bowtie -c e_coli_O157_H7 GAACCGTATTCACCCGCCATCCCCATGCCG | |
| If the index is installed properly, this command should print a single | |
| alignment and then exit. | |
| Finding variations with SAMtools | |
| -------------------------------- | |
| SAMtools (http://samtools.sf.net) is a suite of tools for storing, | |
| manipulating, and analyzing alignments such as those output by Bowtie. | |
| SAMtools understands alignments in either of two complementary | |
| formats: the human-readable SAM format, or the binary BAM format. | |
| Because Bowtie can output SAM (using the -S/--sam option), and SAM can | |
| can be converted to BAM using SAMtools, Bowtie users can make full use | |
| of the analyses implemented in SAMtools, or in any other tools | |
| supporting SAM or BAM. | |
| We will use SAMtools to find SNPs in a set of simulated reads included | |
| with Bowtie. The reads cover the first 10,000 bases of the pre-built | |
| E. coli genome and contain 10 SNPs throughout. First, we run 'bowtie' | |
| to align the reads, being sure to specify the -S option. We also | |
| specify an output file that we will use as input for the next step | |
| (though pipes can be used to accomplish the same thing without the | |
| intermediate file): | |
| bowtie -S e_coli reads/e_coli_10000snp.fq ec_snp.sam | |
| Next, we convert the SAM file to BAM in preparation for sorting. We | |
| assume that SAMtools is installed and that the samtools binary is | |
| accessible in the PATH. | |
| samtools view -bS -o ec_snp.bam ec_snp.sam | |
| Next, we sort the BAM file, in preparation for SNP calling: | |
| samtools sort ec_snp.bam ec_snp.sorted | |
| We now have a sorted BAM file called ec_snp.sorted.bam. Sorted BAM is | |
| a useful format because the alignments are both compressed, which is | |
| convenient for long-term storage, and sorted, which is conveneint for | |
| variant discovery. Finally, we call variants from the Sorted BAM: | |
| samtools pileup -cv -f genomes/NC_008253.fna ec_snp.sorted.bam | |
| For this sample data, the 'samtools pileup' command should print | |
| records for 10 distinct SNPs, the first being at position 541 in the | |
| reference. | |
| See the SAMtools web site for details on how to use these and other | |
| tools in the SAMtools suite: http://samtools.sf.net/. |