diff --git a/MANUAL b/MANUAL index b1395ac8..b82d8bff 100644 --- a/MANUAL +++ b/MANUAL @@ -24,7 +24,7 @@ including for variation calling, ChIP-seq, RNA-seq, BS-seq. [Bowtie 2] and tools, including [TopHat]: a fast splice junction mapper for RNA-seq reads, [Cufflinks]: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads, [Crossbow]: a cloud-enabled software tool for analyzing -reseuqncing data, and [Myrna]: a cloud-enabled software tool for aligning +resequencing data, and [Myrna]: a cloud-enabled software tool for aligning RNA-seq reads and measuring differential gene expression. If you use [Bowtie 2] for your published research, please cite the [Bowtie @@ -59,6 +59,7 @@ The chief differences between Bowtie 1 and Bowtie 2 are: 1. For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie 1. For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive. +B 2. Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring @@ -85,7 +86,7 @@ alignments lie along a continuous spectrum of alignment scores where the not align in a paired fashion, Bowtie 2 attempts to find unpaired alignments for each mate. -8. Bowtie 2 reports a spectrum of mapping qualities, in contrast fo Bowtie 1 +8. Bowtie 2 reports a spectrum of mapping qualities, in contrast for Bowtie 1 which reports either 0 or high. 9. Bowtie 2 does not align colorspace reads. @@ -113,8 +114,7 @@ you may want to consider using tools like [NUCmer], [BLAT], or [BLAST]. These tools can be extremely slow when the reference genome is long, but are often adequate when the reference is short. -Bowtie 2 does not support alignment of colorspace reads. This might be -supported in future versions. +Bowtie 2 does not support alignment of colorspace reads. [MUMmer]: http://mummer.sourceforge.net/ [NUCmer]: http://mummer.sourceforge.net/manual/#nucmer @@ -158,14 +158,14 @@ from the MSYS environment. +Bowtie 2 is using the multithreading software model in order to +speed up execution times on SMP architectures where this is possible. +The Threading Building Blocks library, TBB, is now the default -+threading library in bowtie2. On POSIX platforms (like linux, Mac -+OS, etc) if TBB is not available the pthread library will be used. ++threading library in Bowtie 2. On POSIX platforms (like Linux, Mac ++OS, etc.) if TBB is not available the pthread library will be used. +Although it is possible to use pthread library on Windows, a non-POSIX -+platform, due to performance reasons bowtie 2 will try to use Windows ++platform, due to performance reasons Bowtie 2 will try to use Windows +native multithreading if possible. We recommend that you first +install the [Threading Building Blocks library], but if unable to +do so please specify `make NO_TBB=1`. TBB comes installed by default -+on many popular linux distros. Please note, packages built without ++on many popular Linux distros. Please note, packages built without +TBB will have _-legacy_ appended to the name. [MinGW]: http://www.mingw.org/ @@ -218,7 +218,7 @@ characters match. We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long -stretches of As (`AAAAAAAAA` etc) and the read sequence is a short stretch of As +stretches of As (`AAAAAAAAA` etc.) and the read sequence is a short stretch of As (`AAAAAAA`), we cannot know for certain exactly where in the sea of `A`s the read originated. @@ -268,7 +268,7 @@ Scores: higher = more similar An alignment score quantifies how similar the read sequence is to the reference sequence aligned to. The higher the score, the more similar they are. A score -is calculated by subtracting penalties for each difference (mismatch, gap, etc) +is calculated by subtracting penalties for each difference (mismatch, gap, etc.) and, in local alignment mode, adding bonuses for each match. The scores can be configured with the `--ma` (match bonus), `--mp` (mismatch @@ -437,7 +437,7 @@ a pair. See the [SAM specification] for a more detailed description of the ### Some SAM optional fields describe more paired-end properties -The last severeal fields of each SAM record usually contain SAM optional fields, +The last several fields of each SAM record usually contain SAM optional fields, which are simply tab-separated strings conveying additional information about the reads and alignments. A SAM optional field is formatted like this: "XP:i:1" where "XP" is the `TAG`, "i" is the `TYPE` ("integer" in this case), and "1" is @@ -552,7 +552,7 @@ beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the [SAM specification] for details. Bowtie 2 does not "find" alignments in any specific order, so for reads that -have more than N distinct, valid alignments, Bowtie 2 does not garantee that +have more than N distinct, valid alignments, Bowtie 2 does not guarantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than @@ -581,7 +581,7 @@ Bowtie 2's search for alignments for a given read is "randomized." That is, when Bowtie 2 encounters a set of equally-good choices, it uses a pseudo-random number to choose. For example, if Bowtie 2 discovers a set of 3 equally-good alignments and wants to decide which to report, it picks a pseudo-random integer -0, 1 or 2 and reports the corresponding alignment. Abitrary choices can crop up +0, 1 or 2 and reports the corresponding alignment. Arbitrary choices can crop up at various points during alignment. The pseudo-random number generator is re-initialized for every read, and the @@ -612,18 +612,18 @@ does], except Bowtie 1 attempts to align the entire read this way. This initial step makes Bowtie 2 much faster than it would be without such a filter, but at the expense of missing some valid alignments. For instance, it is possible for a read to have a valid overall alignment but to have no valid -seed alignments because each potential seed alignment is interruped by too many +seed alignments because each potential seed alignment is interrupted by too many mismatches or gaps. -The tradeoff between speed and sensitivity/accuracy can be adjusted by setting +The trade-off between speed and sensitivity/accuracy can be adjusted by setting the seed length (`-L`), the interval between extracted seeds (`-i`), and the number of mismatches permitted per seed (`-N`). For more sensitive alignment, set these parameters to (a) make the seeds closer together, (b) make the seeds shorter, and/or (c) allow more mismatches. You can adjust these options one-by-one, though Bowtie 2 comes with some useful combinations of options -pre-packaged as "[preset options]." +prepackaged as "[preset options]." -`-D` and `-R` are also options that adjust the tradeoff between speed and +`-D` and `-R` are also options that adjust the trade-off between speed and sensitivity/accuracy. ### FM Index memory footprint @@ -667,7 +667,7 @@ Bowtie 2 comes with some useful combinations of parameters packaged into shorter "preset" parameters. For example, running Bowtie 2 with the `--very-sensitive` option is the same as running with options: `-D 20 -R 3 -N 0 -L 20 -i S,1,0.50`. The preset options that come with Bowtie 2 are designed to cover a wide area of -the speed/sensitivity/accuracy tradeoff space, with the presets ending in `fast` +the speed/sensitivity/accuracy trade-off space, with the presets ending in `fast` generally being faster but less sensitive and less accurate, and the presets ending in `sensitive` generally being slower but more sensitive and more accurate. See the [documentation for the preset options] for details. @@ -678,7 +678,7 @@ Filtering Some reads are skipped or "filtered out" by Bowtie 2. For example, reads may be filtered out because they are extremely short or have a high proportion of ambiguous nucleotides. Bowtie 2 will still print a SAM record for such a read, -but no alignment will be reported and and the `YF:i` SAM optional field will be +but no alignment will be reported and the `YF:i` SAM optional field will be set to indicate the reason the read was filtered. * `YF:Z:LN`: the read was filtered because it had length less than or equal to @@ -697,7 +697,7 @@ and the last (11th) field of the read's QSEQ record contains `1`. If a read could be filtered for more than one reason, the value `YF:Z` flag will reflect only one of those reasons. -Alignment summmary +Alignment summary ------------------ When Bowtie 2 finishes running, it prints messages summarizing what happened. @@ -739,7 +739,7 @@ wrapper scripts that call binary programs as appropriate. The wrappers shield users from having to distinguish between "small" and "large" index formats, discussed briefly in the following section. Also, the `bowtie2` wrapper provides some key functionality, like the ability to handle compressed inputs, -and the fucntionality for `--un`, `--al` and related options. +and the functionality for `--un`, `--al` and related options. It is recommended that you always run the bowtie2 wrappers and not run the binaries directly. @@ -1205,7 +1205,7 @@ be valid in that case. If trimming options `-3` or `-5` are also used, the `-I` constraint is applied with respect to the untrimmed mates. The larger the difference between `-I` and `-X`, the slower Bowtie 2 will -run. This is because larger differences bewteen `-I` and `-X` require that +run. This is because larger differences between `-I` and `-X` require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient. @@ -1223,7 +1223,7 @@ constraint is applied with respect to the untrimmed mates, not the trimmed mates. The larger the difference between `-I` and `-X`, the slower Bowtie 2 will -run. This is because larger differences bewteen `-I` and `-X` require that +run. This is because larger differences between `-I` and `-X` require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient. @@ -1398,7 +1398,7 @@ Spec][SAM]. Specify `--rg` multiple times to set multiple fields. See the --omit-sec-seq When printing secondary alignments, Bowtie 2 by default will write out the `SEQ` -and `QUAL` strings. Specifying this option causes Bowtie 2 to print an asterix +and `QUAL` strings. Specifying this option causes Bowtie 2 to print an asterisk in those fields instead. #### Performance options @@ -1493,7 +1493,9 @@ left to right, the fields are: Note that the [SAM specification] disallows whitespace in the read name. If the read name contains any whitespace characters, Bowtie 2 will truncate the name at the first whitespace character. This is similar to the - behavior of other tools. + behavior of other tools. The standard behavior of truncating at the first + whitespace can be suppressed with `--sam-noqname-trunc` at the expense of + generating non-standard SAM. 2. Sum of all applicable flags. Flags relevant to Bowtie are: @@ -1996,21 +1998,21 @@ Run the paired-end example: $BT2_HOME/bowtie2 -x $BT2_HOME/example/index/lambda_virus -1 $BT2_HOME/example/reads/reads_1.fq -2 $BT2_HOME/example/reads/reads_2.fq -S eg2.sam -Use `samtools view` to convert the SAM file into a BAM file. BAM is a the +Use `samtools view` to convert the SAM file into a BAM file. BAM is the binary format corresponding to the SAM text format. Run: samtools view -bS eg2.sam > eg2.bam Use `samtools sort` to convert the BAM file to a sorted BAM file. - samtools sort eg2.bam eg2.sorted + samtools sort eg2.bam -o eg2.sorted.bam We now have a sorted BAM file called `eg2.sorted.bam`. Sorted BAM is a useful format because the alignments are (a) compressed, which is convenient for long-term storage, and (b) sorted, which is conveneint for variant discovery. To generate variant calls in VCF format, run: - samtools mpileup -uf $BT2_HOME/example/reference/lambda_virus.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf + samtools mpileup -uf $BT2_HOME/example/reference/lambda_virus.fa eg2.sorted.bam | bcftools view -Ov - > eg2.raw.bcf Then to view the variants, run: diff --git a/MANUAL.markdown b/MANUAL.markdown index f493333f..e3604a1a 100644 --- a/MANUAL.markdown +++ b/MANUAL.markdown @@ -29,7 +29,7 @@ including for variation calling, ChIP-seq, RNA-seq, BS-seq. [Bowtie 2] and tools, including [TopHat]: a fast splice junction mapper for RNA-seq reads, [Cufflinks]: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads, [Crossbow]: a cloud-enabled software tool for analyzing -reseuqncing data, and [Myrna]: a cloud-enabled software tool for aligning +resequencing data, and [Myrna]: a cloud-enabled software tool for aligning RNA-seq reads and measuring differential gene expression. If you use [Bowtie 2] for your published research, please cite the [Bowtie @@ -64,6 +64,7 @@ The chief differences between Bowtie 1 and Bowtie 2 are: 1. For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie 1. For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive. +B 2. Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring @@ -90,7 +91,7 @@ alignments lie along a continuous spectrum of alignment scores where the not align in a paired fashion, Bowtie 2 attempts to find unpaired alignments for each mate. -8. Bowtie 2 reports a spectrum of mapping qualities, in contrast fo Bowtie 1 +8. Bowtie 2 reports a spectrum of mapping qualities, in contrast for Bowtie 1 which reports either 0 or high. 9. Bowtie 2 does not align colorspace reads. @@ -167,14 +168,14 @@ from the MSYS environment. +Bowtie 2 is using the multithreading software model in order to +speed up execution times on SMP architectures where this is possible. +The Threading Building Blocks library, TBB, is now the default -+threading library in bowtie2. On POSIX platforms (like linux, Mac -+OS, etc) if TBB is not available the pthread library will be used. ++threading library in Bowtie 2. On POSIX platforms (like Linux, Mac ++OS, etc.) if TBB is not available the pthread library will be used. +Although it is possible to use pthread library on Windows, a non-POSIX -+platform, due to performance reasons bowtie 2 will try to use Windows ++platform, due to performance reasons Bowtie 2 will try to use Windows +native multithreading if possible. We recommend that you first +install the [Threading Building Blocks library], but if unable to +do so please specify `make NO_TBB=1`. TBB comes installed by default -+on many popular linux distros. Please note, packages built without ++on many popular Linux distros. Please note, packages built without +TBB will have _-legacy_ appended to the name. [MinGW]: http://www.mingw.org/ @@ -227,7 +228,7 @@ characters match. We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long -stretches of As (`AAAAAAAAA` etc) and the read sequence is a short stretch of As +stretches of As (`AAAAAAAAA` etc.) and the read sequence is a short stretch of As (`AAAAAAA`), we cannot know for certain exactly where in the sea of `A`s the read originated. @@ -277,7 +278,7 @@ Scores: higher = more similar An alignment score quantifies how similar the read sequence is to the reference sequence aligned to. The higher the score, the more similar they are. A score -is calculated by subtracting penalties for each difference (mismatch, gap, etc) +is calculated by subtracting penalties for each difference (mismatch, gap, etc.) and, in local alignment mode, adding bonuses for each match. The scores can be configured with the [`--ma`] (match bonus), [`--mp`] (mismatch @@ -448,7 +449,7 @@ a pair. See the [SAM specification] for a more detailed description of the ### Some SAM optional fields describe more paired-end properties -The last severeal fields of each SAM record usually contain SAM optional fields, +The last several fields of each SAM record usually contain SAM optional fields, which are simply tab-separated strings conveying additional information about the reads and alignments. A SAM optional field is formatted like this: "XP:i:1" where "XP" is the `TAG`, "i" is the `TYPE` ("integer" in this case), and "1" is @@ -564,7 +565,7 @@ beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the [SAM specification] for details. Bowtie 2 does not "find" alignments in any specific order, so for reads that -have more than N distinct, valid alignments, Bowtie 2 does not garantee that +have more than N distinct, valid alignments, Bowtie 2 does not guarantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than @@ -593,7 +594,7 @@ Bowtie 2's search for alignments for a given read is "randomized." That is, when Bowtie 2 encounters a set of equally-good choices, it uses a pseudo-random number to choose. For example, if Bowtie 2 discovers a set of 3 equally-good alignments and wants to decide which to report, it picks a pseudo-random integer -0, 1 or 2 and reports the corresponding alignment. Abitrary choices can crop up +0, 1 or 2 and reports the corresponding alignment. Arbitrary choices can crop up at various points during alignment. The pseudo-random number generator is re-initialized for every read, and the @@ -624,18 +625,18 @@ does], except Bowtie 1 attempts to align the entire read this way. This initial step makes Bowtie 2 much faster than it would be without such a filter, but at the expense of missing some valid alignments. For instance, it is possible for a read to have a valid overall alignment but to have no valid -seed alignments because each potential seed alignment is interruped by too many +seed alignments because each potential seed alignment is interrupted by too many mismatches or gaps. -The tradeoff between speed and sensitivity/accuracy can be adjusted by setting +The trade-off between speed and sensitivity/accuracy can be adjusted by setting the seed length ([`-L`]), the interval between extracted seeds ([`-i`]), and the number of mismatches permitted per seed ([`-N`]). For more sensitive alignment, set these parameters to (a) make the seeds closer together, (b) make the seeds shorter, and/or (c) allow more mismatches. You can adjust these options one-by-one, though Bowtie 2 comes with some useful combinations of options -pre-packaged as "[preset options]." +prepackaged as "[preset options]." -[`-D`] and [`-R`] are also options that adjust the tradeoff between speed and +[`-D`] and [`-R`] are also options that adjust the trade-off between speed and sensitivity/accuracy. [preset options]: #presets-setting-many-settings-at-once @@ -682,7 +683,7 @@ Bowtie 2 comes with some useful combinations of parameters packaged into shorter "preset" parameters. For example, running Bowtie 2 with the `--very-sensitive` option is the same as running with options: `-D 20 -R 3 -N 0 -L 20 -i S,1,0.50`. The preset options that come with Bowtie 2 are designed to cover a wide area of -the speed/sensitivity/accuracy tradeoff space, with the presets ending in `fast` +the speed/sensitivity/accuracy trade-off space, with the presets ending in `fast` generally being faster but less sensitive and less accurate, and the presets ending in `sensitive` generally being slower but more sensitive and more accurate. See the [documentation for the preset options] for details. @@ -695,7 +696,7 @@ Filtering Some reads are skipped or "filtered out" by Bowtie 2. For example, reads may be filtered out because they are extremely short or have a high proportion of ambiguous nucleotides. Bowtie 2 will still print a SAM record for such a read, -but no alignment will be reported and and the `YF:i` SAM optional field will be +but no alignment will be reported and the `YF:i` SAM optional field will be set to indicate the reason the read was filtered. * `YF:Z:LN`: the read was filtered because it had length less than or equal to @@ -714,7 +715,7 @@ and the last (11th) field of the read's QSEQ record contains `1`. If a read could be filtered for more than one reason, the value `YF:Z` flag will reflect only one of those reasons. -Alignment summmary +Alignment summary ------------------ When Bowtie 2 finishes running, it prints messages summarizing what happened. @@ -756,7 +757,7 @@ wrapper scripts that call binary programs as appropriate. The wrappers shield users from having to distinguish between "small" and "large" index formats, discussed briefly in the following section. Also, the `bowtie2` wrapper provides some key functionality, like the ability to handle compressed inputs, -and the fucntionality for [`--un`], [`--al`] and related options. +and the functionality for [`--un`], [`--al`] and related options. It is recommended that you always run the bowtie2 wrappers and not run the binaries directly. @@ -1616,7 +1617,7 @@ be valid in that case. If trimming options [`-3`] or [`-5`] are also used, the [`-I`] constraint is applied with respect to the untrimmed mates. The larger the difference between [`-I`] and [`-X`], the slower Bowtie 2 will -run. This is because larger differences bewteen [`-I`] and [`-X`] require that +run. This is because larger differences between [`-I`] and [`-X`] require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient. @@ -1642,7 +1643,7 @@ constraint is applied with respect to the untrimmed mates, not the trimmed mates. The larger the difference between [`-I`] and [`-X`], the slower Bowtie 2 will -run. This is because larger differences bewteen [`-I`] and [`-X`] require that +run. This is because larger differences between [`-I`] and [`-X`] require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient. @@ -1990,7 +1991,7 @@ Spec][SAM]. Specify `--rg` multiple times to set multiple fields. See the
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index (based on the Burrows-Wheeler Transform or BWT) to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 gigabytes of RAM. Bowtie 2 supports gapped, local, and paired-end alignment modes. Multiple processors can be used simultaneously to achieve greater alignment speed. Bowtie 2 outputs alignments in SAM format, enabling interoperation with a large number of other tools (e.g. SAMtools, GATK) that use SAM. Bowtie 2 is distributed under the GPLv3 license, and it runs on the command line under Windows, Mac OS X and Linux.
-Bowtie 2 is often the first step in pipelines for comparative genomics, including for variation calling, ChIP-seq, RNA-seq, BS-seq. Bowtie 2 and Bowtie (also called "Bowtie 1" here) are also tightly integrated into some tools, including TopHat: a fast splice junction mapper for RNA-seq reads, Cufflinks: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads, Crossbow: a cloud-enabled software tool for analyzing reseuqncing data, and Myrna: a cloud-enabled software tool for aligning RNA-seq reads and measuring differential gene expression.
+Bowtie 2 is often the first step in pipelines for comparative genomics, including for variation calling, ChIP-seq, RNA-seq, BS-seq. Bowtie 2 and Bowtie (also called "Bowtie 1" here) are also tightly integrated into some tools, including TopHat: a fast splice junction mapper for RNA-seq reads, Cufflinks: a tool for transcriptome assembly and isoform quantitiation from RNA-seq reads, Crossbow: a cloud-enabled software tool for analyzing resequencing data, and Myrna: a cloud-enabled software tool for aligning RNA-seq reads and measuring differential gene expression.
If you use Bowtie 2 for your published research, please cite the Bowtie paper. Thank you!
Bowtie 1 was released in 2009 and was geared toward aligning the relatively short sequencing reads (up to 25-50 nucleotides) prevalent at the time. Since then, technology has improved both sequencing throughput (more nucleotides produced per sequencer per day) and read length (more nucleotides per read).
The chief differences between Bowtie 1 and Bowtie 2 are:
For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie 1. For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive.
For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie 1. For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive. B
Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring scheme. Bowtie 1 finds just ungapped alignments.
Bowtie 2 supports local alignment, which doesn't require reads to align end-to-end. Local alignments might be "trimmed" ("soft clipped") at one or both extremes in a way that optimizes alignment score. Bowtie 2 also supports end-to-end alignment which, like Bowtie 1, requires that the read align entirely.
There is no upper limit on read length in Bowtie 2. Bowtie 1 had an upper limit of around 1000 bp.
Bowtie 2 allows alignments to overlap ambiguous characters (e.g. Ns) in the reference. Bowtie 1 does not.
Bowtie 2 does away with Bowtie 1's notion of alignment "stratum", and its distinction between "Maq-like" and "end-to-end" modes. In Bowtie 2 all alignments lie along a continuous spectrum of alignment scores where the scoring scheme, similar to Needleman-Wunsch and Smith-Waterman.
Bowtie 2's paired-end alignment is more flexible. E.g. for pairs that do not align in a paired fashion, Bowtie 2 attempts to find unpaired alignments for each mate.
Bowtie 2 reports a spectrum of mapping qualities, in contrast fo Bowtie 1 which reports either 0 or high.
Bowtie 2 reports a spectrum of mapping qualities, in contrast for Bowtie 1 which reports either 0 or high.
Bowtie 2 does not align colorspace reads.
Bowtie 2 is not a "drop-in" replacement for Bowtie 1. Bowtie 2's command-line arguments and genome index format are both different from Bowtie 1's.
Bowtie 1 and Bowtie 2 are not general-purpose alignment tools like MUMmer, BLAST or Vmatch. Bowtie 2 works best when aligning to large genomes, though it supports arbitrarily small reference sequences (e.g. amplicons). It handles very long reads (i.e. upwards of 10s or 100s of kilobases), but it is optimized for the read lengths and error modes yielded by recent sequencers, such as the Illumina HiSeq 2000, Roche 454, and Ion Torrent instruments.
If your goal is to align two very large sequences (e.g. two genomes), consider using MUMmer. If your goal is very sensitive alignment to a relatively short reference sequence (e.g. a bacterial genome), this can be done with Bowtie 2 but you may want to consider using tools like NUCmer, BLAT, or BLAST. These tools can be extremely slow when the reference genome is long, but are often adequate when the reference is short.
-Bowtie 2 does not support alignment of colorspace reads. This might be supported in future versions.
+Bowtie 2 does not support alignment of colorspace reads.
We said those Bowtie 2 versions were in "beta" to convey that it was not as polished as a tool that had been around for a while, and was still in flux. Since version 2.0.1, we declared Bowtie 2 was no longer "beta".
Building Bowtie 2 from source requires a GNU-like environment with GCC, GNU Make and other basics. It should be possible to build Bowtie 2 on most vanilla Linux installations or on a Mac installation with Xcode installed. Bowtie 2 can also be built on Windows using a 64-bit MinGW distribution and MSYS. In order to simplify the MinGW setup it might be worth investigating popular MinGW personal builds since these are coming already prepared with most of the toolchains needed.
First, download the source package from the sourceforge site. Make sure you're getting the source package; the file downloaded should end in -source.zip. Unzip the file, change to the unzipped directory, and build the Bowtie 2 tools by running GNU make (usually with the command make, but sometimes with gmake) with no arguments. If building with MinGW, run make from the MSYS environment.
+Bowtie 2 is using the multithreading software model in order to +speed up execution times on SMP architectures where this is possible. +The Threading Building Blocks library, TBB, is now the default +threading library in bowtie2. On POSIX platforms (like linux, Mac +OS, etc) if TBB is not available the pthread library will be used. +Although it is possible to use pthread library on Windows, a non-POSIX +platform, due to performance reasons bowtie 2 will try to use Windows +native multithreading if possible. We recommend that you first +install the Threading Building Blocks library, but if unable to +do so please specify make NO_TBB=1. TBB comes installed by default +on many popular linux distros. Please note, packages built without +TBB will have -legacy appended to the name.
+Bowtie 2 is using the multithreading software model in order to +speed up execution times on SMP architectures where this is possible. +The Threading Building Blocks library, TBB, is now the default +threading library in Bowtie 2. On POSIX platforms (like Linux, Mac +OS, etc.) if TBB is not available the pthread library will be used. +Although it is possible to use pthread library on Windows, a non-POSIX +platform, due to performance reasons Bowtie 2 will try to use Windows +native multithreading if possible. We recommend that you first +install the Threading Building Blocks library, but if unable to +do so please specify make NO_TBB=1. TBB comes installed by default +on many popular Linux distros. Please note, packages built without +TBB will have -legacy appended to the name.
By adding your new Bowtie 2 directory to your PATH environment variable, you ensure that whenever you run bowtie2, bowtie2-build or bowtie2-inspect from the command line, you will get the version you just installed without having to specify the entire path. This is recommended for most users. To do this, follow your operating system's instructions for adding the directory to your PATH.
If you would like to install Bowtie 2 by copying the Bowtie 2 executable files to an existing directory in your PATH, make sure that you copy all the executables, including bowtie2, bowtie2-align-s, bowtie2-align-l, bowtie2-build, bowtie2-build-s, bowtie2-build-l, bowtie2-inspect, bowtie2-inspect-s and bowtie2-inspect-l.
bowtie2 alignerWhere dash symbols represent gaps and vertical bars show where aligned characters match.
-We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long stretches of As (AAAAAAAAA etc) and the read sequence is a short stretch of As (AAAAAAA), we cannot know for certain exactly where in the sea of As the read originated.
We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It's not always possible to determine this with certainty. For instance, if the reference genome contains several long stretches of As (AAAAAAAAA etc.) and the read sequence is a short stretch of As (AAAAAAA), we cannot know for certain exactly where in the sea of As the read originated.
By default, Bowtie 2 performs end-to-end read alignment. That is, it searches for alignments involving all of the read characters. This is also called an "untrimmed" or "unclipped" alignment.
When the --local option is specified, Bowtie 2 performs local read alignment. In this mode, Bowtie 2 might "trim" or "clip" some read characters from one or both ends of the alignment if doing so maximizes the alignment score.
@@ -158,7 +158,7 @@An alignment score quantifies how similar the read sequence is to the reference sequence aligned to. The higher the score, the more similar they are. A score is calculated by subtracting penalties for each difference (mismatch, gap, etc) and, in local alignment mode, adding bonuses for each match.
+An alignment score quantifies how similar the read sequence is to the reference sequence aligned to. The higher the score, the more similar they are. A score is calculated by subtracting penalties for each difference (mismatch, gap, etc.) and, in local alignment mode, adding bonuses for each match.
The scores can be configured with the --ma (match bonus), --mp (mismatch penalty), --np (penalty for having an N in either the read or the reference), --rdg (affine read gap penalty) and --rfg (affine reference gap penalty) options.
A mismatched base at a high-quality position in the read receives a penalty of -6 by default. A length-2 read gap receives a penalty of -11 by default (-5 for the gap open, -3 for the first extension, -3 for the second extension). Thus, in end-to-end alignment mode, if the read is 50 bp long and it matches the reference exactly except for one mismatch at a high-quality position and one length-2 read gap, then the overall score is -(6 + 11) = -17.
@@ -191,7 +191,7 @@The SAM FLAGS field, the second field in a SAM record, has multiple bits that describe the paired-end nature of the read and alignment. The first (least significant) bit (1 in decimal, 0x1 in hexadecimal) is set if the read is part of a pair. The second bit (2 in decimal, 0x2 in hexadecimal) is set if the read is part of a pair that aligned in a paired-end fashion. The fourth bit (8 in decimal, 0x8 in hexadecimal) is set if the read is part of a pair and the other mate in the pair had at least one valid alignment. The sixth bit (32 in decimal, 0x20 in hexadecimal) is set if the read is part of a pair and the other mate in the pair aligned to the Crick strand (or, equivalently, if the reverse complement of the other mate aligned to the Watson strand). The seventh bit (64 in decimal, 0x40 in hexadecimal) is set if the read is mate 1 in a pair. The eighth bit (128 in decimal, 0x80 in hexadecimal) is set if the read is mate 2 in a pair. See the SAM specification for a more detailed description of the FLAGS field.
The last severeal fields of each SAM record usually contain SAM optional fields, which are simply tab-separated strings conveying additional information about the reads and alignments. A SAM optional field is formatted like this: "XP:i:1" where "XP" is the TAG, "i" is the TYPE ("integer" in this case), and "1" is the VALUE. See the SAM specification for details regarding SAM optional fields.
The last several fields of each SAM record usually contain SAM optional fields, which are simply tab-separated strings conveying additional information about the reads and alignments. A SAM optional field is formatted like this: "XP:i:1" where "XP" is the TAG, "i" is the TYPE ("integer" in this case), and "1" is the VALUE. See the SAM specification for details regarding SAM optional fields.
The fragment and read lengths might be such that alignments for the two mates from a pair overlap each other. Consider this example:
(For these examples, assume we expect mate 1 to align to the left of mate 2.)
@@ -224,19 +224,19 @@See also: -R, which sets the maximum number of times Bowtie 2 will "re-seed" when attempting to align a read with repetitive seeds. Increasing -R makes Bowtie 2 slower, but increases the likelihood that it will report the correct alignment for a read that aligns many places.
In -k mode, Bowtie 2 searches for up to N distinct, valid alignments for each read, where N equals the integer specified with the -k parameter. That is, if -k 2 is specified, Bowtie 2 will search for at most 2 distinct alignments. It reports all alignments found, in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the SAM specification for details.
Bowtie 2 does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, Bowtie 2 does not garantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than where exactly it originated.
+Bowtie 2 does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, Bowtie 2 does not guarantee that the N alignments reported are the best possible in terms of alignment score. Still, this mode can be effective and fast in situations where the user cares more about whether a read aligns (or aligns a certain number of times) than where exactly it originated.
-a mode is similar to -k mode except that there is no upper limit on the number of alignments Bowtie 2 should report. Alignments are reported in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. See the SAM specification for details.
Some tools are designed with this reporting mode in mind. Bowtie 2 is not! For very large genomes, this mode is very slow.
Bowtie 2's search for alignments for a given read is "randomized." That is, when Bowtie 2 encounters a set of equally-good choices, it uses a pseudo-random number to choose. For example, if Bowtie 2 discovers a set of 3 equally-good alignments and wants to decide which to report, it picks a pseudo-random integer 0, 1 or 2 and reports the corresponding alignment. Abitrary choices can crop up at various points during alignment.
+Bowtie 2's search for alignments for a given read is "randomized." That is, when Bowtie 2 encounters a set of equally-good choices, it uses a pseudo-random number to choose. For example, if Bowtie 2 discovers a set of 3 equally-good alignments and wants to decide which to report, it picks a pseudo-random integer 0, 1 or 2 and reports the corresponding alignment. Arbitrary choices can crop up at various points during alignment.
The pseudo-random number generator is re-initialized for every read, and the seed used to initialize it is a function of the read name, nucleotide string, quality string, and the value specified with --seed. If you run the same version of Bowtie 2 on two reads with identical names, nucleotide strings, and quality strings, and if --seed is set the same for both runs, Bowtie 2 will produce the same output; i.e., it will align the read to the same place, even if there are multiple equally good alignments. This is intuitive and desirable in most cases. Most users expect Bowtie to produce the same output when run twice on the same input.
However, when the user specifies the --non-deterministic option, Bowtie 2 will use the current time to re-initialize the pseudo-random number generator. When this is specified, Bowtie 2 might report different alignments for identical reads. This is counter-intuitive for some users, but might be more appropriate in situations where the input consists of many identical reads.
To rapidly narrow the number of possible alignments that must be considered, Bowtie 2 begins by extracting substrings ("seeds") from the read and its reverse complement and aligning them in an ungapped fashion with the help of the FM Index. This is "multiseed alignment" and it is similar to what Bowtie 1 does, except Bowtie 1 attempts to align the entire read this way.
-This initial step makes Bowtie 2 much faster than it would be without such a filter, but at the expense of missing some valid alignments. For instance, it is possible for a read to have a valid overall alignment but to have no valid seed alignments because each potential seed alignment is interruped by too many mismatches or gaps.
-The tradeoff between speed and sensitivity/accuracy can be adjusted by setting the seed length (-L), the interval between extracted seeds (-i), and the number of mismatches permitted per seed (-N). For more sensitive alignment, set these parameters to (a) make the seeds closer together, (b) make the seeds shorter, and/or (c) allow more mismatches. You can adjust these options one-by-one, though Bowtie 2 comes with some useful combinations of options pre-packaged as "preset options."
-D and -R are also options that adjust the tradeoff between speed and sensitivity/accuracy.
This initial step makes Bowtie 2 much faster than it would be without such a filter, but at the expense of missing some valid alignments. For instance, it is possible for a read to have a valid overall alignment but to have no valid seed alignments because each potential seed alignment is interrupted by too many mismatches or gaps.
+The trade-off between speed and sensitivity/accuracy can be adjusted by setting the seed length (-L), the interval between extracted seeds (-i), and the number of mismatches permitted per seed (-N). For more sensitive alignment, set these parameters to (a) make the seeds closer together, (b) make the seeds shorter, and/or (c) allow more mismatches. You can adjust these options one-by-one, though Bowtie 2 comes with some useful combinations of options prepackaged as "preset options."
-D and -R are also options that adjust the trade-off between speed and sensitivity/accuracy.
Bowtie 2 uses the FM Index to find ungapped alignments for seeds. This step accounts for the bulk of Bowtie 2's memory footprint, as the FM Index itself is typically the largest data structure used. For instance, the memory footprint of the FM Index for the human genome is about 3.2 gigabytes of RAM.
Bowtie 2 allows alignments to overlap ambiguous characters in the reference. An alignment position that contains an ambiguous character in the read, reference, or both, is penalized according to --np. --n-ceil sets an upper limit on the number of positions that may contain ambiguous reference characters in a valid alignment. The optional field XN:i reports the number of ambiguous reference characters overlapped by an alignment.
Note that the multiseed heuristic cannot find seed alignments that overlap ambiguous reference characters. For an alignment overlapping an ambiguous reference character to be found, it must have one or more seed alignments that do not overlap ambiguous reference characters.
Bowtie 2 comes with some useful combinations of parameters packaged into shorter "preset" parameters. For example, running Bowtie 2 with the --very-sensitive option is the same as running with options: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50. The preset options that come with Bowtie 2 are designed to cover a wide area of the speed/sensitivity/accuracy tradeoff space, with the presets ending in fast generally being faster but less sensitive and less accurate, and the presets ending in sensitive generally being slower but more sensitive and more accurate. See the documentation for the preset options for details.
Bowtie 2 comes with some useful combinations of parameters packaged into shorter "preset" parameters. For example, running Bowtie 2 with the --very-sensitive option is the same as running with options: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50. The preset options that come with Bowtie 2 are designed to cover a wide area of the speed/sensitivity/accuracy trade-off space, with the presets ending in fast generally being faster but less sensitive and less accurate, and the presets ending in sensitive generally being slower but more sensitive and more accurate. See the documentation for the preset options for details.
Some reads are skipped or "filtered out" by Bowtie 2. For example, reads may be filtered out because they are extremely short or have a high proportion of ambiguous nucleotides. Bowtie 2 will still print a SAM record for such a read, but no alignment will be reported and and the YF:i SAM optional field will be set to indicate the reason the read was filtered.
Some reads are skipped or "filtered out" by Bowtie 2. For example, reads may be filtered out because they are extremely short or have a high proportion of ambiguous nucleotides. Bowtie 2 will still print a SAM record for such a read, but no alignment will be reported and the YF:i SAM optional field will be set to indicate the reason the read was filtered.
YF:Z:LN: the read was filtered because it had length less than or equal to the number of seed mismatches set with the -N option.YF:Z:NS: the read was filtered because it contains a number of ambiguous characters (usually N or .) greater than the ceiling specified with --n-ceil.YF:Z:QC: the read was filtered because it was marked as failing quality control and the user specified the --qc-filter option. This only happens when the input is in Illumina's QSEQ format (i.e. when --qseq is specified) and the last (11th) field of the read's QSEQ record contains 1.If a read could be filtered for more than one reason, the value YF:Z flag will reflect only one of those reasons.
When Bowtie 2 finishes running, it prints messages summarizing what happened. These messages are printed to the "standard error" ("stderr") filehandle. For datasets consisting of unpaired reads, the summary might look like this:
20000 reads; of these:
20000 (100.00%) were unpaired; of these:
@@ -280,7 +280,7 @@ Alignment summmary
96.70% overall alignment rate
The indentation indicates how subtotals relate to totals.
The bowtie2, bowtie2-build and bowtie2-inspect executables are actually wrapper scripts that call binary programs as appropriate. The wrappers shield users from having to distinguish between "small" and "large" index formats, discussed briefly in the following section. Also, the bowtie2 wrapper provides some key functionality, like the ability to handle compressed inputs, and the fucntionality for --un, --al and related options.
The bowtie2, bowtie2-build and bowtie2-inspect executables are actually wrapper scripts that call binary programs as appropriate. The wrappers shield users from having to distinguish between "small" and "large" index formats, discussed briefly in the following section. Also, the bowtie2 wrapper provides some key functionality, like the ability to handle compressed inputs, and the functionality for --un, --al and related options.
It is recommended that you always run the bowtie2 wrappers and not run the binaries directly.
bowtie2-build can index reference genomes of any size. For genomes less than about 4 billion nucleotides in length, bowtie2-build builds a "small" index using 32-bit numbers in various parts of the index. When the genome is longer, bowtie2-build builds a "large" index using 64-bit numbers. Small indexes are stored in files with the .bt2 extension, and large indexes are stored in files with the .bt2l extension. The user need not worry about whether a particular index is small or large; the wrapper scripts will automatically build and use the appropriate index.
--tab5
Each read or pair is on a single line. An unpaired read line is [name]. A paired-end read line is [name]. An input file can be a mix of unpaired and paired-end reads and Bowtie 2 recognizes each according to the number of fields, handling each as it should.
+Each read or pair is on a single line. An unpaired read line is [name]\t[seq]\t[qual]\n. A paired-end read line is [name]\t[seq1]\t[qual1]\t[seq2]\t[qual2]\n. An input file can be a mix of unpaired and paired-end reads and Bowtie 2 recognizes each according to the number of fields, handling each as it should.
--tab6
Similar to --tab5 except, for paired-end reads, the second end can have a different name from the first: [name1]
Similar to --tab5 except, for paired-end reads, the second end can have a different name from the first: [name1]\t[seq1]\t[qual1]\t[name2]\t[seq2]\t[qual2]\n
The minimum fragment length for valid paired-end alignments. E.g. if -I 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as -X is also satisfied). A 19-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -I constraint is applied with respect to the untrimmed mates.
The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences bewteen -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.
The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences between -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.
Default: 0 (essentially imposing no minimum)
The maximum fragment length for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates.
The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences bewteen -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.
The larger the difference between -I and -X, the slower Bowtie 2 will run. This is because larger differences between -I and -X require that Bowtie 2 scan a larger window to determine if a concordant alignment exists. For typical fragment length ranges (200 to 400 nucleotides), Bowtie 2 is very efficient.
Default: 500.
--omit-sec-seq
When printing secondary alignments, Bowtie 2 by default will write out the SEQ and QUAL strings. Specifying this option causes Bowtie 2 to print an asterix in those fields instead.
When printing secondary alignments, Bowtie 2 by default will write out the SEQ and QUAL strings. Specifying this option causes Bowtie 2 to print an asterisk in those fields instead.
Each subsequent line describes an alignment or, if the read failed to align, a read. Each line is a collection of at least 12 fields separated by tabs; from left to right, the fields are:
Name of read that aligned.
-Note that the SAM specification disallows whitespace in the read name. If the read name contains any whitespace characters, Bowtie 2 will truncate the name at the first whitespace character. This is similar to the behavior of other tools.
Note that the SAM specification disallows whitespace in the read name. If the read name contains any whitespace characters, Bowtie 2 will truncate the name at the first whitespace character. This is similar to the behavior of other tools. The standard behavior of truncating at the first whitespace can be suppressed with --sam-noqname-trunc at the expense of generating non-standard SAM.
Sum of all applicable flags. Flags relevant to Bowtie are:
-
+
|
+
| - + | ||
- | - + | +
Reads (specified with |
| - + | +||
- | - + | +
Reads interleaved FASTQ files where the first two records (8 lines) represent a mate pair. - |
| - + | +||
- |
-
- Each read or pair is on a single line. An unpaired read line is [name]. A paired-end read line is [name]. An input file can be a mix of unpaired and paired-end reads and Bowtie 2 recognizes each according to the number of fields, handling each as it should. - | |
| - + | +
+ Each read or pair is on a single line. An unpaired read line is [name]\t[seq]\t[qual]\n. A paired-end read line is [name]\t[seq1]\t[qual1]\t[seq2]\t[qual2]\n. An input file can be a mix of unpaired and paired-end reads and Bowtie 2 recognizes each according to the number of fields, handling each as it should. + |
+|
- |
-
- Similar to | |
| - + | +
+ Similar to |
+|
- | - + | +
Reads (specified with |
| - + | +||
- | - + | +
Reads (specified with |
| - + | +||
- | - + | +
Reads (specified with |
| - + | +||
- | - + | +
The read sequences are given on command line. I.e. |
| - + | +||
- | - + | +
Skip (i.e. do not align) the first |
| - + | +||
- | - + | +
Align the first |
| - + | +||
- | - + | +
Trim |
| - + | +||
- | - + | +
Trim |
| - + | +||
- | - + | +
Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the "Phred+33" encoding, which is used by the very latest Illumina pipelines. - |
| - + | +||
- | - + | +
Input qualities are ASCII chars equal to the Phred quality plus 64. This is also called the "Phred+64" encoding. - |
| - + | +||
- | - + | +
Convert input qualities from Solexa (which can be negative) to Phred (which can't). This scheme was used in older Illumina GA Pipeline versions (prior to 1.3). Default: off. - |
| - + | +||
- | - + | +
Quality values are represented in the read input file as space-separated ASCII integers, e.g., |
--end-to-end mode| - + | ||
- | - + | +
Same as: |
| - + | +||
- | - + | +
Same as: |
| - + | +||
- | - + | +
Same as: |
| - + | +||
- | - + | +
Same as: |
--local mode| - + | ||
- | - + | +
Same as: |
| - + | +||
- | - + | +
Same as: |
| - + | +||
- | - + | +
Same as: |
| - + | +||
- | - + | +
Same as: |
| - + | ||
- | - + | +
Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0. - |
| - + | +||
- | - + | +
Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the |
| - + | +||
- | - + | +
Sets a function governing the interval between seed substrings to use during multiseed alignment. For instance, if the read has 30 characters, and seed length is 10, and the seed interval is 6, the seeds extracted will be:
Since it's best to use longer intervals for longer reads, this parameter sets the interval as a function of the read length, rather than a single one-size-fits-all number. For instance, specifying |
| - + | +||
- | - + | +
Sets a function governing the maximum number of ambiguous characters (usually |
| - + | +||
- | - + | +
"Pads" dynamic programming problems by |
| - + | +||
- | - + | +
Disallow gaps within |
| - + | +||
- | - + | +
When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are high. This is also the default behavior when the input doesn't specify quality values (e.g. in |
| - + | +||
- | - + | +
If |
| - + | +||
- | - + | +
By default, Bowtie 2 will attempt to find either an exact or a 1-mismatch end-to-end alignment for the read before trying the multiseed heuristic. Such alignments can be found very quickly, and many short read alignments have exact or near-exact end-to-end alignments. However, this can lead to unexpected alignments when the user also sets options governing the multiseed heuristic, like |
| - + | +||
- | - + | +
In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end. The match bonus |
| - + | +||
- | - + | +
In this mode, Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score. The match bonus |
| - + | ||
- | - + | +
Sets the match bonus. In |
| - + | +||
- | - + | +
Sets the maximum ( |
| - + | +||
- | - + | +
Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as |
| - + | +||
- | - + | +
Sets the read gap open ( |
| - + | +||
- | - + | +
Sets the reference gap open ( |
| - + | +||
- | - + | +
Sets a function governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying |
| - + | ||
- | - + | +
By default, When Note: Bowtie 2 is not designed with large values for |
| - + | +||
- |
-
- Like |
+
+ Like Note: Bowtie 2 is not designed with |
| - + | ||
- | - + | +
Up to |
| - + | +||
- | - + | +
|
| - + | ||
- | - + | +
The minimum fragment length for valid paired-end alignments. E.g. if The larger the difference between Default: 0 (essentially imposing no minimum) - |
| - + | +||
- | - + | +
The maximum fragment length for valid paired-end alignments. E.g. if The larger the difference between Default: 500. - |
| - + | +||
- | - + | +
The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if |
| - + | +||
- | - + | +
By default, when |
| - + | +||
- | - + | +
By default, |
| - + | +||
- | - + | +
If the mates "dovetail", that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant. See also: Mates can overlap, contain or dovetail each other. Default: mates cannot dovetail in a concordant alignment. - |
| - + | +||
- | - + | +
If one mate alignment contains the other, consider that to be non-concordant. See also: Mates can overlap, contain or dovetail each other. Default: a mate can contain the other in a concordant alignment. - |
| - + | +||
- | - + | +
If one mate alignment overlaps the other at all, consider that to be non-concordant. See also: Mates can overlap, contain or dovetail each other. Default: mates can overlap in a concordant alignment. - |
| - + | ||
- | - + | +
Print the wall-clock time required to load the index files and align the reads. This is printed to the "standard error" ("stderr") filehandle. Default: off. - |
| - + | +||
- | - + | +
Write unpaired reads that fail to align to file at |
| - + | +||
- | - + | +
Write unpaired reads that align at least once to file at |
| - + | +||
- | - + | +
Write paired-end reads that fail to align concordantly to file(s) at |
| - + | +||
- | - + | +
Write paired-end reads that align concordantly at least once to file(s) at |
| - + | +||
- | - + | +
Print nothing besides alignments and serious errors. - |
| - + | +||
- | - + | +
Write |
| - + | +||
- | - + | +
Write |
| - + | +||
- | - + | +
Write a new |
| - + | ||
- | - + | +
Suppress SAM records for reads that failed to align. - |
| - + | +||
- | - + | +
Suppress SAM header lines (starting with |
| - + | +||
- | - + | +
Suppress |
| - + | +||
- | - + | +
Set the read group ID to |
| - + | +||
- | - + | +
Add |
| - + | +||
- |
-
- When printing secondary alignments, Bowtie 2 by default will write out the |
+ When printing secondary alignments, Bowtie 2 by default will write out the |
+
-
- | - + | +
Override the offrate of the index with |
| - + | +||
- | - + | +
Launch |
| - + | +||
- | - + | +
Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when |
| - + | +||
- | - + | +
Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent |
| - + | ||
- | - + | +
Filter out reads for which the QSEQ filter field is non-zero. Only has an effect when read format is |
| - + | +||
- | - + | +
Use |
| - + | +||
- | - + | +
Normally, Bowtie 2 re-initializes its pseudo-random generator for each read. It seeds the generator with a number derived from (a) the read name, (b) the nucleotide sequence, (c) the quality sequence, (d) the value of the |
| - + | +||
- | - + | +
Print version information and quit. - |
| - + | +||
- | - + | +
Print usage information and quit. - |
Following is a brief description of the SAM format as output by bowtie2. For more details, see the SAM format specification.
By default, bowtie2 prints a SAM header with @HD, @SQ and @PG lines. When one or more --rg arguments are specified, bowtie2 will also print an @RG line that includes all user-specified --rg tokens separated by tabs.
Each subsequent line describes an alignment or, if the read failed to align, a read. Each line is a collection of at least 12 fields separated by tabs; from left to right, the fields are:
Name of read that aligned.
-Note that the SAM specification disallows whitespace in the read name. If the read name contains any whitespace characters, Bowtie 2 will truncate the name at the first whitespace character. This is similar to the behavior of other tools.
Note that the SAM specification disallows whitespace in the read name. If the read name contains any whitespace characters, Bowtie 2 will truncate the name at the first whitespace character. This is similar to the behavior of other tools. The standard behavior of truncating at the first whitespace can be suppressed with --sam-noqname-trunc at the expense of generating non-standard SAM.
Sum of all applicable flags. Flags relevant to Bowtie are:
-
-
+
|
+
Thus, an unpaired read that aligns to the reverse reference strand will have flag 16. A paired-end read that aligns and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1).
Name of reference sequence where alignment occurs
1-based offset into the forward reference strand where leftmost character of the alignment occurs
Optional fields. Fields are tab-separated. bowtie2 outputs zero or more of these optional fields for each alignment, depending on the type of the alignment:
| + | |
|
-
-Alignment score. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if SAM record is for an aligned read.
- |
+
+ |
+|
|
-
-Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.
- |
+
+ |
+|
|
- -Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record is for a read that aligned as part of a paired-end alignment. - |
+
+ |
+|
|
- -The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read. - |
+
+ |
+|
|
- -The number of mismatches in the alignment. Only present if SAM record is for an aligned read. - |
+
+ |
+|
|
- -The number of gap opens, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. - |
+
+ |
+|
|
- -The number of gap extensions, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. - |
+
+ |
+|
|
- -The edit distance; that is, the minimal number of one-nucleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned read. - |
+
+ |
+|
- | - -String indicating reason why the read was filtered out. See also: Filtering. Only appears for reads that were filtered out. - |
| + | +
+
+ |
+
- |
-
-Value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly. Filtering: #filtering
- |
| + | +
+
+ |
+
- | - -A string representation of the mismatched reference bases in the alignment. See SAM format specification for details. Only present if SAM record is for an aligned read. - |
+
+ |
+
bowtie2-build indexerbowtie2-build builds a Bowtie index from a set of DNA sequences. bowtie2-build outputs a set of 6 files with suffixes .1.bt2, .2.bt2, .3.bt2, .4.bt2, .rev.1.bt2, and .rev.2.bt2. In the case of a large index these suffixes will have a bt2l termination. These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence FASTA files are no longer used by Bowtie 2 once the index is built.
Bowtie 2's .bt2 index format is different from Bowtie 1's .ebwt format, and they are not compatible with each other.
Usage:
bowtie2-build [options]* <reference_in> <bt2_base>
-
+
|
+
-
+
|
+
bowtie2-inspect index inspectorbowtie2-inspect extracts information from a Bowtie index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the -n/--names option or a more verbose summary using the -s/--summary option.
Usage:
bowtie2-inspect [options]* <bt2_base>
-
+
|
+
-
+
|
+
Bowtie 2 comes with some example files to get you started. The example files are not scientifically significant; we use the Lambda phage reference genome simply because it's short, and the reads were generated by a computer program, not a sequencer. However, these files will let you start running Bowtie 2 and downstream tools right away.
First follow the manual instructions to obtain Bowtie 2. Set the BT2_HOME environment variable to point to the new Bowtie 2 directory containing the bowtie2, bowtie2-build and bowtie2-inspect binaries. This is important, as the BT2_HOME variable is used in the commands below to refer to that directory.
SAMtools is a collection of tools for manipulating and analyzing SAM and BAM alignment files. BCFtools is a collection of tools for calling variants and manipulating VCF and BCF files, and it is typically distributed with SAMtools. Using these tools together allows you to get from alignments in SAM format to variant calls in VCF format. This example assumes that samtools and bcftools are installed and that the directories containing these binaries are in your PATH environment variable.
Run the paired-end example:
$BT2_HOME/bowtie2 -x $BT2_HOME/example/index/lambda_virus -1 $BT2_HOME/example/reads/reads_1.fq -2 $BT2_HOME/example/reads/reads_2.fq -S eg2.sam
-Use samtools view to convert the SAM file into a BAM file. BAM is a the binary format corresponding to the SAM text format. Run:
Use samtools view to convert the SAM file into a BAM file. BAM is the binary format corresponding to the SAM text format. Run:
samtools view -bS eg2.sam > eg2.bam
Use samtools sort to convert the BAM file to a sorted BAM file.
samtools sort eg2.bam eg2.sorted
+samtools sort eg2.bam -o eg2.sorted.bam
We now have a sorted BAM file called eg2.sorted.bam. Sorted BAM is a useful format because the alignments are (a) compressed, which is convenient for long-term storage, and (b) sorted, which is conveneint for variant discovery. To generate variant calls in VCF format, run:
samtools mpileup -uf $BT2_HOME/example/reference/lambda_virus.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf
+samtools mpileup -uf $BT2_HOME/example/reference/lambda_virus.fa eg2.sorted.bam | bcftools view -Ov - > eg2.raw.bcf
Then to view the variants, run:
bcftools view eg2.raw.bcf
See the official SAMtools guide to Calling SNPs/INDELs with SAMtools/BCFtools for more details and variations on this process.