Software for detecting genomic structural variants from DNA sequencing data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
Cargo.toml
LICENSE.md
README.md

README.md

Breakfast

Breakfast is a software for detecting genomic structural variants from DNA sequencing data. Its features include:

  • Identifies structural variants based on breakpoint-overlapping reads
  • Extremely fast, analyzes ~40 million reads per minute per CPU core
  • Reports the full sequence of all breakpoint supporting reads, and shows mismatched bases
  • Identifies PCR/optical duplicates and does not count them as independent sources of evidence
  • Can be run on sorted or unsorted BAM files, or can read BAM input from a pipe
  • Uses pre-existing Bowtie indexes to speed up alignment (does not require its own index)
  • Provides tools for filtering out rearrangements that are present in control samples

Installation

The easiest way to install Breakfast is to download one of the pre-built binary packages:

  • Breakfast 0.1 (x86-64 Linux)

If a suitable binary package is not available, you can also build Breakfast directly from source code. Note that installing this way requires a Rust compiler and the Cargo build system to be available:

git clone https://github.com/annalam/breakfast.git
cd breakfast
cargo install

Running Breakfast

To run BreakFast, you need a BAM file containing sequenced reads (in this example, tumor.bam). You also need a Bowtie index and the Bowtie1 executable in your PATH. A Breakfast analysis begins with the "breakfast detect" command, which searches the BAM file for unaligned reads that support a genomic breakpoint:

breakfast detect tumor.bam bowtie_indexes/hg38 > tumor.sv

Detailed overview of the Breakfast algorithm

Unaligned reads are split into two anchors of customizable size: one anchor from the 5' end of the read, and one anchor from the 3' end of the read. These anchors are then aligned against the reference genome using a Bowtie index. If both anchors align to the reference genome (but the read as a whole did not), the read is considered to support the existence of a genomic rearrangement. Aligned reads in the input BAM file are omitted from analysis.

Duplicate DNA fragments are identified based on "fragment signatures". For each unaligned read, a fragment signature is generated by taking the first 8 bases of the read, and the first 8 bases of its paired mate. This sequence identifies the boundaries of the DNA fragment. When reporting evidence for an identified genomic breakpoint, Breakfast only reports one read from each cluster of reads that shares the same fragment signature. In this situation, Breakfast preferentially picks the read that has the highest degree of overlap with the genomic breakpoint (i.e. longest flanks).