Home

jfostier edited this page Feb 16, 2016 · 14 revisions
Clone this wiki locally

Getting Started

Welcome to the Brownie wiki page!


prerequisites

This package requires a number of packages to be install on your system. Required: CMake; Google's Sparsehash; gcc (GCC 4.7 or a more recent version) Optional: ZLIB; Googletest Unit Testing

How to install these packages:

As a root, execute the following commands:

on Redhat / Fedora distributions

* yum install cmake

* yum install sparsehash-devel

* yum install zlib-devel (optional)

on Ubuntu / Debian distributions

* aptitude install cmake

* aptitude install libsparsehash-dev

* aptitude install libghc-zlib-dev (optional)


Installation

The installation of Brownie is now simple. First, unzip the brownie-xxx.tar.gz file (where xxx denotes the version number):

tar -xzvf brownie-xxx.tar.gz cd brownie-xxx

From this directory, run the following commands:

  1. mkdir build

  2. cd build

  3. cmake ..

  4. make install

By executing ./brownie you will see

brownie: missing input read file

Try 'brownie --help' for more information

A useful option to specify where cmake should install the software is CMAKE_INSTALL_PREFIX. For example, to install in your local $(HOME)/i-adhore directory you would run: cmake .. -DCMAKE_INSTALL_PREFIX=$(HOME)/brownie


Running Instructions

For impatient users:

./brownie reads.fastq

reads.fastq is an input file which will be cleaned by Brownie. Initial reads will be stored in reads.corr.fastq after correction in the same order.

Looking deeply into Brownie help info

By executing ./brownie --help , it prints the help info in your screen like this:

Usage: brownie [options] [file_options] file1 [[file_options] file2]... Corrects sequence reads in file(s)

[options]

-h --help display help page

-i --info display information page

-s --singlestranded enable single stranded DNA [default = false], we assume sequence data are double stranded by default, you should specify explicitly by this command if you have a single strand input data.

[options arg]

-k --kmersize kmer size [default = 31], optional parameter in range of 9 to 31, and only odd numbers are allowed. To have a bigger kmer size change this line of code add_definitions("-DMAXKMERLENGTH=31") in CMakeLists.txt file to for instance add_definitions("-DMAXKMERLENGTH=63") if your kmer size is less than 64.

-t --threads number of threads [default = available cores]

-g --genomesize size of the genome [default = auto]

-p --pathtotmp path to directory to store temporary files [default = current directory], the given directory should be exist.

[file_options]

-o --output output file name [default = inputfile.corr]

--perfectgraph skip read and graph correction, with this option Brownie builds only De bruijn graph and don't do any modification on this graph. (please look at Brownie's output files explanations)

--graph skip read correction, with this option Brownie builds De bruijn graph, and modify it to remove erroneous nodes.

examples:

./brownie inputA.fastq

./brownie -k 29 -t 4 -o outputA.fasta inputA.fasta -o outputB.fasta inputB.fastq


Reference based error correction by Brownie

Brownie is written in a way that Graph construction and Error correction are done in two different stages. Therefore you can make a perfect graph with a reference genome sequence which is the closest one to your read file. In this way Brownie process reference genome like a fasta read file. Since Brownie ignores automatically reads occur only once in the input file, you should duplicate your genome file first. After making perfect graph, now you can correct your read file.

Tn summary, execute the following commands for reference based error correction.

cat genome.fasta genome.fasta> genome2.fasta

./brownie -t 12 -p sillyDir -k 25 --perfectgraph genome2.fasta

./brownie -t 12 -p sillyDir -k 25 -o BrownieCorrected.fastq initialReads.fastq


How to produce simulated reads file to play with Brownie

In order to have a Illumina simulate data you can refer to this paper: ART: a next-generation sequencing read simulator

After downloading the application you can run the following command to have reads with length 250, and coverage 100.

./art_illumina -i genome.fasta -p -l 250 -f 100 -m 300 -s 30 -o reads250

genome.fasta is your input file which you want to produce reads based on it