Welcome to the Brownie wiki page!
This package requires a number of packages to be install on your system. Required: CMake; Google's Sparsehash; gcc (GCC 4.7 or a more recent version) Optional: ZLIB; Googletest Unit Testing
How to install these packages:
As a root, execute the following commands:
on Redhat / Fedora distributions
* yum install cmake
* yum install sparsehash-devel
* yum install zlib-devel (optional)
on Ubuntu / Debian distributions
* aptitude install cmake
* aptitude install libsparsehash-dev
* aptitude install libghc-zlib-dev (optional)
The installation of Brownie is now simple. First, unzip the brownie-xxx.tar.gz file (where xxx denotes the version number):
tar -xzvf brownie-xxx.tar.gz cd brownie-xxx
From this directory, run the following commands:
By executing ./brownie you will see
brownie: missing input read file
Try 'brownie --help' for more information
A useful option to specify where cmake should install the software is CMAKE_INSTALL_PREFIX. For example, to install in your local $(HOME)/i-adhore directory you would run: cmake .. -DCMAKE_INSTALL_PREFIX=$(HOME)/brownie
For impatient users:
reads.fastq is an input file which will be cleaned by Brownie. Initial reads will be stored in reads.corr.fastq after correction in the same order.
Looking deeply into Brownie help info
./brownie --help , it prints the help info in your screen like this:
Usage: brownie [options] [file_options] file1 [[file_options] file2]...
Corrects sequence reads in file(s)
-h --help display help page
-i --info display information page
-s --singlestranded enable single stranded DNA [default = false], we assume sequence data are double stranded by default, you should specify explicitly by this command if you have a single strand input data.
-k --kmersize kmer size [default = 31], optional parameter in range of 9 to 31, and only odd numbers are allowed. To have a bigger kmer size change this line of code
CMakeLists.txt file to for instance
add_definitions("-DMAXKMERLENGTH=63") if your kmer size is less than 64.
-t --threads number of threads [default = available cores]
-g --genomesize size of the genome [default = auto]
-p --pathtotmp path to directory to store temporary files [default = current directory], the given directory should be exist.
-o --output output file name [default = inputfile.corr]
--perfectgraph skip read and graph correction, with this option Brownie builds only De bruijn graph and don't do any modification on this graph. (please look at Brownie's output files explanations)
--graph skip read correction, with this option Brownie builds De bruijn graph, and modify it to remove erroneous nodes.
./brownie -k 29 -t 4 -o outputA.fasta inputA.fasta -o outputB.fasta inputB.fastq
Reference based error correction by Brownie
Brownie is written in a way that Graph construction and Error correction are done in two different stages. Therefore you can make a perfect graph with a reference genome sequence which is the closest one to your read file. In this way Brownie process reference genome like a fasta read file. Since Brownie ignores automatically reads occur only once in the input file, you should duplicate your genome file first. After making perfect graph, now you can correct your read file.
Tn summary, execute the following commands for reference based error correction.
cat genome.fasta genome.fasta> genome2.fasta
./brownie -t 12 -p sillyDir -k 25 --perfectgraph genome2.fasta
./brownie -t 12 -p sillyDir -k 25 -o BrownieCorrected.fastq initialReads.fastq
How to produce simulated reads file to play with Brownie
In order to have a Illumina simulate data you can refer to this paper: ART: a next-generation sequencing read simulator
After downloading the application you can run the following command to have reads with length 250, and coverage 100.
./art_illumina -i genome.fasta -p -l 250 -f 100 -m 300 -s 30 -o reads250
genome.fasta is your input file which you want to produce reads based on it