-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Francesco Vezzi
authored and
Francesco Vezzi
committed
Feb 27, 2015
1 parent
eaa7daa
commit b8f1b94
Showing
220 changed files
with
81,680 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Francesco Vezzi francesco.vezzi@scilifelab.se |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
|
||
|
||
cmake_minimum_required (VERSION 2.6) | ||
project(FRC_aling) | ||
|
||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/Modules/") | ||
|
||
add_definitions( -Wno-deprecated ) | ||
|
||
find_package(Boost COMPONENTS program_options system filesystem REQUIRED) | ||
|
||
# set our library and executable destination dirs | ||
set( EXECUTABLE_OUTPUT_PATH "${CMAKE_SOURCE_DIR}/bin" ) | ||
|
||
# define compiler flags for all code | ||
set( CMAKE_BUILD_TYPE Release ) | ||
|
||
include_directories("${PROJECT_SOURCE_DIR}/src") | ||
include_directories("${PROJECT_SOURCE_DIR}/lib/") | ||
include_directories("${PROJECT_SOURCE_DIR}/lib/bamtools/src") | ||
include_directories( ${Boost_INCLUDE_DIRS} ) | ||
|
||
|
||
# sorgenti da compilare | ||
file(GLOB FRC_FILES | ||
${PROJECT_SOURCE_DIR}/src/FRC_align.cpp | ||
${PROJECT_SOURCE_DIR}/src/data_structures/Contig.cpp | ||
${PROJECT_SOURCE_DIR}/src/data_structures/Features.cpp | ||
${PROJECT_SOURCE_DIR}/src/data_structures/FRC.cpp | ||
) | ||
|
||
# sorgenti da compilare | ||
file(GLOB FIND_TRANS_FILES | ||
${PROJECT_SOURCE_DIR}/src/FindTranslocations.cpp | ||
${PROJECT_SOURCE_DIR}/src/data_structures/Translocation.cpp | ||
${PROJECT_SOURCE_DIR}/src/common.h | ||
) | ||
|
||
|
||
add_subdirectory(lib) | ||
|
||
# FRC executable | ||
add_executable(FRC ${FRC_FILES}) | ||
target_link_libraries(FRC ${ZLIB_LIBRARIES}) | ||
target_link_libraries(FRC BamTools) | ||
target_link_libraries(FRC ${Boost_LIBRARIES}) | ||
|
||
|
||
|
||
# Find_translocations executable | ||
add_executable(FindTranslocations ${FIND_TRANS_FILES}) | ||
target_link_libraries(FindTranslocations ${ZLIB_LIBRARIES}) | ||
target_link_libraries(FindTranslocations BamTools) | ||
target_link_libraries(FindTranslocations ${Boost_LIBRARIES}) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
All the tools distributed with this package are distributed under GNU General Public License version 3.0 (GPLv3). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Installation Instructions | ||
************************* | ||
|
||
From the FRCurve directory run: | ||
|
||
mkdir build | ||
cd build | ||
cmake .. | ||
make | ||
|
||
You will find the binaries in the main directory under bin. In case of problems the majority of the times there is a problem | ||
with the local installation of boost. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
|
||
2014-05-26: New detailed README.md and explicit reference to FindTransolocations |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
INSTALLATION | ||
============== | ||
|
||
From the FRCurve directory run: | ||
- mkdir build | ||
- cd build | ||
- cmake .. | ||
- make | ||
|
||
You will find the binaries in the main directory under bin. In case of problems the majority of the times there is a problem | ||
with the local installation of boost. | ||
|
||
|
||
DESCRIPTION | ||
============== | ||
FRCcurve is a package containing tools to process bam files in order to evaluate and analyze de novo assembly/assemblers and identify Structural Variations | ||
suspicious genomics regions. The tools have been already successfully applied in several de novo and resequencing projects. | ||
|
||
This package contains two tools: | ||
|
||
1. FRCbam: tool to compute Feature Response Curves in order to validate and rank assemblies and assemblers | ||
2. FindTranslocations: tool to identify chromosomal rearrangements using Mate Pairs | ||
|
||
|
||
FRCbam | ||
-------------- | ||
**USAGE: basic, no CE-stats tuning** | ||
|
||
1. Assemble your data (n PE libraries and m MP libraries) with your favorite tools. Let us call the assemblies A_tool1, A_tool2, etc. | ||
2. Align one PE library and one MP library against each of your assemblies (e.g., A_tool1) | ||
1. Use the same parameters | ||
2. PE library is mandatory, MP library is highly recommended | ||
3. sort and index the generated bam files by coordinate. We will call them A_tool1_PE_lib.bam and A_tool1_MP_lib.bam | ||
4. use PE library with largest read coverage (i.e., vertical coverage) and MP with largest spanning coverage (i.e., horizontal coverage) | ||
3.Run FRCurve for each assembly: | ||
```FRC --pe-sam A_tool1_PE_lib.bam --pe-min-insert MIN_PE_INS --pe-max-insert MAX_PE_INS -mp-sam A_tool1_MP_lib.bam --mp-min-insert MIN_MP_INS --mp-max-insert MAX_MP_INS | ||
--genome-size ESTIMATED_GENOME_SIZE --output OUTPUT_HEADER``` | ||
|
||
where: | ||
|
||
* ```--pe-sam``` A_tool1_PE_lib.bam```: sorted bam file obtained aligning PE library against assembly obtained with tool A; | ||
* ```--pe-min-insert MIN_PE_INS```: estimated min insert length | ||
* ```--pe-max-insert MAX_PE_INS``` : estimated max insert length | ||
* ```--mp-sam A_tool1_MP_lib.bam```: sorted bam file obtained aligning MP library against assembly obtained with tool A; | ||
* ```--mp-min-insert MIN_MP_INS``` : estimated min insert length | ||
* ```--mp-max-insert MAX_MP_INS``` : estimated max insert length | ||
* ```--genome-size ESTIMATED_GENOME_SIZE```: estimated genome size; | ||
* ```--output OUTPUT_HEADER```: output header; | ||
|
||
**IMPORTANT**: | ||
If ```--genome-size``` is not specified the assembly length is used to compute FRCurve. In order to be able to compare FRCurves | ||
obtained with different tools (and hence producing slightly different assembly sizes) the same ```ESTIMATED_GENOME_SIZE``` | ||
must be specified. | ||
|
||
OUTPUT: | ||
|
||
* ```OUTPUT_HEADER_Features.txt```: human readable description of features: contig start end feature_type | ||
* ```OUTPUT_HEADER_FRC.txt```: FRCurve computed with all the features (to be plotted) | ||
* ```OUTPUT_HEADER_FEATURE.txt```: FRCurve for the corresponding feature | ||
* ```OUTPUT_HEADER_featureType.txt```: for each featureType the specific FRCurve | ||
* ```Features.gff```: features description in GFF format (for visualization) | ||
* ```OUTPUT_HEADER_CEstats_PE.txt```: CEvalues distribution (for CE_stats tuning) | ||
* ```OUTPUT_HEADER_CEstats_MP.txt```: CEvalues distribution (for CE_stats tuning) | ||
|
||
**USAGE: advanced, CE-stats tuning** | ||
|
||
CE-stats are able to identify the presence of insertion and deletion events. Different insert sizes give the possibility to | ||
identify different events. In order to avoid too many False Positives (or too many False Negatives) a tuning phase is | ||
highly recommended. | ||
|
||
Once step 3 of USAGE is done, the user can already plot the FRCurves (for all the features or for only some of them). | ||
CE_stats based features have been computed with default (i.e., not optimal) parameters. Each run of FRCbam produces two | ||
files: ```OUTPUT_HEADER_CEstats_PE.txt``` and ```OUTPUT_HEADER_CEstats_MP.txt```. These files contain the distribution of the ```CE_values``` | ||
on each assembly. These values must be plotted as suggested in page 3 of Supplementary Material | ||
(see http://www.nada.kth.se/~vezzi/publications/supplementary.pdf) to estimate the optimal ```CE_min``` and ```CE_max``` values for | ||
the PE and MP library respectively. | ||
Once the optimal parameters are estimated FRCurves must be recomputed for all assemblies (only ```COMPR``` and ```STRECH``` features | ||
will change) specifying the following extra parameters: | ||
|
||
* ```--CEstats-PE-min CE_PE_MIN```: all position with CE values computed with PE library lower than this are considered compressions | ||
* ```--CEstats-PE-max CE_PE_MAX```: all position with CE values computed with PE library higher than this are considered expansions | ||
* ```--CEstats-MP-min CE_MP_MIN```: all position with CE values computed with MP library lower than this are considered compressions | ||
* ```--CEstats-MP-max CE_MP_MAX```: all position with CE values computed with MP library higher than this are considered expansions | ||
|
||
FindTranslocations | ||
-------------- | ||
The tool is under constant development. THere will be soon a detailed user guide, for now run the tool with ```--help``` to discover the options. | ||
|
||
|
||
|
||
|
||
LICENCE | ||
============== | ||
All the tools distributed with this package are distributed under GNU General Public License version 3.0 (GPLv3). | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
project( BamTools ) | ||
|
||
# BamTools version information | ||
set (BamTools_VERSION_MAJOR 2) | ||
set (BamTools_VERSION_MINOR 0) | ||
set (BamTools_VERSION_BUILD 5) | ||
|
||
# add our includes root path | ||
include_directories( bamtools/src ) | ||
|
||
# list subdirectories to build in | ||
add_subdirectory( bamtools/src/api ) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# ========================== | ||
# BamTools CMakeLists.txt | ||
# (c) 2010 Derek Barnett | ||
# | ||
# top-level | ||
# ========================== | ||
|
||
# set project name | ||
project( BamTools ) | ||
|
||
# Cmake requirements | ||
cmake_minimum_required( VERSION 2.6.4 ) | ||
|
||
# Force the build directory to be different from source directory | ||
macro( ENSURE_OUT_OF_SOURCE_BUILD MSG ) | ||
string( COMPARE EQUAL "${CMAKE_SOURCE_DIR}" "${CMAKE_BINARY_DIR}" insource ) | ||
get_filename_component( PARENTDIR ${CMAKE_SOURCE_DIR} PATH ) | ||
string( COMPARE EQUAL "${CMAKE_SOURCE_DIR}" "${PARENTDIR}" insourcesubdir ) | ||
IF( insource OR insourcesubdir ) | ||
message( FATAL_ERROR "${MSG}" ) | ||
ENDIF( insource OR insourcesubdir ) | ||
endmacro( ENSURE_OUT_OF_SOURCE_BUILD ) | ||
|
||
ensure_out_of_source_build( " | ||
${PROJECT_NAME} requires an out of source build. | ||
$ mkdir build | ||
$ cd build | ||
$ cmake .. | ||
$ make | ||
(or the Windows equivalent)\n" ) | ||
|
||
# set BamTools version information | ||
set( BamTools_VERSION_MAJOR 2 ) | ||
set( BamTools_VERSION_MINOR 3 ) | ||
set( BamTools_VERSION_BUILD 0 ) | ||
|
||
# set our library and executable destination dirs | ||
set( EXECUTABLE_OUTPUT_PATH "${CMAKE_SOURCE_DIR}/bin" ) | ||
set( LIBRARY_OUTPUT_PATH "${CMAKE_SOURCE_DIR}/lib" ) | ||
|
||
# define compiler flags for all code | ||
set( CMAKE_BUILD_TYPE Release ) | ||
add_definitions( -Wall -D_FILE_OFFSET_BITS=64 ) | ||
|
||
# ----------------------------------------------- | ||
# handle platform-/environment-specific defines | ||
|
||
# If planning to run in Node.js environment, run: | ||
# cmake -DEnableNodeJS=true | ||
if( EnableNodeJS ) | ||
add_definitions( -DSYSTEM_NODEJS=1 ) | ||
endif() | ||
|
||
# If running on SunOS | ||
if( "${CMAKE_SYSTEM_NAME}" MATCHES "SunOS" ) | ||
add_definitions( -DSUN_OS ) | ||
endif() | ||
|
||
# ------------------------------------------- | ||
|
||
# add our includes root path | ||
include_directories( src ) | ||
|
||
# list subdirectories to build in | ||
add_subdirectory( src ) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
The MIT License | ||
|
||
Copyright (c) 2009-2010 Derek Barnett, Erik Garrison, Gabor Marth, Michael Stromberg | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
-------------------------------------------------------------------------------- | ||
README : BAMTOOLS | ||
-------------------------------------------------------------------------------- | ||
|
||
BamTools provides both a programmer's API and an end-user's toolkit for handling | ||
BAM files. | ||
|
||
I. Learn More | ||
|
||
II. License | ||
|
||
III. Acknowledgements | ||
|
||
IV. Contact | ||
|
||
-------------------------------------------------------------------------------- | ||
I. Learn More: | ||
-------------------------------------------------------------------------------- | ||
|
||
Installation steps, tutorial, API documentation, etc. are all now available | ||
through the BamTools project wiki: | ||
|
||
https://github.com/pezmaster31/bamtools/wiki | ||
|
||
Join the mailing list(s) to stay informed of updates or get involved with | ||
contributing: | ||
|
||
https://github.com/pezmaster31/bamtools/wiki/Mailing-lists | ||
|
||
-------------------------------------------------------------------------------- | ||
II. License : | ||
-------------------------------------------------------------------------------- | ||
|
||
Both the BamTools API and toolkit are released under the MIT License. | ||
Copyright (c) 2009-2010 Derek Barnett, Erik Garrison, Gabor Marth, | ||
Michael Stromberg | ||
|
||
See included file LICENSE for details. | ||
|
||
-------------------------------------------------------------------------------- | ||
III. Acknowledgements : | ||
-------------------------------------------------------------------------------- | ||
|
||
* Aaron Quinlan for several key feature ideas and bug fix contributions | ||
* Baptiste Lepilleur for the public-domain JSON parser (JsonCPP) | ||
* Heng Li, author of SAMtools - the original C-language BAM API/toolkit. | ||
|
||
-------------------------------------------------------------------------------- | ||
IV. Contact : | ||
-------------------------------------------------------------------------------- | ||
|
||
Feel free to contact me with any questions, comments, suggestions, bug reports, | ||
etc. | ||
|
||
Derek Barnett | ||
Marth Lab | ||
Biology Dept., Boston College | ||
|
||
Email: derekwbarnett@gmail.com | ||
Project Website: http://github.com/pezmaster31/bamtools |
Oops, something went wrong.