Skip to content

Commit

Permalink
new FindTranslocation
Browse files Browse the repository at this point in the history
  • Loading branch information
Francesco Vezzi authored and Francesco Vezzi committed Feb 27, 2015
1 parent eaa7daa commit b8f1b94
Show file tree
Hide file tree
Showing 220 changed files with 81,680 additions and 0 deletions.
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Francesco Vezzi francesco.vezzi@scilifelab.se
56 changes: 56 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@


cmake_minimum_required (VERSION 2.6)
project(FRC_aling)

set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/Modules/")

add_definitions( -Wno-deprecated )

find_package(Boost COMPONENTS program_options system filesystem REQUIRED)

# set our library and executable destination dirs
set( EXECUTABLE_OUTPUT_PATH "${CMAKE_SOURCE_DIR}/bin" )

# define compiler flags for all code
set( CMAKE_BUILD_TYPE Release )

include_directories("${PROJECT_SOURCE_DIR}/src")
include_directories("${PROJECT_SOURCE_DIR}/lib/")
include_directories("${PROJECT_SOURCE_DIR}/lib/bamtools/src")
include_directories( ${Boost_INCLUDE_DIRS} )


# sorgenti da compilare
file(GLOB FRC_FILES
${PROJECT_SOURCE_DIR}/src/FRC_align.cpp
${PROJECT_SOURCE_DIR}/src/data_structures/Contig.cpp
${PROJECT_SOURCE_DIR}/src/data_structures/Features.cpp
${PROJECT_SOURCE_DIR}/src/data_structures/FRC.cpp
)

# sorgenti da compilare
file(GLOB FIND_TRANS_FILES
${PROJECT_SOURCE_DIR}/src/FindTranslocations.cpp
${PROJECT_SOURCE_DIR}/src/data_structures/Translocation.cpp
${PROJECT_SOURCE_DIR}/src/common.h
)


add_subdirectory(lib)

# FRC executable
add_executable(FRC ${FRC_FILES})
target_link_libraries(FRC ${ZLIB_LIBRARIES})
target_link_libraries(FRC BamTools)
target_link_libraries(FRC ${Boost_LIBRARIES})



# Find_translocations executable
add_executable(FindTranslocations ${FIND_TRANS_FILES})
target_link_libraries(FindTranslocations ${ZLIB_LIBRARIES})
target_link_libraries(FindTranslocations BamTools)
target_link_libraries(FindTranslocations ${Boost_LIBRARIES})


1 change: 1 addition & 0 deletions COPYING
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
All the tools distributed with this package are distributed under GNU General Public License version 3.0 (GPLv3).
12 changes: 12 additions & 0 deletions INSTALL
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Installation Instructions
*************************

From the FRCurve directory run:

mkdir build
cd build
cmake ..
make

You will find the binaries in the main directory under bin. In case of problems the majority of the times there is a problem
with the local installation of boost.
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

2014-05-26: New detailed README.md and explicit reference to FindTransolocations
97 changes: 97 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
INSTALLATION
==============

From the FRCurve directory run:
- mkdir build
- cd build
- cmake ..
- make

You will find the binaries in the main directory under bin. In case of problems the majority of the times there is a problem
with the local installation of boost.


DESCRIPTION
==============
FRCcurve is a package containing tools to process bam files in order to evaluate and analyze de novo assembly/assemblers and identify Structural Variations
suspicious genomics regions. The tools have been already successfully applied in several de novo and resequencing projects.

This package contains two tools:

1. FRCbam: tool to compute Feature Response Curves in order to validate and rank assemblies and assemblers
2. FindTranslocations: tool to identify chromosomal rearrangements using Mate Pairs


FRCbam
--------------
**USAGE: basic, no CE-stats tuning**

1. Assemble your data (n PE libraries and m MP libraries) with your favorite tools. Let us call the assemblies A_tool1, A_tool2, etc.
2. Align one PE library and one MP library against each of your assemblies (e.g., A_tool1)
1. Use the same parameters
2. PE library is mandatory, MP library is highly recommended
3. sort and index the generated bam files by coordinate. We will call them A_tool1_PE_lib.bam and A_tool1_MP_lib.bam
4. use PE library with largest read coverage (i.e., vertical coverage) and MP with largest spanning coverage (i.e., horizontal coverage)
3.Run FRCurve for each assembly:
```FRC --pe-sam A_tool1_PE_lib.bam --pe-min-insert MIN_PE_INS --pe-max-insert MAX_PE_INS -mp-sam A_tool1_MP_lib.bam --mp-min-insert MIN_MP_INS --mp-max-insert MAX_MP_INS
--genome-size ESTIMATED_GENOME_SIZE --output OUTPUT_HEADER```

where:

* ```--pe-sam``` A_tool1_PE_lib.bam```: sorted bam file obtained aligning PE library against assembly obtained with tool A;
* ```--pe-min-insert MIN_PE_INS```: estimated min insert length
* ```--pe-max-insert MAX_PE_INS``` : estimated max insert length
* ```--mp-sam A_tool1_MP_lib.bam```: sorted bam file obtained aligning MP library against assembly obtained with tool A;
* ```--mp-min-insert MIN_MP_INS``` : estimated min insert length
* ```--mp-max-insert MAX_MP_INS``` : estimated max insert length
* ```--genome-size ESTIMATED_GENOME_SIZE```: estimated genome size;
* ```--output OUTPUT_HEADER```: output header;

**IMPORTANT**:
If ```--genome-size``` is not specified the assembly length is used to compute FRCurve. In order to be able to compare FRCurves
obtained with different tools (and hence producing slightly different assembly sizes) the same ```ESTIMATED_GENOME_SIZE```
must be specified.

OUTPUT:

* ```OUTPUT_HEADER_Features.txt```: human readable description of features: contig start end feature_type
* ```OUTPUT_HEADER_FRC.txt```: FRCurve computed with all the features (to be plotted)
* ```OUTPUT_HEADER_FEATURE.txt```: FRCurve for the corresponding feature
* ```OUTPUT_HEADER_featureType.txt```: for each featureType the specific FRCurve
* ```Features.gff```: features description in GFF format (for visualization)
* ```OUTPUT_HEADER_CEstats_PE.txt```: CEvalues distribution (for CE_stats tuning)
* ```OUTPUT_HEADER_CEstats_MP.txt```: CEvalues distribution (for CE_stats tuning)

**USAGE: advanced, CE-stats tuning**

CE-stats are able to identify the presence of insertion and deletion events. Different insert sizes give the possibility to
identify different events. In order to avoid too many False Positives (or too many False Negatives) a tuning phase is
highly recommended.

Once step 3 of USAGE is done, the user can already plot the FRCurves (for all the features or for only some of them).
CE_stats based features have been computed with default (i.e., not optimal) parameters. Each run of FRCbam produces two
files: ```OUTPUT_HEADER_CEstats_PE.txt``` and ```OUTPUT_HEADER_CEstats_MP.txt```. These files contain the distribution of the ```CE_values```
on each assembly. These values must be plotted as suggested in page 3 of Supplementary Material
(see http://www.nada.kth.se/~vezzi/publications/supplementary.pdf) to estimate the optimal ```CE_min``` and ```CE_max``` values for
the PE and MP library respectively.
Once the optimal parameters are estimated FRCurves must be recomputed for all assemblies (only ```COMPR``` and ```STRECH``` features
will change) specifying the following extra parameters:

* ```--CEstats-PE-min CE_PE_MIN```: all position with CE values computed with PE library lower than this are considered compressions
* ```--CEstats-PE-max CE_PE_MAX```: all position with CE values computed with PE library higher than this are considered expansions
* ```--CEstats-MP-min CE_MP_MIN```: all position with CE values computed with MP library lower than this are considered compressions
* ```--CEstats-MP-max CE_MP_MAX```: all position with CE values computed with MP library higher than this are considered expansions

FindTranslocations
--------------
The tool is under constant development. THere will be soon a detailed user guide, for now run the tool with ```--help``` to discover the options.




LICENCE
==============
All the tools distributed with this package are distributed under GNU General Public License version 3.0 (GPLv3).



12 changes: 12 additions & 0 deletions lib/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
project( BamTools )

# BamTools version information
set (BamTools_VERSION_MAJOR 2)
set (BamTools_VERSION_MINOR 0)
set (BamTools_VERSION_BUILD 5)

# add our includes root path
include_directories( bamtools/src )

# list subdirectories to build in
add_subdirectory( bamtools/src/api )
65 changes: 65 additions & 0 deletions lib/bamtools/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# ==========================
# BamTools CMakeLists.txt
# (c) 2010 Derek Barnett
#
# top-level
# ==========================

# set project name
project( BamTools )

# Cmake requirements
cmake_minimum_required( VERSION 2.6.4 )

# Force the build directory to be different from source directory
macro( ENSURE_OUT_OF_SOURCE_BUILD MSG )
string( COMPARE EQUAL "${CMAKE_SOURCE_DIR}" "${CMAKE_BINARY_DIR}" insource )
get_filename_component( PARENTDIR ${CMAKE_SOURCE_DIR} PATH )
string( COMPARE EQUAL "${CMAKE_SOURCE_DIR}" "${PARENTDIR}" insourcesubdir )
IF( insource OR insourcesubdir )
message( FATAL_ERROR "${MSG}" )
ENDIF( insource OR insourcesubdir )
endmacro( ENSURE_OUT_OF_SOURCE_BUILD )

ensure_out_of_source_build( "
${PROJECT_NAME} requires an out of source build.
$ mkdir build
$ cd build
$ cmake ..
$ make
(or the Windows equivalent)\n" )

# set BamTools version information
set( BamTools_VERSION_MAJOR 2 )
set( BamTools_VERSION_MINOR 3 )
set( BamTools_VERSION_BUILD 0 )

# set our library and executable destination dirs
set( EXECUTABLE_OUTPUT_PATH "${CMAKE_SOURCE_DIR}/bin" )
set( LIBRARY_OUTPUT_PATH "${CMAKE_SOURCE_DIR}/lib" )

# define compiler flags for all code
set( CMAKE_BUILD_TYPE Release )
add_definitions( -Wall -D_FILE_OFFSET_BITS=64 )

# -----------------------------------------------
# handle platform-/environment-specific defines

# If planning to run in Node.js environment, run:
# cmake -DEnableNodeJS=true
if( EnableNodeJS )
add_definitions( -DSYSTEM_NODEJS=1 )
endif()

# If running on SunOS
if( "${CMAKE_SYSTEM_NAME}" MATCHES "SunOS" )
add_definitions( -DSUN_OS )
endif()

# -------------------------------------------

# add our includes root path
include_directories( src )

# list subdirectories to build in
add_subdirectory( src )
22 changes: 22 additions & 0 deletions lib/bamtools/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
The MIT License

Copyright (c) 2009-2010 Derek Barnett, Erik Garrison, Gabor Marth, Michael Stromberg

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

60 changes: 60 additions & 0 deletions lib/bamtools/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
--------------------------------------------------------------------------------
README : BAMTOOLS
--------------------------------------------------------------------------------

BamTools provides both a programmer's API and an end-user's toolkit for handling
BAM files.

I. Learn More

II. License

III. Acknowledgements

IV. Contact

--------------------------------------------------------------------------------
I. Learn More:
--------------------------------------------------------------------------------

Installation steps, tutorial, API documentation, etc. are all now available
through the BamTools project wiki:

https://github.com/pezmaster31/bamtools/wiki

Join the mailing list(s) to stay informed of updates or get involved with
contributing:

https://github.com/pezmaster31/bamtools/wiki/Mailing-lists

--------------------------------------------------------------------------------
II. License :
--------------------------------------------------------------------------------

Both the BamTools API and toolkit are released under the MIT License.
Copyright (c) 2009-2010 Derek Barnett, Erik Garrison, Gabor Marth,
Michael Stromberg

See included file LICENSE for details.

--------------------------------------------------------------------------------
III. Acknowledgements :
--------------------------------------------------------------------------------

* Aaron Quinlan for several key feature ideas and bug fix contributions
* Baptiste Lepilleur for the public-domain JSON parser (JsonCPP)
* Heng Li, author of SAMtools - the original C-language BAM API/toolkit.

--------------------------------------------------------------------------------
IV. Contact :
--------------------------------------------------------------------------------

Feel free to contact me with any questions, comments, suggestions, bug reports,
etc.

Derek Barnett
Marth Lab
Biology Dept., Boston College

Email: derekwbarnett@gmail.com
Project Website: http://github.com/pezmaster31/bamtools
Loading

0 comments on commit b8f1b94

Please sign in to comment.