This is a bugfix release.
- fixed a segfault in some multi-threaded situations.
- removed some files to make the distrib less large.
- fixed a bug with the -storage-type file option.
Integration of Leon compressor into GATB-Core :
- It means that the Leon file format can now be handled natively by all softwares relying upon GATB-Core. In other words, you can apply data processing on reads without decompression of the Leon file.
- more details at https://github.com/GATB/gatb-core/wiki/Using-GATB-Core-integrated-Leon-compressor
- unit tests + large-scale test suite of Leon compressor; cf. https://ci.inria.fr/gatb-core/view/Leon/job/tool-leon-functional-tests/lastBuild/console
Time and memory optimisations :
Faster k-mer counting (inspired by KMC3 but not yet as fast :)
More efficient graph representation using compressed vectors (in
Faster unitigs compaction (engineering improvements in BCALM code)
New compact encoding scheme to load the abundance values in memory (encoded on 8 bits, value range = 0 to 50k with 5% max error)
Parameterizable graph simplifications steps (see
Graph.hppand Minia): optional tip-clipping, bulge and erroneous connection removal
Preliminary support for loading unitigs (in
GraphUnitigs.cpp) from a GFA1 graph format generated by BCALM (using
scripts/convertToGFA.pyin BCALM repository)
Adding new ways to compile, making compilation easier :
Added a simple makefile to compile a GATB tool without CMake (see
Added support for Docker. Using
docker/Dockerfileone can build a docker image containing GATB-core.
2 new ways to compile example codes snippets :
cmake -DGATB_CORE_INCLUDE_EXAMPLES=True ..
cd example ; make [folder]/[examplename.cpp]for instance,
make kmer/kmer2will compile
- A new graph object is introduced: GraphUnitigs, optimized to traverse unitigs but not to query individual kmers.
- A few graph API functions changed.
- Updated MPHF and HDF5.
- This releases now requires your compiler to be C++11-compatible.
Compiling GATB-Core library now requires c++/11 capable compilers.
CMake 3.1.0 is the minimum release of CMake required to compile GATB-Core.
HDF5 library (use for data storage) upgraded to latest release 1.8.18
Parameters "-mphf none", "-mphf emphf" and "-mphf boophf" and variable WITH_MPHF are deprecated. Please remove them from your applications (e.g. in Graph::create()). BooPHF is now the default MPHF object and it is always compiled. Emphf has been removed from the library.
Debug compilation is now done using standard Cmake rule "-DCMAKE_BUILD_TYPE=Debug", instead of "-Ddebug=1".
Developers, please pay attention to these breaking changes:
Graph::Vectoris now ``GraphVector`
Graph::create()does not accept anymore '-mphf ...' (see Tech Notice, above)
New GraphUnitigs class that offers a de Bruijn graph representation based on unitigs (created by BCALM2) loaded in memory. It has the same API as the Graph class although some functions aren't implemented, as accessing a node that is not an extremity of a unitig isn't supported in this representation. The representation is designed to traverse unitigs quickly, skipping over all non-branching nodes. This representation doesn't use the Bloom filter nor the MPHF. To use this representation, have a look at Minia's code: https://github.com/GATB/minia/blob/ee00a34f1a49a1fcdd757e0bdaf7d03190896322/src/Minia.cpp#L116
New functions to traverse the graph have been added . See
simplePath*in Graph.hpp. These functions are mostly designed to take advantage of GraphUnitigs and they have the same API in Graph too. They also will replace the Traversal class. Partial compatibility with the original Graph class has been implemented so far.
BooPHF is now the default MPHF object used by GATB-Core
In addition to HDF5, we introduce a new experimental support for raw file format. It was made for two reasons: avoid potential memory leaks due to hdf5 (unclear at this point), and avoid hdf5 file corruption (whenever a job is interrupted after kmer counting, sometimes the h5 file containing the kmer counts cannot be re-opened). The format is experimental, so use at your own risks. The file format is basically the same content as the previous HDF5 format but with each dataset being into its own file. Also, JSON is used instead of XML for structured configuration. To enable this format, pass "-storage-type file" in your configuration string (e.g. Graph::create()).
GATB-Core version 1.2.2, release notes
This is a bug-fix release :
- fixed a compilation issue with old version of clang compilers (prior to clang 4.3 on mac). This gatb-core release is the last one to officially support clang version older than 4.3 on mac and 3.2 on linux.
GATB-Core version 1.2.1, release notes
This is a bug-fix release :
- bug fixes when MPHF is queried on a false positive node.
- bug fixes that caused "Pool allocation failed" on some large instances.
- fixed some compilation issues regarding clang version (version number incoherence between mac/linux).
- fixed include problem in binary distribution that caused undue dependency on boost.
Released on 2016-06-28/10:42:44
GATB-Core version 1.2.0, release notes
Assembly-inspired de Bruijn graph simplifications
// removes tips, bubbles and erroneous connections, // similar to some of the algorithms implemented in the SPAdes assembler graph.simplify();
Faster graph traversal can be activated using a single command.
// allocates 1 byte/node to precompute adjacency for each nodes // in the MPHF. // Faster graph traversal (especially using neighbors()). graph.precomputeAdjacency();
Improvements in MPHF and kmer counting
- New implementation for the minimal perfect hash function (switched from emphf to BooPHF)
- Non-canonical k-mer counting is supported via "cmake -DNONCANONICAL=1"
Breaking API changes
- neighbors(..) becomes neighbors(..)
- neighbors(..) becomes neighborsEdge(..)
- iterator(..) becomes iterator(..)
- iterator(..) becomes iteratorBranching(..)
- node.kmer.get() becomes node.template getKmer()
- successors(..) becomes successors(..)
- const Node& becomes Node& (as MPHF indices are now cached in Node objects)
and so on for all graph fonctions:
- The basic kmer type (Kmer<>::Type) no longer has a constructor. Use [kmer].setVal(0) to set the value of the variable [kmer] to zero.
For instance, the following code:
optimum = Kmer<span>::Type(0)
- Graph is now a templated object (GraphTemplate<Node_t, Edge_t, GraphDataVariant_t>) behind the scenes. However this change is transparent to users of previous versions of GATB-core, as compatibility with the Graph class is preserved.
- bug fixes in how queries with dir=DIR_INCOMING are handled.
- various minor bug fixes