Fetching contributors…
Cannot retrieve contributors at this time
217 lines (129 sloc) 9.94 KB


  • This is a bugfix release
    • fixed a segfault in some multi-threaded situations
    • removed some spurious large files in the distrib
    • fixed a bug with the -storate-type file option


  • Integration of Leon compressor into GATB-Core :

  • Time and memory optimisations :

    • Faster k-mer counting (inspired by KMC3 but not yet as fast :)

    • More efficient graph representation using compressed vectors (in GraphUnitigs.cpp)

    • Faster unitigs compaction (engineering improvements in BCALM code)

    • New compact encoding scheme to load the abundance values in memory (encoded on 8 bits, value range = 0 to 50k with 5% max error)

  • Parameterizable graph simplifications steps (see Graph.hpp and Minia): optional tip-clipping, bulge and erroneous connection removal

  • Preliminary support for loading unitigs (in GraphUnitigs.cpp) from a GFA1 graph format generated by BCALM (using scripts/ in BCALM repository)

  • Adding new ways to compile, making compilation easier :

    • Added a simple makefile to compile a GATB tool without CMake (see examples/Makefile)

    • Added support for Docker. Using docker/Dockerfile one can build a docker image containing GATB-core.

    • 2 new ways to compile example codes snippets :

      • cmake -DGATB_CORE_INCLUDE_EXAMPLES=True ..
      • cd example ; make [folder]/[examplename.cpp] for instance, make kmer/kmer2 will compile kmer2.cpp
  • Various bugfixes



  • A new graph object is introduced: GraphUnitigs, optimized to traverse unitigs but not to query individual kmers.
  • A few graph API functions changed.
  • Updated MPHF and HDF5.
  • This releases now requires your compiler to be C++11-compatible.


  • Tech notice

    • Compiling GATB-Core library now requires c++/11 capable compilers.

    • CMake 3.1.0 is the minimum release of CMake required to compile GATB-Core.

    • HDF5 library (use for data storage) upgraded to latest release 1.8.18

    • Parameters "-mphf none", "-mphf emphf" and "-mphf boophf" and variable WITH_MPHF are deprecated. Please remove them from your applications (e.g. in Graph::create()). BooPHF is now the default MPHF object and it is always compiled. Emphf has been removed from the library.

    • Debug compilation is now done using standard Cmake rule "-DCMAKE_BUILD_TYPE=Debug", instead of "-Ddebug=1".

  • API changes

    • Developers, please pay attention to these breaking changes:

      • Graph::Vector is now ``GraphVector`
      • Graph::Iterator is now GraphIterator
      • Graph::create() does not accept anymore '-mphf ...' (see Tech Notice, above)
  • New features

    • New GraphUnitigs class that offers a de Bruijn graph representation based on unitigs (created by BCALM2) loaded in memory. It has the same API as the Graph class although some functions aren't implemented, as accessing a node that is not an extremity of a unitig isn't supported in this representation. The representation is designed to traverse unitigs quickly, skipping over all non-branching nodes. This representation doesn't use the Bloom filter nor the MPHF. To use this representation, have a look at Minia's code:

    • New functions to traverse the graph have been added . See simplePath* in Graph.hpp. These functions are mostly designed to take advantage of GraphUnitigs and they have the same API in Graph too. They also will replace the Traversal class. Partial compatibility with the original Graph class has been implemented so far.

    • BooPHF is now the default MPHF object used by GATB-Core

    • In addition to HDF5, we introduce a new experimental support for raw file format. It was made for two reasons: avoid potential memory leaks due to hdf5 (unclear at this point), and avoid hdf5 file corruption (whenever a job is interrupted after kmer counting, sometimes the h5 file containing the kmer counts cannot be re-opened). The format is experimental, so use at your own risks. The file format is basically the same content as the previous HDF5 format but with each dataset being into its own file. Also, JSON is used instead of XML for structured configuration. To enable this format, pass "-storage-type file" in your configuration string (e.g. Graph::create()).


This is a bug-fix release :

  • fixed a compilation issue with old version of clang compilers (prior to clang 4.3 on mac). This gatb-core release is the last one to officially support clang version older than 4.3 on mac and 3.2 on linux.


  • bug fixes when MPHF is queried on a false positive node.

  • bug fixes that caused "Pool allocation failed" on some large instances.

  • fixed some compilation issues regarding clang version (version number incoherence between mac/linux).

  • fixed include problem in binary distribution that caused undue dependency on boost.


  • Assembly-inspired de Bruijn graph simplifications can be performed using a single command.

Here is an example:

// removes tips, bubbles and erroneous connections, 
// similar to some of the algorithms implemented in the SPAdes assembler
  • Faster graph traversal can be activated using a single command.

Here is an example:

// allocates 1 byte/node to precompute adjacency for each nodes 
// in the MPHF. 
// Faster graph traversal (especially using neighbors()).
  • Breaking API changes

Major changes in API are:

  neighbors\<Node>(..) *becomes* neighbors(..)
  neighbors\<Edge>(..) *becomes* neighborsEdge(..)
  iterator\<Node>(..) *becomes* iterator(..)
  iterator\<BranchingNode>(..) *becomes* iteratorBranching(..) 
  node.kmer.get\<Type>() *becomes* node.template getKmer<Type>()
  successors\<Node>(..) *becomes* successors(..)  
  const Node& *becomes* Node&
       (as MPHF indices are now cached in Node objects)

  etc.. for all fonctions of the type:
  - xxx\<Node>,
  - xxx\<Edge>, 
  - xxx\<BranchingNode>,
  - xxx\<BranchingEdge>.
  • The basic kmer type (Kmer<>::Type) no longer has a constructor. Use [kmer].setVal(0) to set the value of the variable [kmer] to zero.

For instance, the following code:

optimum = Kmer<span>::Type(0)


  • Graph is now a templated object (GraphTemplate<Node_t, Edge_t, GraphDataVariant_t>) behind the scenes. However this change is transparent to users of previous versions of GATB-core, as compatibility with the Graph class is preserved.

  • New implementation for the minimal perfect hash function (switched from emphf to BooPHF)

  • Non-canonical k-mer counting is supported via "cmake -DNONCANONICAL=1"

  • bug fixes in how queries with dir=DIR_INCOMING are handled.

  • various minor bug fixes


  • Re-design to support variable number of kmer sizes => now, one can use the cmake variable KSIZE_LIST, for instance "cmake -DKSIZE_LIST="32 64 96" ..

  • Allows "auto" value for the -abundance-min parameter


  • Re-design of the SortingCountAlgorithm with introduction of interface ICountProcessor => it should allow development of new tools based on kmers counting


  • Correction of memory alignment issue on MacOs in some cases

  • Re-introduce multi-passes management in DSK

  • Correction of passes number configuration with some banks inputs

  • Temporary files have now unique names so dbgh5 can be launched several times in the same working directory


  • Correction of scripts for new project creation and delivery process


  • Speed up from x2 to x3 for kmer counting and graph construction phases (optimizations based on minimizers and improved Bloom filters). GATB's k-mer counter has been improved using techniques from KMC2, to achieve competitive running times compared to KMC2.

  • Ability to store arbitrary information associated to each kmer of the graph, enabled by a minimal perfect hash function (costs only 2.61 bits/kmer of memory)

  • Improved API with new possibilities (banks and kmers management)

  • Many new snippets showing how to use the library.


Modifications of Kmer::Model class for kmers management

  • better implementation (factorization, optimization)
  • introduction of minimizers concept

WARNING ! These modifications introduced small API changes. Please read snippets kmer2 and kmer5 to see how to handle kmers now.