The Shark Machine Leaning Library. See more:
Clone or download


Shark is a fast, modular, general open-source C++ machine
learning library. 

Shark is licensed under the GNU Lesser General Public License, please
see the files COPYING and COPYING.LESSER, or visit .

Any application of the SHARK code toward military research and use is
expressly against the wishes of the SHARK development team.


The entry point to the Shark library documentation is located at
doc/index.html . For installation instructions, please click on
"Getting started" on that page. Short version of installation guide:
issue "ccmake ." in the main directory to select your build options,
and afterwards issue "make" in the main directory -- you should be
done (assuming Boost and CMake were installed). See the documentation
for detailed instructions.

BUILDING THE DOCUMENTATION: To build the documentation yourself (e.g.,
if you need to read it locally in order to install it, i.e., because
you don't have internet), see doc/README.txt


README.txt          This file (residing in the root directory of
                    the Shark library).

CMakeLists.txt      Definitions for the CMake build system.

include/            This directory and its sub-directories hold
                    all include files of the library. Note that
                    some functionality is implemented in lower-
                    level Impl/ folders and inline .inl files.

lib/                The Shark library is placed in this directory.
                    In the source code distribution this directory
                    is initially empty, and the library is placed
                    into the directory as the results of
                    compilation. Binary distributions already
                    contain the library, pre-built in release mode.

doc/                All documentation files are found in this
                    sub-directory. In packaged versions of Shark
                    the html documentation is pre-built; the
                    repository provides the corresponding sources.
                    The documentation contains technical reference
                    documents for all classes and functions as well
                    as a collection of introductory and advanced

doc/index.html      Entry point to the Shark documentation.

examples/           The examples directory contains example
                    use-cases of the most important algorithms
                    implemented in Shark. Besides exemplifying
                    powerful learning algorithms, these programs
                    are intended as starting points for
                    experimentation with the library. The
                    executables corresponding to the C++ example
                    programs are found in examples/bin/.

Test/               Shark comes with a large collection of unit
                    tests, all of which reside inside the Test

bin/                The binaries of the Shark unit tests are placed
                    here. Once the CMake build system is set up
                    (with the "ccmake" command or equivalent) the
                    whole test suite can be executed with the
                    command "make test", issued in the Shark root

src/                Source files of the Shark library. Note that
                    from Shark version 3 onwards large parts of the
                    library are templated and therefore header-only.

gpl-3.0.txt         GNU general public license, version 3.

Depending of the type of Shark distribution (binary or source
package, or current repository snapshot) not all of theses files
and directories are present.


The organization of the include/ directory reflects the structure of
the Shark library. It consists of the following modules:


LinAlg              Data structures and algorithms for typical
                    linear algebra computations. For (dense and
                    sparse) vector and matrix classes Shark relies
                    on Boost uBLAS. Many higher level algorithms
                    (such as singular value decomposition) are
                    still implemented by the library itself.

Statistics          This component is new in Shark 3. It wraps the
                    capabilities of Boost accumulators, and it
                    provides tools that appear regularly in machine
                    learning, such as the Mann-Whitney U-test (also
                    known as the Wilcoxon rank-sum test).


Core                The core module is the central place for all
                    top-level interfaces. In addition it holds a
                    few infrastructure classes, such as exceptions.

Data                The data module hosts data containers that have
                    been specifically designed for the needs of
                    machine learning code. Also, data can be
                    imported and exported from and to different
                    standard machine learning data file formats.


Models              Models are adaptive systems, the architectures
                    on top of which (machine) learning happens.
                    Shark features a rich set of models, from simple
                    linear maps to (feed-forward and recurrent)
                    neural networks, support vector machines, and
                    different types of trees. Models can also be
                    concatenated with data format converters and
                    other models.

ObjectiveFunctions  This module collects different types of cost,
                    fitness, or objective functions for learning.
                    The bandwidth includes data-dependent error
                    functions based on simple loss functions,
                    cross-validation, area under the ROC curve, and
                    different objectives used for model selection.

Algorithms          All actual learning algorithms reside in this
                    module. There are two main groups of learning
                    algorithms, namely iterative optimizers and
                    more specialized model trainers. General
                    optimizers are organized into direct search
                    and gradient-based optimization. Specialized
                    algorithms for linear programming (a part of
                    GLPK, the GNU linear programming kit) and
                    quadratic programming for training of non-linear
                    support vector machines are included. Shark
                    also ships with algorithms for efficient
                    nearest neighbor search.

Unsupervised        This module contains the Shark implementation
                    of restricted Bolzmann machines (RBMs),
                    a recent experimental feature of Shark.

We use external libraries, images and code:

* Shark uses the `Doxygen documentation system
  Doxygen is available under the LGPL, as Shark.
* Shark also uses the `Sphinx documentation system:
  Sphinx is available under the 2-clause (simplified/Free) BSD license
* The Shark documentation links between Sphinx and Doxygen using Doxylink written
  by Matt Williams and released under a 2-clause (simplified/Free) BSD license
* The website header is derived from the Mollio, set
  of html/css templates. Mollio is licensed under the
  GPLv2 <
* The page icon in the local table of contents is one of Nicolas Gallagher's
  pure CSS GUI icons
  Nicolas Gallagher's work is licensed under the
  GNU GPLv2 <
* Download icon from the Steel System Icons set by Uriy1966,
  free for non-commercial use.