mlpack is an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages. It aims to provide fast, lightweight implementations of both common and cutting-edge machine learning algorithms.
mlpack's lightweight C++ implementation makes it ideal for deployment, and it can also be used for interactive prototyping via C++ notebooks (these can be seen in action on mlpack's homepage).
In addition to its powerful C++ interface, mlpack also provides command-line programs, and bindings to the Python, R, Julia, and Go languages.
If you use mlpack, please cite the software.
Installing mlpack can be done using the instructions in the README; or the Windows build guide. The following basic guides are highly recommended before using mlpack.
-
First steps:
- mlpack C++ quickstart: create a couple simple C++ programs that use mlpack
- Sample Windows mlpack C++ application: create a working mlpack Windows program using Visual Studio
-
Basics of matrices and data in mlpack:
-
Reference for mlpack core classes:
-
Using mlpack natively with our extensions in Python, R, CLI, Julia, and Go:
Documentation for each machine learning algorithm that mlpack implements is detailed in the sections below.
- Classification algorithms: classify points as
discrete labels (
0
,1
,2
, ...). - Regression algorithms: predict continuous values.
- Clustering algorithms: group points into clusters.
- Geometric algorithms: computations based on distance metrics (nearest neighbors, kernel density estimation, etc.).
- Preprocessing utilities: prepare data for machine learning algorithms.
- Transformations: transform data from one space to another (principal components analysis, etc.).
- Modeling utilities: cross-validation, hyperparameter tuning, etc.
Classify points as discrete labels (0
, 1
, 2
, ...).
AdaBoost
: Adaptive BoostingDecisionTree
: ID3-style decision tree classifierHoeffdingTree
: streaming/incremental decision tree classifierLinearSVM
: simple linear support vector machine classifierLogisticRegression
: L2-regularized logistic regression (two-class only)NaiveBayesClassifier
: simple multi-class naive Bayes classifierPerceptron
: simple Perceptron classifierRandomForest
: parallelized random forest classifierSoftmaxRegression
: L2-regularized softmax regression (i.e. multi-class logistic regression)
Predict continuous values.
BayesianLinearRegression
: Bayesian L2-penalized linear regressionDecisionTreeRegressor
: ID3-style decision tree regressorLARS
: Least Angle Regression (LARS), L1-regularized and L2-regularizedLinearRegression
: L2-regularized linear regression (ridge regression)
Group points into clusters.
Computations based on distance metrics.
Prepare data for machine learning algorithms.
Transform data from one space to another.
LocalCoordinateCoding
: local coordinate coding with dictionary learningNMF
: non-negative matrix factorizationPCA
: principal components analysisSparseCoding
: sparse coding with dictionary learning
Tools for assembling a full data science pipeline.
- Cross-validation: k-fold cross-validation tools for any mlpack algorithm
- Hyperparameter tuning: generic hyperparameter tuner to find good hyperparameters for any mlpack algorithm
mlpack's bindings to other languages have less complete functionality than mlpack in C++, but almost all the same algorithms are available.
| Python | -- | quickstart | -- | reference | | Julia | -- | quickstart | -- | reference | | R | -- | quickstart | -- | reference | Command-line programs | -- | quickstart | -- | reference | | Go | -- | quickstart | -- | reference |
- mlpack examples repository: numerous fully-working example applications of mlpack, in C++ and other languages.
- mlpack models repository: complex models in C++ built with mlpack
For additional documentation beyond what is covered in all the resources above, the source code should be consulted. Each method is fully documented.
The following general documentation can be useful if you are interested in contributing to mlpack:
Throughout the codebase, mlpack uses some common template parameter policies. These are documented below.
- The
ElemType
policy: element types for data - The
MetricType
policy: distance metrics - The
KernelType
policy: kernel functions - The
TreeType
policy: space trees (ball trees, KD-trees, etc.)
In addition, the following documentation may be useful when developing bindings for other languages:
- Timers: timing parts of bindings
- Writing an mlpack binding: simple examples of mlpack bindings
- Automatic bindings: details on mlpack's automatic binding generator system.
For a list of changes in each version of mlpack, see the changelog.