A Python library for memory efficient parallelized ensemble learning.
ML-Ensemble combines the Scikit-learn estimator API with a neural-network style API to allow straight-forward multi-layer ensemble network architectures that can be used as any Scikit-learn estimator.
The core principle of ML-Ensemble is to maximize parallel processing at minimum memory consumption. ML-Ensemble uses memory mapping to remain memory neutral during parallel estimation.
For full documentation, see here.
Ensembles are built by adding layers to an instance object: layers in their
turn are comprised of a list of estimators. No matter how complex the
ensemble, to train it call the fit
method:
ensemble = Subsemble()
# First layer
ensemble.add(list_of_estimators)
# Second layer
ensemble.add(list_of_estimators)
# Final meta estimator
ensemble.add_meta(estimator)
# Train ensemble
ensemble.fit(X, y)
Training data is persisted to a memory cache that each sub-process has access to, allowing parallel processing to require no more memory than processing on a single thread. For more details, see here.
Expect 95-97% of training time to be spent fitting the base estimators - irrespective of data size. The time it takes to fit an ensemble depends therefore entirely on how fast the chosen base learners are, and how many CPU cores are available.
Moreover, ensemble classes that fit estimators on subsets scale more efficiently than the base learners when these do not scale linearly.
The core unit of an ensemble is a layer, much as a hidden unit in a neural network. Each layer contains an ensemble class specification and a mapping of preprocessing pipelines to base learners to be used during fitting.
The modular build of an ensemble allows great flexibility in architecture, both in terms of the depth of the ensemble (number of layers) and how each layer generates predictions.
ML-Ensemble offers the possibility to specify, for each layer, a set of preprocessing pipelines that maps to different (or the same) sets of estimators. One set of estimators might benefit from one type of preprocessing, while for a different set of estimators another preprocessing pipeline is desired. This can easily be achieved in ML-Ensemble:
preprocessing = {'pipeline-1': list_of_transformers_1,
'pipeline-2': list_of_transformers_2}
estimators = {'pipeline-1': list_of_estimators_1,
'pipeline-2': list_of_estimators_2}
ensemble.add(estimators, preprocessing)
ML Ensemble implements a dedicated diagnostics and model selection suite
for intuitive and speedy ensemble evaluation. In particular, the Evaluator
class allows users to evaluate several preprocessing pipelines and several
estimators in one go, possibly even evaluating different estimators on
different preprocessing pipelines.
Moreover, the EnsembleTransformer
allows user to build a transformer that
behaves as any ensemble, but that returns the predictions from the fit
call. Hence, the transformer can be used together with the Evaluator
to
pre-generate training folds for model selection of the meta estimator
(or further layers). For a full example, see here.
preprocessing_dict = {'pipeline-1': list_of_transformers_1,
'pipeline-2': list_of_transformers_2}
evaluator = Evaluator(scorer=score_func)
evaluator.fit(X, y, list_of_estimators, param_dists_dict, preprocessing_dict)
Output is ordered per transformation case and estimators, and can be given a
readable tabular view by passing the summary
attribute to a Pandas
DataFrame
, as in the example below.
fit_time_mean fit_time_std train_score_mean train_score_std test_score_mean test_score_std params
prep-1 est-1 0.001353 0.001316 0.957037 0.005543 0.960000 0.032660 {}
est-2 0.000447 0.000012 0.980000 0.004743 0.966667 0.033333 {'n_neighbors': 15}
prep-2 est-1 0.001000 0.000603 0.957037 0.005543 0.960000 0.032660 {}
est-2 0.000448 0.000036 0.965185 0.003395 0.960000 0.044222 {'n_neighbors': 8}
prep-3 est-1 0.000735 0.000248 0.791111 0.019821 0.780000 0.133500 {}
est-2 0.000462 0.000143 0.837037 0.014815 0.800000 0.126491 {'n_neighbors': 9}
Execute
pip install mlens
To ensure latest version is installed, fork the GitHub repository.
git clone https://flennerhag/mlens.git; cd mlens;
python install setup.py
For the latest in-development version, install the dev
branch of the
repository.
git clone https://flennerhag/mlens.git; cd mlens; git checkout dev;
python install setup.py
MIT License
Copyright (c) 2017 Sebastian Flennerhag
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.