-
Notifications
You must be signed in to change notification settings - Fork 156
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
377 additions
and
154 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,159 +1,216 @@ | ||
## MLJ | ||
|
||
A pure Julia machine learning framework. | ||
|
||
[MLJ News](https://github.com/alan-turing-institute/MLJ.jl/blob/master/docs/src/NEWS.md) for MLJ and its satellite packages, [MLJBase](https://github.com/alan-turing-institute/MLJBase.jl), [MLJModels](https://github.com/alan-turing-institute/MLJModels.jl) and [ScientificTypes](https://github.com/alan-turing-institute/ScientificTypes.jl) | [MLJ Cheatsheet](docs/src/mlj_cheatsheet.md) | ||
|
||
## `join!(MLJ, YourModel)` | ||
|
||
**Call for help.** MLJ needs your help to ensure its success. This depends | ||
crucially on: | ||
|
||
- Existing and developing ML algorithms implementing the MLJ model interface | ||
|
||
- Improvements to existing but poorly maintained Julia ML algorithms | ||
|
||
The MLJ model interface is now relatively stable and | ||
[well-documented](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/), | ||
and the core team is happy to respond to [issue requests](https://github.com/alan-turing-institute/MLJ.jl/issues) for | ||
assistance. Please click [here](CONTRIBUTING.md) for more details on | ||
contributing. | ||
|
||
MLJ is presently supported by a small Alan Turing Institute grant and is looking for new funding sources to grow and maintain the project. | ||
|
||
[![Build Status](https://travis-ci.com/alan-turing-institute/MLJ.jl.svg?branch=master)](https://travis-ci.com/alan-turing-institute/MLJ.jl) | ||
[![Slack Channel mlj](https://img.shields.io/badge/chat-on%20slack-yellow.svg)](https://slackinvite.julialang.org/) | ||
[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://alan-turing-institute.github.io/MLJ.jl/dev/) | ||
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://alan-turing-institute.github.io/MLJ.jl/stable/) | ||
[![Coverage Status](https://coveralls.io/repos/github/alan-turing-institute/MLJ.jl/badge.svg?branch=master)](https://coveralls.io/github/alan-turing-institute/MLJ.jl?branch=master) | ||
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3541506.svg)](https://doi.org/10.5281/zenodo.3541506) | ||
|
||
![](docs/src/two_model_stack.png) | ||
|
||
MLJ aims to be a flexible framework for combining and tuning machine | ||
learning models, written in the high performance, rapid development, | ||
scientific programming language, [Julia](https://julialang.org). | ||
|
||
The MLJ project is partly inspired by [MLR](https://mlr.mlr-org.com/index.html). | ||
|
||
[List of presently implemented models](https://github.com/alan-turing-institute/MLJModels.jl/tree/master/src/registry/Models.toml) | ||
|
||
|
||
### Installation | ||
|
||
At the julia REPL prompt | ||
<div align="center"> | ||
<img src="https://alan-turing-institute.github.io/MLJTutorials/assets/infra/MLJLogo2.svg" alt="MLJ" width="200"> | ||
</div> | ||
|
||
<h2 align="center">A Machine Learning Toolbox for Julia. | ||
<p align="center"> | ||
<a href="https://travis-ci.com/alan-turing-institute/MLJ.jl"> | ||
<img src="https://travis-ci.com/alan-turing-institute/MLJ.jl.svg?branch=master" | ||
alt="Build Status"> | ||
</a> | ||
<a href="https://coveralls.io/github/alan-turing-institute/MLJ.jl?branch=master"> | ||
<img src="https://coveralls.io/repos/github/alan-turing-institute/MLJ.jl/badge.svg?branch=master" | ||
alt="Coverage"> | ||
</a> | ||
<a href="https://slackinvite.julialang.org/"> | ||
<img src="https://img.shields.io/badge/chat-on%20slack-yellow.svg" | ||
alt="#mlj"> | ||
</a> | ||
<a href="https://alan-turing-institute.github.io/MLJ.jl/stable/"> | ||
<img src="https://img.shields.io/badge/docs-stable-blue.svg" | ||
alt="Documentation"> | ||
</a> | ||
</p> | ||
</h2> | ||
|
||
MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem. | ||
MLJ is released under the MIT licensed and sponsored by the [Alan Turing Institute](https://www.turing.ac.uk/). | ||
|
||
<br> | ||
<p align="center"> | ||
<a href="#using-mlj">Using MLJ</a> • | ||
<a href="#the-mlj-universe">MLJ Universe</a> • | ||
<a href="#contributing-to-mlj">Contributing</a> • | ||
<a href="#models-available">Available Models</a> • | ||
<a href="https://github.com/alan-turing-institute/MLJ.jl/blob/master/docs/src/mlj_cheatsheet.md">MLJ Cheatsheet</a> • | ||
<a href="#citing-mlj">Citing MLJ</a> | ||
</p> | ||
|
||
### Key goals | ||
|
||
* Offer a consistent way to use, compose and tune machine learning models in Julia, | ||
* Promote the improvement of the Julia ML/Stats ecosystem by making it easier to use models from a wide range of packages, | ||
* Unlock performance gains by exploiting Julia's support for parallelism, automatic differentiation, GPU, optimisation etc. | ||
|
||
### Key features | ||
|
||
* Data agnostic, train models on any data supported by the [Tables.jl](https://github.com/JuliaData/Tables.jl) interface, | ||
* Extensive support for model composition (*pipelines* and *learning networks*), | ||
* Convenient syntax to tune and evaluate (composite) models, | ||
* Consistent interface to handle probabilistic predictions. | ||
|
||
--- | ||
|
||
### Using MLJ | ||
|
||
It is a good idea to use a [separate environment](https://julialang.github.io/Pkg.jl/v1/environments/) for MLJ in order to avoid version clashes with other packages you may be using. | ||
You can do so with | ||
|
||
```julia | ||
using Pkg | ||
Pkg.add("MLJ") | ||
Pkg.add("MLJModels") | ||
julia> using Pkg; Pkg.activate("My_MLJ_env", shared=true) | ||
``` | ||
|
||
To obtain a list of all registered models: | ||
Installing MLJ is also done with the package manager: | ||
|
||
```julia | ||
using MLJ | ||
models() | ||
julia> Pkg.add(["MLJ", "MLJModels"]) | ||
``` | ||
|
||
To add a package - for example, DecisionTree - to your load path: | ||
It is important to note that MLJ is essentially a big wrapper providing a unified access to _model providing packages_ and so you will also need to make sure these packages are available in your environment. | ||
For instance, if you want to use a **Decision Tree Classifier**, you need to have [DecisionTree.jl](https://github.com/bensadeghi/DecisionTree.jl) installed: | ||
|
||
```julia | ||
using Pkg | ||
Pkg.add("DecisionTree") | ||
julia> Pkg.add("DecisionTree"); | ||
julia> using MLJ; | ||
julia> @load DecisionTreeClassifier | ||
``` | ||
|
||
To load all code needed to use a model - for example, DecisionTreeClassifier, | ||
For a list of models and their packages see the [table below](#models-available), or run | ||
|
||
```julia | ||
@load DecisionTreeClassifier | ||
using MLJ | ||
models() | ||
``` | ||
|
||
which also returns a default instance. Refer to the | ||
[documentation](https://alan-turing-institute.github.io/MLJ.jl/stable/) | ||
for more on instantiating and running loaded models. | ||
We recommend you start with models marked as coming from _mature_ packages such as _DecisionTree_, _ScikitLearn_ or _XGBoost_. | ||
|
||
#### Tutorials | ||
|
||
**Package conflicts.** If you encounter package conflicts during | ||
installation, and are not familiar with the Julia package manager, | ||
then you can try installation in a fresh environment by first entering | ||
these commmands: | ||
The best place to get started with MLJ is to go the [MLJ Tutorials](https://alan-turing-institute.github.io/MLJTutorials/) website. | ||
Each of the tutorial can be downloaded as a notebook or Julia script to facilitate experimentation with the packages. | ||
|
||
You're also welcome to join the `#mlj` Julia slack channel to ask questions and make suggestions. | ||
|
||
```julia | ||
using Pkg | ||
Pkg.activate("my_mlj_env", shared=true) | ||
--- | ||
|
||
### The MLJ Universe | ||
|
||
The MLJ universe is made out of several repositories some of which can be used independently of MLJ (indicated with a ⟂ symbol): | ||
|
||
* (⟂) [MLJBase.jl](https://github.com/alan-turing-institute/MLJBase.jl) offers essential tools to load and interpret data, describe ML models and use metrics; it is the repository you should interface with if you wish to make your package accessible via MLJ, | ||
* [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) offers tools to compose, tune and evaluate models, | ||
* [MLJModels.jl](https://github.com/alan-turing-institute/MLJModels.jl) contains interfaces to a number of important model-providing packages such as, [DecisionTree.jl](https://github.com/bensadeghi/DecisionTree.jl), [ScikitLearn.jl](https://github.com/bensadeghi/ScikitLearn.jl) or [XGBoost.jl](https://github.com/dmlc/XGBoost.jl) as well as a few built-in transformations (one hot encoding, standardisation, ...), it also hosts the *model registry* which keeps track of all models accessible via MLJ, | ||
* (⟂) [ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl) a lightweight package to help specify the *interpretation* of data beyond how the data is currently encoded, | ||
* (⟂) [MLJLinearModels.jl](https://github.com/alan-turing-institute/MLJLinearModels.jl) an experimental package for a wide range of penalised linear models such as Lasso, Elastic-Net, Robust regression, LAD regression, etc. | ||
* [MLJFlux.jl](https://github.com/alan-turing-institute/MLJFlux.jl) an experimental package to use Flux within MLJ. | ||
|
||
and maybe most importantly: | ||
|
||
* [MLJTutorials](https://github.com/alan-turing-institute/MLJTutorials) which collects tutorials on how to use MLJ. | ||
|
||
--- | ||
|
||
### Contributing to MLJ | ||
|
||
MLJ is an ambitious project and we need all the help we can get! | ||
There are multiple ways you can contribute; the table below helps indicate where you can help and what are the subjective requirements in terms of Julia and ML expertise. | ||
|
||
Julia | ML | What to do | ||
----- | ---------- | ---------- | ||
= | = | use MLJ and give us feedback, help us write better tutorials, suggest missing features, test the less mature model packages | ||
⭒ | = | package to facilitate visualising results in MLJ | ||
⭒ | ⭒ | add/improve data pre-processing tools | ||
⭒ | ⭒ | add/improve interfaces to other model-providing packages | ||
⭒ | ⭒ | functionalities for time series | ||
⭒ | ⭒ | functionalities for systematic benchmarking of models | ||
⭒ | ⭒ | functionalities for natural language processing (NLP) | ||
⭒⭒ | = | decrease the overhead incurred by MLJ | ||
⭒⭒ | = | improving support for sparse data | ||
⭒⭒ | ⭒ | add parallelism and/or multithreading to MLJ (*there is an ongoing effort to interface with [Dagger.jl](https://github.com/JuliaParallel/Dagger.jl)*) | ||
⭒ | ⭒⭒ | add interface with probabilistic programming packages (*there is an ongoing effort to interface with [Soss.jl](https://github.com/cscherrer/Soss.jl)*) | ||
⭒⭒ | ⭒⭒ | more sophisticated HP tuning (BO, Bandit, early stopping, ...) possibly as part of an external package(s), possibly integrating with Julia's optimisation and autodiff packages | ||
|
||
If you're interested in one of these beyond the first one, please get in touch with either Anthony Blaom or Thibaut Lienart on Slack and we can further guide you. | ||
Thank you! | ||
|
||
You can also have a look at MLJ's [release notes](https://github.com/alan-turing-institute/MLJ.jl/releases) to get an idea for what's been happening recently. | ||
|
||
--- | ||
|
||
### Models available | ||
|
||
There is a wide range of models accessible via MLJ. | ||
We are always looking for contributors to add new models or help us test existing ones. | ||
The table below indicates the models that are accessible at present along with a subjective indication of how mature the underlying package is. | ||
|
||
* *experimental*: indicates the package is fairly new and/or is under active development; you can help by testing these packages and making them more robust, | ||
* *medium*: indicates the package is fairly mature but may benefit from optimisations and/or extra features; you can help by suggesting either, | ||
* *high*: indicates the package is very mature and functionalities are expected to have been fairly optimised and tested. | ||
|
||
| Package | Models | Maturity | Note | ||
| ------- | ------ | -------- | ---- | ||
[Clustering.jl] | KMeans, KMedoids | high | † | ||
[DecisionTree.jl] | DecisionTreeClassifier, DecisionTreeRegressor | high | † | ||
[GLM.jl] | LinearRegressor, LinearBinaryClassifier, LinearCountRegressor | medium | † | ||
[LIBSVM.jl] | LinearSVC, SVC, NuSVC, NuSVR, EpsilonSVR, OneClassSVM | high | also via ScikitLearn.jl | ||
[MLJModels.jl] (builtins) | StaticTransformer, FeatureSelector, FillImputer, UnivariateStandardizer, Standardizer, UnivariateBoxCoxTransformer, OneHotEncoder, ConstantRegressor, ConstantClassifier | medium | | ||
[MLJLinearModels.jl] | LinearRegressor, RidgeRegressor, LassoRegressor, ElasticNetRegressor, QuantileRegressor, HuberRegressor, RobustRegressor, LADRegressor, LogisticClassifier, MultinomialClassifier | experimental | | ||
[MultivariateStats.jl] | RidgeRegressor, PCA, KernelPCA, ICA, LDA, BayesianLDA, SubspaceLDA, BayesianSubspaceLDA | high | † | ||
[NaiveBayes.jl] | GaussianNBClassifier, MultinomialNBClassifier, HybridNBClassifier | low | | ||
[NearestNeighbors.jl] | KNNClassifier, KNNRegressor | high | | ||
[ScikitLearn.jl] | SVMClassifier, SVMRegressor, SVMNuClassifier, SVMNuRegressor, SVMLClassifier, SVMLRegressor, ARDRegressor, BayesianRidgeRegressor, ElasticNetRegressor, ElasticNetCVRegressor, HuberRegressor, LarsRegressor, LarsCVRegressor, LassoRegressor, LassoCVRegressor, LassoLarsRegressor, LassoLarsCVRegressor, LassoLarsICRegressor, LinearRegressor, OrthogonalMatchingPursuitRegressor, OrthogonalMatchingPursuitCVRegressor, PassiveAggressiveRegressor, RidgeRegressor, RidgeCVRegressor, SGDRegressor, TheilSenRegressor, LogisticClassifier, LogisticCVClassifier, PerceptronClassifier, RidgeClassifier, RidgeCVClassifier, PassiveAggressiveClassifier, SGDClassifier, GaussianProcessRegressor, GaussianProcessClassifier, AdaBoostRegressor, AdaBoostClassifier, BaggingRegressor, BaggingClassifier, GradientBoostingRegressor, GradientBoostingClassifier, RandomForestRegressor, RandomForestClassifier, GaussianNB, MultinomialNB, ComplementNB, BayesianLDA, BayesianQDA | high | † | ||
[XGBoost.jl] | XGBoostRegressor, XGBoostClassifier, XGBoostCount | high | | ||
|
||
**Note** (†): some models are missing, your help is welcome to complete the interface. Get in touch with Thibaut Lienart on Slack if you would like to help, thanks! | ||
|
||
[Clustering.jl]: https://github.com/JuliaStats/Clustering.jl | ||
[DecisionTree.jl]: https://github.com/bensadeghi/DecisionTree.jl | ||
[GaussianProcesses.jl]: https://github.com/STOR-i/GaussianProcesses.jl | ||
[GLM.jl]: https://github.com/JuliaStats/GLM.jl | ||
[LIBSVM.jl]: https://github.com/mpastell/LIBSVM.jl | ||
[MLJLinearModels.jl]: https://github.com/alan-turing-institute/MLJLinearModels.jl | ||
[MLJModels.jl]: https://github.com/alan-turing-institute/MLJModels.jl | ||
[MultivariateStats.jl]: https://github.com/JuliaStats/MultivariateStats.jl | ||
[NaiveBayes.jl]: https://github.com/dfdx/NaiveBayes.jl | ||
[NearestNeighbors.jl]: https://github.com/KristofferC/NearestNeighbors.jl | ||
[ScikitLearn.jl]: https://github.com/cstjean/ScikitLearn.jl | ||
[XGBoost.jl]: https://github.com/dmlc/XGBoost.jl | ||
|
||
--- | ||
|
||
### Citing MLJ | ||
|
||
<a href="https://doi.org/10.5281/zenodo.3541506"> | ||
<img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3541506.svg" | ||
alt="Cite MLJ"> | ||
</a> | ||
|
||
```bibtex | ||
@software{anthony_blaom_2019_3541506, | ||
author = {Anthony Blaom and | ||
Franz Kiraly and | ||
Thibaut Lienart and | ||
Sebastian Vollmer}, | ||
title = {alan-turing-institute/MLJ.jl: v0.5.3}, | ||
month = nov, | ||
year = 2019, | ||
publisher = {Zenodo}, | ||
version = {v0.5.3}, | ||
doi = {10.5281/zenodo.3541506}, | ||
url = {https://doi.org/10.5281/zenodo.3541506} | ||
} | ||
``` | ||
|
||
In future REPL sessions, you can activate your (now populated) | ||
environment with the same command. | ||
|
||
|
||
A docker image with installation | ||
[instructions](https://github.com/ysimillides/mlj-docker) is also | ||
available. | ||
|
||
|
||
### Features to include: | ||
|
||
- **Automated tuning** of hyperparameters, including | ||
composite models with *nested parameters*. Tuning implemented as a | ||
wrapper, allowing composition with other meta-algorithms. ✔ | ||
|
||
- Option to tune hyperparameters using gradient descent and **automatic | ||
differentiation** (for learning algorithms written in Julia). | ||
|
||
- Option to tune hyperaparameters using **Bayesian optimisation** | ||
|
||
- **Data agnostic**: Train models on any data supported by the Tables.jl | ||
[interface](https://github.com/JuliaData/Tables.jl). ✔ | ||
|
||
- Intuitive syntax for building arbitrarily complicated | ||
**learning networks** .✔ | ||
|
||
- Learning networks can be exported as self-contained **composite models** ✔, but | ||
common networks (e.g., linear **pipelines** ✔, **stacks**) come ready to plug-and-play. | ||
|
||
- Performant parallel implementation of large homogeneous **ensembles** | ||
of arbitrary models (e.g., random forests). ✔ | ||
|
||
- Model **registry** and facility to **match models** to machine learning | ||
tasks. ✔ | ||
|
||
- **Benchmarking** a battery of assorted models for a given task. | ||
|
||
- Automated estimates of cpu and memory requirements for given task/model. | ||
|
||
- Friendly interface for handling **probabilistic** prediction. ✔ | ||
|
||
|
||
### Frequently Asked Questions | ||
|
||
See [here](docs/src/frequently_asked_questions.md). | ||
|
||
|
||
### Getting started | ||
|
||
Get started | ||
[here](https://alan-turing-institute.github.io/MLJ.jl/stable/), | ||
or take the MLJ [tour](/examples/tour/tour.ipynb). | ||
#### Contributors | ||
|
||
*Core design*: A. Blaom, F. Kiraly, S. Vollmer | ||
|
||
### History | ||
*Active maintainers*: A. Blaom, T. Lienart | ||
|
||
Antecedents for the current package are | ||
[AnalyticalEngine.jl](https://github.com/tlienart/AnalyticalEngine.jl), | ||
[Orchestra.jl](https://github.com/svs14/Orchestra.jl), and | ||
[Koala.jl](https://github.com/ablaom/Koala.jl). Development was also | ||
guided by a research study group at the University of Warwick, | ||
beginning with a review of existing ML Modules that were available in | ||
Julia at the time | ||
([in-depth](https://github.com/dominusmi/Julia-Machine-Learning-Review/tree/master/Educational), | ||
[overview](https://github.com/dominusmi/Julia-Machine-Learning-Review/tree/master/Package%20Review)). | ||
*Active collaborators*: D. Arenas, D. Buchaca, J. Hoffimann, S. Okon, J. Samaroo, S. Vollmer | ||
|
||
![alt text](material/packages.jpg) | ||
*Past collaborators*: D. Aluthge, E. Barp, G. Bohner, M. K. Borregaard, V. Churavy, H. Devereux, M. Giordano, M. Innes, F. Kiraly, M. Nook, Z. Nugent, P. Oleśkiewicz, A. Shridar, Y. Simillides, A. Sengupta, A. Stechemesser. | ||
|
||
Further work culminated in the first MLJ | ||
[proof-of-concept](https://github.com/alan-turing-institute/MLJ.jl/tree/poc) | ||
#### License | ||
|
||
For administrators: [Implementing requests to register new models](REGISTRY.md). | ||
MLJ is supported by the Alan Turing Institute and released under the MIT "Expat" License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.