Road map

February 2020; updated, January 2021

Please visit contributing guidelines if interested in contributing to MLJ.

Guiding goals

Usability, interoperability, extensibility, reproducibility, and code transparency.
Offer state-of-art tools for model composition and model optimization (hyper-parameter tuning)
Avoid common pain-points of other frameworks:
- identifying all models that solve a given task
- routine operations requiring a lot of code
- passage from data source to algorithm-specific data format
- probabilistic predictions: inconsistent representations, lack of options for performance evaluation
Add some focus to julia machine learning software development more generally

Priorities

Priorities are somewhat fluid, depending on funding offers and available talent. Rough priorities for the core development team at present are marked with † below. However, we are always keen to review external contributions in any area.

Future enhancements

The following road map is more big-picture; see also this list.

Adding models

**Integrate deep learning using Flux.jl deep learning Done but can improve experience by finishing #139 and get better performance by implementing data front-end after MLJBase #501 is merged.
† ~~Probabilistic programming: Turing.jl, Gen, Soss.jl #157 discourse thread~~ done but experimental and requires extension of probabilistic scoring functions to "distributions" that can only be sampled.
Feature engineering (python featuretools?, recursive feature elimination?) #426 MLJModels #314

Enhancing core functionality

† Iterative model control #139
† Add more tuning strategies. HyperOpt.jl integration? Particular focus on ~~random search (#37)~~ (done), Bayesian methods (starting with Gaussian Process methods, a la PyMC3) and a POC for AD-powered gradient descent. See here for complete wish-list. #74 #38 #37. Add tuning strategies for non-Cartesian spaces of models MLJTuning #18
Systematic benchmarking, probably modeled on MLaut #69
Give EnsembleModel more extendible API and extend beyond bagging (boosting, etc) and migrate to separate repository? #363
† Enhance complex model compostition: Introduce a canned stacking model wrapper (POC). Get rid of macros for creating pipelines and possibly implement target transforms as wrapper (MLJBase #594)

Broadening scope

Spin-off a stand-alone measures (loss functions) package (currently here). Introduce measures for multi-targets MLJBase #502.
Add sparse data support and better support for NLP models; we could use NaiveBayes.jl as a POC (currently wrapped only for dense input) but the API needs finalizing first {#731](#731).
POC for implementation of time series models classification #303, ScientificTypes #14
POC for time series forecasting; probably needs MLJBase #502 first, and someone to finish PR on time series CV.
Add tools or separate repository for visualization in MLJ. Only end-to-end visualization provided now is for two-parameter model tuning #85 (closed) #416 #342
Add more pre-processing tools, enhance MLJScientificType's autotype method.

Scalability

Roll out data front-ends for all models after MLJBase #501 is merged.
Online learning support and distributed data #60
DAG scheduling for learning network training #72 (multithreading first?)
Automated estimates of cpu/memory requirements #71
~~Add multithreading to tuning MLJTuning #15~~ Done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROADMAP.md

ROADMAP.md

Road map

Guiding goals

Priorities

Future enhancements

Adding models

Enhancing core functionality

Broadening scope

Scalability

Files

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Road map

Guiding goals

Priorities

Future enhancements

Adding models

Enhancing core functionality

Broadening scope

Scalability