February 2020
Please visit contributing guidelines if interested in contributing to MLJ.
-
Usability, interoperability, extensibility, reproducibility, and code transparency.
-
Offer state-of-art tools for model composition and model optimization (hyper-parameter tuning)
-
Avoid common pain-points of other frameworks:
-
identifying all models that solve a given task
-
routine operations requiring a lot of code
-
passage from data source to algorithm-specific data format
-
probabilistic predictions: inconsistent representations, lack of options for performance evaluation
-
-
Add some focus to julia machine learning software development more generally
Priorities are somewhat fluid, depending on funding offers and available talent. Rough priorities for the core development team at present are marked with † below. However, we are always keen to review external contributions in any area.
-
† Probabilistic programming: Turing.jl, Gen, Soss.jl (POC) #157 discourse thread
-
Feature engineering (python featuretools?, recursive feature elimination?) #426
-
† Iterative model control #139
-
† Add more tuning strategies. HyperOpt.jl integration. Particular focus on random search, Bayesian methods, and AD-powered gradient descent. See here for complete wish-list. #74 #38 #37
-
Give
EnsembleModel
more extendible API and extend beyond bagging (boosting, etc) and migrate to separate repository? #363 -
† Enhance complex model compostition, in particular stacking (POC) #311 #282
-
Spin-off a stand-alone measures (loss functions) package (currently here)
-
Add sparse data support (NLP); could use NaiveBayes.jl as test case (currently wrapped only for dense input)
-
POC for implementation of time series models #303, ScientificTypes #14
-
Add tools or separate repository for visualization in MLJ. Only end-to-end visualization provided now is for two-parameter model tuning #85 (closed) #416 #342
-
Add more pre-processing tools, enhance MLJScientificType's
autotype
method.
-
Online learning support and distributed data #60
-
DAG scheduling for learning network training #72 (multithreading first?)
-
Automated estimates of cpu/memory requirements #71
-
Add multithreading to tuning MLJTuning #15