Disintegration of MLJModels #244

ablaom · 2020-04-29T06:07:10Z

Maintenance of MLJModels has become increasingly burdensome for several reasons. Perhaps the biggest problem is that it's centralised approach to providing model API implementations ("glue code") means:

Testing takes a long time. Tolerable, just.
[extras] must include a very large number of packages which invariably cause fatal version conflicts with the packges in [deps] during CI. (As I understand it, during a test, the [deps] are essentially pinned when [extras] are loaded.) Less tolerable.
With the existing package manager, we have no way to specify bounds on the algorithm-providing packages (the ones in [extras]). The latest release compatible with [deps] always get's loaded. If just one (of these many) packages makes a breaking change to the "glue code", then MLJModels CI fails.

While JuliaLang/Pkg.jl#1285 may help with the second and third issue, I don't think that is close to being resolved. We have also observed elsewhere that code loading using Requires.jl can be slower than otherwise. (And there is #243)

While the plan has always been for all algorithm-providing packages to implement their MLJ interfaces natively, this is not going to happen quickly. In the meantime, it would be good to address the issues above.

In discussions of the core team, @tlienart has suggested the following remedy: Move the glue code for each package X into its own repository Xglue, with its own testing, and make X an ordinary dependency of Xglue. (The package Xglue would be purely a "utility" package and essentially invisible to general users.)

Such migrations could be performed incrementally and I believe each migration would trigger only a patch release, basically because the model registry (which tracks where to find glue code) is part of MLJModels (and so is opaque to MLJ).

I think this a good idea and propose beginning this disintegation; See TODO list.
cc @DilumAluthge

ablaom · 2020-04-29T07:53:03Z

See also @tlienart 's earlier post at JuliaAI/MLJ.jl#276

tlienart · 2020-04-29T07:55:12Z

Great, and just to stress it, we don't want this to be the rule, we want this to be a temporary solution and hope that the packages for which we end up writing glue code will integrate and own said glue code (we understand that some of these are slow moving packages with a lot of legacy and may be slow integrating stuff...)

We definitely do not want other package devs with brand new cool packages to suggest a glue package. They should just own the interface like ParallelKMeans, EvoTrees or MLJLinearModels

DilumAluthge · 2020-04-29T15:59:06Z

FWIW, we now have support for storing multiple registered Julia packages inside subdirectories a single Git repository.

So e.g. if I already have a Git repository for my package MyPackage.jl, I can now make a subdirectory inside that repository, and inside that subdirectory I can store a package called MyPackageGlue.jl.

DilumAluthge · 2020-04-29T16:00:20Z

FWIW, we now have support for storing multiple registered Julia packages inside subdirectories a single Git repository.

So e.g. if I already have a Git repository for my package MyPackage.jl, I can now make a subdirectory inside that repository, and inside that subdirectory I can store a package called MyPackageGlue.jl.

This works exactly the same as if MyPackage.jl and MyPackageGlue.jl are stored in different Git repositories. But I figure that at least some people will find it easier/more convenient to store MyPackage.jl and MyPackageGlue.jl in the same Git repository.

tlienart · 2020-04-29T17:14:44Z

that's great! though here I think that having separate repos has the advantage that when eventually the original package (e.g. MultivariateStats) owns the glu code, we can just archive the repo whereas here you'd have to remove a subrepos which might be messier to maintain overall (?)

DilumAluthge · 2020-04-29T17:51:17Z

So it's not a subrepo or anything fancy. It's just a regular folder. So if you want to move the glue code to the main package, just rename and move the files.

DilumAluthge · 2020-04-29T18:28:05Z

Does that explanation make sense?

Here's an example repo with two packages: https://github.com/DilumAluthge/FooBar

There's nothing fancy, just regular folders. If you want to move the Bar package code into the Foo package code, just move/rename the files. If you want to delete the Bar package entirely, just delete the files.

ablaom · 2020-04-29T23:01:48Z

@DilumAluthge How do release notes work in a Repo hosting multiple packages?

DilumAluthge · 2020-04-29T23:31:10Z

The idea is that you would make tags that look like this:

PkgA-v1.2.3
PkgB-v4.4.6

In the appropriate tag, you have release notes for the relevant package only.

DilumAluthge · 2020-04-29T23:32:17Z

You can have any system you want of course. That is just a suggestion.

tlienart · 2020-04-30T08:13:27Z

I think this is cool but I also think that in this particular case having separate repos is simple & would help see clearly where the code ought to be & the git history, I have a feeling this will cause more headaches especially for new maintainers.

Our end goal here is that this code ends up outside MLJModels and is integrated in other packages so I think having subfolders (I understand it's not subrepos) just does not seem to be the right approach here even if it's fully supported by Pkg.

DilumAluthge · 2020-04-30T08:14:44Z

Yeah, given the long term goal here, I think it makes sense to keep these "glue" packages in separate Git repositories.

ablaom · 2020-06-10T01:38:47Z

Packages with implementation code to migrate out to separate package:

Those marked with * are candidates for algorithm-providing package to host model implementation code.

Others get migrated to new package called "MLJPackageInterface".

Important

The package_name trait should stay the same but the load_path traits will need to reflect new location of the model implementation code. For example:

name = "Birch",
package_name = "ScikitLearn",
load_path = "MLJScikitLearnInterface.Birch"

cc @tlienart

DilumAluthge · 2020-09-16T08:48:11Z

The MLJ.jl README has a very nice visualization of the dependency graph, stored in the repo at https://github.com/alan-turing-institute/MLJ.jl/blob/master/material/MLJ_stack.svg

As we break up MLJModels and MLJBase, we should definitely remember to keep this image up-to-date.

ablaom · 2021-02-08T01:36:31Z

All done 🎉

Thanks to all who helped in this large project, especially @OkonSamuel @tlienart and @ExpandingMan

ablaom added the code organization label Apr 29, 2020

ablaom mentioned this issue Apr 29, 2020

[proposal] set explicit compat versions for the model registry JuliaAI/MLJ.jl#276

Closed

ablaom mentioned this issue May 2, 2020

XGBoostClassifier can't be serialised. Add custom serialisation? JuliaAI/MLJ.jl#512

Closed

ablaom mentioned this issue Jun 10, 2020

Remove ScikitLearn and update all classifiers to predict UnivariateFiniteArrays #265

Merged

This was referenced Jul 23, 2020

Can't use @load inside a package JuliaAI/MLJ.jl#613

Closed

lazy activation of models not working from within packages #22

Closed

ablaom mentioned this issue Sep 15, 2020

New package: SossMLJ v0.1.0 JuliaRegistries/General#21452

Merged

DilumAluthge mentioned this issue Sep 16, 2020

Disintegration of MLJBase (discussion and tracking issue) JuliaAI/MLJBase.jl#416

Open

5 tasks

ablaom mentioned this issue Sep 28, 2020

Migrate interface for NaiveBayes out to MLJNaiveBayesInterface #309

Closed

This was referenced Oct 13, 2020

Complete migration of MultivariateStats and NaiveBayes interfaces #318

Merged

Meta issue: lssues for possible collaboration with UCL JuliaAI/MLJ.jl#673

Closed

OkonSamuel mentioned this issue Oct 26, 2020

migration of Clustering interface to MLJClusteringInterface.jl #329

Merged

ablaom mentioned this issue Oct 27, 2020

Generalize KNNRegressor to multitarget case #328

Merged

ablaom pinned this issue Nov 4, 2020

This was referenced Nov 5, 2020

DecisionTree interface emigration #335

Closed

Emigration of GLM #336

Closed

ablaom mentioned this issue Nov 6, 2020

Emigration of LIBSVM #339

Closed

ablaom mentioned this issue Nov 30, 2020

Curated list of models JuliaAI/MLJ.jl#716

Closed

ablaom closed this as completed Feb 8, 2021

ablaom unpinned this issue Feb 8, 2021

ablaom mentioned this issue Feb 8, 2021

For a 0.14.0 release #362

Merged

frapac mentioned this issue May 6, 2021

Restructure dependencies in MadNLP MadNLP/MadNLP.jl#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disintegration of MLJModels #244

Disintegration of MLJModels #244

ablaom commented Apr 29, 2020 •

edited

Loading

ablaom commented Apr 29, 2020

tlienart commented Apr 29, 2020 •

edited

Loading

DilumAluthge commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

tlienart commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020 •

edited

Loading

ablaom commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

tlienart commented Apr 30, 2020 •

edited

Loading

DilumAluthge commented Apr 30, 2020

ablaom commented Jun 10, 2020 •

edited

Loading

DilumAluthge commented Sep 16, 2020

ablaom commented Feb 8, 2021

Disintegration of MLJModels #244

Disintegration of MLJModels #244

Comments

ablaom commented Apr 29, 2020 • edited Loading

ablaom commented Apr 29, 2020

tlienart commented Apr 29, 2020 • edited Loading

DilumAluthge commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

tlienart commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020 • edited Loading

ablaom commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

DilumAluthge commented Apr 29, 2020

tlienart commented Apr 30, 2020 • edited Loading

DilumAluthge commented Apr 30, 2020

ablaom commented Jun 10, 2020 • edited Loading

DilumAluthge commented Sep 16, 2020

ablaom commented Feb 8, 2021

ablaom commented Apr 29, 2020 •

edited

Loading

tlienart commented Apr 29, 2020 •

edited

Loading

DilumAluthge commented Apr 29, 2020 •

edited

Loading

tlienart commented Apr 30, 2020 •

edited

Loading

ablaom commented Jun 10, 2020 •

edited

Loading