MLJFlow.jl

Branch	Build	Coverage
dev

MLJ is a Julia framework for combining and tuning machine learning models. MLJFlow is a package that extends the MLJ capabilities to use MLflow as a backend for model tracking and experiment management. To be specific, MLJFlow provides a close to zero-preparation to use MLflow with MLJ; by the usage of function extensions that automate the MLflow cycle (create experiment, create run, log metrics, log parameters, log artifacts, etc.).

Background

This project is part of the GSoC 2023 program. The proposal description can be found here. The entire workload is divided into three different repositories: MLJ.jl, MLFlowClient.jl and this one.

Features

MLflow cycle automation (create experiment, create run, log metrics, log parameters, log artifacts, etc.)
Provides a wrapper Logger for MLFlowClient.jl clients and associated metadata; instances of this type are valid "loggers", which can be passed to MLJ functions supporting the logger keyword argument.
Provides MLflow integration with MLJ's evaluate!/evaluate method (model performance evaluation)
Extends MLJ's MLJ.save method, to save trained machines as retrievable MLflow client artifacts
Provides MLflow integration with MLJ's TunedModel wrapper (to log hyper-parameter tuning workflows)
Provides MLflow integration with MLJ's IteratedModel wrapper (to log controlled iteration of tree gradient boosters, neural networks, and other iterative models)
Plays well with composite models (pipelines, stacks, etc.)

Examples

Logging a model performance evaluation

The example below assumes the user is familiar with basic MLflow concepts. We suppose an MLflow API service is running on a local server, with address "http://127.0.0.1:5000". (In a shell/console, run mlflow server to launch an mlflow service on a local server.)

Refer to the MLflow documentation for necessary background.

We assume MLJDecisionTreeClassifier is in the user's active Julia package environment.

using MLJ # Requires MLJ.jl version 0.19.3 or higher

We first define a logger, providing the address of our running MLflow. The experiment name and artifact location are optional.

logger = MLJFlow.Logger(
    "http://127.0.0.1:5000";
    experiment_name="MLJFlow test",
    artifact_location="./mlj-test"
)

Next, grab some synthetic data and choose an MLJ model:

X, y = make_moons(100) # a table and a vector with 100 rows
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
model = DecisionTreeClassifier(max_depth=4)

Now we call evaluate as usual but provide the logger as a keyword argument:

evaluate(model, X, y, resampling=CV(nfolds=5), measures=[LogLoss(), Accuracy()], logger=logger)

Navigate to "http://127.0.0.1:5000" on your browser and select the "Experiment" matching the name above ("MLJFlow test"). Select the single run displayed to see the logged results of the performance evaluation.

Saving and retrieving trained machines as MLflow artifacts

Let's train the model on all data and save the trained machine as an MLflow artifact:

mach = machine(model, X, y) |> fit!
run = MLJBase.save(logger, mach)

Notice that in this case MLJBase.save returns a run (and instance of MLFlowRun from MLFlowClient.jl).

To retrieve an artifact we need to use the MLFlowClient.jl API, and for that we need to know the MLflow service that our logger wraps:

service = MLJFlow.service(logger)

And we reconstruct our trained machine thus:

using MLFlowClient
artifacts = MLFlowClient.listartifacts(service, run)
mach2 = machine(artifact[1].filepath)

We can predict using the deserialized machine:

predict(mach2, X)

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github		.github
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

src

src

test

test

.gitignore

.gitignore

LICENSE

LICENSE

Project.toml

Project.toml

README.md

README.md

Repository files navigation

MLJFlow.jl

Background

Features

Examples

Logging a model performance evaluation

Saving and retrieving trained machines as MLflow artifacts

About

Releases 3

Packages

Contributors 2

Languages

License

JuliaAI/MLJFlow.jl

Folders and files

Latest commit

History

Repository files navigation

MLJFlow.jl

Background

Features

Examples

Logging a model performance evaluation

Saving and retrieving trained machines as MLflow artifacts

About

Resources

License

Stars

Watchers

Forks

Languages