Skip to content

Commit

Permalink
PARTIAL #6 - Update README and add migration guide
Browse files Browse the repository at this point in the history
  • Loading branch information
Galileo-Galilei committed Nov 3, 2020
1 parent 0cb6b15 commit dec8c56
Show file tree
Hide file tree
Showing 4 changed files with 115 additions and 39 deletions.
20 changes: 14 additions & 6 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,17 @@ current_version = 0.3.0
[bumpversion:file:setup.py]

[bumpversion:file:kedro_mlflow/__init__.py]
[bumpversion:file:kedro-mlflow/docs/source/01_introduction/02_motivation.md]
[bumpversion:file:kedro-mlflow/docs/source/01_introduction/03_installation.md]
[bumpversion:file:kedro-mlflow/docs/source/02_hello_world_example/01_example_project.md]
[bumpversion:file:kedro-mlflow/docs/source/02_hello_world_example/02_first_steps.md]
[bumpversion:file:kedro-mlflow/docs/source/03_tutorial/04_version_parameters.md]
[bumpversion:file:kedro-mlflow/docs/source/03_tutorial/05_version_datasets.md]

[bumpversion:file:README.md]

[bumpversion:file:docs/source/01_introduction/02_motivation.md]

[bumpversion:file:docs/source/01_introduction/03_installation.md]

[bumpversion:file:docs/source/02_hello_world_example/01_example_project.md]

[bumpversion:file:docs/source/02_hello_world_example/02_first_steps.md]

[bumpversion:file:docs/source/03_tutorial/04_version_parameters.md]

[bumpversion:file:docs/source/03_tutorial/05_version_datasets.md]
50 changes: 29 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
**General informations**

[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg)](https://pypi.org/project/kedro-mlflow/) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
[![SemVer](https://img.shields.io/badge/semver-2.0.0-green)](https://semver.org/)

----------------------------------------------------------
| Software repository | Latest release | Total downloads |
Expand All @@ -10,30 +11,17 @@
**Code health**

----------------------------------------------------------
| Branch | Tests | Coverage | Documentation | Deployment |
|--------|-------|----------|---------------|------------|
| `develop`| [![test](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/test/badge.svg?branch=develop)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=workflow%3Atest+branch%3Adevelop)| [![codecov](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/develop/graph/badge.svg)](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/develop)|[![Documentation](https://readthedocs.org/projects/kedro-mlflow/badge/?version=latest)](https://kedro-mlflow.readthedocs.io/en/latest/)| [![create-release-candidate](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/create-release-candidate/badge.svg?branch=develop)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=branch%3Adevelop+workflow%3Acreate-release-candidate)|
| `master` | [![test](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/test/badge.svg?branch=master)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=workflow%3Atest+branch%3Amaster) | [![codecov](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/master/graph/badge.svg)](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/master)|[![Documentation](https://readthedocs.org/projects/kedro-mlflow/badge/?version=stable)](https://kedro-mlflow.readthedocs.io/en/stable/)|[![publish](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/publish/badge.svg?branch=master)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=branch%3Amaster+workflow%3Apublish)|

**Main contributors**

The following people actively maintain, enhance and discuss design to make this package as good as possible. Many thanks to them!
- [Yolan Honoré-Rougé](https://github.com/galileo-galilei)
- [Kajetan Maurycy Olszewski](https://github.com/kaemo)
- [Adrian Piotr Kruszewski](https://github.com/akruszewski)
- [Takieddine Kadiri](https://github.com/takikadiri)

# Release and roadmap

The [release history](https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/CHANGELOG.md) centralizes packages improvements across time. The main features coming in next releases are [listed on github milestones](https://github.com/Galileo-Galilei/kedro-mlflow/milestones). Feel free to upvote/downvote and discuss prioritization in associated issues.
| Branch | Tests | Coverage | Documentation | Deployment | Activity |
|--------|-------|----------|---------------|------------|------------|
| `develop`| [![test](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/test/badge.svg?branch=develop)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=workflow%3Atest+branch%3Adevelop)| [![codecov](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/develop/graph/badge.svg)](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/develop)|[![Documentation](https://readthedocs.org/projects/kedro-mlflow/badge/?version=latest)](https://kedro-mlflow.readthedocs.io/en/latest/)| [![create-release-candidate](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/create-release-candidate/badge.svg?branch=develop)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=branch%3Adevelop+workflow%3Acreate-release-candidate)|[![commit](https://img.shields.io/github/commits-since/Galileo-Galilei/kedro-mlflow/0.3.0)](https://github.com/Galileo-Galilei/kedro-mlflow/compare/0.3.0...develop)|
| `master` | [![test](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/test/badge.svg?branch=master)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=workflow%3Atest+branch%3Amaster) | [![codecov](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/master/graph/badge.svg)](https://codecov.io/gh/Galileo-Galilei/kedro-mlflow/branch/master)|[![Documentation](https://readthedocs.org/projects/kedro-mlflow/badge/?version=stable)](https://kedro-mlflow.readthedocs.io/en/stable/)|[![publish](https://github.com/Galileo-Galilei/kedro-mlflow/workflows/publish/badge.svg?branch=master)](https://github.com/Galileo-Galilei/kedro-mlflow/actions?query=branch%3Amaster+workflow%3Apublish)||

# What is kedro-mlflow?

``kedro-mlflow`` is a [kedro-plugin](https://kedro.readthedocs.io/en/stable/04_user_guide/10_developing_plugins.html) for lightweight and portable integration of [mlflow](https://mlflow.org/docs/latest/index.html) capabilities inside [kedro](https://kedro.readthedocs.io/en/stable/index.html) projects. It enforces [``Kedro`` principles](https://kedro.readthedocs.io/en/stable/12_faq/01_faq.html?highlight=principles#what-is-the-philosophy-behind-kedro) to make mlflow usage as production ready as possible. Its core functionalities are :

- **versioning**: you can effortlessly register your parameters or your datasets with minimal configuration in a kedro run. Later, you will be able to browse your runs in the mlflow UI, and retrieve the runs you want. This is directly linked to [Mlflow Tracking](https://www.mlflow.org/docs/latest/tracking.html).
- **model packaging**: ``kedro-mlflow`` offers a convenient API to register a pipeline as a ``model`` in the mlflow sense. Consequently, you can *API-fy* or serve your kedro pipeline with one line of code, or share a model with without worrying of the preprocessing to be made for further use. This is directly linked to [Mlflow Models](https://www.mlflow.org/docs/latest/models.html).

- **versioning**: `kedro-mlflow` intends to enhance reproducibility for machine learning experimentation. With `kedro-mlflow` installed, you can effortlessly register your parameters or your datasets with minimal configuration in a kedro run. Later, you will be able to browse your runs in the mlflow UI, and retrieve the runs you want. This is directly linked to [Mlflow Tracking](https://www.mlflow.org/docs/latest/tracking.html).
- **model packaging**: ``kedro-mlflow`` intends to be be an agnostic machine learning framework for people who want to write portable, production ready machine learning pipelines. It offers a convenient API to convert a Kedro pipeline to a ``model`` in the mlflow sense. Consequently, you can *API-fy* or serve your Kedro pipeline with one line of code, or share a model with without worrying of the preprocessing to be made for further use. This is directly linked to [Mlflow Models](https://www.mlflow.org/docs/latest/models.html).

# How do I install kedro-mlflow?

Expand All @@ -53,7 +41,6 @@ pip install --upgrade git+https://github.com/Galileo-Galilei/kedro-mlflow.git@de

I strongly recommend to use ``conda`` (a package manager) to create an environment and to read [``kedro`` installation guide](https://kedro.readthedocs.io/en/stable/02_getting_started/01_prerequisites.html).


# Getting started

The documentation contains:
Expand All @@ -68,6 +55,27 @@ Some frequently asked questions on more advanced features:
- You want to create easily an API to share your awesome model to anyone? -> [See if ``pipeline_ml_factory`` can fit your needs](https://github.com/Galileo-Galilei/kedro-mlflow/issues/16)
- You want to do something that is not straigthforward with current implementation? Open an issue, and let's see what happens!

# Release and roadmap

The [release history](https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/CHANGELOG.md) centralizes packages improvements across time. The main features coming in next releases are [listed on github milestones](https://github.com/Galileo-Galilei/kedro-mlflow/milestones). Feel free to upvote/downvote and discuss prioritization in associated issues.

# Disclaimer

This package is still in active development. We use [SemVer](https://semver.org/) principles to version our releases. Until we reach `1.0.0` milestone, breaking changes will lead to `<minor>` version number increment, while releases which do not introduce breaking changes in the API will lead to `<patch>` version number increment.

The user must be aware that we will not reach `1.0.0` milestone before Kedro does (mlflow has already reached `1.0.0`).

If you want to see how to migrate from one version of `kedro-mlflow` to another, see the [migration guide](docs/source/03_tutorial/00_migration_guide.md).

# Can I contribute?

I'd be happy to receive help to maintain and improve the package. Please check the [contributing guidelines](https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/CONTRIBUTING.md).
We'd be happy to receive help to maintain and improve the package. Please check the [contributing guidelines](https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/CONTRIBUTING.md).

#### Main contributors

The following people actively maintain, enhance and discuss design to make this package as good as possible. Many thanks to them!

- [Yolan Honoré-Rougé](https://github.com/galileo-galilei)
- [Kajetan Maurycy Olszewski](https://github.com/kaemo)
- [Adrian Piotr Kruszewski](https://github.com/akruszewski)
- [Takieddine Kadiri](https://github.com/takikadiri)
35 changes: 23 additions & 12 deletions docs/source/02_hello_world_example/02_first_steps.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,37 @@
# First steps with the plugins

## Initialize kedro-mlflow

Run

```console
kedro mlflow init
```

You have the following message:

```console
'conf/base/mlflow.yml' successfully updated.
```

The ``conf/base`` folder is updated:

![](../imgs/initialized_project.png)
![initialized_project](../imgs/initialized_project.png)

If you have configured your own mlflow server, you can specify the tracking uri in the ``mlflow.yml`` (replace the highlighted line below:):

![](../imgs/mlflow_yml.png)
![mlflow_yml](../imgs/mlflow_yml.png)

## Run the pipeline

Open a new command and launch

```console
kedro run
```

If the pipeline executes properly, you should see the following log:

```console
2020-07-13 21:29:24,939 - kedro.versioning.journal - WARNING - Unable to git describe path/to/km-example
2020-07-13 21:29:25,401 - kedro.io.data_catalog - INFO - Loading data from `example_iris_data` (CSVDataSet)...
Expand Down Expand Up @@ -56,11 +63,12 @@ If the pipeline executes properly, you should see the following log:

Since we have kept the default value of the ``mlflow.yml``, the tracking uri (the place where runs are recorded) is a local ``mlruns`` folder which has just been created with the execution:

![](../imgs/once_run_project.png)
![once_run_project](../imgs/once_run_project.png)

## Open the UI

Launch the ui:

```console
kedro mlflow ui
```
Expand All @@ -69,15 +77,18 @@ And open the following adress in your favorite browser

``http://localhost:5000/``

![](../imgs/mlflow_host_page.png)
![mlflow_host_page](../imgs/mlflow_host_page.png)

Click now on the last run executed, you will land on this page:

![](../imgs/mlflow_run.png)
![mlflow_run](../imgs/mlflow_run.png)

### Parameters versioning

Note that the parameters have been recorded *automagically*. Here, two parameters format are used:
1. The parameter ``example_test_data_ratio``, which is called in the ``pipeline.py`` file with the ``params:`` prefix

1. The parameter ``example_test_data_ratio``, which is called in the ``pipeline.py`` file with the
``params:`` prefix
2. the dictionary of all parameters in ``parameters.yml`` which is a reserved key word in ``Kedro``. Note that **this is bad practice** because you cannot know which parameters are really used inside the function called. Another problem is that it can generate too long parameters names and lead to mlflow errors.

You can see that these are effectively the registered parameters in the pipeline with the ``kedro-viz`` plugin:
Expand All @@ -89,13 +100,13 @@ kedro viz

Open your browser at the following adress:

```
```browser
http://localhost:4141/
```

You should see the following graph:

![](../imgs/kedro_viz_params.png)
![kedro_viz_params](../imgs/kedro_viz_params.png)

which indicates clearly which parameters are logged (in the red boxes with the "parameter" icon).

Expand All @@ -109,16 +120,16 @@ With this run, artifacts are empty. This is expected: mlflow does not know what

First, open the ``catalog.yml`` file which should like this:

![](../imgs/default_catalog.png)
![default_catalog](../imgs/default_catalog.png)

And persist the model as a pickle with the ``MlflowArtifactDataSet`` class:

![](../imgs/updated_catalog.png)
![updated_catalog](../imgs/updated_catalog.png)

Reopen the ui, select the last run and see that the file was uploaded:

![](../imgs/run_with_artifact.png)
![run_with_artifact](../imgs/run_with_artifact.png)

This works for any type of file (including images with ``MatplotlibWriter``) and the UI even offers a preview for ``png`` and ``csv``, which is really convenient to compare runs.

*Note: Mlflow offers specific logging for machine learning models that may be better suited for your use case, but is not supported yet in ``kedro-mlflow==0.3.0``*
*Note: Mlflow offers specific logging for machine learning models that may be better suited for your use case, see `MlflowModelLoggerDataSet`*
49 changes: 49 additions & 0 deletions docs/source/03_tutorial/00_migration_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Migration guide

This page explains how to migrate between versions with breaking changes, if you had an existing kedro project.

## Migration from 0.3.0 to 0.4.0

### Catalog entries

Replace the follwoing entries:

|---------------------------------------|-------------------------------------------------|
|old |new |
|---------------------------------------|-------------------------------------------------|
|`kedro_mlflow.io.MlflowArtifactDataSet`|`kedro_mlflow.io.artifacts.MlflowArtifactDataSet`|
|`kedro_mlflow.io.MlflowMetricsDataSet` |`kedro_mlflow.io.metrics.MlflowMetricsDataSet` |

### Hooks

Hooks are now auto-registered if you use `kedro>=0.16.4`. You can remove the following entry from your `run.py`:

```python
hooks = (
MlflowPipelineHook(),
MlflowNodeHook()
)
```

### KedroPipelineModel

Be aware that if you had trained saved a pipeline as a mlflow model with `pipeline_ml_factory`, retraining this pipeline with `kedro-mlflow==0.4.0` will lead to a new behaviour. Let assume the name of your output in the `DataCatalog` was `predictions`, the output of a registered model will be modified from:

```json
{
predictions:
{
<your model-predictions>
}
}
```

to:

```json
{
<your model-predictions>
}
```

Thus, parsing the predictions of this model must be updated accordingly.

0 comments on commit dec8c56

Please sign in to comment.