Skip to content

Commit

Permalink
💥 ♻️ Rename MlflowModelSaverDataSet to MlflowModelLocalFileSystemData…
Browse files Browse the repository at this point in the history
…set (#391)
  • Loading branch information
Galileo-Galilei committed Oct 22, 2023
1 parent 494b3af commit 772b584
Show file tree
Hide file tree
Showing 10 changed files with 27 additions and 26 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- ``MlflowMetricsDataSet``->``MlflowMetricsDataset``
- :boom: :recycle: Rename the following ``DataSets`` to make their use more explicit, and use the ``Dataset`` suffix:
- ``MlflowModelLoggerDataSet``->``MlflowModelTrackingDataset``
- ``MlflowModelSaverDataSet``->``MlflowModelLocalFileSystemDataset``

## [0.11.10] - 2023-10-03

Expand Down Expand Up @@ -304,7 +305,7 @@

### Fixed

- :bug: Fix `TypeError: unsupported operand type(s) for /: 'str' and 'str'` when using `MlflowArtifactDataSet` with `MlflowModelSaverDataSet` ([#116](https://github.com/Galileo-Galilei/kedro-mlflow/issues/116))
- :bug: Fix `TypeError: unsupported operand type(s) for /: 'str' and 'str'` when using `MlflowArtifactDataSet` with `MlflowModelLocalFileSystemDataset` ([#116](https://github.com/Galileo-Galilei/kedro-mlflow/issues/116))
- :memo: Fix various docs typo ([#6](https://github.com/Galileo-Galilei/kedro-mlflow/issues/6))
- :bug: When the underlying Kedro pipeline fails, the associated mlflow run is now marked as 'FAILED' instead of 'FINISHED'. It is rendered with a red cross instead of the green tick in the mlflow user interface ([#121](https://github.com/Galileo-Galilei/kedro-mlflow/issues/121)).
- :bug: Fix a bug which made `KedroPipelineModel` impossible to load if one of its artifact was a `MlflowModel<Saver/Logger>DataSet`. These datasets were not deepcopiable because of one their attributes was a module ([#122](https://github.com/Galileo-Galilei/kedro-mlflow/issues/122)).
Expand All @@ -322,7 +323,7 @@
- :sparkles: `kedro-mlflow` now supports configuring the project in `pyproject.toml` (_Only for kedro>=0.16.5_) ([#96](https://github.com/Galileo-Galilei/kedro-mlflow/issues/96))
- :sparkles: `pipeline_ml_factory` now accepts that `inference` pipeline `inputs` may be in `training` pipeline `inputs` ([#71](https://github.com/Galileo-Galilei/kedro-mlflow/issues/71))
- :sparkles: `pipeline_ml_factory` now infer automatically the schema of the input dataset to validate data automatically at inference time. The output schema can be declared manually in `model_signature` argument ([#70](https://github.com/Galileo-Galilei/kedro-mlflow/issues/70))
- :sparkles: Add two DataSets for model logging and saving: `MlflowModelTrackingDataset` and `MlflowModelSaverDataSet` ([#12](https://github.com/Galileo-Galilei/kedro-mlflow/issues/12))
- :sparkles: Add two DataSets for model logging and saving: `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset` ([#12](https://github.com/Galileo-Galilei/kedro-mlflow/issues/12))
- :sparkles: `MlflowPipelineHook` and `MlflowNodeHook` are now [auto-registered](https://kedro.readthedocs.io/en/latest/hooks/introduction.html#registering-your-hook-implementations-with-kedro) if you use `kedro>=0.16.4` ([#29](https://github.com/Galileo-Galilei/kedro-mlflow/issues/29))

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion docs/source/01_introduction/02_motivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Above implementations have the advantage of being very straightforward and *mlfl
| Set up configuration | ``mlflow.yml`` | ``MlflowHook`` |
| Logging parameters | ``mlflow.yml`` | ``MlflowHook`` |
| Logging artifacts | ``catalog.yml`` | ``MlflowArtifactDataset`` |
| Logging models | ``catalog.yml`` | `MlflowModelTrackingDataset` and `MlflowModelSaverDataSet` |
| Logging models | ``catalog.yml`` | `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset` |
| Logging metrics | ``catalog.yml`` | ``MlflowMetricsDataset`` |
| Logging Pipeline as model | ``hooks.py`` | ``KedroPipelineModel`` and ``pipeline_ml_factory`` |

Expand Down
10 changes: 5 additions & 5 deletions docs/source/04_experimentation_tracking/04_version_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ MLflow allows to serialize and deserialize models to a common format, track thos

## How to track models using MLflow in Kedro project?

`kedro-mlflow` introduces two new `DataSet` types that can be used in `DataCatalog` called `MlflowModelTrackingDataset` and `MlflowModelSaverDataSet`. The two have very similar API, except that:
`kedro-mlflow` introduces two new `DataSet` types that can be used in `DataCatalog` called `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset`. The two have very similar API, except that:

- the ``MlflowModelTrackingDataset`` is used to load from and save to from the mlflow artifact store. It uses optional `run_id` argument to load and save from a given `run_id` which must exists in the mlflow server you are logging to.
- the ``MlflowModelSaverDataSet`` is used to load from and save to a given path. It uses the standard `filepath` argument in the constructor of Kedro DataSets. Note that it **does not log in mlflow**.
- the ``MlflowModelLocalFileSystemDataset`` is used to load from and save to a given path. It uses the standard `filepath` argument in the constructor of Kedro DataSets. Note that it **does not log in mlflow**.

*Note: If you use ``MlflowModelTrackingDataset``, it will be saved during training in your current run. However, you will need to specify the run id to predict with (since it is not persisted locally, it will not pick the latest model by default). You may prefer to combine ``MlflowModelSaverDataSet`` and ``MlflowArtifactDataset`` to make persist it both locally and remotely, see further.*
*Note: If you use ``MlflowModelTrackingDataset``, it will be saved during training in your current run. However, you will need to specify the run id to predict with (since it is not persisted locally, it will not pick the latest model by default). You may prefer to combine ``MlflowModelLocalFileSystemDataset`` and ``MlflowArtifactDataset`` to make persist it both locally and remotely, see further.*

Suppose you would like to register a `scikit-learn` model of your `DataCatalog` in mlflow, you can use the following yaml API:

Expand All @@ -35,7 +35,7 @@ During save, a model object from node output is logged to mlflow using ``log_mod

During load, the model is retrieved from the ``run_id`` if specified, else it is retrieved from the mlflow active run. If there is no mlflow active run, the loading fails. This will never happen if you are using the `kedro run` command, because the `MlflowHook` creates a new run before each pipeline run.

**For ``MlflowModelSaverDataSet``**
**For ``MlflowModelLocalFileSystemDataset``**

During save, a model object from node output is saved locally under specified ``filepath`` using ``save_model`` function of the specified ``flavor``.

Expand All @@ -60,7 +60,7 @@ If you want to save your model both locally and remotely within the same run, yo
sklearn_model:
type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
data_set:
type: kedro_mlflow.io.models.MlflowModelSaverDataSet
type: kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset
flavor: mlflow.sklearn
filepath: data/06_models/sklearn_model
```
Expand Down
8 changes: 4 additions & 4 deletions docs/source/07_python_objects/01_DataSets.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ my_model:
- "kedro==0.18.11"
```

### ``MlflowModelSaverDataSet``
### ``MlflowModelLocalFileSystemDataset``

The ``MlflowModelTrackingDataset`` accepts the following arguments:

Expand All @@ -132,7 +132,7 @@ The use is very similar to ``MlflowModelTrackingDataset``, but you have to speci
from kedro_mlflow.io.models import MlflowModelTrackingDataset
from sklearn.linear_model import LinearRegression

mlflow_model_tracking = MlflowModelSaverDataSet(
mlflow_model_tracking = MlflowModelLocalFileSystemDataset(
flavor="mlflow.sklearn", filepath="path/to/where/you/want/model"
)
mlflow_model_tracking.save(LinearRegression().fit(data))
Expand All @@ -141,7 +141,7 @@ mlflow_model_tracking.save(LinearRegression().fit(data))
The same arguments are available, plus an additional [`version` common to usual `AbstractVersionedDataset`](https://kedro.readthedocs.io/en/stable/kedro.io.AbstractVersionedDataset.html)

```python
mlflow_model_tracking = MlflowModelSaverDataSet(
mlflow_model_tracking = MlflowModelLocalFileSystemDataset(
flavor="mlflow.sklearn",
filepath="path/to/where/you/want/model",
version="<valid-kedro-version>",
Expand All @@ -153,7 +153,7 @@ and with the YAML API in the `catalog.yml`:

```yaml
my_model:
type: kedro_mlflow.io.models.MlflowModelSaverDataSet
type: kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset
flavor: mlflow.sklearn
filepath: path/to/where/you/want/model
version: <valid-kedro-version>
Expand Down
2 changes: 1 addition & 1 deletion docs/source/08_API/kedro_mlflow.io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Models DataSet
:undoc-members:
:show-inheritance:

.. automodule:: kedro_mlflow.io.models.mlflow_model_saver_dataset
.. automodule:: kedro_mlflow.io.models.mlflow_model_local_filesystem_dataset
:members:
:undoc-members:
:show-inheritance:
Expand Down
2 changes: 1 addition & 1 deletion kedro_mlflow/io/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .mlflow_model_local_filesystem_dataset import MlflowModelLocalFileSystemDataset
from .mlflow_model_registry_dataset import MlflowModelRegistryDataset
from .mlflow_model_saver_dataset import MlflowModelSaverDataSet
from .mlflow_model_tracking_dataset import MlflowModelTrackingDataset
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
)


class MlflowModelSaverDataSet(MlflowModelRegistryDataset):
class MlflowModelLocalFileSystemDataset(MlflowModelRegistryDataset):
"""Wrapper for saving, logging and loading for all MLflow model flavor."""

def __init__(
Expand Down
4 changes: 2 additions & 2 deletions tests/io/metrics/test_mlflow_metric_history_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ def test_mlflow_metric_history_dataset_save_load(mlflow_client, save_mode, load_
"history": metric_as_history,
}

metric_ds_saver = MlflowMetricHistoryDataset(
metric_ds_model_local_filesystem = MlflowMetricHistoryDataset(
key="my_metric", save_args={"mode": save_mode}
)
with mlflow.start_run():
metric_ds_saver.save(mode_metrics_mapping[save_mode])
metric_ds_model_local_filesystem.save(mode_metrics_mapping[save_mode])
run_id = mlflow.active_run().info.run_id

# check existence
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from kedro_datasets.pickle import PickleDataset
from sklearn.linear_model import LinearRegression

from kedro_mlflow.io.models import MlflowModelSaverDataSet
from kedro_mlflow.io.models import MlflowModelLocalFileSystemDataset
from kedro_mlflow.mlflow import KedroPipelineModel
from kedro_mlflow.pipeline import pipeline_ml_factory

Expand Down Expand Up @@ -110,12 +110,12 @@ def test_save_unversioned_under_same_path(
model_config = {
"name": "linreg",
"config": {
"type": "kedro_mlflow.io.models.MlflowModelSaverDataSet",
"type": "kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset",
"flavor": "mlflow.sklearn",
"filepath": linreg_path.as_posix(),
},
}
mlflow_model_ds = MlflowModelSaverDataSet.from_config(**model_config)
mlflow_model_ds = MlflowModelLocalFileSystemDataset.from_config(**model_config)
mlflow_model_ds.save(linreg_model)
# check that second save does not fail
# this happens if the underlying folder already exists
Expand All @@ -127,13 +127,13 @@ def test_save_load_local(linreg_path, linreg_model, versioned):
model_config = {
"name": "linreg",
"config": {
"type": "kedro_mlflow.io.models.MlflowModelSaverDataSet",
"type": "kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset",
"filepath": linreg_path.as_posix(),
"flavor": "mlflow.sklearn",
"versioned": versioned,
},
}
mlflow_model_ds = MlflowModelSaverDataSet.from_config(**model_config)
mlflow_model_ds = MlflowModelLocalFileSystemDataset.from_config(**model_config)
mlflow_model_ds.save(linreg_model)

if versioned:
Expand Down Expand Up @@ -167,7 +167,7 @@ def test_pyfunc_flavor_python_model_save_and_load(
model_config = {
"name": "kedro_pipeline_model",
"config": {
"type": "kedro_mlflow.io.models.MlflowModelSaverDataSet",
"type": "kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset",
"filepath": (
tmp_path / "data" / "06_models" / "my_custom_model"
).as_posix(),
Expand All @@ -180,7 +180,7 @@ def test_pyfunc_flavor_python_model_save_and_load(
},
}

mlflow_model_ds = MlflowModelSaverDataSet.from_config(**model_config)
mlflow_model_ds = MlflowModelLocalFileSystemDataset.from_config(**model_config)
mlflow_model_ds.save(kedro_pipeline_model)

assert mlflow.active_run() is None
Expand Down
4 changes: 2 additions & 2 deletions tests/mlflow/test_kedro_pipeline_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from kedro_datasets.pickle import PickleDataset
from sklearn.linear_model import LinearRegression

from kedro_mlflow.io.models import MlflowModelSaverDataSet
from kedro_mlflow.io.models import MlflowModelLocalFileSystemDataset
from kedro_mlflow.mlflow import KedroPipelineModel
from kedro_mlflow.mlflow.kedro_pipeline_model import KedroPipelineModelError
from kedro_mlflow.pipeline import pipeline_ml_factory
Expand Down Expand Up @@ -423,7 +423,7 @@ def predict_fun(model, data):
)

# emulate training by creating the model manually
model_dataset = MlflowModelSaverDataSet(
model_dataset = MlflowModelLocalFileSystemDataset(
filepath=(tmp_path / "model.pkl").resolve().as_posix(), flavor="mlflow.sklearn"
)

Expand Down

0 comments on commit 772b584

Please sign in to comment.