💥 ♻️ Rename MlflowModelSaverDataSet to MlflowModelLocalFileSystemData…

…set (#391)
Galileo-Galilei · Oct 22, 2023 · 772b584 · 772b584
1 parent 494b3af
commit 772b584
Show file tree

Hide file tree

Showing 10 changed files with 27 additions and 26 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,7 @@
   - ``MlflowMetricsDataSet``->``MlflowMetricsDataset``
 - :boom: :recycle: Rename the following ``DataSets`` to make their use more explicit, and use the ``Dataset`` suffix:
   -  ``MlflowModelLoggerDataSet``->``MlflowModelTrackingDataset``
+  -  ``MlflowModelSaverDataSet``->``MlflowModelLocalFileSystemDataset``
 
 ## [0.11.10] - 2023-10-03
 
@@ -304,7 +305,7 @@
 
 ### Fixed
 
--   :bug: Fix `TypeError: unsupported operand type(s) for /: 'str' and 'str'` when using `MlflowArtifactDataSet` with `MlflowModelSaverDataSet` ([#116](https://github.com/Galileo-Galilei/kedro-mlflow/issues/116))
+-   :bug: Fix `TypeError: unsupported operand type(s) for /: 'str' and 'str'` when using `MlflowArtifactDataSet` with `MlflowModelLocalFileSystemDataset` ([#116](https://github.com/Galileo-Galilei/kedro-mlflow/issues/116))
 -   :memo: Fix various docs typo ([#6](https://github.com/Galileo-Galilei/kedro-mlflow/issues/6))
 -   :bug: When the underlying Kedro pipeline fails, the associated mlflow run is now marked as 'FAILED' instead of 'FINISHED'. It is rendered with a red cross instead of the green tick in the mlflow user interface ([#121](https://github.com/Galileo-Galilei/kedro-mlflow/issues/121)).
 -   :bug: Fix a bug which made `KedroPipelineModel` impossible to load if one of its artifact was a `MlflowModel<Saver/Logger>DataSet`. These datasets were not deepcopiable because of one their attributes was a module ([#122](https://github.com/Galileo-Galilei/kedro-mlflow/issues/122)).
@@ -322,7 +323,7 @@
 -   :sparkles: `kedro-mlflow` now supports configuring the project in `pyproject.toml`  (_Only for kedro>=0.16.5_) ([#96](https://github.com/Galileo-Galilei/kedro-mlflow/issues/96))
 -   :sparkles: `pipeline_ml_factory` now accepts that `inference` pipeline `inputs` may be in `training` pipeline `inputs` ([#71](https://github.com/Galileo-Galilei/kedro-mlflow/issues/71))
 -   :sparkles: `pipeline_ml_factory` now infer automatically the schema of the input dataset to validate data automatically at inference time. The output schema can be declared manually in `model_signature` argument ([#70](https://github.com/Galileo-Galilei/kedro-mlflow/issues/70))
--   :sparkles: Add two DataSets for model logging and saving: `MlflowModelTrackingDataset` and `MlflowModelSaverDataSet` ([#12](https://github.com/Galileo-Galilei/kedro-mlflow/issues/12))
+-   :sparkles: Add two DataSets for model logging and saving: `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset` ([#12](https://github.com/Galileo-Galilei/kedro-mlflow/issues/12))
 -   :sparkles: `MlflowPipelineHook` and `MlflowNodeHook` are now [auto-registered](https://kedro.readthedocs.io/en/latest/hooks/introduction.html#registering-your-hook-implementations-with-kedro) if you use `kedro>=0.16.4` ([#29](https://github.com/Galileo-Galilei/kedro-mlflow/issues/29))
 
 ### Fixed

diff --git a/docs/source/01_introduction/02_motivation.md b/docs/source/01_introduction/02_motivation.md
@@ -40,7 +40,7 @@ Above implementations have the advantage of being very straightforward and *mlfl
 | Set up configuration      | ``mlflow.yml``  | ``MlflowHook``                                             |
 | Logging parameters        | ``mlflow.yml``  | ``MlflowHook``                                             |
 | Logging artifacts         | ``catalog.yml`` | ``MlflowArtifactDataset``                                  |
-| Logging models            | ``catalog.yml`` | `MlflowModelTrackingDataset` and `MlflowModelSaverDataSet` |
+| Logging models            | ``catalog.yml`` | `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset` |
 | Logging metrics           | ``catalog.yml`` | ``MlflowMetricsDataset``                                   |
 | Logging Pipeline as model | ``hooks.py``    | ``KedroPipelineModel`` and ``pipeline_ml_factory``         |
 

diff --git a/docs/source/04_experimentation_tracking/04_version_models.md b/docs/source/04_experimentation_tracking/04_version_models.md
@@ -6,12 +6,12 @@ MLflow allows to serialize and deserialize models to a common format, track thos
 
 ## How to track models using MLflow in Kedro project?
 
-`kedro-mlflow` introduces two new `DataSet` types that can be used in `DataCatalog` called `MlflowModelTrackingDataset` and `MlflowModelSaverDataSet`. The two have very similar API, except that:
+`kedro-mlflow` introduces two new `DataSet` types that can be used in `DataCatalog` called `MlflowModelTrackingDataset` and `MlflowModelLocalFileSystemDataset`. The two have very similar API, except that:
 
 - the ``MlflowModelTrackingDataset`` is used to load from and save to from the mlflow artifact store. It uses optional `run_id` argument to load and save from a given `run_id` which must exists in the mlflow server you are logging to.
-- the ``MlflowModelSaverDataSet`` is used to load from and save to a given path. It uses the standard `filepath` argument in the constructor of Kedro DataSets. Note that it **does not log in mlflow**.
+- the ``MlflowModelLocalFileSystemDataset`` is used to load from and save to a given path. It uses the standard `filepath` argument in the constructor of Kedro DataSets. Note that it **does not log in mlflow**.
 
-*Note: If you use ``MlflowModelTrackingDataset``, it will be saved during training in your current run. However, you will need to specify the run id to predict with (since it is not persisted locally, it will not pick the latest model by default). You may prefer to combine ``MlflowModelSaverDataSet`` and ``MlflowArtifactDataset`` to make persist it both locally and remotely, see further.*
+*Note: If you use ``MlflowModelTrackingDataset``, it will be saved during training in your current run. However, you will need to specify the run id to predict with (since it is not persisted locally, it will not pick the latest model by default). You may prefer to combine ``MlflowModelLocalFileSystemDataset`` and ``MlflowArtifactDataset`` to make persist it both locally and remotely, see further.*
 
 Suppose you would like to register a `scikit-learn` model of your `DataCatalog` in mlflow, you can use the following yaml API:
 
@@ -35,7 +35,7 @@ During save, a model object from node output is logged to mlflow using ``log_mod
 
 During load, the model is retrieved from the ``run_id`` if specified, else it is retrieved from the mlflow active run. If there is no mlflow active run, the loading fails. This will never happen if you are using the `kedro run` command, because the `MlflowHook` creates a new run before each pipeline run.
 
-**For ``MlflowModelSaverDataSet``**
+**For ``MlflowModelLocalFileSystemDataset``**
 
 During save, a model object from node output is saved locally under specified ``filepath`` using ``save_model`` function of the specified ``flavor``.
 
@@ -60,7 +60,7 @@ If you want to save your model both locally and remotely within the same run, yo
 sklearn_model:
     type: kedro_mlflow.io.artifacts.MlflowArtifactDataset
     data_set:
-        type: kedro_mlflow.io.models.MlflowModelSaverDataSet
+        type: kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset
         flavor: mlflow.sklearn
         filepath: data/06_models/sklearn_model
 ```

diff --git a/docs/source/07_python_objects/01_DataSets.md b/docs/source/07_python_objects/01_DataSets.md
@@ -115,7 +115,7 @@ my_model:
                 - "kedro==0.18.11"
 ```
 
-### ``MlflowModelSaverDataSet``
+### ``MlflowModelLocalFileSystemDataset``
 
 The ``MlflowModelTrackingDataset`` accepts the following arguments:
 
@@ -132,7 +132,7 @@ The use is very similar to ``MlflowModelTrackingDataset``, but you have to speci
 from kedro_mlflow.io.models import MlflowModelTrackingDataset
 from sklearn.linear_model import LinearRegression
 
-mlflow_model_tracking = MlflowModelSaverDataSet(
+mlflow_model_tracking = MlflowModelLocalFileSystemDataset(
     flavor="mlflow.sklearn", filepath="path/to/where/you/want/model"
 )
 mlflow_model_tracking.save(LinearRegression().fit(data))
@@ -141,7 +141,7 @@ mlflow_model_tracking.save(LinearRegression().fit(data))
 The same arguments are available, plus an additional [`version` common to usual `AbstractVersionedDataset`](https://kedro.readthedocs.io/en/stable/kedro.io.AbstractVersionedDataset.html)
 
 ```python
-mlflow_model_tracking = MlflowModelSaverDataSet(
+mlflow_model_tracking = MlflowModelLocalFileSystemDataset(
     flavor="mlflow.sklearn",
     filepath="path/to/where/you/want/model",
     version="<valid-kedro-version>",
@@ -153,7 +153,7 @@ and with the YAML API in the `catalog.yml`:
 
 ```yaml
 my_model:
-    type: kedro_mlflow.io.models.MlflowModelSaverDataSet
+    type: kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset
     flavor: mlflow.sklearn
     filepath: path/to/where/you/want/model
     version: <valid-kedro-version>

diff --git a/docs/source/08_API/kedro_mlflow.io.rst b/docs/source/08_API/kedro_mlflow.io.rst
@@ -41,7 +41,7 @@ Models DataSet
    :undoc-members:
    :show-inheritance:
 
-.. automodule:: kedro_mlflow.io.models.mlflow_model_saver_dataset
+.. automodule:: kedro_mlflow.io.models.mlflow_model_local_filesystem_dataset
    :members:
    :undoc-members:
    :show-inheritance:

diff --git a/kedro_mlflow/io/models/__init__.py b/kedro_mlflow/io/models/__init__.py
@@ -1,3 +1,3 @@
+from .mlflow_model_local_filesystem_dataset import MlflowModelLocalFileSystemDataset
 from .mlflow_model_registry_dataset import MlflowModelRegistryDataset
-from .mlflow_model_saver_dataset import MlflowModelSaverDataSet
 from .mlflow_model_tracking_dataset import MlflowModelTrackingDataset
diff --git a/...w/io/models/mlflow_model_saver_dataset.py → .../mlflow_model_local_filesystem_dataset.py b/...w/io/models/mlflow_model_saver_dataset.py → .../mlflow_model_local_filesystem_dataset.py
@@ -9,7 +9,7 @@
 )
 
 
-class MlflowModelSaverDataSet(MlflowModelRegistryDataset):
+class MlflowModelLocalFileSystemDataset(MlflowModelRegistryDataset):
     """Wrapper for saving, logging and loading for all MLflow model flavor."""
 
     def __init__(

diff --git a/tests/io/metrics/test_mlflow_metric_history_dataset.py b/tests/io/metrics/test_mlflow_metric_history_dataset.py
@@ -44,11 +44,11 @@ def test_mlflow_metric_history_dataset_save_load(mlflow_client, save_mode, load_
         "history": metric_as_history,
     }
 
-    metric_ds_saver = MlflowMetricHistoryDataset(
+    metric_ds_model_local_filesystem = MlflowMetricHistoryDataset(
         key="my_metric", save_args={"mode": save_mode}
     )
     with mlflow.start_run():
-        metric_ds_saver.save(mode_metrics_mapping[save_mode])
+        metric_ds_model_local_filesystem.save(mode_metrics_mapping[save_mode])
         run_id = mlflow.active_run().info.run_id
 
     # check existence

diff --git a/...models/test_mlflow_model_saver_dataset.py → ..._mlflow_model_local_filesystem_dataset.py b/...models/test_mlflow_model_saver_dataset.py → ..._mlflow_model_local_filesystem_dataset.py
@@ -8,7 +8,7 @@
 from kedro_datasets.pickle import PickleDataset
 from sklearn.linear_model import LinearRegression
 
-from kedro_mlflow.io.models import MlflowModelSaverDataSet
+from kedro_mlflow.io.models import MlflowModelLocalFileSystemDataset
 from kedro_mlflow.mlflow import KedroPipelineModel
 from kedro_mlflow.pipeline import pipeline_ml_factory
 
@@ -110,12 +110,12 @@ def test_save_unversioned_under_same_path(
     model_config = {
         "name": "linreg",
         "config": {
-            "type": "kedro_mlflow.io.models.MlflowModelSaverDataSet",
+            "type": "kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset",
             "flavor": "mlflow.sklearn",
             "filepath": linreg_path.as_posix(),
         },
     }
-    mlflow_model_ds = MlflowModelSaverDataSet.from_config(**model_config)
+    mlflow_model_ds = MlflowModelLocalFileSystemDataset.from_config(**model_config)
     mlflow_model_ds.save(linreg_model)
     # check that second save does not fail
     # this happens if the underlying folder already exists
@@ -127,13 +127,13 @@ def test_save_load_local(linreg_path, linreg_model, versioned):
     model_config = {
         "name": "linreg",
         "config": {
-            "type": "kedro_mlflow.io.models.MlflowModelSaverDataSet",
+            "type": "kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset",
             "filepath": linreg_path.as_posix(),
             "flavor": "mlflow.sklearn",
             "versioned": versioned,
         },
     }
-    mlflow_model_ds = MlflowModelSaverDataSet.from_config(**model_config)
+    mlflow_model_ds = MlflowModelLocalFileSystemDataset.from_config(**model_config)
     mlflow_model_ds.save(linreg_model)
 
     if versioned:
@@ -167,7 +167,7 @@ def test_pyfunc_flavor_python_model_save_and_load(
     model_config = {
         "name": "kedro_pipeline_model",
         "config": {
-            "type": "kedro_mlflow.io.models.MlflowModelSaverDataSet",
+            "type": "kedro_mlflow.io.models.MlflowModelLocalFileSystemDataset",
             "filepath": (
                 tmp_path / "data" / "06_models" / "my_custom_model"
             ).as_posix(),
@@ -180,7 +180,7 @@ def test_pyfunc_flavor_python_model_save_and_load(
         },
     }
 
-    mlflow_model_ds = MlflowModelSaverDataSet.from_config(**model_config)
+    mlflow_model_ds = MlflowModelLocalFileSystemDataset.from_config(**model_config)
     mlflow_model_ds.save(kedro_pipeline_model)
 
     assert mlflow.active_run() is None

diff --git a/tests/mlflow/test_kedro_pipeline_model.py b/tests/mlflow/test_kedro_pipeline_model.py
@@ -9,7 +9,7 @@
 from kedro_datasets.pickle import PickleDataset
 from sklearn.linear_model import LinearRegression
 
-from kedro_mlflow.io.models import MlflowModelSaverDataSet
+from kedro_mlflow.io.models import MlflowModelLocalFileSystemDataset
 from kedro_mlflow.mlflow import KedroPipelineModel
 from kedro_mlflow.mlflow.kedro_pipeline_model import KedroPipelineModelError
 from kedro_mlflow.pipeline import pipeline_ml_factory
@@ -423,7 +423,7 @@ def predict_fun(model, data):
     )
 
     # emulate training by creating the model manually
-    model_dataset = MlflowModelSaverDataSet(
+    model_dataset = MlflowModelLocalFileSystemDataset(
         filepath=(tmp_path / "model.pkl").resolve().as_posix(), flavor="mlflow.sklearn"
     )