Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging a finetuned Transformer model with the mlflow.transformers flavor results in '...ForSequenceClassification' object has no attribute 'model' #424

Closed
hugocool opened this issue Jun 12, 2023 · 5 comments

Comments

@hugocool
Copy link

Logging a finetuned Transformer model with the mlflow.transformers flavor results in '...ForSequenceClassification' object has no attribute 'model'

Context

I am training a transformer model for a downstream task, and would like to save the model in mlflow using the mlflow.transformers flavor. However, when actually logging the trained model, for example in jupyter with the catalog.save function, i get an error. While this works fine when using the pickledataset.

So what is the correct way to log transformers using the mlflow.transformers flavor?

Steps to Reproduce

catalog.yml
finetuned_classifier:
  type: kedro_mlflow.io.models.MlflowModelLoggerDataSet
  flavor: mlflow.transformers

from trainings example:

from transformers import BertForSequenceClassification, Trainer, TrainingArguments

model = BertForSequenceClassification.from_pretrained("bert-large-uncased")

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=test_dataset            # evaluation dataset
)

trainer.train()

model = trainer.model

and now where the error happens:
catalog.save('finetuned_classifier',model)

Expected Result

Saving data to 'finetuned_classifier'             [data_catalog.py](file:///home/ec2-user/.../.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py#384)
                             (MlflowModelLoggerDataSet)...

Actual Result

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/core.py:215 in     │
│ save                                                                                             │
│                                                                                                  │
│   212 │   │                                                                                      │
│   213 │   │   try:                                                                               │
│   214 │   │   │   self._logger.debug("Saving %s", str(self))                                     │
│ ❱ 215 │   │   │   self._save(data)                                                               │
│   216 │   │   except DataSetError:                                                               │
│   217 │   │   │   raise                                                                          │
│   218 │   │   except (FileNotFoundError, NotADirectoryError):                                    │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro_mlflow/io/models/mlfl │
│ ow_model_logger_dataset.py:123 in _save                                                          │
│                                                                                                  │
│   120 │   │   else:                                                                              │
│   121 │   │   │   # if there is no run_id, log in active run                                     │
│   122 │   │   │   # OR open automatically a new run to log                                       │
│ ❱ 123 │   │   │   self._save_model_in_run(model)                                                 │
│   124 │                                                                                          │
│   125 │   def _save_model_in_run(self, model):                                                   │
│   126                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro_mlflow/io/models/mlfl │
│ ow_model_logger_dataset.py:140 in _save_model_in_run                                             │
│                                                                                                  │
│   137 │   │   │   # Otherwise we save using the common workflow where first argument is the      │
│   138 │   │   │   # model object and second is the path.                                         │
│   139 │   │   │   if self._logging_activated:                                                    │
│ ❱ 140 │   │   │   │   self._mlflow_model_module.log_model(                                       │
│   141 │   │   │   │   │   model, self._artifact_path, **self._save_args                          │
│   142 │   │   │   │   )                                                                          │
│   143                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/utils/docstring_util │
│ s.py:235 in version_func                                                                         │
│                                                                                                  │
│   232 │   │   │   installed_version = Version(get_distribution(module_key).version)              │
│   233 │   │   │   if installed_version < Version(min_ver) or installed_version > Version(max_v   │
│   234 │   │   │   │   warnings.warn(notice, category=FutureWarning, stacklevel=2)                │
│ ❱ 235 │   │   │   return func(*args, **kwargs)                                                   │
│   236 │   │                                                                                      │
│   237 │   │   version_func.__doc__ = (                                                           │
│   238 │   │   │   "    .. Note:: " + notice + "\n" * 2 + func.__doc__ if func.__doc__ else not   │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:729  │
│ in log_model                                                                                     │
│                                                                                                  │
│    726 │   │   │   │   │   │   │   │   │   │   │    release without warning.                     │
│    727 │   :param kwargs: Additional arguments for :py:class:`mlflow.models.model.Model`         │
│    728 │   """                                                                                   │
│ ❱  729 │   return Model.log(                                                                     │
│    730 │   │   artifact_path=artifact_path,                                                      │
│    731 │   │   flavor=mlflow.transformers,                                                       │
│    732 │   │   registered_model_name=registered_model_name,                                      │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/models/model.py:562  │
│ in log                                                                                           │
│                                                                                                  │
│   559 │   │   │   local_path = tmp.path("model")                                                 │
│   560 │   │   │   run_id = mlflow.tracking.fluent._get_or_start_run().info.run_id                │
│   561 │   │   │   mlflow_model = cls(artifact_path=artifact_path, run_id=run_id, metadata=meta   │
│ ❱ 562 │   │   │   flavor.save_model(path=local_path, mlflow_model=mlflow_model, **kwargs)        │
│   563 │   │   │   mlflow.tracking.fluent.log_artifacts(local_path, mlflow_model.artifact_path)   │
│   564 │   │   │   tracking_uri = _resolve_tracking_uri()                                         │
│   565 │   │   │   if (                                                                           │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/utils/docstring_util │
│ s.py:235 in version_func                                                                         │
│                                                                                                  │
│   232 │   │   │   installed_version = Version(get_distribution(module_key).version)              │
│   233 │   │   │   if installed_version < Version(min_ver) or installed_version > Version(max_v   │
│   234 │   │   │   │   warnings.warn(notice, category=FutureWarning, stacklevel=2)                │
│ ❱ 235 │   │   │   return func(*args, **kwargs)                                                   │
│   236 │   │                                                                                      │
│   237 │   │   version_func.__doc__ = (                                                           │
│   238 │   │   │   "    .. Note:: " + notice + "\n" * 2 + func.__doc__ if func.__doc__ else not   │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:391  │
│ in save_model                                                                                    │
│                                                                                                  │
│    388 │   """                                                                                   │
│    389 │   import transformers                                                                   │
│    390 │                                                                                         │
│ ❱  391 │   _validate_transformers_model_dict(transformers_model)                                 │
│    392 │                                                                                         │
│    393 │   if isinstance(transformers_model, dict):                                              │
│    394 │   │   transformers_model = _TransformersModel.from_dict(**transformers_model)           │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:175  │
│ in _validate_transformers_model_dict                                                             │
│                                                                                                  │
│    172 │   │   │   )                                                                             │
│    173 │   │   model = transformers_model[_MODEL_KEY]                                            │
│    174 │   else:                                                                                 │
│ ❱  175 │   │   model = transformers_model.model                                                  │
│    176 │   if not hasattr(model, "name_or_path"):                                                │
│    177 │   │   raise MlflowException(                                                            │
│    178 │   │   │   f"The submitted model type {type(model).__name__} does not inherit "          │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1269 in __getattr__                                                                              │
│                                                                                                  │
│   1266 │   │   │   modules = self.__dict__['_modules']                                           │
│   1267 │   │   │   if name in modules:                                                           │
│   1268 │   │   │   │   return modules[name]                                                      │
│ ❱ 1269 │   │   raise AttributeError("'{}' object has no attribute '{}'".format(                  │
│   1270 │   │   │   type(self).__name__, name))                                                   │
│   1271 │                                                                                         │
│   1272 │   def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'DistilBertForSequenceClassification' object has no attribute 'model'

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 1>:1                                                                              │
│                                                                                                  │
│ ❱ 1 catalog.save('finetuned_pre_trained_isco_classifier', model)                                 │
│   2                                                                                              │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py:38 │
│ 6 in save                                                                                        │
│                                                                                                  │
│   383 │   │                                                                                      │
│   384 │   │   self._logger.info("Saving data to '%s' (%s)...", name, type(dataset).__name__)     │
│   385 │   │                                                                                      │
│ ❱ 386 │   │   dataset.save(data)                                                                 │
│   387 │                                                                                          │
│   388 │   def exists(self, name: str) -> bool:                                                   │
│   389 │   │   """Checks whether registered data set exists by calling its `exists()`             │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/core.py:614 in     │
│ save                                                                                             │
│                                                                                                  │
│   611 │   │   self._version_cache.clear()                                                        │
│   612 │   │   save_version = self.resolve_save_version()  # Make sure last save version is set   │
│   613 │   │   try:                                                                               │
│ ❱ 614 │   │   │   super().save(data)                                                             │
│   615 │   │   except (FileNotFoundError, NotADirectoryError) as err:                             │
│   616 │   │   │   # FileNotFoundError raised in Win, NotADirectoryError raised in Unix           │
│   617 │   │   │   _default_version = "YYYY-MM-DDThh.mm.ss.sssZ"                                  │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/core.py:222 in     │
│ save                                                                                             │
│                                                                                                  │
│   219 │   │   │   raise                                                                          │
│   220 │   │   except Exception as exc:                                                           │
│   221 │   │   │   message = f"Failed while saving data to data set {str(self)}.\n{str(exc)}"     │
│ ❱ 222 │   │   │   raise DataSetError(message) from exc                                           │
│   223 │                                                                                          │
│   224 │   def __str__(self):                                                                     │
│   225 │   │   def _to_str(obj, is_root=False):                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
DataSetError: Failed while saving data to data set MlflowModelLoggerDataSet(artifact_path=model, 
flavor=mlflow.transformers, load_args={}, save_args={}).
'DistilBertForSequenceClassification' object has no attribute 'model'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • kedro and kedro-mlflow version used (pip show kedro and pip show kedro-mlflow):
$ pip show kedro
Name: kedro
Version: 0.18.9
Summary: Kedro helps you build production-ready data and analytics pipelines
Home-page: 
Author: Kedro
Author-email: 
License: Apache Software License (Apache 2.0)
Location: /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages
Requires: anyconfig, attrs, build, cachetools, click, cookiecutter, dynaconf, fsspec, gitpython, importlib-metadata, importlib-resources, jmespath, more-itertools, omegaconf, pip-tools, pluggy, PyYAML, rich, rope, setuptools, toml, toposort
Required-by: kedro-datasets, kedro-docker, kedro-mlflow, kedro-viz

$ pip show kedro-mlflow
Name: kedro-mlflow
Version: 0.11.8
Summary: A kedro-plugin to use mlflow in your kedro projects
Home-page: https://github.com/Galileo-Galilei/kedro-mlflow
Author: Yolan Honoré-Rougé
Author-email: 
License: Apache Software License (Apache 2.0)
Location: /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages
Requires: kedro, mlflow, pydantic
Required-by: 

$ pip show mlflow
Name: mlflow
Version: 2.4.1
Summary: MLflow: A Platform for ML Development and Productionization
Home-page: https://mlflow.org/
Author: Databricks
Author-email: 
License: Apache License 2.0
Location: /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages
Requires: alembic, click, cloudpickle, databricks-cli, docker, entrypoints, Flask, gitpython, gunicorn, importlib-metadata, Jinja2, markdown, matplotlib, numpy, packaging, pandas, protobuf, pyarrow, pytz, pyyaml, querystring-parser, requests, scikit-learn, scipy, sqlalchemy, sqlparse
Required-by: kedro-mlflow

  • Python version used (python -V):
    python 3.9.10
  • Operating system and version:
    amazon linux SUSE
@Galileo-Galilei
Copy link
Owner

Galileo-Galilei commented Jun 12, 2023

Hi @hugocool,

I am sorry that you are facing issues. Thank you for the very detailed bug report. The error comes from mlflow itself and you should likely raise the issue there if we cannot fix it. From your example:

from transformers import BertForSequenceClassification, Trainer, TrainingArguments

model = BertForSequenceClassification.from_pretrained("bert-large-uncased")

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=test_dataset            # evaluation dataset
)

trainer.train()

model = trainer.model

import mlflow.transformers save_model


mlflow.transformers.save_model(model, "model/") # this should raise the same error

According to mlflow documentation, it seems that the "model" should be

"A trained transformers Pipeline or a dictionary that maps required components of a pipeline to the named keys of [“model”, “image_processor”, “tokenizer”, “feature_extractor”]. The model key in the dictionary must map to a value that inherits from PreTrainedModel, TFPreTrainedModel, or FlaxPreTrainedModel.".

So this should likely work: catalog.save('finetuned_classifier', {"model": model}, can you check?

@hugocool
Copy link
Author

hugocool commented Jun 13, 2023

That does not work,

if i set the return of my training node to `dict(model = trainer.model)'

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:1032 │
│ in _build_pipeline_from_model_input                                                              │
│                                                                                                  │
│   1029 │   pipeline_config = model.to_dict()                                                     │
│   1030 │   pipeline_config.update({"task": task})                                                │
│   1031 │   try:                                                                                  │
│ ❱ 1032 │   │   return pipeline(**pipeline_config)                                                │
│   1033 │   except Exception as e:                                                                │
│   1034 │   │   raise MlflowException(                                                            │
│   1035 │   │   │   "The provided model configuration cannot be created as a Pipeline. "          │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/transformers/pipelines/__in │
│ it__.py:868 in pipeline                                                                          │
│                                                                                                  │
│   865 │   │   │   │   tokenizer = config                                                         │
│   866 │   │   │   else:                                                                          │
│   867 │   │   │   │   # Impossible to guess what is the right tokenizer here                     │
│ ❱ 868 │   │   │   │   raise Exception(                                                           │
│   869 │   │   │   │   │   "Impossible to guess which tokenizer to use. "                         │
│   870 │   │   │   │   │   "Please provide a PreTrainedTokenizer class or a path/identifier to    │
│   871 │   │   │   │   )                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Impossible to guess which tokenizer to use. Please provide a PreTrainedTokenizer class or a path/identifier to a pretrained tokenizer.

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/core.py:215 in     │
│ save                                                                                             │
│                                                                                                  │
│   212 │   │                                                                                      │
│   213 │   │   try:                                                                               │
│   214 │   │   │   self._logger.debug("Saving %s", str(self))                                     │
│ ❱ 215 │   │   │   self._save(data)                                                               │
│   216 │   │   except DataSetError:                                                               │
│   217 │   │   │   raise                                                                          │
│   218 │   │   except (FileNotFoundError, NotADirectoryError):                                    │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro_mlflow/io/models/mlfl │
│ ow_model_logger_dataset.py:123 in _save                                                          │
│                                                                                                  │
│   120 │   │   else:                                                                              │
│   121 │   │   │   # if there is no run_id, log in active run                                     │
│   122 │   │   │   # OR open automatically a new run to log                                       │
│ ❱ 123 │   │   │   self._save_model_in_run(model)                                                 │
│   124 │                                                                                          │
│   125 │   def _save_model_in_run(self, model):                                                   │
│   126                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro_mlflow/io/models/mlfl │
│ ow_model_logger_dataset.py:140 in _save_model_in_run                                             │
│                                                                                                  │
│   137 │   │   │   # Otherwise we save using the common workflow where first argument is the      │
│   138 │   │   │   # model object and second is the path.                                         │
│   139 │   │   │   if self._logging_activated:                                                    │
│ ❱ 140 │   │   │   │   self._mlflow_model_module.log_model(                                       │
│   141 │   │   │   │   │   model, self._artifact_path, **self._save_args                          │
│   142 │   │   │   │   )                                                                          │
│   143                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/utils/docstring_util │
│ s.py:235 in version_func                                                                         │
│                                                                                                  │
│   232 │   │   │   installed_version = Version(get_distribution(module_key).version)              │
│   233 │   │   │   if installed_version < Version(min_ver) or installed_version > Version(max_v   │
│   234 │   │   │   │   warnings.warn(notice, category=FutureWarning, stacklevel=2)                │
│ ❱ 235 │   │   │   return func(*args, **kwargs)                                                   │
│   236 │   │                                                                                      │
│   237 │   │   version_func.__doc__ = (                                                           │
│   238 │   │   │   "    .. Note:: " + notice + "\n" * 2 + func.__doc__ if func.__doc__ else not   │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:729  │
│ in log_model                                                                                     │
│                                                                                                  │
│    726 │   │   │   │   │   │   │   │   │   │   │    release without warning.                     │
│    727 │   :param kwargs: Additional arguments for :py:class:`mlflow.models.model.Model`         │
│    728 │   """                                                                                   │
│ ❱  729 │   return Model.log(                                                                     │
│    730 │   │   artifact_path=artifact_path,                                                      │
│    731 │   │   flavor=mlflow.transformers,                                                       │
│    732 │   │   registered_model_name=registered_model_name,                                      │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/models/model.py:562  │
│ in log                                                                                           │
│                                                                                                  │
│   559 │   │   │   local_path = tmp.path("model")                                                 │
│   560 │   │   │   run_id = mlflow.tracking.fluent._get_or_start_run().info.run_id                │
│   561 │   │   │   mlflow_model = cls(artifact_path=artifact_path, run_id=run_id, metadata=meta   │
│ ❱ 562 │   │   │   flavor.save_model(path=local_path, mlflow_model=mlflow_model, **kwargs)        │
│   563 │   │   │   mlflow.tracking.fluent.log_artifacts(local_path, mlflow_model.artifact_path)   │
│   564 │   │   │   tracking_uri = _resolve_tracking_uri()                                         │
│   565 │   │   │   if (                                                                           │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/utils/docstring_util │
│ s.py:235 in version_func                                                                         │
│                                                                                                  │
│   232 │   │   │   installed_version = Version(get_distribution(module_key).version)              │
│   233 │   │   │   if installed_version < Version(min_ver) or installed_version > Version(max_v   │
│   234 │   │   │   │   warnings.warn(notice, category=FutureWarning, stacklevel=2)                │
│ ❱ 235 │   │   │   return func(*args, **kwargs)                                                   │
│   236 │   │                                                                                      │
│   237 │   │   version_func.__doc__ = (                                                           │
│   238 │   │   │   "    .. Note:: " + notice + "\n" * 2 + func.__doc__ if func.__doc__ else not   │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:407  │
│ in save_model                                                                                    │
│                                                                                                  │
│    404 │   resolved_task = _get_or_infer_task_type(transformers_model, task)                     │
│    405 │                                                                                         │
│    406 │   if not isinstance(transformers_model, transformers.Pipeline):                         │
│ ❱  407 │   │   built_pipeline = _build_pipeline_from_model_input(transformers_model, resolved_t  │
│    408 │   else:                                                                                 │
│    409 │   │   built_pipeline = transformers_model                                               │
│    410                                                                                           │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/mlflow/transformers.py:1034 │
│ in _build_pipeline_from_model_input                                                              │
│                                                                                                  │
│   1031 │   try:                                                                                  │
│   1032 │   │   return pipeline(**pipeline_config)                                                │
│   1033 │   except Exception as e:                                                                │
│ ❱ 1034 │   │   raise MlflowException(                                                            │
│   1035 │   │   │   "The provided model configuration cannot be created as a Pipeline. "          │
│   1036 │   │   │   "Please verify that all required and compatible components are "              │
│   1037 │   │   │   "specified with the correct keys.",                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
MlflowException: The provided model configuration cannot be created as a Pipeline. Please verify that all required and compatible components are specified with 
the correct keys.

The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/ec2-user/ISCO-Classification/.venv/bin/kedro:8 in <module>                                 │
│                                                                                                  │
│   5 from kedro.framework.cli import main                                                         │
│   6 if __name__ == "__main__":                                                                   │
│   7 │   sys.argv[0] = re.sub(r"(-script\.pyw|\.exe)?$", "", sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/framework/cli/cli.py: │
│ 211 in main                                                                                      │
│                                                                                                  │
│   208 │   """                                                                                    │
│   209 │   _init_plugins()                                                                        │
│   210 │   cli_collection = KedroCLI(project_path=Path.cwd())                                     │
│ ❱ 211 │   cli_collection()                                                                       │
│   212                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/click/core.py:1130 in       │
│ __call__                                                                                         │
│                                                                                                  │
│   1127 │                                                                                         │
│   1128 │   def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:                           │
│   1129 │   │   """Alias for :meth:`main`."""                                                     │
│ ❱ 1130 │   │   return self.main(*args, **kwargs)                                                 │
│   1131                                                                                           │
│   1132                                                                                           │
│   1133 class Command(BaseCommand):                                                               │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/framework/cli/cli.py: │
│ 139 in main                                                                                      │
│                                                                                                  │
│   136 │   │   )                                                                                  │
│   137 │   │                                                                                      │
│   138 │   │   try:                                                                               │
│ ❱ 139 │   │   │   super().main(                                                                  │
│   140 │   │   │   │   args=args,                                                                 │
│   141 │   │   │   │   prog_name=prog_name,                                                       │
│   142 │   │   │   │   complete_var=complete_var,                                                 │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/click/core.py:1055 in main  │
│                                                                                                  │
│   1052 │   │   try:                                                                              │
│   1053 │   │   │   try:                                                                          │
│   1054 │   │   │   │   with self.make_context(prog_name, args, **extra) as ctx:                  │
│ ❱ 1055 │   │   │   │   │   rv = self.invoke(ctx)                                                 │
│   1056 │   │   │   │   │   if not standalone_mode:                                               │
│   1057 │   │   │   │   │   │   return rv                                                         │
│   1058 │   │   │   │   │   # it's not safe to `ctx.exit(rv)` here!                               │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/click/core.py:1657 in       │
│ invoke                                                                                           │
│                                                                                                  │
│   1654 │   │   │   │   super().invoke(ctx)                                                       │
│   1655 │   │   │   │   sub_ctx = cmd.make_context(cmd_name, args, parent=ctx)                    │
│   1656 │   │   │   │   with sub_ctx:                                                             │
│ ❱ 1657 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))               │
│   1658 │   │                                                                                     │
│   1659 │   │   # In chain mode we create the contexts step by step, but after the                │
│   1660 │   │   # base command has been invoked.  Because at that point we do not                 │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/click/core.py:1404 in       │
│ invoke                                                                                           │
│                                                                                                  │
│   1401 │   │   │   echo(style(message, fg="red"), err=True)                                      │
│   1402 │   │                                                                                     │
│   1403 │   │   if self.callback is not None:                                                     │
│ ❱ 1404 │   │   │   return ctx.invoke(self.callback, **ctx.params)                                │
│   1405 │                                                                                         │
│   1406 │   def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]:  │
│   1407 │   │   """Return a list of completions for the incomplete value. Looks                   │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/click/core.py:760 in invoke │
│                                                                                                  │
│    757 │   │                                                                                     │
│    758 │   │   with augment_usage_errors(__self):                                                │
│    759 │   │   │   with ctx:                                                                     │
│ ❱  760 │   │   │   │   return __callback(*args, **kwargs)                                        │
│    761 │                                                                                         │
│    762 │   def forward(                                                                          │
│    763 │   │   __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any  # noqa: B902             │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/src/isco_classification/cli.py:117 in run                     │
│                                                                                                  │
│   114 │   with KedroSession.create(env=env, extra_params=params) as session:                     │
│   115 │   │   context = session.load_context()                                                   │
│   116 │   │   runner_instance = _instantiate_runner(runner, is_async, context)                   │
│ ❱ 117 │   │   session.run(                                                                       │
│   118 │   │   │   tags=tag,                                                                      │
│   119 │   │   │   runner=runner_instance,                                                        │
│   120 │   │   │   node_names=node_names,                                                         │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/framework/session/ses │
│ sion.py:425 in run                                                                               │
│                                                                                                  │
│   422 │   │   )                                                                                  │
│   423 │   │                                                                                      │
│   424 │   │   try:                                                                               │
│ ❱ 425 │   │   │   run_result = runner.run(                                                       │
│   426 │   │   │   │   filtered_pipeline, catalog, hook_manager, session_id                       │
│   427 │   │   │   )                                                                              │
│   428 │   │   │   self._run_called = True                                                        │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/runner/runner.py:92   │
│ in run                                                                                           │
│                                                                                                  │
│    89 │   │   │   self._logger.info(                                                             │
│    90 │   │   │   │   "Asynchronous mode is enabled for loading and saving data"                 │
│    91 │   │   │   )                                                                              │
│ ❱  92 │   │   self._run(pipeline, catalog, hook_manager, session_id)                             │
│    93 │   │                                                                                      │
│    94 │   │   self._logger.info("Pipeline execution completed successfully.")                    │
│    95                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/runner/sequential_run │
│ ner.py:70 in _run                                                                                │
│                                                                                                  │
│   67 │   │                                                                                       │
│   68 │   │   for exec_index, node in enumerate(nodes):                                           │
│   69 │   │   │   try:                                                                            │
│ ❱ 70 │   │   │   │   run_node(node, catalog, hook_manager, self._is_async, session_id)           │
│   71 │   │   │   │   done_nodes.add(node)                                                        │
│   72 │   │   │   except Exception:                                                               │
│   73 │   │   │   │   self._suggest_resume_scenario(pipeline, done_nodes, catalog)                │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/runner/runner.py:320  │
│ in run_node                                                                                      │
│                                                                                                  │
│   317 │   if is_async:                                                                           │
│   318 │   │   node = _run_node_async(node, catalog, hook_manager, session_id)                    │
│   319 │   else:                                                                                  │
│ ❱ 320 │   │   node = _run_node_sequential(node, catalog, hook_manager, session_id)               │
│   321 │                                                                                          │
│   322 │   for name in node.confirms:                                                             │
│   323 │   │   catalog.confirm(name)                                                              │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/runner/runner.py:436  │
│ in _run_node_sequential                                                                          │
│                                                                                                  │
│   433 │                                                                                          │
│   434 │   for name, data in items:                                                               │
│   435 │   │   hook_manager.hook.before_dataset_saved(dataset_name=name, data=data, node=node)    │
│ ❱ 436 │   │   catalog.save(name, data)                                                           │
│   437 │   │   hook_manager.hook.after_dataset_saved(dataset_name=name, data=data, node=node)     │
│   438 │   return node                                                                            │
│   439                                                                                            │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/data_catalog.py:38 │
│ 6 in save                                                                                        │
│                                                                                                  │
│   383 │   │                                                                                      │
│   384 │   │   self._logger.info("Saving data to '%s' (%s)...", name, type(dataset).__name__)     │
│   385 │   │                                                                                      │
│ ❱ 386 │   │   dataset.save(data)                                                                 │
│   387 │                                                                                          │
│   388 │   def exists(self, name: str) -> bool:                                                   │
│   389 │   │   """Checks whether registered data set exists by calling its `exists()`             │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/core.py:614 in     │
│ save                                                                                             │
│                                                                                                  │
│   611 │   │   self._version_cache.clear()                                                        │
│   612 │   │   save_version = self.resolve_save_version()  # Make sure last save version is set   │
│   613 │   │   try:                                                                               │
│ ❱ 614 │   │   │   super().save(data)                                                             │
│   615 │   │   except (FileNotFoundError, NotADirectoryError) as err:                             │
│   616 │   │   │   # FileNotFoundError raised in Win, NotADirectoryError raised in Unix           │
│   617 │   │   │   _default_version = "YYYY-MM-DDThh.mm.ss.sssZ"                                  │
│                                                                                                  │
│ /home/ec2-user/ISCO-Classification/.venv/lib/python3.9/site-packages/kedro/io/core.py:222 in     │
│ save                                                                                             │
│                                                                                                  │
│   219 │   │   │   raise                                                                          │
│   220 │   │   except Exception as exc:                                                           │
│   221 │   │   │   message = f"Failed while saving data to data set {str(self)}.\n{str(exc)}"     │
│ ❱ 222 │   │   │   raise DataSetError(message) from exc                                           │
│   223 │                                                                                          │
│   224 │   def __str__(self):                                                                     │
│   225 │   │   def _to_str(obj, is_root=False):                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
DataSetError: Failed while saving data to data set MlflowModelLoggerDataSet(artifact_path=model, flavor=mlflow.transformers, load_args={}, save_args={}).
The provided model configuration cannot be created as a Pipeline. Please verify that all required and compatible components are specified with the correct keys.

What does work is to package as a transforms Pipeline.

@Galileo-Galilei
Copy link
Owner

Indeed a pipeline is supposed to work 👍
"A trained transformers Pipeline or a dictionary that maps required components of a pipeline to the named keys of [“model”, “image_processor”, “tokenizer”, “feature_extractor”]. The model key in the dictionary must map to a value that inherits from PreTrainedModel, TFPreTrainedModel, or FlaxPreTrainedModel.".

Looking at the mlflow doc, it seems that you need to specify an additional tokenizer key and not only the model one to your dictionnary.

If it ok for you, I'close the issue!

@hugocool
Copy link
Author

okay,

yeah, maybe it would be great for other to have an examples section in kedro-mlflow for the different models. We could add the transformers to there as well.
Also because working with transformers is a little more involved than most other packages due to its own packaging ecosystem, like the Trainer, PreTrainedModel and Pipeline.
Especially if you want to do things like re-loading models, multi-step finetuning, HPO, and or cross-validation on seperate test sets.

One thing, i would like to add, which was also mentioned in the Kedro slack, is the way the MetricsDataset works. I would expect to be able to just log a dict directly, however instead i need to make separate MetricDataSet for each metric, and return the metrics as a list.
The docs arent as clear, and the MetricsDataSet (at least for me) goes against my expectation, mostly because the usecase of logging multiple metrics (for example loss, acc, precision, f1, etc) is much more common, and now requires quite an ugly solution IMHO.

@Galileo-Galilei
Copy link
Owner

I close the issue because there is an other one opened about MLflowMetricsDataset #440 and the original issue is solved.

I will accept documentation PR but it is not possible for me to document all these specific behaviours which are pure mlflow and much better documented over their documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants