Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hera fails when a script receives list of pydantic objects as list of expressions #827

Closed
2 of 4 tasks
halilbilgin opened this issue Nov 1, 2023 · 6 comments
Closed
2 of 4 tasks
Labels
semver:patch A change requiring a patch version bump type:bug A general bug

Comments

@halilbilgin
Copy link

halilbilgin commented Nov 1, 2023

Pre-bug-report checklist

1. This bug can be reproduced using YAML

2. This bug occurs when...

  • running Hera code without submitting to Argo (e.g. when exporting to YAML)
  • running Hera code and submitting to Argo

Bug report

Hera script input serialization fails when a script receives a list of pydantic objects as expressions
x_dataset = x_dataset_config().result
y_dataset = y_dataset_config().result

# this doesn't work
train_model(name="train-model-fails", arguments={"dataset_configs": [x_dataset, y_dataset]}).result

Where x_dataset_config, y_dataset_config and train_model are script callables, train-model step receives the following error while serializing the input:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/site-packages/hera/workflows/runner.py", line 313, in <module>
    _run()
  File "/usr/local/lib/python3.9/site-packages/hera/workflows/runner.py", line 306, in _run
    result = _runner(args.entrypoint, kwargs_list)
  File "/usr/local/lib/python3.9/site-packages/hera/workflows/runner.py", line 280, in _runner
    return function(**kwargs)
  File "/usr/local/lib/python3.9/site-packages/hera/workflows/runner.py", line 37, in inner
    return f(**filtered_kwargs)
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
  File "pydantic/decorator.py", line 133, in pydantic.decorator.ValidatedFunction.call
  File "pydantic/decorator.py", line 130, in pydantic.decorator.ValidatedFunction.init_model_instance
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for TrainModel
dataset_configs
  value is not a valid list (type=type_error.list)

The value of the input parameter ranking_dataset_configs train_model receives is as follows

["{"filename": "x"}", "{"filename": "y"}"]

Describe the bug
Double quotes in the expression values are not escaped in Argo, which causes an issue.
Above, train_model script receives a list of configs which is equivalent to ["{{steps.x-dataset-config.outputs.result}}", "{{steps.y-dataset-config.outputs.result}}"]. However, Argo doesn't escape the quotes in the expression values so the evaluated list becomes:
["{"filename": "x"}", "{"filename": "y"}"] so, hera fails to serialize this since this is not a valid json dump.

To Reproduce
Full Hera code to reproduce the bug:

from hera.workflows import WorkflowsService, Container, Parameter, Workflow, script, Steps, Script
import json
from pydantic import BaseModel
from typing import List
from hera.shared import global_config

global_config.set_class_defaults(Script, constructor="runner")
global_config.experimental_features["script_runner"] = True
global_config.image = "hbilgin123/example_hera"
global_config.script_command = ["python"]

class DatasetConfig(BaseModel):
    filename: str


@script(constructor="runner")
def x_dataset_config() -> DatasetConfig:
    return DatasetConfig(filename="x")


@script(constructor="runner")
def y_dataset_config() -> DatasetConfig:
    return DatasetConfig(filename="y")

@script(constructor="runner")
def escape_quotes(data: dict) -> str:
    return json.dumps(data).replace('"', '\\"')


@script(constructor="runner")
def train_model_fails(dataset_configs: List[DatasetConfig]) -> str:
    return "Some model training has been done here"


@script(constructor="runner")
def train_model_works(dataset_configs: List[str]) -> str:
    processed_configs = [DatasetConfig.parse_raw(raw_config) for raw_config in dataset_configs]
    return "Some model training has been done here"


with Workflow(
    generate_name="arguments-parameters-",
    entrypoint="try",
    namespace="argo",
    workflows_service=WorkflowsService(host="https://127.0.0.1:2746", verify_ssl=False),
) as w:
    with Steps(name="try") as s:
        x_dataset = x_dataset_config().result
        y_dataset = y_dataset_config().result
        escaped_datasets = [escape_quotes(name='escape-x', arguments={'data': x_dataset}).result, escape_quotes(name='escape-y', arguments={'data': y_dataset}).result]

        with s.parallel():
            # this works
            train_model_works(name="train-model-works", arguments={"dataset_configs": escaped_datasets}).result

            # this doesn't work
            train_model_fails(name="train-model-fails-1", arguments={"dataset_configs": [x_dataset, y_dataset]}).result

            # this doesn't work neither
            train_model_fails(name="train-model-fails-2", arguments={"dataset_configs": escaped_datasets}).result

Expected behavior
The train-model-fails-1 step above could have worked without any additional change.

My current workaround
Escaping double quotes and serializing dataset_configs manually as it is shown in the example above (i.e. train-model-works step)

Environment

  • OS: Windows, WSL2 (minikube)
  • Browser: Chrome
  • Version of Hera: 5.9.0
  • Version of Argo: 3.4.5

Additional context
Add any other context about the problem here.

@halilbilgin
Copy link
Author

halilbilgin commented Nov 1, 2023

Yaml template

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: arguments-parameters-
  name: arguments-parameters-h5tr8
  namespace: argo
spec:
  entrypoint: try
  templates:
  - name: try
    steps:
    - - name: x-dataset-config
        template: x-dataset-config
    - - name: y-dataset-config
        template: y-dataset-config
    - - arguments:
          parameters:
          - name: data
            value: '{{steps.x-dataset-config.outputs.result}}'
        name: escape-x
        template: escape-quotes
    - - arguments:
          parameters:
          - name: data
            value: '{{steps.y-dataset-config.outputs.result}}'
        name: escape-y
        template: escape-quotes
    - - arguments:
          parameters:
          - name: dataset_configs
            value: '["{{steps.escape-x.outputs.result}}", "{{steps.escape-y.outputs.result}}"]'
        name: train-model-works
        template: train-model-works
      - arguments:
          parameters:
          - name: dataset_configs
            value: '["{{steps.x-dataset-config.outputs.result}}", "{{steps.y-dataset-config.outputs.result}}"]'
        name: train-model-fails-1
        template: train-model-fails
      - arguments:
          parameters:
          - name: dataset_configs
            value: '["{{steps.escape-x.outputs.result}}", "{{steps.escape-y.outputs.result}}"]'
        name: train-model-fails-2
        template: train-model-fails
  - name: x-dataset-config
    script:
      args:
      - -m
      - hera.workflows.runner
      - -e
      - example_hera.callable_list_parameter_passing:x_dataset_config
      command:
      - python
      image: hbilgin123/example_hera
      source: '{{inputs.parameters}}'
  - name: y-dataset-config
    script:
      args:
      - -m
      - hera.workflows.runner
      - -e
      - example_hera.callable_list_parameter_passing:y_dataset_config
      command:
      - python
      image: hbilgin123/example_hera
      source: '{{inputs.parameters}}'
  - inputs:
      parameters:
      - name: data
    name: escape-quotes
    script:
      args:
      - -m
      - hera.workflows.runner
      - -e
      - example_hera.callable_list_parameter_passing:escape_quotes
      command:
      - python
      image: hbilgin123/example_hera
      source: '{{inputs.parameters}}'
  - inputs:
      parameters:
      - name: dataset_configs
    name: train-model-works
    script:
      args:
      - -m
      - hera.workflows.runner
      - -e
      - example_hera.callable_list_parameter_passing:train_model_works
      command:
      - python
      image: hbilgin123/example_hera
      source: '{{inputs.parameters}}'
  - inputs:
      parameters:
      - name: dataset_configs
    name: train-model-fails
    script:
      args:
      - -m
      - hera.workflows.runner
      - -e
      - example_hera.callable_list_parameter_passing:train_model_fails
      command:
      - python
      image: hbilgin123/example_hera
      source: '{{inputs.parameters}}'

This is how the failed step looks on Argo UI
image

@halilbilgin halilbilgin changed the title Hera fails when a script receives lists pydantic objects as expressions Hera fails when a script receives list of pydantic objects as expressions Nov 1, 2023
@halilbilgin halilbilgin changed the title Hera fails when a script receives list of pydantic objects as expressions Hera fails when a script receives list of pydantic objects as list of expressions Nov 1, 2023
@halilbilgin
Copy link
Author

Updated the example above with the docker image published to docker hub.

@flaviuvadan flaviuvadan added semver:patch A change requiring a patch version bump type:bug A general bug labels Nov 15, 2023
@flaviuvadan
Copy link
Collaborator

@halilbilgin thanks for reporting this!

@elliotgunton any immediate thoughts here?

@flaviuvadan
Copy link
Collaborator

@elliotgunton ping

@elliotgunton
Copy link
Collaborator

Need to play around with some examples, but initial thought is figuring out that when the user has arguments like

arguments={"dataset_configs": [x_dataset, y_dataset]}

That get converted to:

value: '["{{steps.escape-x.outputs.result}}", "{{steps.escape-y.outputs.result}}"]'

What we actually want (I think) is without quotes:

value: '[{{steps.escape-x.outputs.result}}, {{steps.escape-y.outputs.result}}]'

So probably we want to consider something like whether the parameter is in a list, then do we want it to be "stringified"?

@elliotgunton
Copy link
Collaborator

The root issue is a dupe of #1146. Closing this as I'm consolidating the "loader" issues into one (#1166)

@elliotgunton elliotgunton closed this as not planned Won't fix, can't repro, duplicate, stale Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:patch A change requiring a patch version bump type:bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants