Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Pandas ValueError occurs during task_instance.render_template() #1017

Closed
howardyoo opened this issue Aug 17, 2022 · 3 comments · Fixed by #1028
Closed

Bug: Pandas ValueError occurs during task_instance.render_template() #1017

howardyoo opened this issue Aug 17, 2022 · 3 comments · Fixed by #1028
Assignees
Labels
area:integration/airflow openlineage-airflow kind:bug Something isn't working

Comments

@howardyoo
Copy link
Contributor

We have a code change in (0.11.0) https://github.com/OpenLineage/OpenLineage/pull/870/files which has
following added:
image
in integration/airflow/openlineage/airflow/listener.py

here's the snippet of exception trace:

[2022-08-12 00:02:08,729] {taskinstance.py:1909} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1471, in _run_raw_task
    self._execute_task_with_callbacks(context, test_mode)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1575, in _execute_task_with_callbacks
    task_orig = self.render_templates(context=context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2232, in render_templates
    rendered_task = self.task.render_template_fields(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1188, in render_template_fields
    self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 71, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/abstractoperator.py", line 341, in _do_render_template_fields
    if not value:
  File "/usr/local/lib/python3.9/site-packages/pandas/core/generic.py", line 1537, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

and looks like when the listener was calling the render_templates() on task instance,
In this part of the code in airflow:
https://github.com/apache/airflow/blob/fe0f903f750d79cdb7ae0cfe105397eedc9af141/airflow/models/abstractoperator.py#L360

got ValueError() raised.
the Error seems to be coming from the pandas package - and seeing the DAG file of which caused this error, looks like
pandas DataFrame was used:

from datetime import datetime, timedelta
from pandas import DataFrame
from typing import Any, Dict
import yaml
...
def xcom_to_list(xcom, cols: Dict[str, str]) -> list:
    """
    Take xcom and return a list of dictionaries.
        Parameter:
            xcom: Data from upstream task.
            cols (dict): Dictionary of column name mapped with old column name.
        Return:
            results (list)
    """
    import pandas as pd

    df = pd.DataFrame(xcom)[cols.values()]
    df.columns = cols.keys()
    results = df.to_dict(orient="records")

    return results
...

And looks like the value error was coming from the fact that
panda DataFrame was given as the value and python was trying to see if

    if not value:

was true or not, and got the value error : ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So, would like to see if this can be fixed, and therefore have OpenLineage avoid error when Airflow DAG is using pandas DataFrame .

@howardyoo howardyoo added area:integration/airflow openlineage-airflow kind:bug Something isn't working labels Aug 17, 2022
@howardyoo howardyoo assigned howardyoo and mobuchowski and unassigned howardyoo Aug 17, 2022
@howardyoo
Copy link
Contributor Author

>>> from pandas import DataFrame
>>> df = DataFrame()
>>> if not df:
...   print("hey")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Running a simple test to see if DataFrame is compatible with if not value can be tested out quite easily.

@JDarDagran
Copy link
Contributor

Hey @howardyoo,
as you pointed correctly, the render_templates method is called during task execution which makes it Airflow specific issue. Pandas DataFrames are for sure not supported in Airflow XComs (I'm not really sure what's the connection between xcom_to_list method and rendered template fields). However, the Airflow code snippet shows that you cannot set Pandas DataFrame (or anything that raises exception on if not value check) as template field in Airflow Operator.

Would you like to elaborate more on what's the example Pandas DataFrame as the template field and how xcom_to_list is used?

@JDarDagran JDarDagran assigned JDarDagran and unassigned mobuchowski Aug 18, 2022
@howardyoo
Copy link
Contributor Author

Also, just my two cents, but it may be better to perhaps put the render_template() in try .. catch statement so that if the rendering fails for some reason, we may be able to get the error and even populate the error message as errorFacet - for more graceful error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:integration/airflow openlineage-airflow kind:bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants