You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2022-08-12 00:02:08,729] {taskinstance.py:1909} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1471, in _run_raw_task
self._execute_task_with_callbacks(context, test_mode)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1575, in _execute_task_with_callbacks
task_orig = self.render_templates(context=context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2232, in render_templates
rendered_task = self.task.render_template_fields(context)
File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1188, in render_template_fields
self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 71, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.9/site-packages/airflow/models/abstractoperator.py", line 341, in _do_render_template_fields
if not value:
File "/usr/local/lib/python3.9/site-packages/pandas/core/generic.py", line 1537, in __nonzero__
raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
got ValueError() raised.
the Error seems to be coming from the pandas package - and seeing the DAG file of which caused this error, looks like
pandas DataFrame was used:
fromdatetimeimportdatetime, timedeltafrompandasimportDataFramefromtypingimportAny, Dictimportyaml
...
defxcom_to_list(xcom, cols: Dict[str, str]) ->list:
""" Take xcom and return a list of dictionaries. Parameter: xcom: Data from upstream task. cols (dict): Dictionary of column name mapped with old column name. Return: results (list) """importpandasaspddf=pd.DataFrame(xcom)[cols.values()]
df.columns=cols.keys()
results=df.to_dict(orient="records")
returnresults
...
And looks like the value error was coming from the fact that
panda DataFrame was given as the value and python was trying to see if
ifnotvalue:
was true or not, and got the value error : ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So, would like to see if this can be fixed, and therefore have OpenLineage avoid error when Airflow DAG is using pandas DataFrame .
The text was updated successfully, but these errors were encountered:
>>> from pandas import DataFrame
>>> df = DataFrame()
>>> if not df:
... print("hey")
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Running a simple test to see if DataFrame is compatible with if not value can be tested out quite easily.
Hey @howardyoo,
as you pointed correctly, the render_templates method is called during task execution which makes it Airflow specific issue. Pandas DataFrames are for sure not supported in Airflow XComs (I'm not really sure what's the connection between xcom_to_list method and rendered template fields). However, the Airflow code snippet shows that you cannot set Pandas DataFrame (or anything that raises exception on if not value check) as template field in Airflow Operator.
Would you like to elaborate more on what's the example Pandas DataFrame as the template field and how xcom_to_list is used?
Also, just my two cents, but it may be better to perhaps put the render_template() in try .. catch statement so that if the rendering fails for some reason, we may be able to get the error and even populate the error message as errorFacet - for more graceful error handling.
We have a code change in (0.11.0) https://github.com/OpenLineage/OpenLineage/pull/870/files which has
following added:
in
integration/airflow/openlineage/airflow/listener.py
here's the snippet of exception trace:
and looks like when the listener was calling the
render_templates()
on task instance,In this part of the code in airflow:
https://github.com/apache/airflow/blob/fe0f903f750d79cdb7ae0cfe105397eedc9af141/airflow/models/abstractoperator.py#L360
got ValueError() raised.
the Error seems to be coming from the
pandas
package - and seeing the DAG file of which caused this error, looks likepandas DataFrame was used:
And looks like the value error was coming from the fact that
panda DataFrame was given as the value and python was trying to see if
was true or not, and got the value error :
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So, would like to see if this can be fixed, and therefore have OpenLineage avoid error when Airflow DAG is using pandas DataFrame .
The text was updated successfully, but these errors were encountered: