You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So apply_defaults throws an error if any args or kwargs which are explicitly mentioned in the function signature are empty (i.e. None). Since GreatExpectationsBigQueryOperator names expectation_suite_name as an positional arg (not variable, i.e. hidden in *args), it becomes mandatory to the function call.
The problem is when I want to use checkpoints, which have this check in the GreatExpectationsOperator:
# Check that only the correct args to validate are passed# this doesn't cover the case where only one of expectation_suite_name or batch_kwargs is specified# along with one of the others, but I'm ok with just giving precedence to the correct oneifsum(bool(x) forxin [(expectation_suite_nameandbatch_kwargs), assets_to_validate, checkpoint_name]) !=1:
raiseValueError("Exactly one of expectation_suite_name + batch_kwargs, assets_to_validate, \ or checkpoint_name is required to run validation.")
As a result, I am mandated by GreatExpectationsBigQueryOperatorapply_defaults() call to have expectation_suite_name, but the GreatExpectationsOperator requires me to only name the checkpoint_name.
Without expectation_suite_name:
airflow.exceptions.AirflowException: Argument ['expectation_suite_name'] is required
With expectation_suite_name:
Traceback (most recent call last):
File "dags/gfk/test_ge.py", line 54, in <module>
dag=dag,
File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/great_expectations_provider/operators/great_expectations_bigquery.py", line 150, in __init__
**kwargs)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 98, in wrapper
result = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/great_expectations_provider/operators/great_expectations.py", line 85, in __init__
or checkpoint_name is required to run validation.")
ValueError: Exactly one of expectation_suite_name + batch_kwargs, assets_to_validate, or checkpoint_name is required to run validation.
I wonder why it fails though, since the check seems to allow for one of the values (batch_kwargs or expectation_suite_name) to be set, but it does not work. I will investigate it bit further tomorrow.
In order to access the expectation_suite_name of the parent class, just put it back to the variable kwargs, call super() and access it via self, maybe? Or remove apply_defaults?
Versions
airflow==1.10.12
airflow-provider-great-expectations==0.0.4
great-expectations==0.13.19
Python 3.7.10
The text was updated successfully, but these errors were encountered:
Found out why it doesn't work. batch_kwargs are always set in the GreatExpectationsBigQueryOperator before the super() call:
batch_kwargs=self.get_batch_kwargs()
# Call the parent constructor but override the default alerting behavior in the parent by hard coding# fail_task_on_validation_failure=False. This is done because we want to alert a little differently# than the parent class by sending an email to the user and then throwing an Airflow exception whenever# data doesn't match Expectations.super().__init__(data_context=data_context, batch_kwargs=batch_kwargs,
expectation_suite_name=expectation_suite_name, fail_task_on_validation_failure=False,
**kwargs)
So
apply_defaults
throws an error if any args or kwargs which are explicitly mentioned in the function signature are empty (i.e.None
). SinceGreatExpectationsBigQueryOperator
namesexpectation_suite_name
as an positional arg (not variable, i.e. hidden in*args
), it becomes mandatory to the function call.The problem is when I want to use checkpoints, which have this check in the
GreatExpectationsOperator
:As a result, I am mandated by
GreatExpectationsBigQueryOperator
apply_defaults()
call to haveexpectation_suite_name
, but theGreatExpectationsOperator
requires me to only name thecheckpoint_name
.Without
expectation_suite_name
:With
expectation_suite_name
:I wonder why it fails though, since the check seems to allow for one of the values (
batch_kwargs
orexpectation_suite_name
) to be set, but it does not work. I will investigate it bit further tomorrow.In order to access the
expectation_suite_name
of the parent class, just put it back to the variable kwargs, callsuper()
and access it viaself
, maybe? Or removeapply_defaults
?Versions
airflow==1.10.12
airflow-provider-great-expectations==0.0.4
great-expectations==0.13.19
Python 3.7.10
The text was updated successfully, but these errors were encountered: