Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove apply_defaults from GreatExpectationsBigQueryOperator (or remove explicit kwargs) #25

Closed
Milchdealer opened this issue Apr 28, 2021 · 1 comment

Comments

@Milchdealer
Copy link

So apply_defaults throws an error if any args or kwargs which are explicitly mentioned in the function signature are empty (i.e. None). Since GreatExpectationsBigQueryOperator names expectation_suite_name as an positional arg (not variable, i.e. hidden in *args), it becomes mandatory to the function call.

The problem is when I want to use checkpoints, which have this check in the GreatExpectationsOperator:

# Check that only the correct args to validate are passed
# this doesn't cover the case where only one of expectation_suite_name or batch_kwargs is specified
# along with one of the others, but I'm ok with just giving precedence to the correct one
if sum(bool(x) for x in [(expectation_suite_name and batch_kwargs), assets_to_validate, checkpoint_name]) != 1:
    raise ValueError("Exactly one of expectation_suite_name + batch_kwargs, assets_to_validate, \
     or checkpoint_name is required to run validation.")

As a result, I am mandated by GreatExpectationsBigQueryOperator apply_defaults() call to have expectation_suite_name, but the GreatExpectationsOperator requires me to only name the checkpoint_name.

Without expectation_suite_name:

airflow.exceptions.AirflowException: Argument ['expectation_suite_name'] is required

With expectation_suite_name:

Traceback (most recent call last):
  File "dags/gfk/test_ge.py", line 54, in <module>
    dag=dag,
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 98, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/great_expectations_provider/operators/great_expectations_bigquery.py", line 150, in __init__
    **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/decorators.py", line 98, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/great_expectations_provider/operators/great_expectations.py", line 85, in __init__
    or checkpoint_name is required to run validation.")
ValueError: Exactly one of expectation_suite_name + batch_kwargs, assets_to_validate,              or checkpoint_name is required to run validation.

I wonder why it fails though, since the check seems to allow for one of the values (batch_kwargs or expectation_suite_name) to be set, but it does not work. I will investigate it bit further tomorrow.

In order to access the expectation_suite_name of the parent class, just put it back to the variable kwargs, call super() and access it via self, maybe? Or remove apply_defaults?

Versions

  • airflow==1.10.12
  • airflow-provider-great-expectations==0.0.4
  • great-expectations==0.13.19
  • Python 3.7.10
@Milchdealer
Copy link
Author

Milchdealer commented Apr 29, 2021

Found out why it doesn't work. batch_kwargs are always set in the GreatExpectationsBigQueryOperator before the super() call:

batch_kwargs = self.get_batch_kwargs()
# Call the parent constructor but override the default alerting behavior in the parent by hard coding
# fail_task_on_validation_failure=False.  This is done because we want to alert a little differently
# than the parent class by sending an email to the user and then throwing an Airflow exception whenever
# data doesn't match Expectations.
super().__init__(data_context=data_context, batch_kwargs=batch_kwargs,
                 expectation_suite_name=expectation_suite_name, fail_task_on_validation_failure=False,
                 **kwargs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants