-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding ge bigquery operator #6
adding ge bigquery operator #6
Conversation
…ly change to the base operator is that the execute method returns the results at the end so the bigquery operator can examine the results and use them to get a data docs link
Thanks so much for resubmitting @brian-lavery! I'll take a look this week! |
Just one immediate question/suggestion: Do you want to try out using the convenience methods to create a Data Context which sets some defaults for the GCS backend? Not sure if this covers all your parameters, but maybe take a look? https://docs.greatexpectations.io/en/latest/guides/how_to_guides/configuring_data_contexts/how_to_instantiate_a_data_context_without_a_yml_file.html |
Hi @spbail . I made the change to use the convenience method. I tested it out and it worked. Happy New Year!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, love seeing how this is coming together! I just have a few remarks which would be good to address/explain, but none of them are absolutely critical.
:type datadocs_domain: str | ||
:param email_to: Email address to receive any alerts when expectations are not met. | ||
:type email_to: str | ||
:param fail_if_expectations_not_met: Fail the Airflow task if expectations are not met? Defaults to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent class uses "fail_task_on_validation_failure" for this, which would be good to stick with so the behavior is similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sam. This change has been made.
} | ||
|
||
if self.validation_type == GreatExpectationsValidations.SQL.value: | ||
batch_kwargs["query"] = self.validation_type_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought here: If you look at the parent class, it allows different types of mutually exclusive parameters to specify what asset to validate. I find that a little nicer than specifying the validation_type. So the possible parameters for the class could be
(...., data_asset_name, query, table, ...) and then check in the code that only one of table or query is populated. See line 67 in great_expectations.py
This means you're doing more work behind the scenes, but a user only needs to specify either the table or query and doesn't have to worry about the validation_type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sam. This change has been made.
data_context_config = self.create_data_context_config() | ||
data_context = BaseDataContext(project_config=data_context_config) | ||
batch_kwargs = self.get_batch_kwargs() | ||
super().__init__(data_context=data_context, batch_kwargs=batch_kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note below, I'd add a comment to explain why you're not passing through the fail_task_on_validation_failure/fail_task_if_expectations_not_met argument as-is.
Maybe it's a bit ugly, but do you think it's possible to stick with "fail_task_on_validation_failure", so if anyone looks at the parent class and this one it's clear they're the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sam. This change has been made.
site_name='gcs_site')[0][ | ||
'site_url'] | ||
self.log.info("Data docs url is: %s", data_docs_url) | ||
if results["success"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the logging is already done in the parent class, so probably not needed here.
I'm assuming the reason for not passing through the "fail on validation failure" is that you want the alert email before raising the error right?
I think this works for now, but I'd put in an explicit comment that you're overwriting the parent class behavior somewhat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sam. This change has been made.
@cla-bot check |
@brian-lavery Do you mind signing the CLA, see link in the comment above ^ |
@cla-bot check |
@spbail Discovered this preparing mugs. @brian-lavery signed :) |
Just tested it out locally, and it works! Yay! I didn't test out the email configuration, but the GCS stores and BigQuery datasource work well. I think we're good to ship this! I'll add an example in another commit, and then I can cut a release later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works \o/
Hi @spbail. This pull request is adding the GE Bigquery operator that extends the GE base operator. The only change to the base operator is that the execute method returns the results at the end so the bigquery operator can examine the results to determine success or failure and and use them to get a data docs link. I did a functional test and it worked fine using GE 0.13.1 and Cloud Composer version composer-1.11.3-airflow-1.10.6. I'll still working on unit tests.