Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-271] [Feature] not_null test selects column instead of * #4769

Closed
1 task done
willbowditch opened this issue Feb 23, 2022 · 2 comments · Fixed by #4777
Closed
1 task done

[CT-271] [Feature] not_null test selects column instead of * #4769

willbowditch opened this issue Feb 23, 2022 · 2 comments · Fixed by #4777
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! Team:Adapters Issues designated for the adapter area of the code

Comments

@willbowditch
Copy link
Contributor

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

The built in test for not_null writes SQL equivalent to:

SELECT *
FROM model
WHERE column IS NULL

On databases like BigQuery this ends up being quite expensive when compared to:

SELECT column
FROM model
WHERE column IS NULL

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

PR ready to go

Anything else?

No response

@willbowditch willbowditch added enhancement New feature or request triage labels Feb 23, 2022
@github-actions github-actions bot changed the title [Feature] not_null test selects column instead of * [CT-271] [Feature] not_null test selects column instead of * Feb 23, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 24, 2022

@willbowditch Good call!

Two quick thoughts:

  • It's occasionally reasonable for databases to reimplement the built-in generic tests, because of an attribute of that database. For instance, BigQuery doesn't support a fairly common group by syntax, which led us to reimplement bigquery__test_unique over there: Created bigquery__test_unique to handle expressions passed to BQ uniq… dbt-bigquery#10
  • It's desirable for this test to include the full row output when using --store-failures. If the query result stored in the database contained just the null values of the null column, it can't do much to contextualize why those rows are null. Maybe we could have it both ways?
{% macro default__test_not_null(model, column_name) %}

{% set column_list = '*' if should_store_failures() else column_name %}

select {{ column_list }}
from {{ model }}
where {{ column_name }} is null

{% endmacro %}

The first approach would require a code change in the dbt-bigquery repo; the second one could happen here. I lean toward the second, even if it risks making the code in the built-in test look slightly more tricky. I'm going to mark this a good first issue—it sounds like you've got a PR at the ready.

@jtcohen6 jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! Team:Adapters Issues designated for the adapter area of the code Team:Execution and removed triage labels Feb 24, 2022
@willbowditch
Copy link
Contributor Author

Thanks @jtcohen6. I've just raised a PR to core with the additional store-failures check added in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants