Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix at_least_one test when group_by_columns is configured #922

Merged

Conversation

katieclaiborne-duet
Copy link
Contributor

@katieclaiborne-duet katieclaiborne-duet commented Jun 25, 2024

resolves #838

Problem

The pruning logic introduced in #776 disrupted the grouping behavior in the at_least_one test.

Specifically, the initial pruned_rows CTE includes a limit 1 statement, which selects a single record where the target column is not null.

As a result, the test passes inappropriately when the group_by_columns parameter is configured, as long as the target column has at least one not-null value.

Essentially, the group_by_columns parameter currently has no effect on the test. It evaluates target column values as a single group, regardless of whether or not the parameter is present.

Solution

Restore the at_least_one test to its prior state.

Since the goal of #776 was to improve the performance of the test, I also considered a set-logic approach to evaluating a column at the group level:

  • identify all group-by column values
  • identify group-by column values with at least one not-null target column value
  • return group-by column values without at least one not-null target column value as errors

In BigQuery, however, this version of the test took twice as long as the simple reverted version.

Checklist

  • This code is associated with an issue which has been triaged and accepted for development.
  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the README.md (if applicable)

@katieclaiborne-duet katieclaiborne-duet marked this pull request as ready for review June 25, 2024 19:01
@katieclaiborne-duet
Copy link
Contributor Author

@dbeatty10, would you mind to review when you have a chance?

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A crucial step before merging this would be the addition of test(s) as described in #922 (comment)

@dbeatty10 dbeatty10 changed the title Revert pruning in at_least_one test Revert pruning in at_least_one test Jul 2, 2024
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this PR @katieclaiborne-duet 🤩

I used #934 to quickly check that the new test you added correctly surfaces the error case discovered in #838. Looks good ✅

It would be nice to keep the optimization and also fix the bug 🤞

When group_by_columns is an empty list, is there a way to keep the optimizations in #776 but avoid the bug reported in #838?

@dbeatty10
Copy link
Contributor

Background

The at_least_one data test was originally added in #32. Then #633 / 78afedd added the group_by_columns argument, but no test was added that would catch a future regression. Then #776 accidentally added a regression, and this was reported in #838.

@dbeatty10 dbeatty10 added the bug Something isn't working label Jul 2, 2024
@katieclaiborne-duet
Copy link
Contributor Author

When group_by_columns is an empty list, is there a way to keep the optimizations in #776 but avoid the bug reported in #838?

Good idea! Let me know what you think of this latest pass.

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome @katieclaiborne-duet 🤩

Thank you for creating a relevant test for this bug and implementing a targeted fix 🧠

@dbeatty10 dbeatty10 added this pull request to the merge queue Jul 8, 2024
Merged via the queue into dbt-labs:main with commit be74299 Jul 8, 2024
4 checks passed
@dbeatty10 dbeatty10 changed the title Revert pruning in at_least_one test Fix at_least_one test when group_by_columns is configured Jul 8, 2024
@katieclaiborne-duet katieclaiborne-duet deleted the kc-revert-at-least-one-pruning branch July 8, 2024 19:18
@tffragoso
Copy link

Thank you so much for the fix, @katieclaiborne-duet! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug in at_least_one test code logic
3 participants