You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we started using pre-commit-dbt (the original project) we found that some hooks did not exhibit what we believe is the "correct" behavior. We made updates to the way some hooks worked and merged those changes. Now that we have moved to dbt-checkpoint and officially updated the version, some people are running into issues due to the new behavior.
What were we trying to solve for?
We noticed that a hook we were using check_model_has_all_columns was not always catching issues. The root cause was that it only checked that a property (yml) file contained all the columns in a model (sql) file if the model was changed. However, if the property file was the only file changed then it was not compared to the model file. This would allow someone to delete or add a column to the property file even though it did not match the model file. We felt this check should assure that these two files are always in sync and as such it should check if either file is changed.
How was this issue addressed?
We "fill in" the missing files so that if the yml file is changed we find the corresponding sql file so we can make sure we do the proper check.
Side effects
@followingell reported an issue with check-model-has-tests where excluded files were being included in the check. @karabulute found what we had implemented and submitted a PR that removed the functionality we added.
Our Opinion
We don't believe we should remove get_missing_file_paths from all the hooks because without this we are indirectly not complying with the spirit of the hook.
Proposal
We propose we add a parameter to the hooks that have this behavior so that files can be excluded, but we cannot use the pre-commit exclude parameter because we don't have that information in dbt-checkpoint.
Instead of doing this
- id: check-model-has-tests
description: "Ensures that the model has a number of tests"
args: ["--test-cnt", "1", "--"]
exclude: |
(?x)(
models/demo
)
We propose this
- id: check-model-has-tests
description: "Ensures that the model has a number of tests"
args: ["--test-cnt", "1", "--exclude","models/demo", "--"]
@noel or @BAntonellini : Can you provide some more clarity on why the pre-commit exclude configuration is not an option here?
Its availability at multiple levels of the pre-commit-config is a compelling reason to use it if possible instead of creating another CLI exclude option. For example, if I always wanted to ignore a given directory for every single ID, it would be simpler/DRYer to use exclude at the repo level in my pre-commit-config.
@JFrackson pre-commit's exclude works in a previous stage before reaching dbt-checkpoint hooks. What we receive in dbt-checkpoint are the results of pre-commit configurations, and we have no access to what the user specified in exclude, be it at repo or hook level.
Thanks for clarifying @BAntonellini . So without a more significant change it wouldn't be possible. That seems fine to me then to merge this as is! Let's merge this and then push a v1.2 so that there's a new stable release with this patch and the others recently.
Also, just in case users would find it useful: we can create an issue for making exclusions easier so that users can leave comments or reactions on what they would like to see (i.e. to keep conversation in one place). Would you mind creating that issue after you merge?
Background
When we started using pre-commit-dbt (the original project) we found that some hooks did not exhibit what we believe is the "correct" behavior. We made updates to the way some hooks worked and merged those changes. Now that we have moved to dbt-checkpoint and officially updated the version, some people are running into issues due to the new behavior.
What were we trying to solve for?
We noticed that a hook we were using
check_model_has_all_columns
was not always catching issues. The root cause was that it only checked that a property (yml) file contained all the columns in a model (sql) file if the model was changed. However, if the property file was the only file changed then it was not compared to the model file. This would allow someone to delete or add a column to the property file even though it did not match the model file. We felt this check should assure that these two files are always in sync and as such it should check if either file is changed.How was this issue addressed?
We "fill in" the missing files so that if the yml file is changed we find the corresponding sql file so we can make sure we do the proper check.
Side effects
@followingell reported an issue with check-model-has-tests where
excluded
files were being included in the check. @karabulute found what we had implemented and submitted a PR that removed the functionality we added.Our Opinion
We don't believe we should remove
get_missing_file_paths
from all the hooks because without this we are indirectly not complying with the spirit of the hook.Proposal
We propose we add a parameter to the hooks that have this behavior so that files can be excluded, but we cannot use the pre-commit
exclude
parameter because we don't have that information in dbt-checkpoint.Instead of doing this
We propose this
Which hooks are Impacted
Hooks that implement yml/sql file discovery:
The text was updated successfully, but these errors were encountered: