Improving / Updating how some hooks work #118

noel · 2023-03-23T19:28:20Z

Background

When we started using pre-commit-dbt (the original project) we found that some hooks did not exhibit what we believe is the "correct" behavior. We made updates to the way some hooks worked and merged those changes. Now that we have moved to dbt-checkpoint and officially updated the version, some people are running into issues due to the new behavior.

What were we trying to solve for?

We noticed that a hook we were using check_model_has_all_columns was not always catching issues. The root cause was that it only checked that a property (yml) file contained all the columns in a model (sql) file if the model was changed. However, if the property file was the only file changed then it was not compared to the model file. This would allow someone to delete or add a column to the property file even though it did not match the model file. We felt this check should assure that these two files are always in sync and as such it should check if either file is changed.

How was this issue addressed?

We "fill in" the missing files so that if the yml file is changed we find the corresponding sql file so we can make sure we do the proper check.

Side effects

@followingell reported an issue with check-model-has-tests where excluded files were being included in the check. @karabulute found what we had implemented and submitted a PR that removed the functionality we added.

Our Opinion

We don't believe we should remove get_missing_file_paths from all the hooks because without this we are indirectly not complying with the spirit of the hook.

Proposal

We propose we add a parameter to the hooks that have this behavior so that files can be excluded, but we cannot use the pre-commit exclude parameter because we don't have that information in dbt-checkpoint.
Instead of doing this

- id: check-model-has-tests
  description: "Ensures that the model has a number of tests"
  args: ["--test-cnt", "1", "--"]
  exclude: |
    (?x)(
      models/demo
    )

We propose this

- id: check-model-has-tests
  description: "Ensures that the model has a number of tests"
  args: ["--test-cnt", "1", "--exclude","models/demo", "--"]

Which hooks are Impacted

Hooks that implement yml/sql file discovery:

Discover SQL:
- check_column_name_contract https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-column-name-contract
- check_model_has_all_columns https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-has-all-columns
- check_model_has_tests https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-has-tests
- check_model_has_tests_by_group https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-has-tests-by-group
- check_model_has_tests_by_name https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-has-tests-by-name
- check_model_has_tests_by_type https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-has-tests-by-type
- check_model_name_contract https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-name-contract
- check_model_parents_database https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-parents-database
- check_model_tags https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-tags
Discover YML:
- check_model_has_description https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-model-has-description
Discover both:
- check_macro_has_description https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-macro-has-description

The text was updated successfully, but these errors were encountered:

JFrackson · 2023-04-03T13:25:49Z

@noel or @BAntonellini : Can you provide some more clarity on why the pre-commit exclude configuration is not an option here?

Its availability at multiple levels of the pre-commit-config is a compelling reason to use it if possible instead of creating another CLI exclude option. For example, if I always wanted to ignore a given directory for every single ID, it would be simpler/DRYer to use exclude at the repo level in my pre-commit-config.

BAntonellini · 2023-04-03T14:26:40Z

@JFrackson pre-commit's exclude works in a previous stage before reaching dbt-checkpoint hooks. What we receive in dbt-checkpoint are the results of pre-commit configurations, and we have no access to what the user specified in exclude, be it at repo or hook level.

JFrackson · 2023-04-10T08:49:46Z

Thanks for clarifying @BAntonellini . So without a more significant change it wouldn't be possible. That seems fine to me then to merge this as is! Let's merge this and then push a v1.2 so that there's a new stable release with this patch and the others recently.

Also, just in case users would find it useful: we can create an issue for making exclusions easier so that users can leave comments or reactions on what they would like to see (i.e. to keep conversation in one place). Would you mind creating that issue after you merge?

noel added the enhancement New feature or request label Mar 23, 2023

BAntonellini mentioned this issue Mar 23, 2023

Fix/support excluding files at hook level #119

Merged

BAntonellini added the priority: high label Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving / Updating how some hooks work #118

Improving / Updating how some hooks work #118

noel commented Mar 23, 2023 •

edited

Loading

JFrackson commented Apr 3, 2023

BAntonellini commented Apr 3, 2023

JFrackson commented Apr 10, 2023

Improving / Updating how some hooks work #118

Improving / Updating how some hooks work #118

Comments

noel commented Mar 23, 2023 • edited Loading

Background

What were we trying to solve for?

How was this issue addressed?

Side effects

Our Opinion

Proposal

Which hooks are Impacted

JFrackson commented Apr 3, 2023

BAntonellini commented Apr 3, 2023

JFrackson commented Apr 10, 2023

noel commented Mar 23, 2023 •

edited

Loading