Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: add warning on duplicated yaml keys #5146

Merged
merged 4 commits into from
Apr 28, 2022

Conversation

jeremyyeo
Copy link
Contributor

@jeremyyeo jeremyyeo commented Apr 25, 2022

resolves #5114

Description

When we have a duplicate top level key in our schema yml files, we should see the following warning:

$ dbt run
08:01:05  Running with dbt=1.2.0-a1
08:01:05  [WARNING]: Compilation Error
  Duplicate 'models' top level key found in yaml file models/schema.yml.
08:01:05  [WARNING]: Compilation Error
  Duplicate 'models' top level key found in yaml file models/sub/schema.yml.
08:01:05  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
08:01:05  
08:01:06  Concurrency: 1 threads (target='dev')
08:01:06  
08:01:06  1 of 2 START table model public.my_model_a ..................................... [RUN]
08:01:06  1 of 2 OK created table model public.my_model_a ................................ [SELECT 1 in 0.10s]

  1. In core/dbt/exceptions.py, DuplicateYamlKeyException is a no-op of CompilationException - is this okay?

  2. For the tests, because we warn instead of error, I'm using run_dbt_and_capture() from dbt.tests.utils and asserting that stdout has the right string - not too sure if this is the right way to do this. I also tried something along the lines of:

from dbt.exceptions import DuplicateYamlKeyException

...

def test_duplicate_key_in_yaml(project):
    with pytest.raises(DuplicateYamlKeyException):
        run_dbt(["run"])

Which didn't quite work because we print a warning instead of raising an error. (Ftr, I just cribbed what was in dbt/tests/functional/basic/tests_basic.py for this one).

Checklist

@jeremyyeo jeremyyeo requested a review from emmyoop April 25, 2022 10:31
@jeremyyeo jeremyyeo requested review from a team as code owners April 25, 2022 10:31
@cla-bot cla-bot bot added the cla:yes label Apr 25, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@jeremyyeo
Copy link
Contributor Author

jeremyyeo commented Apr 25, 2022

Integration test from days of yonder fails led me to a yaml test file (https://github.com/dbt-labs/dbt-core/blob/main/test/integration/006_simple_dependency_tests/local_dependency/models/schema.yml) with actual duplicated top level keys - it looks like the "mapping node should be a set" 1 so curious if that particular test worked properly before 🤔


Looks like as a byproduct of this, previously when the project had multiple duplicated keys:

...
models:
  - name: my_model_a
    columns:
      - name: id
        tests:
          - unique
  
 models:
  - name: my_model_b
    columns:
      - name: id
        tests:
          - unique

A dbt test would silently test according to the models key that is the last in the schema yml file (i.e. just my_model_b will be tested), but now, since the DuplicateYamlKeyException is raised in the loader, we will not be doing any tests at all.

Footnotes

  1. https://stackoverflow.com/questions/47668308/duplicate-key-in-yaml-configuaration-file

@emmyoop emmyoop removed request for a team and VersusFacit April 25, 2022 16:01
Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few comments about structure that I wanted to get back to you on. I'll take a look at the failing legacy tests and see what's happening there next.

tests/functional/duplications/test_basic_duplications.py Outdated Show resolved Hide resolved
core/dbt/clients/yaml_helper.py Show resolved Hide resolved
Comment on lines 80 to 81
msg = f"{e} {path.searched_path}/{path.relative_path}."
dbt.exceptions.warn_or_error(msg, log_fmt=warning_tag("{}"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use warn_or_raise here instead of warn_or_error. That way the exception you throw above bubbles up instead of getting overlaid by the CompilationException by warn_or_error. To get the message passed through as DuplicateYamlKeyException you can modify the message.

e.msg = f"{e} {path.searched_path}/{path.relative_path}."
dbt.exceptions.warn_or_raise(e, log_fmt=warning_tag("{}"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new changes I see:

$ dbt run                                 
10:59:42  Running with dbt=1.2.0-a1
10:59:42  [WARNING]: Compilation Error
  Compilation Error
    Duplicate 'models' key found in yaml file models/schema.yml.
10:59:43  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
10:59:43  
10:59:43  Concurrency: 1 threads (target='dev')
...

Wondering if that was what you expected here?

@jeremyyeo jeremyyeo requested a review from emmyoop April 27, 2022 23:04
@@ -4,7 +4,6 @@ sources:
schema: invalid_schema
tables:
- name: my_table
sources:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmyoop surprisingly fixing the duplicate top level keys in this yaml file in the legacy tests causes it to pass so now really curious what this was doing previously... makes me wonder if we had left 2 sources keys in this file by mistake the last time around but test was passing due to pyaml default of only reading the last source key of this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing anything in the tests that indicate we did it on purpose so you may be right. These tests are going to be rewritten and we'll work through some of the weirdness in them as we go.

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@gshank gshank merged commit 3733817 into main Apr 28, 2022
@gshank gshank deleted the fix/ct-520-warn-on-duplicated-yaml-keys branch April 28, 2022 13:31
agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request May 20, 2022
* add warning on duplicated yaml keys

* update structure and tests

* fix old test schema file

* add changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants