Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3123] Make seed inputs optional for unit tests #8652

Closed
Tracked by #8283
graciegoheen opened this issue Sep 15, 2023 · 6 comments · Fixed by #9064
Closed
Tracked by #8283

[CT-3123] Make seed inputs optional for unit tests #8652

graciegoheen opened this issue Sep 15, 2023 · 6 comments · Fixed by #9064
Assignees
Labels
enhancement New feature or request user docs [docs.getdbt.com] Needs better documentation

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 15, 2023

Description

Our proposed unit testing spec, doesn't yet take into account if one of the inputs to the model you want to add a unit test to is a seed.

Current behavior: Let's say model_a depends on my_seed. If you want to add a unit test to model_a - you need to define an input for my_seed. That feels redundant since seeds don't often change and are already small data inputs.

Proposed solution:

Let's say you're unit testing my_favorite_model that depends on:

  • my_favorite_source (a source)
  • my_second_favorite_model (a model)
  • and my_special_seed (a seed)

You must supply mock input data for all of the direct parent models and sources.

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_favorite_source')
        format: csv
        fixture: my_favorite_source_fixture
      - input: ref('my_second_favorite_model')
        format: csv
        fixture: my_second_favorite_model_fixture

In this case, we just use your seed for my_special_seed input.

You may supply mock input data for direct parent seeds, if desired.

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_favorite_source')
        format: csv
        fixture: my_favorite_source_fixture
      - input: ref('my_second_favorite_model')
        format: csv
        fixture: my_second_favorite_model_fixture
      - input: ref('my_special_seed')
        format: csv
        fixture: my_special_seed_fixture

In this case, we use your fixture my_special_seed_fixture for my_special_seed input.

Acceptance Criteria

  • If no input is provided for a seed, we will use your seed (ignore pre and post hooks)
  • If an input is provided for a seed, we will use the input

Impact to other teams

None

Will backports be required?

No

Context

@graciegoheen graciegoheen added the enhancement New feature or request label Sep 15, 2023
@github-actions github-actions bot changed the title Handle input seeds in unit tests [CT-3123] Handle input seeds in unit tests Sep 15, 2023
@dbeatty10
Copy link
Contributor

Maybe it "just works"?

Let's suppose my_upstream_model_a is a model (rather than a seed) and this is how we configure a fixture for it:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_upstream_model_a')
        format: csv
        fixture: my_upstream_model_a_fixture

Then we can imagine that the above should "just work" even if my_upstream_model_a were a seed instead -- the seed's CSV rows would just be replaced by the CSV rows within my_upstream_model_a_fixture, right?

Can we think of reasons why it wouldn't "just work"?

An additional thought

For the special case of my_upstream_model_a_fixture being a seed, what about the idea of being able to make format and fixture optional:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_upstream_model_a')

Or maybe even going the whole way and making the input: optional for upstream seeds:

unit-tests:
  - name: test_my_model
    model: my_model
    given:
      # presumably other `input`s are explicitly defined here, but at least `my_upstream_model_a` is implicitly defined -- since it's a seed, we already have all the data we need to mock it

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Sep 21, 2023

Updated the issue!! Thanks @dbeatty10 for your great ideas :)


Additional thought...

What would happen if you didn't care to supply mock data for one of the direct parent models (because it's not relevant for the logic that you're trying to test), where you want to instead use an "empty" mock data set or a one-row, all NULL data set?

  • would you not have to supply the input at all?
unit-tests:
  - name: test_my_model
    model: my_model
    given:
  • or would we require that you explicitly specify that behavior?
unit-tests:
  - name: test_my_model
    model: my_model
    given:
      - input: ref('my_upstream_model_a')
        rows: {}

The second feels less magical, but I like that it's an intentional decision rather than the risk of "oops, I forgot to add mock data for my_upstream_model_a but didn't get any error"!

Food for thought... probably for a separate issue. cc: @MichelleArk

@graciegoheen graciegoheen changed the title [CT-3123] Handle input seeds in unit tests [CT-3123] Make seed inputs optional for unit tests Sep 21, 2023
@graciegoheen
Copy link
Contributor Author

graciegoheen commented Sep 25, 2023

Notes from refinement:

  • we should have a consistent limit for how large a seed & fixture can be (maybe if you seed file is larger than the size limit for fixture, you must define a fixture for your seed) -> let's open up a new issue for this!

@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Oct 5, 2023
@graciegoheen
Copy link
Contributor Author

Notes from estimation:

  • seeds can have pre-hooks & post-hooks; can we just ignore these for unit testing? that's what we're doing for models

@aranke
Copy link
Member

aranke commented Nov 14, 2023

Hey @graciegoheen, can you please clarify how this scenario can be reached during unit testing?

seeds can have pre-hooks & post-hooks; can we just ignore these for unit testing? that's what we're doing for models

From what I can see, these only get triggered during dbt seed and not during dbt unit-test.

Furthermore, the spec labels this as an anti-goal:

Unit testing seeds, snapshots, analyses, or macros directly (they can be tested by making a passthrough model though!)

Given this, is it fair to cut pre- / post-hooks for seeds out of scope?

@graciegoheen
Copy link
Contributor Author

Yes, unit testing seeds directly is an anti-goal - so we can cut pre- / post- hooks for seeds out of scope!

aranke added a commit that referenced this issue Nov 16, 2023
…fied in YAML config (#9064)

Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Fix #8652: Use seed value if rows not specified
gshank added a commit that referenced this issue Jan 16, 2024
* Initial implementation of unit testing (from pr #2911)

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* 8295 unit testing artifacts (#8477)

* unit test config: tags & meta (#8565)

* Add additional functional test for unit testing selection, artifacts, etc (#8639)

* Enable inline csv format in unit testing (#8743)

* Support unit testing incremental models (#8891)

* update unit test key: unit -> unit-tests (#8988)


* convert to use unit test name at top level key (#8966)

* csv file fixtures (#9044)

* Unit test support for `state:modified` and `--defer` (#9032)

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>

* Allow use of sources as unit testing inputs (#9059)

* Use daff for diff formatting in unit testing (#8984)

* Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064)

Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Fix #8652: Use seed value if rows not specified

* Move unit testing to test and build commands (#9108)

* Enable unit testing in non-root packages (#9184)

* convert test to data_test (#9201)

* Make fixtures files full-fledged members of manifest and enable partial parsing (#9225)

* In build command run unit tests before models (#9273)

---------

Co-authored-by: Michelle Ark <michelle.ark@dbtlabs.com>
Co-authored-by: Michelle Ark <MichelleArk@users.noreply.github.com>
Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
4 participants