Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3122] [spike] explore testing command options to understand technical complexities and determine best option #8651

Closed
1 task done
Tracked by #8283
graciegoheen opened this issue Sep 15, 2023 · 8 comments
Assignees
Labels
user docs [docs.getdbt.com] Needs better documentation

Comments

@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 15, 2023

Housekeeping

  • I am a maintainer of dbt-core

Short description

As discussed in this issue, as we expand our test coverage we need to rethink our testing commands to ensure clarity and usability. We will have 2 types of tests in dbt:

  • dbt "data" tests: Test your data outputs (dbt models, snapshots, seeds, etc.) and inputs (dbt sources) in your warehouse to ensure your data is valid given your defined assertions. This is the type of testing we currently support.
  • dbt "unit" tests: Test your modeling logic using a small set of static inputs to validate that your code is working as expected. This is the new type of testing we're building as part of this initiative.

We have a few options for the associated commands:

  1. dbt test —-unit and dbt test —-data
  2. dbt test unit and dbt test data
  3. dbt unit-test and dbt data-test
  4. dbt test --select test_type:unit and dbt test --select test_type:data

We would also need to decide what happens to the legacy dbt test and dbt build commands:

Options for dbt test:

  1. dbt test runs all of your tests (both unit and data)
  2. dbt test is an alias for dbt test —-data (or whichever option we go with above); only runs data tests

Options for dbt build:

  1. dbt build runs all of your tests (both unit and data), and we provide a way to just run one type of test like dbt build --unit and dbt build --data
  2. dbt build only runs data tests

While 2 might more accurately maintain legacy dbt test/build behavior, this wouldn't actually be behavior change right away, because no one will have unit tests defined in their projects.

1, on the other hand, would be simpler to explain and makes it clear that these things are both "tests".

We would recommend as a Best Practice that you should exclude unit tests from production runs, sorta like how we recommend that you exclude unchanged views: https://docs.getdbt.com/docs/cloud/billing#build-only-changed-views.

This may impact the "test type" and "test name" selection methods. I'd be inclined to get rid of the test_type selection method, as they aren't actually "types" more just methods of implementing data tests (there are also other ways to achieve this selection whether via tags or file_path).

There may be technical complexities with each of these options, so we should spike investigating them as options to decide the best past forward.

From @MichelleArk

In order to support:
dbt test --data
dbt test --unit
we'd likely to the if/else routing for task setup somewhere here, which would make it difficult with our current click setup to differentiate between which options are supported for which subcommand (--unit or --data).
But on the flip side, defining these as click subcommands (dbt test unit, dbt test data) would make it difficult to preserve backwards compatibility with the previous dbt test command.

Acceptance criteria

We understand the complexities of each of the command options and have determined the best path forward. See expanded acceptance criteria in comment below.

Impact to Other Teams

None

Will backports be required?

None

Context

No response

@github-actions github-actions bot changed the title [spike] explore testing command options to understand technical complexities and determine best option [CT-3122] [spike] explore testing command options to understand technical complexities and determine best option Sep 15, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 2, 2023

Considerations from estimation:

  • (3) is cleanest for reflecting that these are separate commands, including in --help text

Why not pursue (3)?

Scope of this spike:

  • Technical complexity of each option above (1-3)

Preference from engineering is for option (3). If that's the choice, we do not need a spike.

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Oct 5, 2023

Acceptance criteria:

dbt test

  • dbt test runs all of your tests (both unit tests and data tests)
  • we provide a simple mechanism to the dbt test command to specify which types of tests to run (unit tests and/or data tests)

dbt build

  • dbt build runs all of your tests (both unit and data)
  • we provide a simple mechanism to the dbt build command to specify which types of tests to run (unit tests and/or data tests)

The question for both is ... what is the mechanism to specify which types of tests to run

  • If we one day added a 3rd type of tests, the mechanism should be flexible enough to select exactly the test types you want to be executed (only a, only b, only c, a and b, b and c, a and c, a and b and c).
  • The simple mechanism could be:
    • a flag
    • an environment variable
    • both a and b
    • a selector
    • something else?

Open question: Are we planning to support different flags for unit tests vs. data tests? Or will we have the same flags for both?
- --store-failures
- --warn-error
- --select by test_name

What to do about the test_type method

  • this is actually just "how was the data test implemented" (yml or in sql)
  • there's a work-around (just add a tag "singular_tests" to your tests/singular folder and select via tag; select via path path:method tests/singular/)
  • this is not a very popular selection method

I believe our two "test types" are unit tests and data tests.
I believe two ways to implement data tests are "singular" and "generic".

We could:
a) get rid of this selector (to avoid confusion)
b) add to this selector (test_type can be unit, data, generic, or singular)
c) leave this selector as is
d) rename the selector to be defined: (or something clearer)

@graciegoheen graciegoheen removed their assignment Oct 24, 2023
@graciegoheen
Copy link
Contributor Author

graciegoheen commented Oct 31, 2023

Notes from refinement:

  • 2 would be incredibly tricky to do (sounds like not worth it!)

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Oct 31, 2023

Option 2 dbt test unit and dbt test data will not work because it is incredibly hard to do a subcommand under an existing command. Tried that with deps(link) and didn't workout well.

@aranke
Copy link
Member

aranke commented Oct 31, 2023

Can we just add unit-test as a step that build runs and bypass the issue altogether?

Users can then use selectors to specify whether they want unit tests, data tests, or both; similar to seed or snapshot.

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Nov 1, 2023

@aranke How would users select that they want only unit tests or data tests to run? Via --select? An environment variable?

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Nov 6, 2023

the unit testing squad had a great discussion today on “what should the mechanism be to select only unit tests or only data test to run” and have narrowed it down to 2 main options:

  1. option 1 - unit tests and data tests are different resource types (like models vs. seeds) that have their own unique commands dbt unit-test and dbt data-test and can be selected via --select resource_type:unit-test or --select resource_type:data-test for dbt build
  2. option 2 - unit tests and data tests are types of tests that can be selected using --select test_type:unit or --select test_type:data for dbt build and dbt test

Example scenarios:

“i need to run my unit tests” - likely in development
option 1: dbt unit-test
option 2: dbt test -s test_type:unit

“i need to run all of my tests” - likely in development & CI
option 1: dbt unit-test && dbt data-test
option 2: dbt test

“i need to run just my data tests” - likely in production
option 1: dbt data-test
option 2: dbt test -s test_type:data

“i need to build everything except my unit tests” - likely in production
option 1: dbt build -e resource_type:unit-test
option 2: dbt build -e test_type:unit

“i need to build everything” - likely in CI
option 1: dbt build
option 2: dbt build

some additional considerations

for option 1:

  • dbt test dies eventually, but for now is an alias for dbt data-test
  • we kill the test_type method

for option 2:

  • dbt test executes all test types by default
  • dbt test -s test_type:generic and dbt test -s test_type:singular still work as documented (they’re just sub-selections of test_type:data)

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Nov 9, 2023

After some discussion with dbt users - we're going to move forward with option 2: unit tests and data tests are types of tests that can be selected using --select test_type:unit or --select test_type:data for dbt build and dbt test. Some main considerations:

  • We are compelled by "These are all kinds of tests." - They all involve hitting the DWH, actually running real queries (even if it's with fixture inputs). So a command like dbt test just makes sense if you want to run all of the tests.
  • We like the idea that dbt build -s my_model does everything for that model, and you could use standard-fare selection syntax (including even a default yaml selector) to pare it down if desired.
  • Option 2 has "less new things" / better for backwards-compatibility
  • We like the clarity of explicitly excluding unit tests from when desired (like in production runs) in a way that it can be seen in the command, i.e. dbt build --exclude test_type:unit

Thanks for weighing in here! I'll go ahead and close this out and open up some implementation ticket(s):

  • add new test types test_type:unit and test_type:data to --select and --exclude for dbt build, dbt test, and dbt list
  • add new TEST_TYPES environment variable to allow folks to specify the test types to run in a given environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

7 participants