[CT-3033] [Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based #8499

MichelleArk · 2023-08-25T19:19:02Z

From the discussion thread: #8275 (reply in thread)

There are two main high-level implementation approaches for unit testing in dbt:

inlining fixtures/expected ouputs as CTE in a single unit test query (to produce the 'actual' result) or,
persisting all input fixtures + expected given outputs, and querying the result of the model SQL run against the persisted input fixtures. Once the unit test finishes, cleanup any persisted fixtures from the warehouse.

I think both are technically feasible and would actually have pretty similar implementations under the hood: either using a materialization that leverages existing ephemeral logic for the 'CTE trickery' route, or actually materializing inputs and the 'actual' test model in the warehouse using the existing seed materialization.

Tradeoffs:
Actually materializing the input/actual datasets is a more accurate representation of how the models are run in production in comparison to the CTE-based approach, and would support a larger set of SQL/dbt functionality than CTEs. For example, syntax that is used sql_headers that may not be valid in a standalone query, or certain types in that can be inserted but not actually declared in a standalone query (dbt-labs/dbt-project-evaluator#290). Do any other limitations come to mind? The tradeoff being performance: actually materializing fixtures/expected/actual in the warehouse, querying them to obtain a diff, and deleting them reliably at the end of the test run all add up to additional latency.

Next steps
So far we've started with the CTE approach, mostly for sake of simplicity, but I do believe it'd be very worthwhile to spike the seed-based approach and quantify more precisely how much slower/complex that approach would be. @gshank also suggested exploring implementing both strategies and either selecting the strategy based on user configuration or the presence of certain conditions (e.g. a sql_header, or particular type on the model being tested). I think a non-CTE stategy would also be necessary to test complex or custom materializations end-to-end (#8275 (reply in thread)) .

Let's implement the seed-based strategy in a spike to understand:

its technical complexity
whether/how it deviates from the CTE-based approach
performance implications in relation to the CTE-based approach (with an remote adapter rather than dbt-postgres)

Ultimately let's use those learnings to recommend whether we should implement unit tests with:

just the CTE-based approach (as it is currently)
just the seed-based approach
a combination of both - with a top-level strategy pattern that toggles between the two under appropriate conditions or user configuration.

The text was updated successfully, but these errors were encountered:

martynydbt · 2023-11-01T15:38:56Z

this may roll to the next sprint

MichelleArk mentioned this issue Aug 25, 2023

[CT-2911] [Epic] Unit testing dbt models #8283

Closed

github-actions bot changed the title ~~[Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based~~ [CT-3033] [Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based Aug 25, 2023

graciegoheen assigned gshank and unassigned gshank Sep 12, 2023

MichelleArk mentioned this issue Sep 18, 2023

[CT-2997] Ability to test incremental models #8422

Closed

martynydbt assigned gshank and unassigned gshank Sep 26, 2023

graciegoheen mentioned this issue Oct 2, 2023

[CT-2998] [SPIKE] Handle unit testing of JSON and ARRAY data types #8423

Closed

graciegoheen assigned gshank Oct 5, 2023

graciegoheen mentioned this issue Oct 26, 2023

[CT-3153] [implementation] limit the size of fixtures for unit testing #8708

Open

1 task

graciegoheen unassigned gshank Nov 7, 2023

graciegoheen added this to the v1.8 milestone Feb 15, 2024

graciegoheen mentioned this issue Feb 27, 2024

[Feature] I want my unit tests to optionally execute pre/post hooks #9683

Closed

3 tasks

graciegoheen added the enhancement New feature or request label Mar 12, 2024

martynydbt assigned MichelleArk Mar 12, 2024

martynydbt modified the milestones: v1.8, v1.9 Mar 13, 2024

MichelleArk removed their assignment Mar 19, 2024

MichelleArk mentioned this issue Apr 24, 2024

[feature] unit testing a recursive CTE fails dbt-labs/dbt-bigquery#1178

Open

2 tasks

ChenyuLInx mentioned this issue May 6, 2024

[CT-3129] Ability to test the incremental behavior of models using incremental strategy that requires partitioning #8664

Open

graciegoheen mentioned this issue May 6, 2024

[Unit Testing] Allow explicit precision testing in unit tests #9884

Open

1 task

jtcohen6 mentioned this issue May 9, 2024

support merge_update_columns logic for unit_tests #10115

Open

2 tasks

jtcohen6 mentioned this issue Jun 3, 2024

[Bug] Macro adapter.get_columns_in_relation() doesn't return any columns during unit testing #10139

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3033] [Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based #8499

[CT-3033] [Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based #8499

MichelleArk commented Aug 25, 2023 •

edited

martynydbt commented Nov 1, 2023

[CT-3033] [Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based #8499

[CT-3033] [Spike] Explore support multiple unit test materialization strategies: CTE vs 'seed'-based #8499

Comments

MichelleArk commented Aug 25, 2023 • edited

martynydbt commented Nov 1, 2023

MichelleArk commented Aug 25, 2023 •

edited