Simplify staging models and inner join in staging. #85

alanmcruickshank · 2022-02-13T10:51:15Z

This reverts many of the changes from #75 and takes an alternate approach. It also supports #84.

Problem

Run results are all stored together, manifest items are not. To solve some of the race condition errors we need a consistent way of matching up elements from the results file and elements from the manifest. This means being able to apply operations to all manifest items together.

The disparate staging models for tests, models, seeds etc... make operations on all elements difficult. This also leads to more models than we need and much duplicated logic.

Aim

The aim of this PR is to unify all staging models for all manifest into a single staging table. Seeds, tests and models co-exist nicely, but this requires a little bit of effort for sources and exposures.

This can then be joined on the results file, early in the model tree to ensure we only look at results which have records in both the manifest and and run results file.

Alternate approaches

I'm pretty sure that having all the nodes in the same model at some point is the right route to take, but there are a few choices I've made which have other options:

We could do the UNION in a non staging model. I considered this but I think it just adds model bloat without a good upside.
We could make the join between manifest and run results in node_executions a left join so that elements which have one of a manifest or run result could come through if they don't have both. This would lead to better latency and maybe better flexibility - but it raises the risk of race conditions and I couldn't think of a good reason for using records which have one but not the other.

NiallRees

This is really great @alanmcruickshank, looks like there are one or two SQLFluff errors but will get this merged once they're resolved. Much DRYer than before.

NiallRees · 2022-02-15T17:30:47Z

models/staging/stg_dbt__nodes.sql

+
+)
+
+select * from surrogate_key


Feels a little weird to combine all these in one model - but for the tradeoff in redundant columns it does massively reduce the overall amount of code so I'm in favour

alanmcruickshank · 2022-02-15T21:15:13Z

Tests are passing now - but don't merge yet. Found an issue on my end.

alanmcruickshank · 2022-02-15T21:48:38Z

False alarm - it was an issue with my local test setup. All good to merge 👍

Simplify staging models and inner join in staging.

075a646

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 10:51 Failure

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 19:28 Failure

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 19:33 Failure

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 19:50 Failure

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 20:00 Failure

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 20:18 Failure

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 20:27 Failure

Fix typos in initial commit

7c2b90d

alanmcruickshank force-pushed the ac/single_results_model branch from 5123954 to 7c2b90d Compare February 13, 2022 20:31

alanmcruickshank temporarily deployed to Approve Integration Tests February 13, 2022 20:31 Inactive

alanmcruickshank had a problem deploying to Approve Integration Tests February 13, 2022 20:31 Failure

linting

17e7954

alanmcruickshank had a problem deploying to Approve Integration Tests February 14, 2022 11:14 Failure

alanmcruickshank temporarily deployed to Approve Integration Tests February 14, 2022 11:14 Inactive

Merge remote-tracking branch 'origin/main' into ac/single_results_model

2f7deae

alanmcruickshank temporarily deployed to Approve Integration Tests February 15, 2022 14:05 Inactive

NiallRees reviewed Feb 15, 2022

View reviewed changes

alanmcruickshank added 2 commits February 15, 2022 18:19

Merge remote-tracking branch 'origin/main' into ac/single_results_model

81d92a2

linting

784c290

alanmcruickshank temporarily deployed to Approve Integration Tests February 15, 2022 19:23 Inactive

NiallRees merged commit d8bd5e1 into brooklyn-data:main Feb 16, 2022

alanmcruickshank mentioned this pull request Feb 24, 2022

Allow for build command artifacts to populate by adjusting filter for data:args:which = 'run' #78

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify staging models and inner join in staging. #85

Simplify staging models and inner join in staging. #85

alanmcruickshank commented Feb 13, 2022

NiallRees left a comment

NiallRees Feb 15, 2022

alanmcruickshank commented Feb 15, 2022

alanmcruickshank commented Feb 15, 2022

Simplify staging models and inner join in staging. #85

Simplify staging models and inner join in staging. #85

Conversation

alanmcruickshank commented Feb 13, 2022

Problem

Aim

Alternate approaches

NiallRees left a comment

Choose a reason for hiding this comment

NiallRees Feb 15, 2022

Choose a reason for hiding this comment

alanmcruickshank commented Feb 15, 2022

alanmcruickshank commented Feb 15, 2022