-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify staging models and inner join in staging. #85
Simplify staging models and inner join in staging. #85
Conversation
5123954
to
7c2b90d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great @alanmcruickshank, looks like there are one or two SQLFluff errors but will get this merged once they're resolved. Much DRYer than before.
|
||
) | ||
|
||
select * from surrogate_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels a little weird to combine all these in one model - but for the tradeoff in redundant columns it does massively reduce the overall amount of code so I'm in favour
Tests are passing now - but don't merge yet. Found an issue on my end. |
False alarm - it was an issue with my local test setup. All good to merge 👍 |
This reverts many of the changes from #75 and takes an alternate approach. It also supports #84.
Problem
Run results are all stored together, manifest items are not. To solve some of the race condition errors we need a consistent way of matching up elements from the results file and elements from the manifest. This means being able to apply operations to all manifest items together.
The disparate staging models for tests, models, seeds etc... make operations on all elements difficult. This also leads to more models than we need and much duplicated logic.
Aim
The aim of this PR is to unify all staging models for all manifest into a single staging table. Seeds, tests and models co-exist nicely, but this requires a little bit of effort for sources and exposures.
This can then be joined on the results file, early in the model tree to ensure we only look at results which have records in both the manifest and and run results file.
Alternate approaches
I'm pretty sure that having all the nodes in the same model at some point is the right route to take, but there are a few choices I've made which have other options:
node_executions
a left join so that elements which have one of a manifest or run result could come through if they don't have both. This would lead to better latency and maybe better flexibility - but it raises the risk of race conditions and I couldn't think of a good reason for using records which have one but not the other.