Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify staging models and inner join in staging. #85

Merged

Conversation

alanmcruickshank
Copy link
Contributor

This reverts many of the changes from #75 and takes an alternate approach. It also supports #84.

Problem

Run results are all stored together, manifest items are not. To solve some of the race condition errors we need a consistent way of matching up elements from the results file and elements from the manifest. This means being able to apply operations to all manifest items together.

The disparate staging models for tests, models, seeds etc... make operations on all elements difficult. This also leads to more models than we need and much duplicated logic.

Aim

The aim of this PR is to unify all staging models for all manifest into a single staging table. Seeds, tests and models co-exist nicely, but this requires a little bit of effort for sources and exposures.

This can then be joined on the results file, early in the model tree to ensure we only look at results which have records in both the manifest and and run results file.

Alternate approaches

I'm pretty sure that having all the nodes in the same model at some point is the right route to take, but there are a few choices I've made which have other options:

  • We could do the UNION in a non staging model. I considered this but I think it just adds model bloat without a good upside.
  • We could make the join between manifest and run results in node_executions a left join so that elements which have one of a manifest or run result could come through if they don't have both. This would lead to better latency and maybe better flexibility - but it raises the risk of race conditions and I couldn't think of a good reason for using records which have one but not the other.

@alanmcruickshank alanmcruickshank temporarily deployed to Approve Integration Tests February 13, 2022 20:31 Inactive
@alanmcruickshank alanmcruickshank temporarily deployed to Approve Integration Tests February 14, 2022 11:14 Inactive
@alanmcruickshank alanmcruickshank temporarily deployed to Approve Integration Tests February 15, 2022 14:05 Inactive
@alanmcruickshank alanmcruickshank temporarily deployed to Approve Integration Tests February 15, 2022 14:05 Inactive
Copy link
Contributor

@NiallRees NiallRees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great @alanmcruickshank, looks like there are one or two SQLFluff errors but will get this merged once they're resolved. Much DRYer than before.


)

select * from surrogate_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a little weird to combine all these in one model - but for the tradeoff in redundant columns it does massively reduce the overall amount of code so I'm in favour

@alanmcruickshank alanmcruickshank temporarily deployed to Approve Integration Tests February 15, 2022 19:23 Inactive
@alanmcruickshank alanmcruickshank temporarily deployed to Approve Integration Tests February 15, 2022 19:23 Inactive
@alanmcruickshank
Copy link
Contributor Author

Tests are passing now - but don't merge yet. Found an issue on my end.

@alanmcruickshank
Copy link
Contributor Author

False alarm - it was an issue with my local test setup. All good to merge 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants