-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2068] [Feature] Let's add Materialized View as a materialization, finally #6911
[CT-2068] [Feature] Let's add Materialized View as a materialization, finally #6911
Comments
I agree with this approach of not making auto-refresh the default. It will help users be more mindful of managing to compute costs (i.e making sure their development environments are running with intent). While I don't find using MVs for analytics compelling without auto-refresh in prod, a user interacting with an MV in dbt will be primarily doing it during development where having clear levers will be more helpful. |
Materialized views in Materialize work under different assumptions to other databases (i.e. native incremental updates, no need for scheduled refreshes), so at first glance it doesn't look like we could inherit from On our end, the materialization looks like: Configurations
Lifecycle |
We should talk ;)
I like that. |
Updated with a new |
Updated following internal review:
|
Updated to remove |
Wondering how this can be closed as completed when the dbt-bigquery part is still in draft pr? dbt-labs/dbt-bigquery#639 |
Incorrectly closed due to automation. |
Yay for automation! |
on auto_refresh and backup: is backup taken care of in core or do we need any new adapter work? Is auto_refresh a signal from core that the adapters should implement as appropriate? |
I imagine Databricks has its own platform specific settings, and those would need to be added into I'm working with @dataders to put something together for you that should be much more helpful than this response, but hopefully that answers your immediate question. |
The only outstanding work for this feature addresses the nuances around automatic refreshes on materialized views that are already scheduled to refresh. This functionality already has associated issues in each adapter's respective repository. With that context, I'm closing this as complete. |
materialized: materialized_view
Context : The time has come, let's do this.
I'll elaborate on why we're doing this now and not before in a public post in the near future. The short version is that a new generation of Materialized Views (MVs) is emerging, powered by streaming technologies, and we want to support them. This also allows dbt to retroactively support some of the current implementations in a cleaner way. Win, win, win.
This specific issue is about building the foundation. It is about re-purposing the work already done in the experimental package for a default materialization, Postgres, Redshift and BigQuery implementations, and make it official.
Please note that Snowflake is not in the list above. For
dbt-snowflake
, we'll skip their current MVs and support directly Dynamic Tables when they're ready (we already have an issue for it!).A new materialization
So yes, we need a new materialization with a default implementation in dbt core, and 3 dispatch implementations for Postgres, Redshift and BigQuery.
I like the ergonomics designed in the experimental package...
I see no reason for us to change them for now (well... see the alternative option below).I re-used them but addedon_definition_change
(now split intoon_configuration_change
andon_schema_change
for easier onboarding from incremental models) to offer a similar behavior to what can be found withon_schema_change
for incremental models. Here it's not only the schema that we may need to change, but most importantly the configuration options like auto_refresh mode, or the refresh_interval_minutes. These options can be modified via an ALTER statement that doesn't require the full state of the MV to be recomputed (not that it's limited to that, but it allows those too).on_configuration_change
to mimic the behavior ofon_schema_change
from incremental models, focused on configs. A normal run of dbt on MVs should only be about "moving data", and not deploying code changes. So either refresh, or do nothing. We can still try to pass along configuration changes (indexes, labels, warehouses...) because there's a chance we can do that without triggering a full refresh of the view (which is what we are trying to avoid).We also want to provide that capability for the logic of the view (SQL statement). But at this point, we can't do it cleanly. We can't compare the logic that is deployed to the one we have in the model (I dread comparing SQL statements as strings), and no platform in scope supports
CREATE OR REPLACE
. So we can't be clever, and we can't be brutal expecting the platform to be clever.We also can't limit the scope to
on_schema_change
because in MVs there is only room for 1 logic to be applied at a point in time, contrary to incremental models where the previous logic can live with the new one. We will follow up on that later.Default implementation
Note how the current expectation is that base MVs require a refresh.
dbt run
) :OR exists and full-refresh)on_configuration_change
: we can't use ignore here, as on_schema_change=ignore for incremental models has set weird expectations on the word. Let's use skip instead for now.ignore
skip
: Do nothing, aka stage changes until the next full refresh. (Documentation follow-up: this can create uncertainty where there is a delta between the model definition and what's in the database - same as on_schema_change=ignore for incremental models)fail
: Fail the runapply
: [default] Apply the changes (implementation detail: let's try to favor commands that do not trigger full recomputes (ALTER, CREATE OR REPLACE))dbt run --full-refresh
) :OR exists and full-refresh)Described more simply, for a MV already created:
dbt run --full-refresh
: we drop and createdbt run
: we refresh (or do nothing if auto_refresh)on_configuration_change
to something other thatskip
, which is the default, then we check the config in database to the one in the model, and if there are diffs, weapply
orfail
apply
, in order we should try to - this choice need to be made for each configuration (some will support ALTER, some not)Postgres
Postgres : Materialized Views
on_configuration_change
.apply
: indexes via DROP/CREATE INDEXESRedshift
Redshift Materialized Views
on_configuration_change
.apply
: auto_refresh via ALTER, rest via DROP/CREATEBigQuery
BigQuery Materialized Views
on_configuration_change
.apply
: enable_refresh/refresh_interval_minutes/max_staleness via ALTER, rest via DROP/CREATEBackward compatibility
People using
materialized: materialized_view
from the experimental package in dbt core 1.4 and before, should not be broken by the addition of a default and adapter levelmaterialized_view
materializations. The chain of precedence in macros will preserve existing behavior. The package will need to be uninstalled to benefit from the new implementation.Additional details, not required for implementation
Risks
Most users interested by this topic are already using the experimental package. We need to make sure that this implementation won't break them when we roll it out. I'm guessing the chain of inheritance should work in our favor here, but we need to be sure.
Same thing for community/vendor supported adapters already implementing a version of MVs (Materialize for example).
Information collection will be tricky across the board. MVs are usually not accessible via the default information schemas / catalogues
Cascading drops in full-refresh are tricky. If an upstream table used in a MV is dropped/altered, the MV may end up in a funky state. See comments in the experimental packages. We may need to revisit some of the behaviors or re-calibrate expectations.
Alternative/Options
default auto_refresh = true
I'm seriously wondering if the default ergonomics should not be more aggressively geared towards auto refreshing:[Edit following comments below] Not a good idea
true
Who will this benefit?
A lot of people:
Anything else?
Historical perspective on the topic : https://www.youtube.com/watch?v=awj-5aYXZnc
The text was updated successfully, but these errors were encountered: