-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1521] [Bug] The static partitions configuration for incremental materializations doesn't work if it is a list that is declared via a statement block #6278
Comments
@achilleas-stachtiaris Thanks for the thorough write-up! This is a general principle within dbt: All node definitions, dependencies, and configuration need to be resolved at parse time, before dbt has made any connections to the database. If we don't do a good job of stating that in the docs on configs today, we should! It's what enables dbt to statically and reliably understand the shape of the DAG, without database state as an input, and to support I appreciate that this does limit certain use cases. A similar one that comes to mind: before the addition of
That doesn't work because the We do enable a small subset of configurations to accept macros, which are then "late-rendered" at runtime, namely To support this category of use cases, we'd need a more-general rule about what dbt can expect to know when. We've talked about "decorators" for macros that would indicate whether they're:
That's a much bigger idea, and worth further discussion. In the meantime, I'm going to close this one, as a known limitation of dbt configuration, and outside the scope of the built-in feature at play here ("static" |
@jtcohen6 Right back at you, thanks a lot for the prompt and informative reply! 😃 This makes sense, I do admit that the ability to declare Macros for hooks in the Using your advise and with a bit more research, I slightly repurposed the
Where That way I can "dynamically parameterise" the "static" input, which enables incremental Works well, so great outcome 🙌 I love the idea you mentioned regarding Macro decorators, will try and see if I can contribute somehow 😃 Thanks! 🎉 |
Amazing! Glad you were able to get this working for your use case. I opened up a new discussion for the Big Idea around clearly defining macro that are "pure" versus ones with dynamic state / side effects: #6280 |
Is this a new bug in dbt-core?
Current Behavior
Referencing the
incremental
materialization documentation for Static Partitions, thepartitions
configuration accepts alist
, which is then iterated by the macrobq_copy_partitions()
, and evaluated by the macrobq_insert_overwrite_sql
in order to invokebq_static_insert_overwrite_sql
ifpartitions is not none and partitions != []
.When defining this
list
in a Macro and then supplying it to thepartitions
configuration of a model, it only invokes thebq_static_insert_overwrite_sql()
if thelist
is declared using in-line text within the{% set %}
clause. If astatement_block
is used to construct the list (for example,dbt_utils.get_column_values()
which is iterable),bq_dynamic_insert_overwrite_sql ()
is invoked instead because thepartitions
list
is omitted 🤔 . And this happens even if thelist
is evaluated identically with one that is defined using in-line text inside a{% set %}
clause.See the following code / dummy data as examples:
Macro to construct the
partitions
list
If the
use_statement_block
is True, the Macro will usedbt_utils.get_column_values()
to construct alist
. Otherwise, the Macro will use a list defined in-line in text via{% set %}
Model defining one column to be queried in a statement block to construct a
partitions
list
An incremental model
You will notice a few Jinja {{ }} evaluators, we use these to compare the two
lists
which contain ourpartitions
Remember that
bq_insert_overwrite_sql
uses this evaluatorpartitions is not none and partitions != []
to determine whether to invoke thestatic
ordynamic
statement - this is very important ❗Also,
{{ partitions | join (', ') }}
is used bybq_static_insert_overwrite_sql()
Expected Behavior
By default, in the above example of an
incremental
model, we set the Macro touse_statement_block=False
, which will use thelist
as per the in-line definition of{% set %}
, and that works as expectedly 👍dbt run --select partitions
dbt run --select incremental_model --full-refresh
dbt run --select incremental_model
In
target/run/../partitions.sql
we can see the correctly compiled statement against BigQuery, which uses thepartitions
configuration to determine which partitions to replace.I expected the same to occur when we set the Macro to
use_statement_block=True
, as the Jinja evaluators which we placed in SQL comments demonstrate that thelist
objects in memory are evaluated in the same way. What would be the reason for dbt to not accept thepartitions
configuration? 🤔🐛 The actual bug
When we do set
use_statement_block=True
, thepartitions
configuration gets omitted, and thebq_dynamic_insert_overwrite_sql ()
is invoked as a result. This means that dbt does not accept thelist
as a validconfig
variable, and will again recompute its own partitions merge strategy rather than using thepartitions
list
supplied in the configuration.❗ Looking at the source, I cannot understand why this is happening, and believe it is a bug.
I've looked through the incremental materialization and the insert_overwrite strategy source
Thanks a lot for taking the time to read through this! 😃
Steps To Reproduce
Given the code I shared above, you can reproduce this by:
dbt run --select partitions
dbt run --select incremental_model --full-refresh
dbt run --select incremental_model
- withuse_statement_block=False
which will behave as expecteddbt run --select incremental_model
- withuse_statement_block=True
which will behave unexpectedlyRelevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
No response
The text was updated successfully, but these errors were encountered: