Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow _dbt_max_partition to be disabled #17

Closed
hui-zheng opened this issue Mar 31, 2020 · 4 comments · Fixed by #41
Closed

allow _dbt_max_partition to be disabled #17

hui-zheng opened this issue Mar 31, 2020 · 4 comments · Fixed by #41
Labels
enhancement New feature or request

Comments

@hui-zheng
Copy link
Contributor

Describe the feature

In dbt v0.16, dbt run will make query calls for _dbt_max_partition on all BigQuery partitioned incremental models.

In our situation, we have many very large partitioned incremental models that have no use of _dbt_max_partition. However, they still have to query _dbt_max_partition. It is very costly when the runs in production add up.

Could we disable _dbt_max_partition for models that have no use of it?

Describe alternatives you've considered

To allow the model config to disable _dbt_max_partition for a particular model.

for example,

{{ config(
    materialized='table',
    partition_by={
      "field": "created_at",
      "data_type": "timestamp",
      "enable_dbt_max_partition": false
    }
)}}

Additional context

This is specific to BigQuery database, and it's part of the new merge feature from dbt v0.16

Who will this benefit?

Everyone who own very large BigQuery tables for incremental runs but not using _dbt_max_partition

@yu-iskw
Copy link
Contributor

yu-iskw commented Feb 8, 2021

I am for the suggestion. Calculating _dbt_max_partition is very expensive on.a very large BigQuery table. However, we don't always use _dbt_max_partition in an incremental model. In addition, I think the implicit feature is a little difficult to dbt beginners. If we need to calculate something like _dbt_max_partition, it would be good to explicitly declare the variable with config.pre_hook.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 8, 2021

I agree we should come out with a resolution for this in the next version of dbt. It feels relevant to ongoing and recent work around:

Here's a comment I left over on dbt-labs/dbt-core#2976, but I think this issue is the better place for discussing it:

Perhaps we should add support for a new config that would override the default value used within dbt today. So, for instance:

  • If the user sets the config to None, we'd skip setting the _dbt_max_partition var entirely, and save the cost of the standardSQL query
  • If the user wishes, they could set the config using adapter.get_partitions_metadata(target_relation) to get a $0 dynamic metadata lookup (though using legacySQL). We could offer a special config value or recommended code snippet for this.
  • If the user wanted to set the max partition dynamically based on an arbitrary query, even a ref to a separate model, they could. It would be executed at model runtime, as part of the BQ script.

What's not obvious to me is whether that config should be:

  • exclusive to handling _dbt_max_partition
  • generic handling for declare/set scripting variables
  • rolled into a default value of set_sql_header, which is the current way folks are using BQ scripting with dbt models

@pcasteran
Copy link
Contributor

pcasteran commented Feb 16, 2021

I like the idea of using the set_sql_header macro to add the _dbt_max_partition variable to the query context.

It would be even better to have it disabled by default and explicitly enabled by the model creator only when required (via a code snippet as proposed). But this seems not possible for backward compatibility reasons as the current default behavior is to compute the _dbt_max_partition variable.
So I guess the best option is to have the proposed flag allowing to disable the computation of this variable.

@jtcohen6 jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021
@jtcohen6 jtcohen6 added the enhancement New feature or request label Oct 12, 2021
@jtcohen6
Copy link
Contributor

Really helpful comment by @bcolbert978 about another way we could go about this: dbt-labs/dbt-core#3666 (comment)

I'm starting to think, _dbt_max_partition should just be a macro! And users can override/reimplement/disable that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants