[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with `time_ingestion_partitioning` option #487

github-christophe-oudar · 2023-01-27T21:19:50Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As we've introduced time_ingestion_partitioning flag in 1.4.0, I figured out that using INSERT DML statement for that option appears not to work as expected.

Let's consider following example:

SELECT *
FROM a

Let's say that it's being billed 1 TB (because a table data is 1 TB)
Then

Select *
FROM a
LIMIT 0

would be detected as a 0 bytes billed request.
But if we do

INSERT INTO b (...)
SELECT * 
FROM a
LIMIT 0

Then we're going to be billed 1 TB just like the same query with LIMIT 0.
It looks like it's a DML issue but it can be a problem for dbt user till it's fixed as it's the approach that is used by dbt-bigquery.

The use case for that LIMIT 0 is to do some "dry run" for smart slim CI.
It's a blocker to migrate some model that have large inputs as CI would be expensive as even for flat price usage, you'll still pay for some (useless) slot time.

Describe alternatives you've considered

As discussed with @jtcohen6, there are some ways to fix it:

Use a staging table
- It can be a great workaround but it creates another temp table to clean up, it creates a bit more latency as it's adding yet another statement to be executed and finally, without the limit 0, it means that we would billed the output of the intermediate table as a result of splitting the query in 2.
Use a staging copy only for copy_partitions option
- As I would suggest to use copy_partitions most of the time and it would use bq copy to copy from the staging table (that would use column type partitioning) to the time ingestion table, it would work well with a bit more latency (and still I'm not even sure as we would replace the DML INSERT by a bq copy which tend to be faster).

Who will this benefit?

Any user that runs queries with limit 0 and don't want to waste resources

Are you interested in contributing this feature?

Yes, since I wrote the time_ingestion_partitioning & copy_partitions features, it would likely be more efficient that I work that one too

Anything else?

No response

The text was updated successfully, but these errors were encountered:

github-christophe-oudar · 2023-04-28T09:41:36Z

Here a workaround I'm using:

I'm writing my main model in a model my_model_ephemeral

{{
    config(
        materialized=time_partition_source_workaround('ephemeral'),
    )
}}
SELECT ... from `prod_table` LIMIT 0 (only in CI)

And then I have a second model my_model:

SELECT * from {{ ref('my_model_ephemeral') }}

The macro is the following:

{% macro time_partition_source_workaround(original_materialization) -%}
{% if target.name == 'ci' -%}
table
{%- else -%}
{{ original_materialization }}
{%- endif -%}
{%- endmacro %}

In practice, it works like this:
In CI, I output the data as "empty" to a table (like a temp table), the table materialization works as expected, the query isn't billing/reading data. Then the insert of this empty table in the time ingestion partitioned table works as intended.
In production, it's just used a CTE in a single query (like it would be intended in the first place).

github-christophe-oudar · 2023-08-21T10:04:53Z

Apparently it was fixed in BQ so we can close that issue

github-christophe-oudar · 2023-08-24T10:00:16Z

Apparently it was fixed in BQ so we can close that issue

Well it's not.

github-christophe-oudar added enhancement New feature or request triage labels Jan 27, 2023

github-actions bot changed the title ~~[Feature] Avoid billed bytes on empty model resulting of limit 0 usage with time_ingestion_partitioning option~~ [CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with time_ingestion_partitioning option Jan 27, 2023

Fleid added help_wanted Extra attention is needed and removed triage labels Feb 14, 2023

github-christophe-oudar closed this as completed Aug 21, 2023

github-christophe-oudar reopened this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with `time_ingestion_partitioning` option #487

[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with `time_ingestion_partitioning` option #487

github-christophe-oudar commented Jan 27, 2023 •

edited

Loading

github-christophe-oudar commented Apr 28, 2023

github-christophe-oudar commented Aug 21, 2023

github-christophe-oudar commented Aug 24, 2023

[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with time_ingestion_partitioning option #487

[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with time_ingestion_partitioning option #487

Comments

github-christophe-oudar commented Jan 27, 2023 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

github-christophe-oudar commented Apr 28, 2023

github-christophe-oudar commented Aug 21, 2023

github-christophe-oudar commented Aug 24, 2023

[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with `time_ingestion_partitioning` option #487

[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with `time_ingestion_partitioning` option #487

github-christophe-oudar commented Jan 27, 2023 •

edited

Loading