[CT-1934] [Feature] Avoid billed bytes on empty model resulting of limit 0 usage with time_ingestion_partitioning
option
#487
Labels
time_ingestion_partitioning
option
#487
Is this your first time submitting a feature request?
Describe the feature
As we've introduced
time_ingestion_partitioning
flag in 1.4.0, I figured out that usingINSERT
DML statement for that option appears not to work as expected.Let's consider following example:
Let's say that it's being billed 1 TB (because
a
table data is 1 TB)Then
would be detected as a 0 bytes billed request.
But if we do
Then we're going to be billed 1 TB just like the same query with
LIMIT 0
.It looks like it's a DML issue but it can be a problem for dbt user till it's fixed as it's the approach that is used by dbt-bigquery.
The use case for that LIMIT 0 is to do some "dry run" for smart slim CI.
It's a blocker to migrate some model that have large inputs as CI would be expensive as even for flat price usage, you'll still pay for some (useless) slot time.
Describe alternatives you've considered
As discussed with @jtcohen6, there are some ways to fix it:
limit 0
, it means that we would billed the output of the intermediate table as a result of splitting the query in 2.copy_partitions
optioncopy_partitions
most of the time and it would usebq copy
to copy from the staging table (that would use column type partitioning) to the time ingestion table, it would work well with a bit more latency (and still I'm not even sure as we would replace the DML INSERT by a bq copy which tend to be faster).Who will this benefit?
Any user that runs queries with limit 0 and don't want to waste resources
Are you interested in contributing this feature?
Yes, since I wrote the time_ingestion_partitioning & copy_partitions features, it would likely be more efficient that I work that one too
Anything else?
No response
The text was updated successfully, but these errors were encountered: