-
Notifications
You must be signed in to change notification settings - Fork 418
Description
Feature description
dlt has a biqquery_adaptor that accepts a partition column name, this is rather limiting, big query supports different partition types (i.e. timestamp | date | datetime | int64) at different granularities (i.e. hour | day | month | year) or integer ranges (i.e. start + end + interval), along with options to raise an exception if a query doesn't filter by the partition key, and/or to expire partitions after n days etc.
Are you a dlt user?
Yes, I run dlt in production.
Use case
This is important for cost management, I've observed a large step change in BigQuery costs using incremental merge replication on large tables - using proper partitioning rectifies this but for all but in most cases I've had to manually create the destinations tables in order to properly partition.
Proposed solution
One option is to change the type of the partition argument to a dict similar to dbt (https://docs.getdbt.com/reference/resource-configs/bigquery-configs) rather than a column name and generate the PARTITION BY clause
The other option is to leave partition as a str, but accept the BigQuery syntax literally ie.
DATE_TRUNC(transaction_date, MONTH)
OPTIONS (
partition_expiration_days = 3,
require_partition_filter = TRUE)
see https://cloud.google.com/bigquery/docs/creating-partitioned-tables
Related issues
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status