Skip to content

Add better support for BigQuery partitions #2516

@waterworthd-cim

Description

@waterworthd-cim

Feature description

dlt has a biqquery_adaptor that accepts a partition column name, this is rather limiting, big query supports different partition types (i.e. timestamp | date | datetime | int64) at different granularities (i.e. hour | day | month | year) or integer ranges (i.e. start + end + interval), along with options to raise an exception if a query doesn't filter by the partition key, and/or to expire partitions after n days etc.

Are you a dlt user?

Yes, I run dlt in production.

Use case

This is important for cost management, I've observed a large step change in BigQuery costs using incremental merge replication on large tables - using proper partitioning rectifies this but for all but in most cases I've had to manually create the destinations tables in order to properly partition.

Proposed solution

One option is to change the type of the partition argument to a dict similar to dbt (https://docs.getdbt.com/reference/resource-configs/bigquery-configs) rather than a column name and generate the PARTITION BY clause

The other option is to leave partition as a str, but accept the BigQuery syntax literally ie.

DATE_TRUNC(transaction_date, MONTH)
  OPTIONS (
    partition_expiration_days = 3,
    require_partition_filter = TRUE)

see https://cloud.google.com/bigquery/docs/creating-partitioned-tables

Related issues

No response

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions