Skip to content

Add Bigquery Partition and Atomic Insert Overwrite Support#500

Merged
eakmanrq merged 6 commits intomainfrom
eakmanrq/improve_bq_imp
Mar 7, 2023
Merged

Add Bigquery Partition and Atomic Insert Overwrite Support#500
eakmanrq merged 6 commits intomainfrom
eakmanrq/improve_bq_imp

Conversation

@eakmanrq
Copy link
Contributor

@eakmanrq eakmanrq commented Mar 7, 2023

Some Bigquery specific context:

  • Bigquery partitioning is limited and just supports dates/timestamps and integers. If using date/time it will assume you are partitioning by DAY buckets but you can also do HOUR (or month/year but that is not relevant to us). To solve this I use the cadence interval unit to determine the partitioning scheme to use. The current implementation doesn't support integers so I assume you are using dates/timestamps.
  • BigQuery offers the same runtime characteristics as INSERT OVERWRITE in Spark with MERGE with a FALSE condition and a delete expression on the filter clause. This is now what we use.

This PR also updates the base merge implementation to work with the latest SQLGlot.

Resolves:

@eakmanrq eakmanrq requested a review from a team March 7, 2023 21:56
@eakmanrq eakmanrq force-pushed the eakmanrq/improve_bq_imp branch from 165633a to 0e21c79 Compare March 7, 2023 21:57
@eakmanrq eakmanrq requested a review from tobymao March 7, 2023 22:30
@eakmanrq eakmanrq force-pushed the eakmanrq/improve_bq_imp branch from 6ca6817 to 75a2161 Compare March 7, 2023 22:53
@eakmanrq eakmanrq merged commit 1b02516 into main Mar 7, 2023
@eakmanrq eakmanrq deleted the eakmanrq/improve_bq_imp branch March 7, 2023 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants