Support for ingestion time partition table on BigQuery as incremental materialization #136

Kayrnt · 2022-03-13T21:27:07Z

resolves #75

Description

Support for ingestion time partition table on BigQuery as incremental materialization

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-bigquery next" section.

github-christophe-oudar · 2022-03-27T20:22:35Z

dbt/include/bigquery/macros/materializations/incremental.sql

+    {{ log('table ' ~ table) }}
+    {%- set columns = result.columns -%}
+    {{ log('columns ' ~ columns) }}
+    {{ return(load_result('get_columns_with_types_in_query').table.columns | list) }}


Hello @jtcohen6 @McKnight-42, I could use some help over here:
To create a table such as

create or replace table `project`.`cou`.`test_ingestion_dbt` (`x` INT64)

Yet, as it goes through agate lib apparently, I'm getting

create or replace table `project`.`cou`.`test_ingestion_dbt` (`x` <dbt.clients.agate_helper.Number object at 0x104688ee0>)

That code is mostly the same as https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/adapters/columns.sql#L24-L34

I want a similar return to get_columns_in_relation from the adapter so that I get the real BQ type and not a Number which can't be used for creating the actual BQ table with correct columns.
My intent is to use the data_type here https://github.com/dbt-labs/dbt-bigquery/pull/136/files#diff-167e3557df7f18f1520c5db0045dfac9923e38a617c909d703be844192b28ebeR76

However, if I need to do so I'll have to store the "dry run request" in a table so that I can use get_columns_in_relation making it even longer/complex to run the command.

I honestly don't think it's required. Indeed if I run

WITH base AS ( select 1 as f1, 1.3 as f2, "test" as f3 ) select * from base where false limit 0

here is the result from https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/getQueryResults

{ "kind": "bigquery#getQueryResultsResponse", "etag": "XXX", "schema": { "fields": [ { "name": "f1", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "f2", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "f3", "type": "STRING", "mode": "NULLABLE" } ] }, "jobReference": { "projectId": "XXX", "jobId": "XXX", "location": "US" }, "totalRows": "0", "totalBytesProcessed": "0", "jobComplete": true, "cacheHit": false }

As you can see the type are well reported so the issue is with agate and the way the data parsed.

Do you think the original types are accessible somewhere?

My guess is that I have to create a custom version of load_result to access the returned schema, is it the best way to do so?

An unrelated downside with that approach (and it's maybe something to deal in another change) is that all those columns are going to be created as "NULLABLE" but we have the same issue with if the schema is inferred (in usual incremental case).

Except for that, I'm making progress but we're not there yet (once I have everything working, I'll still need to write tests).
I'm wasting a lot of time working logging fields for dev as I don't think I can work with a step by step debugger (in Python code and in Jinja), isn't it?

Thanks!

dataders · 2022-04-01T20:23:26Z

@Kayrnt you're very close here. correct me if I'm wrong, but my understanding is that you'd like to:

for a given relation (table/view), give me the column type for each column.

The way to do this is with column_type column of the table returned by the get_catalog() (bigquery__get_catalog()) macro.

Not only do I totally understand the approach you're taken, but I've also made the same mistake a few times. Your call statement('get_columns_with_types_in_query'.... construction was sound, and it will return an empty table with column names.

The problem is with load_result which takes the json returned by the call statement and converts it into an Agate table. When Agate is given data to turn into an Agate table, there's a specific way it guesses the type of each columnn provided by looking at the values in each column. Because there are no example values for each column, Agate defaults to assuming they are Number type. If you're interested, you can read more on Agate Data Types, Agate's type inference, and how dbt-core modifies the Agate's type inference and even adds a new type in agate_helper.py.

However, you are correct that directly querying BQ via an API will return the column types, however, AFAICT, there's nothing in dbt-core or dbt-biqquery that uses the metadata in every query response and passes the needed types to Agate before loading. Perhaps it could be though starting with BigQueryConnectionManager.get_table_from_response(). Agree or disagree: @VersusFacit @McKnight-42 @jtcohen6?

dbt/adapters/bigquery/impl.py

dataders · 2022-04-11T22:58:05Z

dbt/adapters/bigquery/connections.py

        response = BigQueryAdapterResponse(  # type: ignore[call-arg]
-            _message=message, rows_affected=num_rows, code=code, bytes_processed=bytes_processed
+            _message=message,
+            rows_affected=num_rows,
+            code=code,
+            bytes_processed=bytes_processed,
+            fields=fields,


looks like you're introducing non-black formatted code. any chance you want to take formatting imply.py with black, to make your actual changes easier to diff here?

Indeed I ran black after my changes but I definitely run it in a dedicated commit to push before mine.

dataders

I think you're almost there!

github-christophe-oudar · 2022-08-08T10:19:22Z

I've rebased the change to work with the python.
The code should be properly tested and tests appears to pass.
I'll have few tests locally to confirm it works as intended but you can rereview the code.

Kayrnt · 2022-08-29T10:27:11Z

What's the status on your end @jtcohen6 @McKnight-42? It would be sad to miss the 1.3 release 😉
@dataders I think you'll have to dismiss your review because it looks like I can't (and I think I made the change since then, but feel free to review 🙌 ). Thanks!

colin-rogers-dbt · 2022-08-29T16:14:29Z

dbt/adapters/bigquery/impl.py

@@ -69,6 +73,13 @@ def render(self, alias: Optional[str] = None):
        else:
            return f"{self.data_type}_trunc({column}, {self.granularity})"

+    def render_wrapped(self, alias: Optional[str] = None):
+        """Wrap the partitioning column when time involved to ensure it is properly casted to matching time."""
+        if self.data_type in ("date", "timestamp", "datetime"):


do we need to include "time" type here as well?

As far as I know, it's not possible to partition by "time", see https://cloud.google.com/bigquery/docs/partitioned-tables?hl=fr

ahh makes sense, thanks for the clarification!

McKnight-42 · 2022-08-29T16:50:04Z

@github-christophe-oudar shined a light on this one while testing locally as part of review looking good, hoping to get it merged soon!

github-christophe-oudar · 2022-08-29T17:03:33Z

It looks like the stars are about to align then! ⭐ ⭐ ⭐
After that I'll rebase #167
Let's be ambitious and try to merge both in September 🙌

… tables

github-christophe-oudar · 2022-09-08T14:45:50Z

👋
Any chance we merge over the next weeks?
It will become a blocker on my end, so I'm wondering if I need to fork for my organisation.
Thanks 🙇

colin-rogers-dbt

LGTM

Outdated

colin-rogers-dbt · 2022-10-20T15:47:29Z

@cla-bot check

cla-bot · 2022-10-20T15:47:35Z

The cla-bot has been summoned, and re-checked this pull request!

cla-bot bot added the cla:yes label Mar 13, 2022

github-christophe-oudar mentioned this pull request Mar 13, 2022

Support for ingestion time partition table on BigQuery as incremental materialization #75

Closed

jtcohen6 mentioned this pull request Mar 22, 2022

Remove ingestion-time partitioned tables in BigQuery and all associated logic #20

Closed

Kayrnt force-pushed the incremental-ingestion-time branch from 7e6d317 to 93caabc Compare March 27, 2022 20:06

github-christophe-oudar reviewed Mar 27, 2022

View reviewed changes

Kayrnt force-pushed the incremental-ingestion-time branch from 93caabc to 9e1c480 Compare March 29, 2022 16:27

Kayrnt force-pushed the incremental-ingestion-time branch 10 times, most recently from e7dfb33 to a8b32a2 Compare April 10, 2022 19:44

Kayrnt force-pushed the incremental-ingestion-time branch 2 times, most recently from f8e8739 to b8b0d3a Compare April 11, 2022 10:18

Kayrnt marked this pull request as ready for review April 11, 2022 10:31

Kayrnt force-pushed the incremental-ingestion-time branch 2 times, most recently from 1b1783c to 926e1dc Compare April 11, 2022 22:27

Kayrnt marked this pull request as draft April 11, 2022 22:35

dataders reviewed Apr 11, 2022

View reviewed changes

dbt/adapters/bigquery/impl.py Outdated Show resolved Hide resolved

dataders reviewed Apr 11, 2022

View reviewed changes

dataders previously requested changes Apr 11, 2022

View reviewed changes

Kayrnt force-pushed the incremental-ingestion-time branch from 926e1dc to 3314332 Compare April 12, 2022 06:48

jtcohen6 added the ok to test label Apr 12, 2022

Kayrnt force-pushed the incremental-ingestion-time branch 2 times, most recently from 88d550b to 42d527d Compare April 13, 2022 16:46

Kayrnt force-pushed the incremental-ingestion-time branch from f532c54 to b0a97c5 Compare August 8, 2022 15:44

McKnight-42 self-assigned this Aug 9, 2022

McKnight-42 requested a review from VersusFacit August 11, 2022 16:08

Kayrnt force-pushed the incremental-ingestion-time branch from 59a5354 to c605b5c Compare August 29, 2022 10:23

McKnight-42 requested a review from colin-rogers-dbt August 29, 2022 14:16

colin-rogers-dbt reviewed Aug 29, 2022

View reviewed changes

McKnight-42 requested a review from colin-rogers-dbt August 29, 2022 16:49

Kayrnt added 3 commits September 8, 2022 11:06

Support for incremental materialization with ingestion time partition…

492df78

… tables

Refactor incremental materialization for readibility

d562d69

add changelog entry

ad072c1

Kayrnt force-pushed the incremental-ingestion-time branch from c605b5c to ad072c1 Compare September 8, 2022 09:06

Merge branch 'main' into incremental-ingestion-time

7fa4258

Fleid mentioned this pull request Sep 28, 2022

[CT-1272] [PR Review] Support for ingestion time partition table on BigQuery as incremental materialization #327

Closed

colin-rogers-dbt added 2 commits October 6, 2022 12:17

Merge branch 'main' into incremental-ingestion-time

257f318

Merge branch 'main' into incremental-ingestion-time

0c5673a

colin-rogers-dbt approved these changes Oct 20, 2022

View reviewed changes

colin-rogers-dbt removed the cla:yes label Oct 20, 2022

cla-bot bot added the cla:yes label Oct 20, 2022

colin-rogers-dbt merged commit ed2b051 into dbt-labs:main Oct 20, 2022

jtcohen6 mentioned this pull request Nov 21, 2022

Update incremental modeling docs for BigQuery (ingestion-based partitioning + insert_overwrite) dbt-labs/docs.getdbt.com#2426

Closed

1 task

dbeatty10 mentioned this pull request Dec 23, 2022

[CT-1725] [CT-1722] [CT-1721] [Bug] 1.4.0-b1 Bigquery Incremental Query breaks when ending in comment dbt-labs/dbt-core#6485

Closed

2 tasks

jtcohen6 mentioned this pull request Jan 14, 2023

Convert incremental on_schema_change tests #469

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for ingestion time partition table on BigQuery as incremental materialization #136

Support for ingestion time partition table on BigQuery as incremental materialization #136

Kayrnt commented Mar 13, 2022 •

edited by dbeatty10

Loading

github-christophe-oudar Mar 27, 2022 •

edited

Loading

dataders commented Apr 1, 2022

dataders Apr 11, 2022

github-christophe-oudar Apr 11, 2022

dataders left a comment

github-christophe-oudar commented Aug 8, 2022

Kayrnt commented Aug 29, 2022

colin-rogers-dbt Aug 29, 2022

github-christophe-oudar Aug 29, 2022

colin-rogers-dbt Aug 29, 2022

McKnight-42 commented Aug 29, 2022

github-christophe-oudar commented Aug 29, 2022

github-christophe-oudar commented Sep 8, 2022

colin-rogers-dbt left a comment

colin-rogers-dbt commented Oct 20, 2022

cla-bot bot commented Oct 20, 2022

Support for ingestion time partition table on BigQuery as incremental materialization #136

Support for ingestion time partition table on BigQuery as incremental materialization #136

Conversation

Kayrnt commented Mar 13, 2022 • edited by dbeatty10 Loading

Description

Checklist

github-christophe-oudar Mar 27, 2022 • edited Loading

Choose a reason for hiding this comment

dataders commented Apr 1, 2022

dataders Apr 11, 2022

Choose a reason for hiding this comment

github-christophe-oudar Apr 11, 2022

Choose a reason for hiding this comment

dataders left a comment

Choose a reason for hiding this comment

github-christophe-oudar commented Aug 8, 2022

Kayrnt commented Aug 29, 2022

colin-rogers-dbt Aug 29, 2022

Choose a reason for hiding this comment

github-christophe-oudar Aug 29, 2022

Choose a reason for hiding this comment

colin-rogers-dbt Aug 29, 2022

Choose a reason for hiding this comment

McKnight-42 commented Aug 29, 2022

github-christophe-oudar commented Aug 29, 2022

github-christophe-oudar commented Sep 8, 2022

colin-rogers-dbt left a comment

Choose a reason for hiding this comment

colin-rogers-dbt commented Oct 20, 2022

cla-bot bot commented Oct 20, 2022

Kayrnt commented Mar 13, 2022 •

edited by dbeatty10

Loading

github-christophe-oudar Mar 27, 2022 •

edited

Loading