Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-874] [Bug] Columns are unquoted with nested data (incremental + time_ingestion_partitioning) #913

Closed
2 tasks done
jeremyyeo opened this issue Sep 7, 2023 · 2 comments · Fixed by #919
Closed
2 tasks done
Labels
bug Something isn't working incremental partitioning Related to creating, replacing, or pruning partitions to avoid full table scans

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Sep 7, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

It looks like the DDL that creates the target incremental table does not quote all the columns/fields in a stuct by default - this causes columns/fields that use reserve keywords (i.e. group) to fail.

Expected Behavior

We should generate a statement where all columns are quoted.

Steps To Reproduce

  1. Create a some raw table with a struct with reserved keyword field:
create or replace table dbt_jyeo.raw_data as
select current_timestamp as created_at, 1 id, struct(2 as `group`, 'b' as c) as meta
;
  1. Setup dbt project:
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: 1.0
models:
  my_dbt_project:
    +materialized: view

# models/sources.yml
version: 2
sources:
  - name: dbt_jyeo
    tables:
      - name: raw_data
-- models/test_struct_inc.sql
{{ 
    config(
        materialized = 'incremental',
        incremental_strategy = 'insert_overwrite',
        partition_by = {
            "field": "_partitiontime",
            "data_type": "timestamp",
            "time_ingestion_partitioning": true
        }
    )
}}

select [struct(a.meta.`group` as `group`, 'foo' as v)] as some_data, date(created_at) as _partitiontime from {{ source('dbt_jyeo', 'raw_data') }} as a
  1. Build our model:
$ dbt run -s test_struct_inc --full-refresh
03:26:35  Running with dbt=1.6.0
03:26:36  Registered adapter: bigquery=1.6.4
03:26:39  Found 1 model, 1 source, 0 exposures, 0 metrics, 933 macros, 0 groups, 0 semantic models
03:26:39  
03:27:46  Concurrency: 1 threads (target='bq')
03:27:46  
03:27:46  1 of 1 START sql incremental model dbt_jyeo.test_struct_inc .................... [RUN]
03:27:52  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:cef5b81c-4286-495a-b650-4632f81c26ff&page=queryresults
03:27:52  1 of 1 ERROR creating sql incremental model dbt_jyeo.test_struct_inc ........... [ERROR in 5.21s]
03:27:52  
03:27:52  Finished running 1 incremental model in 0 hours 1 minutes and 12.34 seconds (72.34s).
03:27:52  
03:27:52  Completed with 1 error and 0 warnings:
03:27:52  
03:27:52  Database Error in model test_struct_inc (models/test_struct_inc.sql)
03:27:52    Syntax error: Unexpected keyword GROUP at [7:104]
03:27:52  
03:27:52  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Relevant log output

============================== 15:26:35.477181 | a12e9dcc-a2c3-454e-9895-4723dd460b0d ==============================
�[0m15:26:35.477181 [info ] [MainThread]: Running with dbt=1.6.0
�[0m15:26:35.482767 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'False', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'fail_fast': 'False', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'debug': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'invocation_command': 'dbt run -s test_struct_inc --target bq --full-refresh', 'introspect': 'True', 'target_path': 'None', 'log_format': 'default', 'send_anonymous_usage_stats': 'True'}
�[0m15:26:36.450148 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11190efa0>]}
�[0m15:26:36.467150 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x111935850>]}
�[0m15:26:36.469007 [info ] [MainThread]: Registered adapter: bigquery=1.6.4
�[0m15:26:36.604614 [debug] [MainThread]: checksum: 7cc018d4f948065a61996e8bb1e3bfcb0d7a0b5c8765a71b01f80eb85159b7fc, vars: {}, profile: , target: bq, version: 1.6.0
�[0m15:26:36.606596 [debug] [MainThread]: Partial parsing not enabled
�[0m15:26:39.833138 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x111d380d0>]}
�[0m15:26:39.858685 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11197e220>]}
�[0m15:26:39.859468 [info ] [MainThread]: Found 1 model, 1 source, 0 exposures, 0 metrics, 933 macros, 0 groups, 0 semantic models
�[0m15:26:39.860068 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11197e1c0>]}
�[0m15:26:39.861890 [info ] [MainThread]: 
�[0m15:26:39.863083 [debug] [MainThread]: Acquiring new bigquery connection 'master'
�[0m15:26:39.864780 [debug] [ThreadPool]: Acquiring new bigquery connection 'list_cse-sandbox-319708'
�[0m15:26:39.865502 [debug] [ThreadPool]: Opening a new connection, currently in state init
�[0m15:26:41.910950 [debug] [ThreadPool]: Re-using an available connection from the pool (formerly list_cse-sandbox-319708, now list_cse-sandbox-319708_dbt_jyeo)
�[0m15:26:41.912455 [debug] [ThreadPool]: Opening a new connection, currently in state closed
�[0m15:27:46.979151 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x111cae520>]}
�[0m15:27:46.981688 [info ] [MainThread]: Concurrency: 1 threads (target='bq')
�[0m15:27:46.982408 [info ] [MainThread]: 
�[0m15:27:46.985853 [debug] [Thread-1  ]: Began running node model.my_dbt_project.test_struct_inc
�[0m15:27:46.986997 [info ] [Thread-1  ]: 1 of 1 START sql incremental model dbt_jyeo.test_struct_inc .................... [RUN]
�[0m15:27:46.988146 [debug] [Thread-1  ]: Re-using an available connection from the pool (formerly list_cse-sandbox-319708_dbt_jyeo, now model.my_dbt_project.test_struct_inc)
�[0m15:27:46.989065 [debug] [Thread-1  ]: Began compiling node model.my_dbt_project.test_struct_inc
�[0m15:27:47.004208 [debug] [Thread-1  ]: Writing injected SQL for node "model.my_dbt_project.test_struct_inc"
�[0m15:27:47.006922 [debug] [Thread-1  ]: Timing info for model.my_dbt_project.test_struct_inc (compile): 15:27:46.990604 => 15:27:47.006480
�[0m15:27:47.007785 [debug] [Thread-1  ]: Began executing node model.my_dbt_project.test_struct_inc
�[0m15:27:47.375247 [debug] [Thread-1  ]: Opening a new connection, currently in state closed
�[0m15:27:47.420043 [debug] [Thread-1  ]: On model.my_dbt_project.test_struct_inc: 
    select * from (
      

select [struct(a.meta.`group` as `group`, 'foo' as v)] as some_data, date(created_at) as _partitiontime from `cse-sandbox-319708`.`dbt_jyeo`.`raw_data` as a
    ) as __dbt_sbq
    where false
    limit 0
  
�[0m15:27:49.184743 [debug] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:cc94a20c-2083-42ca-a6e6-6ce453f0c755&page=queryresults
�[0m15:27:50.286102 [debug] [Thread-1  ]: On model.my_dbt_project.test_struct_inc: /* {"app": "dbt", "dbt_version": "1.6.0", "profile_name": "all", "target_name": "bq", "node_id": "model.my_dbt_project.test_struct_inc"} */

    
  
    

    create or replace table `cse-sandbox-319708`.`dbt_jyeo`.`test_struct_inc`(`some_data` ARRAY<STRUCT<group INT64, v STRING>>)
      
    partition by timestamp_trunc(_PARTITIONTIME, day)
    

    OPTIONS()
  
  
�[0m15:27:50.682913 [debug] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:57295df1-8d9e-4502-8934-ddbad346cdf0&page=queryresults
�[0m15:27:50.683824 [debug] [Thread-1  ]: BigQuery adapter: Retry attempt 1 of 1 after error: BadRequest('Syntax error: Unexpected keyword GROUP at [7:104]')
�[0m15:27:52.159772 [debug] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:cef5b81c-4286-495a-b650-4632f81c26ff&page=queryresults
�[0m15:27:52.161826 [error] [Thread-1  ]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:cef5b81c-4286-495a-b650-4632f81c26ff&page=queryresults
�[0m15:27:52.164817 [debug] [Thread-1  ]: Timing info for model.my_dbt_project.test_struct_inc (execute): 15:27:47.008517 => 15:27:52.163810
�[0m15:27:52.195637 [debug] [Thread-1  ]: Database Error in model test_struct_inc (models/test_struct_inc.sql)
  Syntax error: Unexpected keyword GROUP at [7:104]
�[0m15:27:52.197059 [debug] [Thread-1  ]: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'a12e9dcc-a2c3-454e-9895-4723dd460b0d', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11afe6d00>]}
�[0m15:27:52.198863 [error] [Thread-1  ]: 1 of 1 ERROR creating sql incremental model dbt_jyeo.test_struct_inc ........... [�[31mERROR�[0m in 5.21s]
�[0m15:27:52.200341 [debug] [Thread-1  ]: Finished running node model.my_dbt_project.test_struct_inc
�[0m15:27:52.204424 [debug] [MainThread]: Connection 'master' was properly closed.
�[0m15:27:52.205337 [debug] [MainThread]: Connection 'model.my_dbt_project.test_struct_inc' was properly closed.
�[0m15:27:52.206044 [info ] [MainThread]: 
�[0m15:27:52.206741 [info ] [MainThread]: Finished running 1 incremental model in 0 hours 1 minutes and 12.34 seconds (72.34s).
�[0m15:27:52.207874 [debug] [MainThread]: Command end result
�[0m15:27:52.230977 [info ] [MainThread]: 
�[0m15:27:52.231870 [info ] [MainThread]: �[31mCompleted with 1 error and 0 warnings:�[0m
�[0m15:27:52.233508 [info ] [MainThread]: 
�[0m15:27:52.234545 [error] [MainThread]: �[33mDatabase Error in model test_struct_inc (models/test_struct_inc.sql)�[0m
�[0m15:27:52.235212 [error] [MainThread]:   Syntax error: Unexpected keyword GROUP at [7:104]
�[0m15:27:52.235914 [info ] [MainThread]: 
�[0m15:27:52.236600 [info ] [MainThread]: Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
�[0m15:27:52.237638 [debug] [MainThread]: Command `dbt run` failed at 15:27:52.237469 after 76.82 seconds
�[0m15:27:52.238265 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10d72fbe0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x111935850>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11afdbb20>]}
�[0m15:27:52.238832 [debug] [MainThread]: Flushing usage events

Environment

- OS: macOS
- Python: 3.9.13
- dbt-core: 1.6.0
- dbt-bigquery: 1.6.4

Additional Context

The debug logs show:

    create or replace table `cse-sandbox-319708`.`dbt_jyeo`.`test_struct_inc`(`some_data` ARRAY<STRUCT<group INT64, v STRING>>)
      
    partition by timestamp_trunc(_PARTITIONTIME, day)
    

    OPTIONS()

but we should instead be doing:

    create or replace table `cse-sandbox-319708`.`dbt_jyeo`.`test_struct_inc`(`some_data` ARRAY<STRUCT<`group` INT64, `v` STRING>>)
      
    partition by timestamp_trunc(_PARTITIONTIME, day)
    

    OPTIONS()

Quoted both fields "group" and "v" for completeness but just quotes around "group" would have done it.

Couple of other things I looked at or attempted without success:

(A) Setting the quote property - I didn't find a way to actually set that property properly.

(B) Using contracts / data_type:

-- models/schema.yml
version: 2
models:
  - name: test_struct_inc
    config:
      contract:
        enforced: true
    columns:
      - name: some_data
        data_type: ARRAY<STRUCT `group` INT64, v STRING>>
@jeremyyeo jeremyyeo added bug Something isn't working triage labels Sep 7, 2023
@github-actions github-actions bot changed the title [Bug] Columns are unquoted with nested data (incremental + time_ingestion_partitioning) [ADAP-874] [Bug] Columns are unquoted with nested data (incremental + time_ingestion_partitioning) Sep 7, 2023
@dbeatty10 dbeatty10 added partitioning Related to creating, replacing, or pruning partitions to avoid full table scans incremental labels Sep 7, 2023
@github-christophe-oudar
Copy link
Contributor

I wrote a quick fix for that problem #919
Hopefully it won't lead to any regression. We could be bit safer using a parameter but I'm not sure it's worth it as I don't know a case where the data type without the quote would pass and the quoted one would fail.

@dbeatty10
Copy link
Contributor

Thanks for raising a PR @github-christophe-oudar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working incremental partitioning Related to creating, replacing, or pruning partitions to avoid full table scans
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants