Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-767] [Bug] Model's contract validation fails on query data types change #861

Closed
2 tasks done
dementiev27 opened this issue Aug 3, 2023 · 1 comment · Fixed by #862
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@dementiev27
Copy link
Contributor

dementiev27 commented Aug 3, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Model data types are being detected incorrectly due to empty query caching in BigQuery, resulting in contract validation errors.

To get column data types, DBT executes a query like this:

select * from (
    -- model's body goes here
    select
        id,
        amount
    from project.dataset.payments
    --
) as __dbt_sbq
where false
limit 0

This query is cached in BigQuery, leading to incorrect data types detection in case of their change.

Expected Behavior

Model data types are detected correctly and contract validation works as expected.

Steps To Reproduce

  1. Create a model that selects data from any table:
select
    customer_id,
    sum(amount) as total_amount
from {{ source('finance', 'payments') }}
where finished is true
group by customer_id
version: 2

models:
  - name: customer_payments
    config:
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: string
      - name: total_amount
        data_type: integer
  1. Run this model. Under the hood, DBT will execute the "empty" query mentioned earlier.
  2. Change the customer_id data type in the payments source table.
    Also, appropriately change the data type of customer_id for the model in the YAML config.
  3. Run the model again. The dbt will not detect the customer_id data type change, and you will get the error:
This model has an enforced contract that failed.
Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
| column_name | definition_type | contract_type | mismatch_reason    |
| ----------- | --------------- | ------------- | ------------------ |
| customer_id | INT64           | STRING        | data type mismatch |

Relevant log output

No response

Environment

- OS: macOS 13.4.1
- Python: 3.10.2
- dbt-core: 1.5.2
- dbt-bigquery: 1.5.3

Additional Context

No response

@dementiev27 dementiev27 added bug Something isn't working triage labels Aug 3, 2023
@github-actions github-actions bot changed the title [Bug] Model's contact validation fails on query data types change [ADAP-767] [Bug] Model's contact validation fails on query data types change Aug 3, 2023
@dbeatty10
Copy link
Contributor

Thanks for reporting this and opening up an associated PR @dementiev27 !

Maybe the BigQuery cache invalidation isn't working as we'd expect? It seems like BigQuery should invalidate its cache when an underlying table is rebuilt, but that doesn't appear to be happening.

I was able to reproduce the scenario you reported, and those details are in the "Reprex" toggle immediately below.

Reprex

Start with a model like this:
models/customer_payments.sql

{{ config(materialized="table") }}

select
    customer_id
from {{ ref('bogus_source_data') }}

Create the "source" data and the initial contracted model:

cat << EOF > models/bogus_source_data.sql
{{ config(materialized="table") }}

select
    cast(1 as integer) as customer_id
    -- '1' as customer_id
EOF
cat << EOF > models/_models.yaml
models:
  - name: customer_payments
    config:
      contract:
        enforced: true
    columns:
      - name: customer_id
        data_type: integer
        # data_type: string
EOF
dbt run -s bogus_source_data
dbt run -s customer_payments --full-refresh

Change the data type within the "source" data and the contracted model:

cat << EOF > models/bogus_source_data.sql
{{ config(materialized="table") }}

select
    -- cast(1 as integer) as customer_id
    '1' as customer_id
EOF
cat << EOF > models/_models.yaml
models:
  - name: customer_payments
    config:
      contract:
        enforced: true
    columns:
      - name: customer_id
        # data_type: integer
        data_type: string
EOF
dbt run -s bogus_source_data
dbt run -s customer_payments

Get an error like this:

22:59:24  Compilation Error in model customer_payments (models/customer_payments.sql)
22:59:24    This model has an enforced contract that failed.
22:59:24    Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
22:59:24    
22:59:24    | column_name | definition_type | contract_type | mismatch_reason    |
22:59:24    | ----------- | --------------- | ------------- | ------------------ |
22:59:24    | customer_id | INT64           | STRING        | data type mismatch |

Updating models/customer_payments.sql like the following and re-running everything above seems work:

{{ config(materialized="table") }}

-- {{ modules.datetime.datetime.now() }}
select
    customer_id
from {{ ref('bogus_source_data') }}

Adding a SQL comment that generates a unique timestamp each time it is executed seems to successfully invalidate the BigQuery cache.

This feels kinda hacky, but I think we'd need to do something like this if BigQuery doesn't invalidate it's own cache when an underlying table is rebuilt.

One of the members of our engineering team will take a look at your PR -- thanks again for submitting it 🏆

@dbeatty10 dbeatty10 removed the triage label Aug 3, 2023
@dbeatty10 dbeatty10 changed the title [ADAP-767] [Bug] Model's contact validation fails on query data types change [ADAP-767] [Bug] Model's contract validation fails on query data types change Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants