Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model contract enforce is not done at compile time #496

Open
faizhasan opened this issue Nov 9, 2023 · 8 comments
Open

model contract enforce is not done at compile time #496

faizhasan opened this issue Nov 9, 2023 · 8 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@faizhasan
Copy link

When trying to enforce contracts on a model schema, the dbt run command is supposed to throw a Compilation Error before dbt has materialised the table when the contract does not match - instead the run continues to happen and a Compilation error is thrown after the run is executed.

If the table already exists, it gets dropped, making the situation worse for downstream systems.

tested with dbt-athena 1.5.2 and 1.6.4 versions.

test_contract.sql

-------------------
{{
  config(
    materialized = 'table',
    )
}}

select
  'abc123' as customer_id,
  'My Best Customer' as customer_namex

schema.yml

version: 2

models:
    - name: test_contract
      config:
        materialized: table
        contract:
          enforced: true
      columns:
        - name: customer_id
          data_type: string
        - name: customer_name
          data_type: string

Result:

04:12:02  Running with dbt=1.6.8
04:12:02  Registered adapter: athena=1.6.4
04:12:02  Found 19 models, 4 tests, 0 sources, 0 exposures, 0 metrics, 380 macros, 0 groups, 0 semantic models
04:12:02
04:12:08  Concurrency: 1 threads (target='prod')
04:12:08
04:12:08  1 of 1 START sql table model z_xxxxxx.test_contract ................... [RUN]
04:12:11  1 of 1 ERROR creating sql table model z_xxxxxx.test_contract .......... [ERROR in 3.57s]
04:12:11
04:12:11  Finished running 1 table model in 0 hours 0 minutes and 9.35 seconds (9.35s).
04:12:11
04:12:11  Completed with 1 error and 0 warnings:
04:12:11
04:12:11    Compilation Error in model test_contract (models/presentation/test_contract.sql)
  This model has an enforced contract that failed.
  Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.

  | column_name    | definition_type | contract_type | mismatch_reason       |
  | -------------- | --------------- | ------------- | --------------------- |
  | customer_name  |                 | VARCHAR       | missing in definition |
  | customer_namex | VARCHAR         |               | missing in contract   |


  > in macro assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql)
  > called by macro default__get_assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql)
  > called by macro get_assert_columns_equivalent (macros/materializations/models/table/columns_spec_ddl.sql)
  > called by macro athena__create_table_as (macros/materializations/models/table/create_table_as.sql)
  > called by macro create_table_as (macros/materializations/models/table/create_table_as.sql)
  > called by macro safe_create_table_as (macros/materializations/models/table/create_table_as.sql)
  > called by macro materialization_table_athena (macros/materializations/models/table/table.sql)
  > called by model test_contract (models/presentation/test_contract.sql)
04:12:11
04:12:11  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Expected:
contract is checked at Compile time before dbt run runs the materialisation on the table.

@nicor88
Copy link
Member

nicor88 commented Nov 9, 2023

@faizhasan did you check if this is related to dbt-core and not to dbt-athena? in my understanding this is the expected behvior. Maybe @Jrmyy can also help out here.

@faizhasan
Copy link
Author

faizhasan commented Nov 9, 2023

Unsure what you mean by testing with dbt-core, do you mean with a different adapter like dbt-spark?

In general a Compilation error is thrown before runtime execution, not after. At the moment the behavior of contract enforcement in dbt-athena is destructive - the model table gets removed if contract does not match.

The dbt documentation refers to contract enforcement in 2 places both clearly saying this should happen before model is executed.

reference/resource-configs/contract

When you dbt run your model, before dbt has materialized it as a table in the database, you will see this error:

govern/model-contracts

When building a model with a defined contract, dbt will do two things differently:

1. dbt will run a "preflight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. This check is agnostic to the order of columns specified in your model (SQL) or YAML spec.
2. dbt will include the column names, data types, and constraints in the DDL statements it submits to the data platform, which will be enforced while building or updating the model's table.

@nicor88
Copy link
Member

nicor88 commented Nov 10, 2023

The destructive behaviour is indeed far from ideal.
I looked how the table materialization looks like in dbt-athena, and I didn't see any reference to contracts, that's why I suspect that this behavior gets inherited by dbt-core.

Could you tell me in version of dbt-core and dbt-athena you saw this behavior?

@Jrmyy
Copy link
Member

Jrmyy commented Nov 10, 2023

Our implementation is slightly different from dbt-core because we are limited by what can be done in Trino. dbt-core handles it directly during the table creation (with a create or replace) but we can't do that, so we do it before creating the table.

This means that indeed, in the case the materialization is table and is_ha=False, the table is:

  • deleted
  • then the contract is checked -> if there is an error indeed the table will be deleted
  • then we create the table

We can try to make the contract check before dropping the table. This is a good catch, feel free to submit a PR if you feel like it, otherwise we will try to tackle it as soon as we can

@Jrmyy Jrmyy added bug Something isn't working good first issue Good for newcomers labels Nov 10, 2023
@nicor88
Copy link
Member

nicor88 commented Nov 10, 2023

@Jrmyy i missed that contract check was inside the cta, we should definitely try to move it out and put in the materialization

@Jrmyy
Copy link
Member

Jrmyy commented Nov 10, 2023

This could be tricky because we want to do it also for all tables we create along the way (to stay consistent with dbt-core implementation), which means temp tables, ha tables, etc

@nicor88
Copy link
Member

nicor88 commented Nov 10, 2023

Shall we recommend users to use iceberg or ha=true or incremental model for contracts enforcement? @faizhasan could you try with ha=True for your use cases and see if you get a less destructive behaviour?

@Jrmyy
Copy link
Member

Jrmyy commented Nov 13, 2023

We can but the long term solution should be to have a non-destructive behavior for contract enforcement or not, independent from the high availability feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants