Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Missing query-comments on insert statements for models with enforced contracts #9617

Open
2 tasks done
OFlanagan opened this issue Feb 21, 2024 · 14 comments
Open
2 tasks done
Labels
bug Something isn't working model_contracts

Comments

@OFlanagan
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Queries run against redshift do not all have the query-comment appended at the start.
This query-comment is present for many CREATE and DELETE commands but is missing for inserts for:
incrementals (inserting into the incremental table from the temp table)
and seeds (the insert does not have query comments)

Expected Behavior

the documentation states "A string to inject as a comment in each query that dbt runs against your database"
This means that the above inserts should also include the query-comment
https://docs.getdbt.com/reference/project-configs/query-comment

Steps To Reproduce

  1. define a seed or incremental in your dbt project
  2. ensure query-comments are enabled for your project
  3. run dbt run --select seed_name or similar for incremental

Relevant log output

No response

Environment

- OS: Amazon Linux 23
- Python: 3.11.6
- dbt-core: 1.6.6

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@OFlanagan OFlanagan added bug Something isn't working triage labels Feb 21, 2024
@dbeatty10 dbeatty10 self-assigned this Feb 21, 2024
@dbeatty10
Copy link
Contributor

Thanks for reporting this @OFlanagan 👍

Which incremental strategies are you seeing this with?

  • append
  • merge
  • delete+insert

Is it all of the above, so just one or two of them?

@OFlanagan
Copy link
Author

We are currently using the default settings on redshift. I am not sure what incremental strategy that would correspond to but it is not merge.
I can see deletes and inserts onto our tables so I would guess delete+insert. The deletes have the query-comments but the inserts don't

@dbeatty10
Copy link
Contributor

I'm seeing what you are saying about seeds, but I'm not seeing the same thing for incremental models.

Can you share the relevant sections of your logs/dbt.log file?

Here's what I'm seeing in mine which includes the query comment:

[0m10:53:46.124315 [debug] [Thread-1 (]: Using redshift connection "model.my_project.my_incremental_append"
[0m10:53:46.124881 [debug] [Thread-1 (]: On model.my_project.my_incremental_append: /* {"app": "dbt", "dbt_version": "1.7.8", "profile_name": "redshift", "target_name": "blue", "node_id": "model.my_project.my_incremental_append"} */

      insert into "ci"."dbt_dbeatty"."my_incremental_append" ("id")
    (
        select "id"
        from "my_incremental_append__dbt_tmp105342706163"
    )
[0m10:53:46.650463 [debug] [Thread-2 (]: Using redshift connection "model.my_project.my_incremental_delete_insert"
[0m10:53:46.651705 [debug] [Thread-2 (]: On model.my_project.my_incremental_delete_insert: /* {"app": "dbt", "dbt_version": "1.7.8", "profile_name": "redshift", "target_name": "blue", "node_id": "model.my_project.my_incremental_delete_insert"} */

      insert into "ci"."dbt_dbeatty"."my_incremental_delete_insert" ("id")
    (
        select "id"
        from "my_incremental_delete_insert__dbt_tmp105342744424"
    )

But seeds are not showing the query comment:

[0m10:53:44.071793 [debug] [Thread-4 (]: Using redshift connection "seed.my_project.my_seed"
[0m10:53:44.072505 [debug] [Thread-4 (]: On seed.my_project.my_seed: insert into "ci"."dbt_dbeatty"."my_seed" ("id") values
          (%s)...

@OFlanagan
Copy link
Author

I probably can't share the logs for confidentiality reasons but for the tmp table:

�[0m14:19:03.392616 [debug] [Thread-2 (]: Using redshift connection "model.redshift.tablename_staging"
�[0m14:19:03.393573 [debug] [Thread-2 (]: On model.redshift.tablename_staging: insert into tablename__dbt_tmpnumber
(--rest of query

and then when inserting from tmp table into main table
�[0m14:24:44.153468 [debug] [Thread-3 (]: On model.redshift.tablename_staging: insert into tablename_staging (columns)
(
select columns
from tablename_staging__dbt_tmpnumber
)

I can try updating to the latest version of dbt-core and see if that resolves the issue for incrementals

@dbeatty10
Copy link
Contributor

I can try updating to the latest version of dbt-core and see if that resolves the issue for incrementals

Yeah, give it a shot with the latest versions for dbt-core and dbt-redshift and let me know how it goes.

@OFlanagan
Copy link
Author

I updated to dbt-core 1.7.8 and dbt-redshift redshift=1.7.3 and still don't see the comments on the inserts

@dbeatty10
Copy link
Contributor

@OFlanagan I'm not sure what is different between our environments... 🤔

See below for my exact project files -- could you try out this simplified project to eliminate anything in your context that might be affecting the query comments?

dbt_project.yml

name: "my_project"
version: "1.0.0"
config-version: 2
profile: "redshift"

clean-targets:
  - dbt_packages
  - target
  - logs

seeds/my_seed.csv

id
1

models/my_model.sql

{{ config(materialized="table") }}

select 1 as id

models/my_incremental_append.sql

{{ config(
    materialized="incremental",
    strategy="append",
) }}

select 1 as id

models/my_incremental_delete_insert.sql

{{ config(
    materialized="incremental",
    strategy="delete_insert",
) }}

select 1 as id
dbt clean
dbt build --full-refresh
dbt build

Then examine code logs/dbt.log.

@OFlanagan
Copy link
Author

OFlanagan commented Feb 22, 2024

When I run that project I do see the comments on the insert.

However the incremental strategies appear to be performing a different set of operations to those in the project where I am encountering the issue.
In the sample project you have provided for the delete_insert model DBT creates a temp table with

create temporary table "my_incremental_delete_insert__dbt_tmp090014408572" as ( select 1 as id);

and then the increment is loaded into the main table with

      insert into "dwh"."dbt"."my_incremental_delete_insert" ("id")
    (
        select "id"
        from "my_incremental_delete_insert__dbt_tmp090014408572"
    )

Both of these queries have the query comment.

In my project I am seeing

a create temporary table command with the DDL (I am using distkey and sortkey in redshift - probably the difference). This has the query-comment

and then a separate insert into the temp table (with the query-comment)

and then a delete from the main table (with query-comment)

and then an insert into the main table from the tmp table (without the query-comment)

I would assume that the key difference here is the dist style and sort key settings in redshift

@OFlanagan
Copy link
Author

I tried without the diststyle and sort key specifications but that didn't fix it.
Do you know why the operations used would differ?

@dbeatty10
Copy link
Contributor

I'm not sure why the query comment isn't showing. There's surely something different, but it's not yet clear what that difference is.

Here's an idea for further troubleshooting:

  • Add the model from the simple project into your real project and then run it.
    • if it doesn't have the query comments, then that implies it's something about your project or environment
    • if it does have the query comments, then that implies it is something about your model

Wanna give that a shot and see what happens?

@dbeatty10 dbeatty10 removed the triage label Feb 22, 2024
@OFlanagan
Copy link
Author

Great call.

The issue appears when a contract is specified in the properties file

Model File

{{ config(
    materialized="incremental",
    on_schema_change='fail',
    unique_key='composite_key',
    sort=['sort_id1','sort_id2'],
    dist='id'
) }}

select 1 as id, 2 as sort_id1, 3 as sort_id2, 4 as composite_key

in the properties file

models:
  - name: my_incremental2
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: INTEGER
      - name: sort_id1
        data_type: INTEGER
      - name: sort_id2
        data_type: INTEGER
      - name: composite_key
        data_type: INTEGER

Logs

�[0m08:56:38.638862 [debug] [Thread-1 (]: Using redshift connection "model.redshift.my_incremental2"
�[0m08:56:38.640454 [debug] [Thread-1 (]: On model.redshift.my_incremental2: insert into my_incremental2__dbt_tmp085638083527
    (
      
    select id, sort_id1, sort_id2, composite_key
    from (
        

select 1 as id, 2 as sort_id1, 3 as sort_id2, 4 as composite_key
    ) as model_subq
    )
  ;

�[0m08:56:51.966516 [debug] [Thread-1 (]: On model.redshift.my_incremental2: insert into dwh.raw_data.my_incremental2 ("id", "sort_id1", "sort_id2", "composite_key")
    (
        select "id", "sort_id1", "sort_id2", "composite_key"
        from my_incremental2__dbt_tmp085638083527
    )

@dbeatty10
Copy link
Contributor

You found it! Nice sleuth work :shipit:

It looks like model contracts have an insert statement that doesn't have the query comments applied. Namely, not all SQL statements in here include query comments.

It's not limited to incremental models. See below for a simple reproducible example using Postgres.

models/my_model.sql

{{ config(materialized="table") }}

select 1 as id

models/_models.yml

models:
  - name: my_model
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: integer

Subset of queries -- notice no query comment prior to insert:

BEGIN
/* {"app": "dbt", "dbt_version": "1.7.8", "profile_name": "postgres", "target_name": "postgres", "node_id": "model.my_project.my_model"} */

  
    

  create  table "postgres"."dbt_dbeatty"."my_model__dbt_tmp"
  
  
    
  
  (
    id integer
    
    )
 ;
    insert into "postgres"."dbt_dbeatty"."my_model__dbt_tmp" (
      id
    )
  
  (
    
    select id
    from (
        

select 1 as id
    ) as model_subq
  );
  
/* {"app": "dbt", "dbt_version": "1.7.8", "profile_name": "postgres", "target_name": "postgres", "node_id": "model.my_project.my_model"} */
alter table "postgres"."dbt_dbeatty"."my_model" rename to "my_model__dbt_backup"
/* {"app": "dbt", "dbt_version": "1.7.8", "profile_name": "postgres", "target_name": "postgres", "node_id": "model.my_project.my_model"} */
alter table "postgres"."dbt_dbeatty"."my_model__dbt_tmp" rename to "my_model"
COMMIT

Acceptance criteria

  • Each individual SQL statement executed during model contracts has its own query comment applied

Out of scope

  • Query comments for seeds

@dbeatty10 dbeatty10 removed their assignment Feb 23, 2024
@dbeatty10 dbeatty10 changed the title [Bug] Missing query-comments on inserts for seeds and incrementals [Bug] Missing query-comments on insert statements for models with enforced contracts Feb 23, 2024
@OFlanagan
Copy link
Author

Thanks @dbeatty10 . I notice that you have identified "Query comments for seeds" as out of scope. Will there be a separate issue for that?

@dbeatty10
Copy link
Contributor

Thanks @dbeatty10 . I notice that you have identified "Query comments for seeds" as out of scope. Will there be a separate issue for that?

Would you mind opening an issue for seeds specifically? You can just copy the same exact description you have for this issue and just make some slight tweaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model_contracts
Projects
None yet
Development

No branches or pull requests

2 participants