Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Existing dynamic tables attempted to rebuild instead of issuing a fast alter and moving on #1017

Closed
2 tasks done
nydnarb opened this issue May 3, 2024 · 8 comments
Closed
2 tasks done
Labels
bug Something isn't working duplicate

Comments

@nydnarb
Copy link

nydnarb commented May 3, 2024

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Note that this issue resolved itself, so it might not be a bug anymore.

Consider this dynamic table:

models:
  - name: aws_cost_reports_monthly
    config:
      schema: aws_cost_reports
      alias: monthly
      materialized: 'dynamic_table'
      target_lag: '1 hour'
      snowflake_warehouse: 'transforming'

This model has been materialized and when dbt Cloud does our daily build, it does not need to build this table. If I inspect run artifacts in the target/run folder, I see this (which aligns with my expectation):

alter dynamic table "ANALYTICS"."AWS_COST_REPORTS"."MONTHLY" set warehouse = transforming

However, on May 2nd, we had an error with this model.

14:06:00    Database Error in model aws_cost_reports_monthly (models/aws_cost_reports/aws_cost_reports_monthly.sql)
  090106 (22000): Cannot perform CREATE TABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.

Upon investigating the run artifacts, I saw the following SQL scripts (reformatted and unchanged):

-- get the standard backup name
    
-- drop any pre-existing backup
    
drop table if exists "ANALYTICS"."AWS_COST_REPORTS"."MONTHLY__dbt_backup" cascade
;

alter table "ANALYTICS"."AWS_COST_REPORTS"."MONTHLY" rename to MONTHLY__dbt_backup
;
        
create dynamic table analytics.aws_cost_reports.monthly
  target_lag = '1 hour'
  warehouse = transforming
  as ( << long SQL query redacted >> )
;

-- get the standard backup name
    
drop table if exists "ANALYTICS"."AWS_COST_REPORTS"."MONTHLY__dbt_backup" cascade
;

So, the error occurred because of the unqualified name MONTHLY__dbt_backup. However, the bigger issue is why dbt was attempting to rebuild a dynamic table that was already materialized?

I'm not sure exactly how dbt determines if a dynamic table is already materialized but it could be related to the error. For example, if dbt failed to "see" the materialized dynamic table because it was checking an unqualified location that didn't match the appropriate location, then perhaps dbt thought it had to materialize this dynamic table for the first time. Note that @jeremyyeo shared this bug in Slack while we were discussing the issue: #1016

I confirmed this behavior on dbt-snowflake 1.7.3 and dbt-core 1.7.13. Also, I confirmed that snowflake bundle 2024_03 was disabled. This issue occurred on dbt Cloud and locally in a python Docker container.

However, as of May 3rd, jobs in dbt Cloud are running successfully. When I check the artifacts for aws_cost_reports_monthly, I see the appropriate alter statement.

Here is a link to the Slack discussion: #1016

Expected Behavior

The issue is resolved and expected behavior has returned. That is, when a dynamic table is already materialized and dbt Cloud builds it, it issues an alter command.

However, I am concerned about this part of the materialization:

alter table "ANALYTICS"."AWS_COST_REPORTS"."MONTHLY" rename to MONTHLY__dbt_backup

It relies on default behavior in Snowflake to infer the schema and database. Since that behavior is subject to change, dbt might want to be more explicit here.

Steps To Reproduce

I can't reproduce the issue because it appears to be resolved.

Relevant log output

See above

Environment

dbt Cloud on 1.7.13 and Snowflake 1.7.3

and locally:

# dbt debug
17:19:53  Running with dbt=1.7.13
17:19:53  dbt version: 1.7.13
17:19:53  python version: 3.10.14
17:19:53  python path: /usr/local/bin/python
17:19:53  os info: Linux-6.1.64-0-virt-aarch64-with-glibc2.36
17:19:53  Using profiles dir at /app/dbt
17:19:53  Using profiles.yml file at /app/dbt/profiles.yml
17:19:53  Using dbt_project.yml file at /app/dbt/dbt_project.yml
17:19:53  adapter type: snowflake
17:19:53  adapter version: 1.7.3
17:19:53  Configuration:
17:19:53    profiles.yml file [OK found and valid]
17:19:53    dbt_project.yml file [OK found and valid]
17:19:53  Required dependencies:
17:19:53   - git [OK found]

17:19:53  Connection:
17:19:53    account: ******
17:19:53    user: ******
17:19:53    database: ******
17:19:53    warehouse: ******
17:19:53    role: ******
17:19:53    schema: ******
17:19:53    authenticator: username_password_mfa
17:19:53    oauth_client_id: None
17:19:53    query_tag: None
17:19:53    client_session_keep_alive: False
17:19:53    host: None
17:19:53    port: None
17:19:53    proxy_host: None
17:19:53    proxy_port: None
17:19:53    protocol: None
17:19:53    connect_retries: 1
17:19:53    connect_timeout: None
17:19:53    retry_on_database_errors: False
17:19:53    retry_all: False
17:19:53    insecure_mode: False
17:19:53    reuse_connections: None
17:19:53  Registered adapter: snowflake=1.7.3
17:19:54    Connection test: [OK connection ok]

17:19:54  All checks passed!


### Additional Context

_No response_
@nydnarb nydnarb added bug Something isn't working triage labels May 3, 2024
@nydnarb
Copy link
Author

nydnarb commented May 4, 2024

Note that the issue came back. I'm not sure if folks have access to dbt Cloud run logs, but I will share two runs. This run is from yesterday and you can see the alter statement if you look at the artifact run/dl_dbt/models/aws_cost_reports/aws_cost_reports_monthly.sql:
https://cloud.getdbt.com/deploy/93718/projects/178054/runs/279831506 There were no changes to that model and then last night and this morning, dbt attempted again to recreate the dynamic table. Here are links to those two runs:

I tried locally and once again dbt is attempting to recreate the entire table.

Not sure if y'all have access to dbt Cloud customer logs, so I'm happy to hop on a call if that would be useful.

@nydnarb
Copy link
Author

nydnarb commented May 6, 2024

It appears that Snowflake was toggling the 2024_03 bundle back and forth during the time that the issue was occurring and then resolving itself. Every time I checked to see if the bundle was enabled, it was not. However, it appears it's enabled now and the issue persists. I believe the source of this issue is already captured in this issue: #1016 This issue can very likely be closed.

@dataders
Copy link
Contributor

dataders commented May 6, 2024

@nydnarb can you test the proposed change in #1016 by adding this macro (with the change on L45) to your project's macros/ dir to see if the error persists?

{% macro snowflake__get_catalog_tables_sql(information_schema) -%}
    select
        table_catalog as "table_database",
        table_schema as "table_schema",
        table_name as "table_name",
        case
            when (is_dynamic = 'YES' OR is_dynamic = 'Y') and table_type = 'BASE TABLE' THEN 'DYNAMIC TABLE'
            else table_type
        end as "table_type",
        comment as "table_comment",


        -- note: this is the _role_ that owns the table
        table_owner as "table_owner",


        'Clustering Key' as "stats:clustering_key:label",
        clustering_key as "stats:clustering_key:value",
        'The key used to cluster this table' as "stats:clustering_key:description",
        (clustering_key is not null) as "stats:clustering_key:include",


        'Row Count' as "stats:row_count:label",
        row_count as "stats:row_count:value",
        'An approximate count of rows in this table' as "stats:row_count:description",
        (row_count is not null) as "stats:row_count:include",


        'Approximate Size' as "stats:bytes:label",
        bytes as "stats:bytes:value",
        'Approximate size of the table as reported by Snowflake' as "stats:bytes:description",
        (bytes is not null) as "stats:bytes:include",


        'Last Modified' as "stats:last_modified:label",
        to_varchar(convert_timezone('UTC', last_altered), 'yyyy-mm-dd HH24:MI'||'UTC') as "stats:last_modified:value",
        'The timestamp for last update/change' as "stats:last_modified:description",
        (last_altered is not null and table_type='BASE TABLE') as "stats:last_modified:include"
    from {{ information_schema }}.tables
{%- endmacro %}

@nydnarb
Copy link
Author

nydnarb commented May 6, 2024

I just did the following.

First, I made sure that, in my sandbox database, the schemas dbt_brandyn and dbt_brandyn_aws_cost_reports were dropped. The model is currently not materialized.

Then, I added the above macro to the macros folder.

Then, I updated the dynamic table model so that is only looked at the last three days worth of data so it doesn't take 45 minutes to materialize (e.g., where _metadata_file_last_modified > dateadd('day', -3, sysdate())).

Then, dbt build -s aws_cost_reports_monthly. Returned:

20:52:12  Finished running 1 dynamic_table model, 12 tests, 2 hooks in 0 hours 0 minutes and 31.00 seconds (31.00s).

Checking target/run/dl_dbt/models/aws_cost_reports/aws_cost_reports_monthly.sql I see

create dynamic table sandbox.dbt_brandyn_aws_cost_reports.monthly ...

This behavior is as expected. Success! But...

Then, I try running it again: dbt build -s aws_cost_reports_monthly:

20:55:04  1 of 13 ERROR creating sql dynamic_table model dbt_brandyn_aws_cost_reports.monthly  [ERROR in 0.86s]
...
20:55:04  Finished running 1 dynamic_table model, 12 tests, 2 hooks in 0 hours 0 minutes and 7.99 seconds (7.99s).
20:55:05
20:55:05  Completed with 1 error and 0 warnings:
20:55:05
20:55:05    Database Error in model aws_cost_reports_monthly (models/aws_cost_reports/aws_cost_reports_monthly.sql)
  090106 (22000): Cannot perform CREATE TABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
  compiled Code at target/run/dl_dbt/models/aws_cost_reports/aws_cost_reports_monthly.sql
20:55:05
20:55:05  Done. PASS=0 WARN=0 ERROR=1 SKIP=12 TOTAL=13

Checking target/run/dl_dbt/models/aws_cost_reports/aws_cost_reports_monthly.sql I see this:

-- get the standard backup name
-- drop any pre-existing backup
drop table if exists "SANDBOX"."DBT_BRANDYN_AWS_COST_REPORTS"."MONTHLY__dbt_backup" cascade;
alter table "SANDBOX"."DBT_BRANDYN_AWS_COST_REPORTS"."MONTHLY" rename to MONTHLY__dbt_backup;
create dynamic table sandbox.dbt_brandyn_aws_cost_reports.monthly
target_lag = '1 hour'
warehouse = sandboxing
as (...);
-- get the standard backup name
drop table if exists "SANDBOX"."DBT_BRANDYN_AWS_COST_REPORTS"."MONTHLY__dbt_backup" cascade

Looks like the error came from the alter table statement above. But, this output seems to indicate that the dynamic table materialization is still trying to do a full refresh on the table. Upon trying a third time, I got exactly the same error as when I tried the second time.

For completeness:

# dbt debug
21:02:43  Running with dbt=1.7.13
21:02:43  dbt version: 1.7.13
21:02:43  python version: 3.10.14
21:02:43  python path: /usr/local/bin/python
21:02:43  os info: Linux-6.1.64-0-virt-aarch64-with-glibc2.36
21:02:43  Using profiles dir at /app/dbt
21:02:43  Using profiles.yml file at /app/dbt/profiles.yml
21:02:43  Using dbt_project.yml file at /app/dbt/dbt_project.yml
21:02:43  adapter type: snowflake
21:02:43  adapter version: 1.7.3
21:02:43  Configuration:
21:02:43    profiles.yml file [OK found and valid]
21:02:43    dbt_project.yml file [OK found and valid]
21:02:43  Required dependencies:
21:02:43   - git [OK found]

@clareawilson
Copy link

We're seeing the same error with the alter table statement -
090106 (22000): Cannot perform CREATE TABLE. This session does not have a current schema. Call 'USE SCHEMA', or use a qualified name.
using bundle 2024_03

@nydnarb
Copy link
Author

nydnarb commented May 8, 2024

The issue has been identified. dbt uses a show terse objects command to get a list of database objects. Snowflake changed the behavior of this command in bundle 2024_03 without letting any of us know. Note that in their change logs, they explained that the behavior of show tables would change, but Snowflake also changed the behavior of show terse objects without mentioning it. The team at dbt is working with Snowflake on a solution.

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 9, 2024

@nydnarb Thank you so much for the thorough issue & follow-up comments!

So, the error occurred because of the unqualified name MONTHLY__dbt_backup.

I agree, and I am thinking that we might be able to avoid this error by fully qualifying the name within the alter ... rename.

However, the bigger issue is why dbt was attempting to rebuild a dynamic table that was already materialized?

This is the issue that we are tracking down with Snowflake. We would like to be able to reliably detect whether an object is already a dynamic table, in order to atomically operate on it (either alter or create or replace), rather than dropping + recreating (risk of lost data / downtime):

@jtcohen6 jtcohen6 added duplicate and removed triage labels May 9, 2024
@jtcohen6
Copy link
Contributor

I've opened a separate bug for the unqualified renaming:

I'd also like to share what I believe to be a viable workaround, while we sort this out with Snowflake:

I'm going to close this issue as a duplicate in the meantime.

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate
Projects
None yet
Development

No branches or pull requests

4 participants