Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-619] [Bug] Column comparison in snapshot_check_all_get_existing_columns #5222

Closed
jtcohen6 opened this issue May 9, 2022 · 0 comments · Fixed by #5223
Closed

[CT-619] [Bug] Column comparison in snapshot_check_all_get_existing_columns #5222

jtcohen6 opened this issue May 9, 2022 · 0 comments · Fixed by #5223
Labels
adapter_plugins Issues relating to third-party adapter plugins bug Something isn't working snapshots Issues related to dbt's snapshot functionality Team:Adapters Issues designated for the adapter area of the code

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 9, 2022

#4893 added a snapshot_check_all_get_existing_columns macro, with more complex logic to handle column schema evolution, in cases where a user has simultaneously added a new column to the snapshot query and added it to an explicit check_cols config (i.e. not "all").

{% macro snapshot_check_all_get_existing_columns(node, target_exists, check_cols_config) -%}
{% if check_cols_config == 'all' %}
{%- set query_columns = get_columns_in_query(node['compiled_sql']) -%}
{% elif check_cols_config is iterable and (check_cols_config | length) > 0 %}
{% set query_columns = check_cols_config %}
{% else %}
{% do exceptions.raise_compiler_error("Invalid value for 'check_cols': " ~ check_cols_config) %}
{% endif %}
{%- if not target_exists -%}
{# no table yet -> return whatever the query does #}
{{ return([false, query_columns]) }}
{%- endif -%}
{# handle any schema changes #}
{%- set target_table = node.get('alias', node.get('name')) -%}
{%- set target_relation = adapter.get_relation(database=node.database, schema=node.schema, identifier=target_table) -%}
{%- set existing_cols = get_columns_in_query('select * from ' ~ target_relation) -%}
{%- set ns = namespace() -%} {# handle for-loop scoping with a namespace #}
{%- set ns.column_added = false -%}
{%- set intersection = [] -%}
{%- for col in query_columns -%}
{%- if col in existing_cols -%}
{%- do intersection.append(col) -%}
{%- else -%}
{% set ns.column_added = true %}
{%- endif -%}
{%- endfor -%}
{{ return([ns.column_added, intersection]) }}
{%- endmacro %}

However, this logic depends on comparison between user-provided check_cols config to the columns accessed from the database. This doesn't account for quoting/casing differences, which are just about guaranteed on Snowflake. The macro always finds that column_added, so snapshots with check strategy + explicit check_cols config always return where TRUE for their row_changed_expr:

{%- set row_changed_expr -%}
(
{%- if column_added -%}
{{ get_true_sql() }}
{%- else -%}

Hence, these failing tests: dbt-labs/dbt-snowflake#154

We have two options for refactoring this macro:

  1. Do a case-and-quote-insensitive comparison between check_cols_config and existing_cols
  2. Query check_cols_config from the database, in all cases, to ensure that casing/quoting will line up exactly

Personally, I'd opt for the second. It's better to run an extra query than to risk snapshot failure (very bad!). I'm aware that the change in #4893 has us running at least 2 extra queries, relative to the previous behavior for explicit check_cols. I don't think it's worth an opt-in config: the principle of the thing is that snapshots shouldn't fail, even if it requires a lot of metadata queries along the way. Worst case, this is user-space code, someone could override snapshot_check_all_get_existing_columns for a more concise (if risky) approach.

@jtcohen6 jtcohen6 added bug Something isn't working snapshots Issues related to dbt's snapshot functionality adapter_plugins Issues relating to third-party adapter plugins Team:Adapters Issues designated for the adapter area of the code labels May 9, 2022
@github-actions github-actions bot changed the title [Bug] Column comparison in snapshot_check_all_get_existing_columns [CT-619] [Bug] Column comparison in snapshot_check_all_get_existing_columns May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_plugins Issues relating to third-party adapter plugins bug Something isn't working snapshots Issues related to dbt's snapshot functionality Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant