Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-964] [Regression] Truncate relation names when appending a suffix is creating cache inconsistency #5586

Closed
2 tasks done
tlfbrito opened this issue Jul 30, 2022 · 9 comments · Fixed by dbt-labs/dbt-redshift#149 or #5656
Labels
adapter_caching Issues related to the adapter's relation cache bug Something isn't working regression Team:Adapters Issues designated for the adapter area of the code
Milestone

Comments

@tlfbrito
Copy link

tlfbrito commented Jul 30, 2022

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

We have a few models with long and similar names, for example:

  • my_example__shipping_group_configurations__path_groups_and_settings__path_group__shipping_path_codes
  • my_example__shipping_group_configurations__path_groups_and_settings

After updating to DBT 1.2 we started to see the following error:
Cache inconsistency detected: in rename, new key

It that v1.2 is truncating both of the table identifiers to "identifier='shipping_group_configurations__path_groups_and_sett__dbt_backup'" probably due to #4921

Expected/Previous Behavior

In the 1.1 version this truncate did not happen.

Steps To Reproduce

Create two models with more than 63 characters in the name with a common prefix name.

Relevant log output

No response

Environment

- dbt (working version): 1.1
- dbt (regression version): 1.2

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@tlfbrito tlfbrito added bug Something isn't working regression triage labels Jul 30, 2022
@github-actions github-actions bot changed the title [Regression] Truncate relation names when appending a suffix is creating cache inconsistency [CT-964] [Regression] Truncate relation names when appending a suffix is creating cache inconsistency Jul 30, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 1, 2022

@tlfbrito Thanks for opening! You've identified the right root cause. We implemented smarter logic for handling relation name truncation on Postgres. Because the dbt-redshift adapter inherits from dbt-postgres, it will use any new dbt-postgres methods/macros unless told otherwise.

Here's the relevant macro logic:

{% macro postgres__make_relation_with_suffix(base_relation, suffix, dstring) %}
{% if dstring %}
{% set dt = modules.datetime.datetime.now() %}
{% set dtstring = dt.strftime("%H%M%S%f") %}
{% set suffix = suffix ~ dtstring %}
{% endif %}
{% set suffix_length = suffix|length %}
{% set relation_max_name_length = 63 %}
{% if suffix_length > relation_max_name_length %}
{% do exceptions.raise_compiler_error('Relation suffix is too long (' ~ suffix_length ~ ' characters). Maximum length is ' ~ relation_max_name_length ~ ' characters.') %}
{% endif %}
{% set identifier = base_relation.identifier[:relation_max_name_length - suffix_length] ~ suffix %}
{{ return(base_relation.incorporate(path={"identifier": identifier })) }}
{% endmacro %}

Redshift allows relation names up to 127 characters (docs), so it doesn't require the same complex handling at the 63-character mark. Potential solutions:

  1. Reimplement redshift__make_temp_relation to just use default__make_temp_relation instead (restores previous behavior, probably simplest)
  2. Copy-paste postgres__make_relation_with_suffix as redshift__make_relation_with_suffix, just changing relation_max_name_length from 63 to 127 characters
  3. Create a new property/method on the adapter or Relation class, identifier_max_length, and use that within postgres__make_relation_with_suffix

Then, we should add a test for this: two models with >63 character names, where the first 63 characters are identical.

@jtcohen6 jtcohen6 added Team:Adapters Issues designated for the adapter area of the code and removed triage labels Aug 1, 2022
@jtcohen6 jtcohen6 added this to the v1.2.x milestone Aug 1, 2022
@Goodkat
Copy link
Contributor

Goodkat commented Aug 3, 2022

@jtcohen6 I am working currently on the solution number two (Copy-paste postgres__make_relation_with_suffix as redshift__make_relation_with_suffix) and created the PR for that. Please take a look.
I am not sure about the tests, since I have no access to Redshift just now. Can we mock it?

@VersusFacit
Copy link
Contributor

Merge completed. We had "local" and CI tests reporting success. If you want to use the fix, point your adapters at repo HEAD.

@Limess
Copy link

Limess commented Aug 11, 2022

I just tested this with dbt 1.2.1RC1 and I'm seeing the same error:

[2022-08-11, 15:35:24 BST] {{subprocess.py:92}} INFO - 14:35:24  Running with dbt=1.2.1-rc1
...
...
...
[2022-08-11, 15:46:01 BST] {{subprocess.py:92}} INFO - 14:46:01  Completed with 2 errors and 0 warnings:
[2022-08-11, 15:46:01 BST] {{subprocess.py:92}} INFO - 14:46:01
[2022-08-11, 15:46:01 BST] {{subprocess.py:92}} INFO - 14:46:01  Cache inconsistency detected: in rename, new key _ReferenceKey(database='analytics', schema='analytics_intermediate', identifier='int_organisations_content_licenses_with_time_ranges__dbt_backup') already in cache:  ...
[2022-08-11, 15:46:01 BST] {{subprocess.py:92}} INFO - 14:46:01
[2022-08-11, 15:46:01 BST] {{subprocess.py:92}} INFO - 14:46:01  Cache inconsistency detected: in rename, new key _ReferenceKey(database='analytics', schema='analytics_intermediate', identifier='int_organisations_content_set_access_with_time_rang__dbt_backup') already in cache ...

both dbt-core and dbt-redshift are 1.2.1rc1

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 11, 2022

@Limess Thanks for testing and following up so quickly! I'll reopen this so we can keep working.

It sounds like you have two models with similar names which are colliding. Is it these?

  • int_organisations_content_licenses_with_time_ranges
  • int_organisations_content_set_access_with_time_ranges

I'm surprised to hear there's a collision between those two, since their first 51 characters are different. This should work even without the Redshift macro override that we included in 1.2.1rc1. Are there other models with whom they share the same first 51 characters?

@jtcohen6 jtcohen6 reopened this Aug 11, 2022
@Limess
Copy link

Limess commented Aug 11, 2022

Yep, there are 4 models like this within the project:

  • int_organisations_content_licenses_with_time_ranges
  • int_organisations_content_set_access_with_time_ranges
  • int_organisations_content_set_access_with_time_ranges_api
  • int_organisations_content_licenses_with_time_ranges_api

@jtcohen6
Copy link
Contributor

Ah, the issue is that while dbt-labs/dbt-redshift#149 reimplemented make_temp_relation, it didn't reimplement make_backup_relation. As soon as we apply the same logic there, it works for me locally:

$ dbt run
16:55:11  Running with dbt=1.2.1-rc1
16:55:11  Found 4 models, 0 tests, 0 snapshots, 0 analyses, 476 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
16:55:11
16:55:18  Concurrency: 5 threads (target='dev')
16:55:18
16:55:18  1 of 4 START view model dbt_jcohen.int_organisations_content_licenses_with_time_ranges  [RUN]
16:55:18  2 of 4 START view model dbt_jcohen.int_organisations_content_licenses_with_time_ranges_api  [RUN]
16:55:18  3 of 4 START view model dbt_jcohen.int_organisations_content_set_access_with_time_ranges  [RUN]
16:55:18  4 of 4 START view model dbt_jcohen.int_organisations_content_set_access_with_time_ranges_api  [RUN]
16:55:21  3 of 4 OK created view model dbt_jcohen.int_organisations_content_set_access_with_time_ranges  [CREATE VIEW in 3.44s]
16:55:22  1 of 4 OK created view model dbt_jcohen.int_organisations_content_licenses_with_time_ranges  [CREATE VIEW in 4.16s]
16:55:22  2 of 4 OK created view model dbt_jcohen.int_organisations_content_licenses_with_time_ranges_api  [CREATE VIEW in 4.82s]
16:55:23  4 of 4 OK created view model dbt_jcohen.int_organisations_content_set_access_with_time_ranges_api  [CREATE VIEW in 5.58s]
16:55:25
16:55:25  Finished running 4 view models in 0 hours 0 minutes and 13.47 seconds (13.47s).
16:55:25
16:55:25  Completed successfully
16:55:25
16:55:25  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

I think we can do that quite simply, within the dbt-postgres plugin, by tapping into Relation.relation_max_name_length(), rather than requiring dbt-redshift to reimplement every one of those macros.

@VersusFacit Code I was using to test out this fix locally:

@jtcohen6
Copy link
Contributor

Hey @tlfbrito @Limess - We included a fix for this in dbt-core + dbt-redshift v1.2.1-rc2. Would you be able to try installing the RC2 and confirm that it's now working for your projects?

pip install dbt-core==1.2.1rc2 dbt-redshift==1.2.1rc2

@Limess
Copy link

Limess commented Aug 23, 2022

We haven't seen any errors since deploying rc2 the day it was released

@jtcohen6 jtcohen6 added the adapter_caching Issues related to the adapter's relation cache label Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_caching Issues related to the adapter's relation cache bug Something isn't working regression Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
5 participants