Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1090] [Bug] dbt.type_timestamp() is not working using dbt 1.2.1 #5720

Closed
2 tasks done
clausherther opened this issue Aug 26, 2022 · 2 comments · Fixed by #5907
Closed
2 tasks done

[CT-1090] [Bug] dbt.type_timestamp() is not working using dbt 1.2.1 #5720

clausherther opened this issue Aug 26, 2022 · 2 comments · Fixed by #5907
Labels
bug Something isn't working packages Functionality for interacting with installed packages

Comments

@clausherther
Copy link
Contributor

clausherther commented Aug 26, 2022

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Using dbt.type_timestamp() I get,
dict object' has no attribute 'type_timestamp
but it works using type_timestamp(). dbt.<macro>() works for others though.

Expected Behavior

I expected dbt.type_timestamp() to work flawlessly :smi

Steps To Reproduce

Using 1.2.1, try

select
    cast(my_column as {{ dbt.type_timestamp() }}) 
from
    my_model

Relevant log output

'dict object' has no attribute 'type_timestamp'

Environment

- OS: Mac OS 12.51
- Python: 3.9
- dbt: 1.2.1

Which database adapter are you using with dbt?

postgres, snowflake, bigquery

Additional Context

No response

@clausherther clausherther added bug Something isn't working triage labels Aug 26, 2022
@github-actions github-actions bot changed the title [Bug] dbt.type_timestamp() is not working using dbt 1.2.1 [CT-1090] [Bug] dbt.type_timestamp() is not working using dbt 1.2.1 Aug 26, 2022
@dbeatty10 dbeatty10 self-assigned this Aug 26, 2022
@jtcohen6
Copy link
Contributor

In order to reproduce this:

  • Custom generic test definition that references a dbt.* macro that was moved from dbt_utils
  • A version of the dbt_utils package installed with backwards compatibility (= same macro in the dbt_utils namespace, dispatching into the dbt namespace)

Why generic tests? For parsing speed, dbt constructs slimmer rendering contexts when parsing generic tests. It attempts to include just the macros that are called (depended on by) the generic test definition... and any macros those macros depend on, and so on. The dict object' has no attribute 'type_timestamp' error really means, for whatever reason, dbt isn't capturing that dependency, and including type_timestamp as a macro within the dbt namespace when parsing that generic test. (The dbt macro namespace is treating as a dictionary during macro resolution.)

If the package is installed, for whatever reason, dbt is resolving {{ type_timestamp() }} to point to dbt_utils.type_timestamp instead of dbt.type_timestamp. I'm not sure why—in general, only root project macros should get that kind of "global" priority; macros from packages should not—but in any case, dbt_utils.type_timestamp then redirects to dbt.type_timestamp, and all is well with the world.

In any case, it's possible to avoid this error by:

  • referencing it just as type_timestamp in the generic test definition (or removing the test / the macro call entirely)
  • uninstalling dbt_utils :)

In an ideal world, our macro dispatching/resolution order wouldn't yield this surprising outcome! I think the code involved here is pretty gnarly. IIRC it's around here:

def add_macro(self, macro: ParsedMacro, ctx: Dict[str, Any]):
macro_name: str = macro.name
# MacroGenerator is in clients/jinja.py
# a MacroGenerator object is a callable object that will
# execute the MacroGenerator.__call__ function
macro_func: MacroGenerator = MacroGenerator(macro, ctx, self.node, self.thread_ctx)
# internal macros (from plugins) will be processed separately from
# project macros, so store them in a different place
if macro.package_name in self.internal_package_names:
self._add_macro_to(self.internal_packages, macro, macro_func)
else:
# if it's not an internal package
self._add_macro_to(self.packages, macro, macro_func)
# add to locals if it's the package this node is in
if macro.package_name == self.search_package:
self.locals[macro_name] = macro_func
# add to globals if it's in the root package
elif macro.package_name == self.root_package:
self.globals[macro_name] = macro_func

Workaround: Never use a dbt.* prefix ... for macros migrated from a package while also having a backwards-compatible of that package installed.

In a weird way, this issue (in its narrowest sense) will be fixed when dbt_utils removes that backwards compatibility. But we do risk running into this again in the future when we perform other package → core macro migrations.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 22, 2022

Taking another look here, since we might want to consider recommending dbt.* as the go-forward standard, instead of calling utility macros out of the global namespace, and this bug is a big scary thing standing in our way. (cc @joellabes @dbeatty10)

Repro case

-- macros/some_test.sql
{% test some_test(model) %}
  select cast(current_timestamp as {{ dbt.type_timestamp() }})
  limit 0
{% endtest %}
-- models/some_model.sql
select 1 as id
# models/config.yml
version: 2
models:
  - name: my_model
    tests:
      - some_test
# packages.yml
packages:
  - package: dbt-labs/dbt_utils
    version: 0.9.2
# note that removing this, removes the bug!
$ dbt deps
$ dbt parse
...
dbt.exceptions.CompilationException: Compilation Error in test some_test_my_model_ (models/config.yml)
  'dict object' has no attribute 'type_timestamp'

  > in macro test_some_test (macros/custom_test.sql)
  > called by test some_test_my_model_ (models/config.yml)

Big idea, restated

We build a special context for generic test parsing that contains only the macros we think are necessary. That happens in these funky methods:

# this overrides _build_namespace in ManifestContext which provides a
# complete namespace of all macros to only specify macros in the depends_on
# This only provides a namespace with macros in the test node
# 'depends_on.macros' by using the TestMacroNamespace
def _build_test_namespace(self):
depends_on_macros = []
# all generic tests use a macro named 'get_where_subquery' to wrap 'model' arg
# see generic_test_builders.build_model_str
get_where_subquery = self.macro_resolver.macros_by_name.get("get_where_subquery")
if get_where_subquery:
depends_on_macros.append(get_where_subquery.unique_id)
if self.model.depends_on and self.model.depends_on.macros:
depends_on_macros.extend(self.model.depends_on.macros)
lookup_macros = depends_on_macros.copy()
for macro_unique_id in lookup_macros:
lookup_macro = self.macro_resolver.macros.get(macro_unique_id)
if lookup_macro:
depends_on_macros.extend(lookup_macro.depends_on.macros)
macro_namespace = TestMacroNamespace(
self.macro_resolver, self._ctx, self.model, self.thread_ctx, depends_on_macros
)
self.namespace = macro_namespace

# Currently this is just used by test processing in the schema
# parser (in connection with the MacroResolver). Future work
# will extend the use of these classes to other parsing areas.
# One of the features of this class compared to the MacroNamespace
# is that you can limit the number of macros provided to the
# context dictionary in the 'to_dict' manifest method.
class TestMacroNamespace:
def __init__(self, macro_resolver, ctx, node, thread_ctx, depends_on_macros):
self.macro_resolver = macro_resolver
self.ctx = ctx
self.node = node # can be none
self.thread_ctx = thread_ctx
self.local_namespace = {}
self.project_namespace = {}
if depends_on_macros:
dep_macros = []
self.recursively_get_depends_on_macros(depends_on_macros, dep_macros)
for macro_unique_id in dep_macros:
if macro_unique_id in self.macro_resolver.macros:
# Split up the macro unique_id to get the project_name
(_, project_name, macro_name) = macro_unique_id.split(".")
# Save the plain macro_name in the local_namespace
macro = self.macro_resolver.macros[macro_unique_id]
macro_gen = MacroGenerator(
macro,
self.ctx,
self.node,
self.thread_ctx,
)
self.local_namespace[macro_name] = macro_gen
# We also need the two part macro name
if project_name not in self.project_namespace:
self.project_namespace[project_name] = {}
self.project_namespace[project_name][macro_name] = macro_gen
def recursively_get_depends_on_macros(self, depends_on_macros, dep_macros):
for macro_unique_id in depends_on_macros:
if macro_unique_id in dep_macros:
continue
dep_macros.append(macro_unique_id)
if macro_unique_id in self.macro_resolver.macros:
macro = self.macro_resolver.macros[macro_unique_id]
if macro.depends_on.macros:
self.recursively_get_depends_on_macros(macro.depends_on.macros, dep_macros)

The problem: When dbt_utils is installed, dbt thinks that the generic test macro depends on dbt_utils.type_timestamp, instead of dbt.type_timestamp. So dbt_utils.type_timestamp shows up in the special test-parsing rendering context, whereas dbt.type_timestamp is missing, leading to the awkward error. (Yes, this error should also probably be better!)

This is the spot where we're saying that some_test depends on dbt_utils.type_timestamp, instead of dbt.type_timestamp:

dep_macro_id = self.macro_resolver.get_macro_id(package_name, macro_name)

ipdb> possible_macro_calls
['dbt.type_timestamp']
ipdb> dep_macro_id
'macro.dbt_utils.type_timestamp'

Why? Because our macro resolution logic, which has special logic in case a specific package namespace has been specified, does not use said logic if the specified package namespace is an "internal" namespace (dbt):

def get_macro(self, local_package, macro_name):
local_package_macros = {}
if local_package not in self.internal_package_names and local_package in self.packages:
local_package_macros = self.packages[local_package]
# First: search the local packages for this macro
if macro_name in local_package_macros:
return local_package_macros[macro_name]
# Now look up in the standard search order
if macro_name in self.macros_by_name:
return self.macros_by_name[macro_name]
return None

Precise fix: Add logic to get_macro that prefers macros defined in the dbt namespace, if a macro has been called as dbt.some_macro. This... makes sense to me!
Blunt fix (risks a performance hit): Just always include all macros from the dbt namespace in namespace for generic test parsing

The precise fix could maybe be a breaking change for people who have been depending on funky/undocumented behavior around resolution order for global macros — namely, that my root project's create_table_as() will always "win" whenever create_table_as() or dbt.create_table_as() is called. For the latter case, dispatch makes it possible to do this more clearly and explicitly, and I'm inclined to say it's the direction we should head in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working packages Functionality for interacting with installed packages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants