Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework adapter.dispatch(), remove packages arg #3383

Closed
jtcohen6 opened this issue May 21, 2021 · 2 comments · Fixed by #3363
Closed

Rework adapter.dispatch(), remove packages arg #3383

jtcohen6 opened this issue May 21, 2021 · 2 comments · Fixed by #3363
Labels
enhancement New feature or request
Milestone

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 21, 2021

We've run into some issues (#3362) with statically analyzing macros, which is required to resolve v0.19.1 bugs, continue our performance work, and enable compelling features in the future.

The big blocker is that adapter.dispatch() is not-quite statically analyzable today, since the packages argument can accept a Jinja macro. We're working on a fix that will be backwards compatible for how we know this is used in the wild today, by packages such as dbt-utils, fivetran-utils, and dbt-expectations.

Namely, so long as the user sets a variable named <package_name>_dispatch_list, dbt will set packages to the equivalent of var('<package_name>_dispatch_list', []) + ['<package_name']. All open source packages I know of that lean on dispatch have followed this convention for the _get_namespaces() macro that they currently pass to the packages argument of dispatched macros.

In the meantime, we also want to change dispatch() to:

  • Safeguard how users are actually getting value from it
  • Ensure that all macro dependencies in a dbt project are statically analyzable

Proposal

The Jinja macro adapter.dispatch() should only take one argument, macro_name (a string literal). Package maintainers shouldn't need to think about what kind of vars/macros to pass into the packages arg at all.

For users, instead of setting a magic variable name (e.g. dbt_utils_dispatch_list), we should support a new config specification in dbt_project.yml:

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['spark_utils', 'dbt_utils']
  - macro_namespace: fivetran_utils
    search_order: ['my_project', 'dbt_utils', 'fivetran_utils']

When dbt goes to dispatch a macro, it uses the search_order corresponding to the macro_namespace (package) of the dispatching macro. This accomplishes the same functionality as the current packages + vars setup, but with much less confusion, and it centralizes all macro "routing" logic in the project file. (If a dispatching macro does not have a "route" set in the project dispatch config, it just searches within its own project—the same as current behavior when packages is None.)

This should also make the syntax for dispatch a little less onerous:

{% macro my_macro(field) %}
  {{ return(adapter.dispatch('my_macro')(field)) }}
{% endmacro %}

Whether the python dispatch argument still takes two arguments, with search_order passed in as packages, is an implementation detail. But in v0.20.0, we should start raising a deprecation warning if the user passes a value to the packages argument of the adapter.dispatch() Jinja macro.

And in a future version of dbt, once everyone has had a chance to upgrade to this new way of dispatching, we should remove the Jinja AST-parsing logic that #3363 is adding for backwards compatibility.

@jtcohen6 jtcohen6 added the enhancement New feature or request label May 21, 2021
@jtcohen6 jtcohen6 added this to the Margaret Mead milestone May 21, 2021
@dataders
Copy link
Contributor

I think I 90% understand where this is coming from, and I'm sure I'll get the remaining 10% as I brood on it. Thanks @jtcohen6 your strong written communication skills!

I'm also pretty certain I've seen the missing shim macro described in #3362 while debugging synapse__ macros that themselves dispatch to the corresponding sqlserver__ macro. Does this mean a missing shim macro won't throw this error anymore? 'dict object' has no attribute 'current_timestamp'

I'm also onboard with the new dispatch config. You're right that there may be dragons in assuming that the dispatch var name aligns with the project name. Def a better to lock-Side note: the disconnect b/w repo and package names for dbt-audit-helper (dbt-audit-helper vs. audit-helper) threw me for a small loop last week.

default search space

Perhaps explicit search path would be a happy path for users new to shimming to develop a set of shims for a package.

e.g. a user who wants to make a proposed change to tsql-utils could put this in their current project then put whatever sqlserver__ macro into their current macros/ dir?

config:
- macro_namespace: dbt-utils
  search_order: ['dbt_utils', 'my_project', 'tsql-utils']

before it seems that macros in the user's current project were available to the package being shimmed, right? Maybe this was never the case and I would just override the main macro rather than trying to put a shim into the macros/ dir...

implication for dispatched adapters

this question that might be more about v0.20.0 than this particular proposal. I tried to see how dbt-redshift's new inheritance behavior and couldn't find it...

with this proposal, would all projects, using the dbt-synapse adapter need to add a dispatch config like this?

config:
- macro_namespace: dbt-sqlserver
  search_order: ['spark_utils', 'dbt_utils']

@jtcohen6
Copy link
Contributor Author

Thanks for the quick read + feedback @swanderz!

Does this mean a missing shim macro won't throw this error anymore? 'dict object' has no attribute 'current_timestamp'

I sure hope not! That's an ugly error. I'm not positive what's causing it, but I very much hope that this change will simplify the experience of building and testing shim (and shim-able) packages.

e.g. a user who wants to make a proposed change to tsql-utils could put this in their current project then put whatever sqlserver__ macro into their current macros/ dir?

That's right. You can just as easily shim dbt_utils with macros in your own root project macros/ folder, as you can with macros coming from an installed compatibility package. This is a great way of achieving the desired functionality, before potentially upstreaming it to an open source shim package.

As far as adapter inheritance in v0.20.0: This is built-in, and not something a user needs to specify in their dbt_project.yml at all. Already, dbt searches for macros based on two criteria: adapter and package. Simply by using dbt-redshift, dbt will now know to look for macros prefixed redshift__, postgres__, and default__ (in that order).

So in general, I think a config like this makes a lot of sense:

dispatch:
  - macro_namespace: dbt_utils
    search_order: ['my_project', 'tsql_utils', 'dbt_utils']

That is, when dispatching dbt_utils.some_macro on dbt-synapse (which inherits from dbt-sqlserver), dbt would look for macros in the following order:

  1. my_project.synapse__some_macro
  2. my_project.sqlserver__some_macro
  3. my_project.default__some_macro
  4. tsql_utils.synapse__some_macro
  5. tsql_utils.sqlserver__some_macro
  6. tsql_utils.default__some_macro
  7. dbt_utils.synapse__some_macro
  8. dbt_utils.sqlserver__some_macro
  9. dbt_utils.default__some_macro

The first one it finds is the one it will use. To put the same logic in English:

  • Look in my own project first, to see if I've overridden any macros
  • Then look in tsql_utils, preferring the synapse__ version if available, sqlserver__ otherwise
  • Finally, if none of the above is found, use the default__ version in dbt_utils

Question: Should we name search_order something different in this proposed config, given that it's only search order of packages (user-controllable), and doesn't account for search order of adapters (built in)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants