Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-112] Better UX for macro dispatch #4646

Closed
jtcohen6 opened this issue Jan 31, 2022 · 8 comments
Closed

[CT-112] Better UX for macro dispatch #4646

jtcohen6 opened this issue Jan 31, 2022 · 8 comments
Labels
enhancement New feature or request stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 31, 2022

Big idea

We're moving, slowly but surely, in the direction of sensible defaults for macro dispatch. Now that we've established conventions, let's find ways to make it easier/quicker to write.

Take dbt_utils.hash for example. This has to include a lot of supporting code to do something that's ultimately pretty simple:

{% macro hash(field) -%}
  {{ return(adapter.dispatch('hash', 'dbt_utils') (field)) }}
{%- endmacro %}

{% macro default__hash(field) -%}
    md5(cast({{field}} as {{dbt_utils.type_string()}}))
{%- endmacro %}

{% macro bigquery__hash(field) -%}
    to_hex({{dbt_utils.default__hash(field)}})
{%- endmacro %}

What are the problems here?

  • The first macro, which dispatches a macro by its own name, is redundant / boilerplate. Could we support dispatch for all macros, and make this the default behavior?
  • The need to specify the default__ adapter decorator is unintuitive for folks who are looking to take advantage of dispatch's package override capabilities, and who might be unfamiliar with its history as (primarily) a way to support cross-database macros

Imagine the same macro, like:

{% macro hash(field) -%}
    md5(cast({{field}} as {{dbt_utils.type_string()}}))
{%- endmacro %}

{% macro bigquery__hash(field) -%}
    to_hex({{dbt_utils.default__hash(field)}})
{%- endmacro %}

The implied (default) behavior is that a macro named hash, in the package named dbt_utils:

  • Is going to search for macros named hash in the dbt_utils namespace
  • Is itself the default__ implementation of hash

The prime beneficiaries of this change would be package maintainers (+ us, as global_project maintainers).

There's two additional considerations we need to make, for "special kinds of macros": materializations and generic tests.

Materializations

Adapter-specific materializations work differently from adapter-specific macros. Namely:

Could we make materialization macro generation / discovery work more like dispatch? We'd need to offer backwards compatibility / avoiding breaking changes for folks who currently rely on the materialization_<name>_<adapter> construction.

One other thing: I actually like the syntax for defining adapter-specific materializations. (It's important to have the adapter__-prefixed version available, in order to call from other macros.) As part of that reconciliation, could we support this for other macros, too? Same example as above:

{% macro hash(field) -%}
    md5(cast({{field}} as {{dbt_utils.type_string()}}))
{%- endmacro %}

{% macro hash(field), adapter = 'bigquery' -%}
    to_hex({{dbt_utils.default__hash(field)}})
{%- endmacro %}

Better, no?

And then even:

macros:
    # this is what we'd actually want
  - name: hash 
    description: "Hash macro"

  - name: bigquery__hash   # don't love this
    description: "Special note about the BigQuery implementation"

  - name: hash
    adapter: bigquery   # maybe this?? do we need it?
    description: "Special note about the BigQuery implementation"

Generic tests

Tests aren't dispatch-able at all today, so it's necessary to do something like:

{% test at_least_one(model, column_name) %}
  {{ return(adapter.dispatch('test_at_least_one', 'dbt_utils')(model, column_name)) }}
{% endtest %}

{% macro default__test_at_least_one(model, column_name) %}

If we pursue either/both of the ideas presented above, I think we'd be in good shape for tests, too:

  • If we made the changes proposed above—dispatch was the implied/default behavior for all macros, and it was consistent for "special macros"—then creating a macro named bigquery__test_at_least_one should "just work"
  • Now imagine that, but with the nicer syntax of {% test at_least_one(model, column_name), adapter = 'bigquery' %}
@jtcohen6 jtcohen6 added enhancement New feature or request Team:Language Team:Adapters Issues designated for the adapter area of the code labels Jan 31, 2022
@github-actions github-actions bot changed the title Better UX for macro dispatch [CT-112] Better UX for macro dispatch Jan 31, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jul 31, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2022

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@github-actions github-actions bot closed this as completed Aug 8, 2022
@jtcohen6
Copy link
Contributor Author

I'm going to un-stale this, I'd still like to do it someday :)

@jtcohen6 jtcohen6 reopened this Sep 13, 2022
@jtcohen6 jtcohen6 removed the stale Issues that have gone stale label Sep 13, 2022
@jeremyyeo
Copy link
Contributor

Following this thread - some global_project macros are not dispatched today1 which could be confusing to some end users when they try and follow our docs on "Overriding global macros" (https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch#overriding-global-macros).

Footnotes

  1. Like should_store_failures()

@dataders
Copy link
Contributor

  1. Like should_store_failures()

You raise a great point. Virtually all of my PRs to dbt-core are to do exactly this. However, we don't normally see a need for this macro to be overridden, as there's not actually SQL code in there.

  1. Can you explain the user's usecase for wanting to override it?
  2. Does the user know they can just put a macro with the same name as the one they want to override within a .sql file in their macros/ dir?

@jeremyyeo
Copy link
Contributor

TL;DR: All macros should be dispatch for the sake of consistency - I think this makes it "better UX".

@dataders - My 2c is that it's irrelevant what the use-case is for wanting to override a global_project macro (perhaps for argument sake we can say the end user wants to add some var('STORE_OR_NOT') or something to this here macro in order to control it's behaviour).

I shared this internally:

image

Just checking on the behaviour of dispatching in dbt-core. Going by docs - all (?) built-in macro behaviour should be overridable by importing them from packages yes? I did find some instances where this isn't true, for example should_store_failures() does not seem to have adapter.dispatch in core - this means that to import the behaviour, you need to add a dispatch macro into your main dbt project (or simply return shared_lib.should_store_failures() I suppose). For generate_x_name you don't need to do "extra work" in your main dbt project because dbt-core has those as dispatched.


Does the user know they can just put a macro with the same name as the one they want to override within a .sql file in their macros/ dir?

Yea 100%. In this scenario, a team leader wants to write a "shared_lib" with overriden "core macros" like generate_x_name(), ref(), should_store_failures(), etc so that all other projects follow some certain pattern. This shared_lib dbt package is imported into many other dbt projects via the packages.yml file.

Now as the end user of the "shared_lib" package, one has to remember that... "oh for should_store_failures, I need to remember to add this dispatch thing into my own dbt project but I don't need to do that for generate_x_name overrides" - which is probably unnecessary cognitive burden.

So at the end of it all, I think all core macros should have the same pattern wrt overriding them in root projects without needing to think of when you'd need to also add some dispatching code in your root project.

Although Jerco has mentioned "special kinds of macros" - so perhaps differentiating between macros that are dispatched and not dispatched by default is appropriate? I'm not sure what category to put this should_store_failures macro but just for simplicity I'm voting for making all macros the same way wrt overriding :P

sdebruyn added a commit to microsoft/dbt-synapse that referenced this issue Sep 24, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Mar 21, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

3 participants