Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

support cross_db_utils macros with materialize specific macros #2

Open
rjnn opened this issue Dec 23, 2020 · 4 comments
Open

support cross_db_utils macros with materialize specific macros #2

rjnn opened this issue Dec 23, 2020 · 4 comments
Assignees

Comments

@rjnn
Copy link

rjnn commented Dec 23, 2020

in dbt_utils, dbt has definitions to translate between the various syntaxes of the various databases - postgres, snowflake, bigquery, etc. so that when moving a defined model from e.g. bigquery to e.g. snowflake, users get the same behavior.

For instance, every database defines EXCEPT one way, but bigquery gives you something else if you say EXCEPT, and gives you the same behavior when you say EXCEPT DISINCT. Therefore, dbt has a set of macros so that users get the same behavior. These macros are listed here:
https://github.com/fishtown-analytics/dbt-utils/tree/master/macros/cross_db_utils
Under except.sql, you will see the pattern match that replaces except with except distinct if the database is bigquery.

We also need to build a similar set of macros for Materialize. There already are macros for postgres, and it may be that we just need to copy them. Or it may be that we need to do them, plus a few more. It also turns out that we don't need to upstream our behavior to the main repo, we can simply add our own macros in our adapter (I don't know where, but @jthandy indicated that there was a place).

Assigned to @JLDLaughlin for triage.

@rjnn rjnn changed the title support cross_db_utils macros with materialize specific support cross_db_utils macros with materialize specific macros Dec 23, 2020
@jtcohen6
Copy link

Current

It also turns out that we don't need to upstream our behavior to the main repo, we can simply add our own macros in our adapter (I don't know where, but @jthandy indicated that there was a place).

dbt v0.18.0 introduced adapter.dispatch (docs), a mechanism by which, given a set of search namespaces (installed packages + root project), dbt can find the best implementation of a given macro for the current adapter.

In effect, this enables "shim" packages to extend + override packages such as dbt-utils without needing to upstream their additions. Check out:

For example, let's say a dbt-spark user wants to use dbt_utils.date_spine in a model in their project. The date_spine macro depends on dbt_utils.datediff; default__datediff does not work for SparkSQL. Rather than needing to copy-paste-rewrite the entire macro, however, the user can:

  • Install dbt_utils + spark_utils in packages.yml
  • Add a quick "override" config to dbt_project.yml:
vars:
  dbt_utils_dispatch_list: ['spark_utils']

When dbt goes to dispatch dbt_utils.datediff, it will find spark_utils.spark__datediff as the best candidate and use that instead. As a result, the original date_spine macro "just works" on Spark.

For the time being, your best bet here may be to create a materialize-utils package that extends/shims dbt-utils in this fashion. If there are dbt_utils macros where dbt-materialize could simply use the postgres__ implementation instead of the default__ implementation, you could make that redirect explicit. E.g.:

{% macro materialize__datediff() %}
  {{ return(dbt_utils.postgres__datediff(first_date, second_date, datepart)) }}
{% endmacro %}

As long as users install materialize-utils and add the dbt_utils_dispatch_list: ['materialize-utils'] to their project config, the overridden dbt_utils macros will "just work."

Future

  1. In Inherit from dbt-postgres #4, I propose that dbt-materialize should explicitly depend on dbt-postgres. (It already does, implicitly.)
  2. There's also an open issue around adapter.dispatch (Use all adapter names for adapter.dispatch dbt-labs/dbt-core#2923) to make the behavior I described above happen implicitly / automatically.

In a world with both of these changes, dbt would dispatch macro implementations in the following order:

first_package.materialize__my_macro
first_package.postgres__my_macro
first_package.default__my_macro
second_package.materialize__my_macro
second_package.postgres__my_macro
second_package.default__my_macro
...

@JLDLaughlin
Copy link
Contributor

Thank you for your guidance @jtcohen6! I just created a new materialize-utils repo.

Quick question for you - should materialize-utils only contain overrides for dbt-utils macros? meaning that all other overrides should be in dbt-materialize? (ie: materialize__create_view_as)

@JLDLaughlin
Copy link
Contributor

Also for you @jtcohen6 - is there anything on dbt's end that expects our utils package to be named exactly materialize-utils? Or is it possible for us to rename it to materialize-dbt-utils?

@jtcohen6
Copy link

You could name it whatever you'd like!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants