Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post hooks that call macros get parsed with execute = False #2370

Closed
clrcrl opened this issue Apr 28, 2020 · 7 comments
Closed

Post hooks that call macros get parsed with execute = False #2370

clrcrl opened this issue Apr 28, 2020 · 7 comments
Labels
bug stale

Comments

@clrcrl
Copy link
Contributor

@clrcrl clrcrl commented Apr 28, 2020

Describe the bug

Easier to read the steps to reproduce馃憞:

Steps to reproduce

  1. Add the following (bad) macro to your project:
$ cat macros/post_hook_with_execute.sql
{% macro post_hook_with_execute() %}
    {% if execute %}
        select flibbity
    {% else %}
        select floo
    {% endif %}

{% endmacro %}
  1. Call it as a post hook:
$ cat models/my_model.sql
{{ config(
    post_hook=post_hook_with_execute()
) }}

select 1
  1. Run the model
$ dbt run -m my_model -t dev_redshift
Running with dbt=0.16.1
Found 10 models, 28 tests, 0 snapshots, 0 analyses, 150 macros, 0 operations, 0 seed files, 3 sources

17:46:00 | Concurrency: 1 threads (target='dev_redshift')
17:46:00 |
17:46:00 | 1 of 1 START table model dbt_claire.my_model......................... [RUN]
17:46:01 | 1 of 1 ERROR creating table model dbt_claire.my_model................ [ERROR in 0.63s]
17:46:01 |
17:46:01 | Finished running 1 table model in 2.50s.

Completed with 1 error and 0 warnings:

Database Error in model my_model (models/my_model.sql)
  column "floo" does not exist
  compiled SQL at target/run/jaffle_shop/my_model.sql

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Expected behavior

This macro is expected to fail, but it should fail because of column "flibbity" does not exist.

Actual behavior

The macro fails on the else condition, "floo"

Why this is important

There's legit reasons why you'd want to use the if execute statement here 鈥 for example, if you need to run an introspective query to generate your post-hook SQL.

This came up because @willwnekowicz was using the following macro that generates "comment on" statements in Postgres (also works on Redshift)

$ cat macros/wills_macro.sql
{% macro get_source_columns_comments(source_table, current_table) %}
    {% set source_columns_comments_query %}
        SELECT c.column_name,pgd.description
        FROM pg_catalog.pg_statio_all_tables as st
          inner join pg_catalog.pg_description pgd on (pgd.objoid=st.relid)
          inner join information_schema.columns c on (pgd.objsubid=c.ordinal_position
            and  c.table_schema=st.schemaname and c.table_name=st.relname)
        WHERE table_name='{{ source_table }}';
    {% endset %}
    {% set results = run_query(source_columns_comments_query) %}
    {% if execute %}
        {% set result_rows = results.rows %}
    {% else %}
        {% set result_rows = [] %}
    {% endif %}
    {% for result_row in result_rows %}
        comment on column {{ current_table }}.{{ result_row[0] }} is E'{{ result_row[1] }}';
    {% endfor %}
{% endmacro %}

$ cat models/staging/jaffle_shop/stg_jaffle_shop__customers.sql
{% set columns_comments = get_source_columns_comments('customers', this) %}
{{ log(columns_comments, info=True) }}
{{config(
    post_hook=columns_comments
)}}
with source as (

    select * from {{ source('raw_jaffle_shop', 'customers') }}

),

renamed as (

    select
        id,
        first_name,
        last_name,
        email

    from source

)

select * from renamed

System information

Which database are you using dbt with?
n/a:dbt issue

The output of dbt --version:
0.16.1

@clrcrl clrcrl changed the title Post hooks that called macros get parsed with execute = Falsw Post hooks that call macros get parsed with execute = Falsw Apr 28, 2020
@clrcrl clrcrl changed the title Post hooks that call macros get parsed with execute = Falsw Post hooks that call macros get parsed with execute = False Apr 28, 2020
@beckjake
Copy link
Contributor

@beckjake beckjake commented Apr 28, 2020

The problem here is that calls to config() are evaluated at parse time (this is how things like the database parameter can work!). When you do {{ config(post_hook=post_hook_with_execute()) }}, what happens (skipping over the uninteresting steps) is:

  • when the model is being parsed (execute=False!), the jinja config() call happens and the string result of post_hook_with_execute() is stored as a result. In this case, that's 'select floo'
  • When the model is evaluated again with execute=True, and post_hook_with_execute() returns select flibbity, the call to config() is ignored. This model evaluation returns the sql value that a materialization will have access to.
  • In the materialization context, dbt renders each hook's sql (from within the materialization context!) and runs the SQL of that hook.

Anyway, tl;dr: Hooks are the special exception to don't nest your curlies. Changing your example to:

{{ config(
    post_hook=" {{ post_hook_with_execute() }}",
) }}

select 1

should fix it.

@willwnekowicz
Copy link

@willwnekowicz willwnekowicz commented Apr 29, 2020

Reposting from slack...

That flow definitely makes sense.
This solutions looks promising but doesn鈥檛 work with variable arguments. source_table_name doesn鈥檛 get passed into the macro in this version:

post_hook='{{ get_source_columns_comments(source_table_name, "all_deals") }}'

and if I try nesting the curlies on that:

post_hook='{{ get_source_columns_comments({{ source_table_name }}, "all_deals") }}'

.. I get:

expected token ':', got '}'

Though, hardcoding the arguments works:

post_hook='{{ get_source_columns_comments("deals_log", "all_deals") }}'

鈥nd that鈥檚 a good enough workaround for me!

@drewbanin drewbanin removed the triage label Apr 29, 2020
@drewbanin
Copy link
Contributor

@drewbanin drewbanin commented Apr 29, 2020

@beckjake my thinking here is increasingly that we shouldn't store info like this at parse-time, and we'd instead be better served by re-capturing config calls at runtime. We can definitely still capture refs in hooks at parse-time, and I imagine we'll want to capture materialization configs (as ephemeral configs have nonlocal effects on other models) too.

Beyond that though, I don't think there's a good reason why we couldn't capture the args provided to config() at runtime and use those config values when executing a materialization. Is there some big consideration I'm not thinking of that would make this really challenging?

@ramtej23
Copy link

@ramtej23 ramtej23 commented Jun 9, 2021

Can we include multiple pre_hook & post_hook inside a model in dbt?

@jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Jun 9, 2021

@ramtej23 Absolutely! Both pre_hook and post_hook can take a list of hooks as their argument. Check out the docs.

@ramtej23
Copy link

@ramtej23 ramtej23 commented Jun 9, 2021

@jtcohen6 Thank you for the reply. What exactly my question was in a model while defining the config blocks in that we will include pre_hook/post_hook. But it is not accepting the multiple pre_hook/post_hook in a single config block. Is there any way to define the pre_hook/post_hook outside the config block.
Thanks.

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Dec 7, 2021

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug stale
Projects
None yet
Development

No branches or pull requests

6 participants