Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty transient tables #64

Closed
timothyjang123 opened this issue Jan 22, 2022 · 6 comments · Fixed by #107
Closed

Empty transient tables #64

timothyjang123 opened this issue Jan 22, 2022 · 6 comments · Fixed by #107

Comments

@timothyjang123
Copy link

timothyjang123 commented Jan 22, 2022

I'm noticing some undesirable behavior for the incremental population of the transient tables.

Let's assume fct_dbt__test_executions doesn't get populated on a dbt run dbt_artifacts run, but gets created as a table with 0 records. Any subsequent run with how the incremental filter is setup like below is blocked from populating the table (which can easily happen if we only do model runs, without tests):

where artifact_generated_at > (select max(artifact_generated_at) from {{ this }})

The above will always be false since a null will be returned from the transient table. A fix may be as simple as adding another condition to is_incremental like below:

    {% if is_incremental() and var_row_count > 0 %}
    -- this filter will only be applied on an incremental run
    where artifact_generated_at > (select max(artifact_generated_at) from {{ this }})
    {% endif %}

Where var_row_count is just a count of {{ this }}. Might be overthinking this.. but it's strange when the incrementally populated table is empty while the view counterpart returns the current test executions.

Any suggestions or insights?

@davidshapir
Copy link

We are seeing the same issue and just arrived at this root cause today. We were thinking along the same lines of adjusting the incremental condition. Has that worked for y'all?

@timothyjang123
Copy link
Author

@davidshapir I haven't had a chance to test this but am hoping to later this week - will let you know!

@davidshapir
Copy link

davidshapir commented Jan 27, 2022

@timothyjang123 - Sounds good! Can you share whatever macro or other operation you use in var_row_count if it ends up working?

@tommyh
Copy link

tommyh commented Jan 27, 2022

@timothyjang123 - I'm curious if making the template dynamic with var_row_count might suffer from the "parse time" vs "run time" issue explained in this dbt article. I was wondering if this could be done in SQL with something like:

where artifact_generated_at > (select max(artifact_generated_at, '1970-01-01 00:00:00 +0000') from {{ this }})

Note: this code is 100% untested, but I will try to look into this today. :)

context: I'm @davidshapir's coworker.

@timothyjang123
Copy link
Author

Hmm I think I see what you're saying - I haven't seen that article before and I'd have to test out the differences between dbt compile/run to see how the issue would work with the change mentioned. From what I read, I think it might have unexpected behavior (strange logs) but during execution it could work as expected - partly because the dependent view is just a saved query so it wouldn't have that same ref() issue mentioned in the article.

where artifact_generated_at > (select ifnull(max(artifact_generated_at), '1970-01-01 00:00:00 +0000') from {{ this }})

Regardless, I think the above is more precise for what we're trying to do - thanks @tommyh! FYI, I modified the above a bit for my own use in Snowflake.

@tommyh
Copy link

tommyh commented Jan 27, 2022

@timothyjang123 - the edgecase to test with var_row_count approach would be: when the fct_dbt__test_executions does have rows in it, is the var_row_count variable greater than 0? Because, if I'm reading that blog article correctly which I totally might not be, then var_row_count will be 0 during "parse" time when the model gets compiled.

ifnull - yes, that looks great!

tommyh added a commit to incrediblehealth/dbt_artifacts that referenced this issue Jan 27, 2022
If the incremental table is empty, ie: `dim_dbt__models`, then `max(artifact-generated_at)` will be `NULL`, which means the `dbt_models_incremental` CTE will return 0 rows.

brooklyn-data#64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants