Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-682] [Feature] Overriding temp table name #5291

Closed
1 task done
yoavo-datricks opened this issue May 23, 2022 · 5 comments
Closed
1 task done

[CT-682] [Feature] Overriding temp table name #5291

yoavo-datricks opened this issue May 23, 2022 · 5 comments
Labels
enhancement New feature or request Team:Adapters Issues designated for the adapter area of the code wontfix Not a bug or out of scope for dbt-core

Comments

@yoavo-datricks
Copy link

Is this your first time opening an issue?

Describe the Feature

We are executing DBT models via RPC with an external orchestrator.
We need to execute the same model multiple times in a parallel - each would create a different table with an alias.
However, since the temp tables name is determined via the model's name and not the alias the different executions crash.
Is it possible to add a way to manipulate the temp table's name at the model scope or make it based on the alias by the default?

Describe alternatives you've considered

I can execute the models sequentially but that would take much more time.

Who will this benefit?

Everyone who is interested in executing the same model multiple times in parallel and assuring different names via aliases.

Are you interested in contributing this feature?

No response

Anything else?

No response

@yoavo-datricks yoavo-datricks added enhancement New feature or request triage labels May 23, 2022
@github-actions github-actions bot changed the title [Feature] Overriding temp table name [CT-682] [Feature] Overriding temp table name May 23, 2022
@jtcohen6 jtcohen6 added Team:Adapters Issues designated for the adapter area of the code and removed triage Team:Execution labels May 31, 2022
@jtcohen6
Copy link
Contributor

Hey @yoavo-datricks, thanks for opening!

We need to execute the same model multiple times in a parallel - each would create a different table with an alias.

Could you tell me more about your use case here? I have some guesses, but it would be worth understanding further, since we generally say that running the same dbt model concurrently is an anti-pattern.

That being said, we did just do a bunch of work around cleaning up and consolidating the logic for temp relations within materializations (#4921, #5221). I have a hunch that this might be resolved by that work: we're now using a suffixed version of {{ this.identifier }} for the temp table, which should account for the model alias, rather than the model name.

Any chance you could try installing dbt-core from the main branch, to see if the issue is resolved?

@yoavo-datricks
Copy link
Author

yoavo-datricks commented Jun 2, 2022

Hi @jtcohen6. After having multi models almost identical to each other, we were able to create a more generic model based on parameters and aliases. However, we use airflow to execute the models and are interested in having them created concurrently, mostly because of time consideration. We will test the main branch in the coming days and let you know.
Thank you for the quick response.

@meisam-napster
Copy link

#2881 - this is similar to a ticket that was closed. We're also using airflow and this causes problems with backfilling a table, currently, we have to run the incremental model within airflow with 1 concurrency

@jtcohen6
Copy link
Contributor

jtcohen6 commented Nov 15, 2022

@meisam-napster As stated above, running the same dbt model multiple times concurrently is an anti-pattern and officially unsupported. That said, based on the refactoring work we did in v1.2, it should be simple enough to reimplement the make_intermediate_relation and make_temp_relation macros in your own project, and use (e.g.) the current timestamp to create disambiguated identifiers.

I'm going to close this issue in the meantime.

@jtcohen6 jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Nov 15, 2022
@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2022
@meisam-napster
Copy link

meisam-napster commented Nov 15, 2022

@jtcohen6 - So I'd like to give a bit more clarity on the use case here / reasons for wanting this, then we can call it off as anti-pattern or maybe you have another approach...

By default and what's been recommended by DBT is that an incremental model on its first run, will do a full-refresh and on subsequent runs it will run daily, one execution at a time per day. Which is great and makes sense if you're dealing with a small amount of data.

However, problems arise when you're dealing with a lot more data to transform in BQ and the full-refresh which will run on the entire underlying table's data will exhaust resources / take too long and not complete.

Then instead of the first run being the full-refresh "full historical" one, we need to run day by day to backfill all the historical data. So far we've managed to achieve this, by doing something like this in our model:

    {% if is_incremental() %} -- applies on incremental runs
      AND ingestion_date = '{{ dbt_airflow_macros.ds() }}'
    {% else %} -- added this condition so that the first time an incremental model is run or a full-refresh is invoked, it doesn't attempt to run on ALL historical data, instead on the execution date it will create the table
      AND ingestion_date = '{{ dbt_airflow_macros.ds() }}'
    {% endif %}

Now this works fine, but we cannot as stated by others above and myself run multiple concurrent runs of these, even though each run would be a new execution date, which produces different (based on the day) but idempotent data. To call this "anti-pattern" is confusing.

We don't want to do anything "anti-pattern" we want to do what's recommended here, but if this is being deemed as "anti-pattern" what is the actual solution to this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Adapters Issues designated for the adapter area of the code wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

4 participants