Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on running on parallel #3

Open
jrpereirajr opened this issue Jan 19, 2024 · 18 comments
Open

Error on running on parallel #3

jrpereirajr opened this issue Jan 19, 2024 · 18 comments

Comments

@jrpereirajr
Copy link

jrpereirajr commented Jan 19, 2024

Hi!

I have set up a Proof of Concept (PoC) that calls the dbt run command after ingesting some data. The ingestion process is executed using an IRIS Interoperability Production. Since this production invokes its hosts asynchronously, multiple dbt run commands are issued concurrently.

To provide a visual representation of this production, please refer to the following image:

image

In order to increase throughput, I have adjusted the pool size of the host that issues the dbt run command to allow concurrent calls. However, after making this adjustment, dbt-iris started logging error messages like the following:

<Table 'customer1_dbt.stg__ordem_servico__dbt_tmp' already exists>

I think the cause is dbt-iris always try to create a new table with the same name, because there are many dbt process running in parallel due serveral interoperability BOs running in parallel as well.

I would like to know if this indicates a misuse of dbt and dbt-iris or if there is an underlying issue.

Thanks!

@bdeboe
Copy link
Contributor

bdeboe commented Jan 19, 2024

Hi @jrpereirajr !

can you share a little more about the transformations envisaged by your dbt project?
Is it correct that that table is meant to be specific to your process, and should not be "seen" by other dbt run sessions that might be running concurrently?

@jrpereirajr
Copy link
Author

jrpereirajr commented Jan 19, 2024

Hi Benjamin!

I didn't created that table. I think it was created by the dbt-iris plugin because I found this file in the directory target/run:

/* create_view_as */
create table "customer1_dbt"."stg_ordem_servico__dbt_tmp" as
    
with

source as (
    select 
        OriginalId,
...

Note that the name of the table is the same presented in the error message - customer1_dbt.stg_ordem_servico__dbt_tmp.

This is the model SQL file that generated the statement mentioned early:

{{
    config(
        materialized='view'
    )
}}

with

source as (
    select 
        OriginalId,
        SessionId,
        Codigo
    from {{source('shift_lis', 'Order')}}
),

renamed as (
    select
        OriginalId as EntityId,
        SessionId,
        Codigo
        current_timestamp(3) as LastModified
    from source
)

select * from renamed

Thank you!

@bdeboe
Copy link
Contributor

bdeboe commented Jan 21, 2024

Hi,

it looks like running multiple jobs concurrently may not be an intended pattern, at least from (not finding much in the dbt docs) this thread: dbt-labs/dbt-core#2881
At the bottom there is a reference to how the Snowflake adapter deals with the issue by supporting the invocation ID as a suffix for naming these tables, as part of a macro. I wonder if that is something we could consider for the IRIS one as well @daimor ?

@daimor
Copy link
Member

daimor commented Jan 21, 2024

Yes, I think we can make this too

@fernandolferreira
Copy link

Hi @daimor , do you have any news about this issue?

@daimor
Copy link
Member

daimor commented Apr 8, 2024

Sorry, I did not have a chance to look yet.

@jrpereirajr
Copy link
Author

Hi guys! Do you have any updates?

@daimor
Copy link
Member

daimor commented Jul 6, 2024

I've released a new version, also with upgrade to the latest dbt
please check it

@jrpereirajr
Copy link
Author

jrpereirajr commented Jul 12, 2024

Hi @daimor.

Thank you for your response.

After installing version v1.8.17, bash was not able to find dbt command. By issuing a dbt command, I've getting this error:
bash: dbt: command not found

This was I used to test it into a container:

ARG IMAGE=containers.intersystems.com/intersystems/irishealth-ml:2023.1.4.580.0
FROM $IMAGE

USER root

RUN apt-get update
RUN apt-get install -y python3.10-venv nano curl

COPY requirements.txt ./

RUN pip install -r ./requirements.txt
...

requirements.txt:

dbt-iris
numpy

After downgrading to version v1.5.7, it went back to work:

dbt-iris==1.5.7
numpy

Did I miss something?

@daimor
Copy link
Member

daimor commented Jul 18, 2024

dbt-core, in the latest version not a functional requirement for dbt-iris itself
and it can be installed separately (and I think supposed to be installed this way)
I'll update package to have it in dependencies for compatibility

@jrpereirajr
Copy link
Author

Hi @daimor.

Thank you for installing instructions. By installing dbt-core first and then dbt-iris I was able to run dbt again.

However, after installing dbt 1.8.18 I'm now getting the following error on concurrent executions:

image

As you can see, I opened two shell sessions and tried to execute the same dbt run command at the same time, but got a <Table or view not found> error.

@bdeboe
Copy link
Contributor

bdeboe commented Aug 1, 2024

any chance you'd be able to show more of the generated SQL? You can use dbt compile to see those in case you weren't already tracking them

@jrpereirajr
Copy link
Author

Hi @bdeboe,

Sorry for the delay, I was on PTO.
Are you looking for the content created within target folder? If so, have you a secure way to send it for privacy reasons?

@bdeboe
Copy link
Contributor

bdeboe commented Sep 6, 2024

I think it would be helpful to verify if the generated SQL uses different table names for both invocations, as I believe @daimor intended with his change. If not, we need to look into the model file to see if perhaps the materialization option isn't what the code expects it to be (could well be the code that needs to change, of course).
And in case it does use different table names, we should check which table is being dropped earlier than expected and triggering that error.

@jrpereirajr
Copy link
Author

Hi @bdeboe

I sent you the files into your e-mail.

Thanks!

@bdeboe
Copy link
Contributor

bdeboe commented Sep 23, 2024

Hi @jrpereirajr ,

My apologies for the long delay in responding. I received this while traveling and had forgotten about it by the time I got back.

Now having taken a closer look I’m wondering what the actual table name is that is referenced in the error message (scrubbed from your screenshot). Any modules with materialized=’view’ will in practice result in the following sequence of statements, in which “tab” is the name of the table (module) being created and 122334 is a hash to avoid collisions between processes:

DROP TABLE IF EXISTS tab_122334;
CREATE TABLE tab_122334 AS SELECT … ;
DROP TABLE IF EXISTS tab__dbt_backup;
ALTER TABLE tab RENAME tab__dbt_backup;
DROP TABLE IF EXISTS tab;
ALTER TABLE tab_122334 RENAME tab;
DROP TABLE IF EXISTS tab__dbt_backup;

Because there is no IF EXISTS syntax for the ALTER TABLE command, I can imagine concurrent processes might try to rename a table that just got deleted by another process and result in the table not found error on either tab or tab__dbt_backup. If that is the case, the error would probably not reproduce consistently as you have to have "bad luck" for the two processes to be swapping out the tables at the same time, so I'm not sure if this is the true reason. Can you please confirm the full error message?

Either way, I think eventually you'll want to build some concurrency control into your flow, for example using the LOCK command around the entire module using macros. Depending on what happens in the module (and how well it anticipates increments), we'll want to make sure the CREATE TABLE ... AS SELECT from a second process doesn't overwrite the result from the first process.

Thanks,
benjamin

@jrpereirajr
Copy link
Author

Hi @bdeboe! No need to apologies. :)

As "full error message", I understood running the same test again (as described here), but this time with --debug flag. So, I ran it again and sent you the outputs. Please, let me know if I was correct.

In the meanwhile I'll try your suggestion regarding the macro with LOCK command.

Thanks!

@bdeboe
Copy link
Contributor

bdeboe commented Sep 23, 2024

Hi @jrpereirajr ,

Thanks for your prompt reply.
The error message in the debug output (which is indeed very comprehensive and helpful) is slightly different:

[SQLCODE: <-400>:<Fatal error occurred>]
[Location: <ServerLoop>]
[%msg: <ERROR #5864: User '<user>' in process '6140' has '<table>' open for editing.>]

This might be another manifestation of the same thing: that the two processes are trying to swap the tables at the same time. You'd get this error when the ALTER TABLE statements happen at the same time, and the earlier "Table not found" error when one process' ALTER TABLE or DROP TABLE already completed right when the second process gets to it.

I suppose there's no developers accessing the system while dbt is running?

Either way, using locking in a macro should help avoid the error and offer some level of control. I think we can also consider adding locking in the dbt-iris implementation, and automatically taking and releasing locks at the start/end of a model's execution.
Maybe @daimor already considered this or noted relevant alternative approaches in other adapters?

It's not a silver bullet though, because in a view or table materialization, you risk replacing your table with one that's slightly older when two processes overtake one another. Say you have a model A that builds a staging table and a model B that uses table materialization to build the final result. Even if you use locking within the model, there's a risk:

  1. Process 1 builds model A
  2. Process 2 builds model A (based on slightly more recent data)
  3. Process 2 happens to finish first building model A, and in model B swaps the target table
  4. Process 1 finishes building model A and in model B swaps the target table for the not-so-recent data it built in model A

Some form of incremental logic is likely safer when you risk having concurrent processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants