Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] jobs timing out early regardles of job_execution_timeout_seconds #1081

Closed
2 tasks done
colin-rogers-dbt opened this issue Jan 31, 2024 · 9 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@colin-rogers-dbt
Copy link
Contributor

colin-rogers-dbt commented Jan 31, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Pulling this from a slack thread: https://getdbt.slack.com/archives/C99SNSRTK/p1706719877028389

Hello everyone,
We are currently scheduling our jobs via github actions to trigger dbt + bigquery stack, and we are having our sql models are timing out ~900 seconds when we have the relevant profile's job_execution_timeout_seconds at 5400.
This issue was not present 3 days ago - anyone having issues like this?

Expected Behavior

job_execution_timeout_seconds is respected in model execution

Steps To Reproduce

TBD

Relevant log output

13:44:38  Running with dbt=1.6.6
13:44:39  Registered adapter: bigquery=1.6.7

...

13:45:27  Concurrency: 8 threads (target='prod-ci')

...

14:01:06  22 of 142 ERROR creating sql table model dbt_core_offer.dim_offer_home_violation  [ERROR in 902.79s]
14:05:34  64 of 142 ERROR creating sql table model dbt_intermediate.int_offer_group ...... [ERROR in 901.68s]

...

14:06:11  Completed with 2 errors and 1 warning:
14:06:11  
14:06:11    Runtime Error in model dim_offer_home_violation (models/core/offer/dimension/dim_offer_home_violation.sql)
  Query exceeded configured timeout of 5400s
14:06:11  
14:06:11    Runtime Error in model int_offer_group (models/intermediate/offer/cached_4x_daily/int_offer_group.sql)
  Query exceeded configured timeout of 5400s

Environment

- OS:
- Python:
- dbt-core: 1.6.6
- dbt-bigquery: 1.6.7

Additional Context

No response

@colin-rogers-dbt colin-rogers-dbt added bug Something isn't working triage labels Jan 31, 2024
@colin-rogers-dbt
Copy link
Contributor Author

colin-rogers-dbt commented Jan 31, 2024

Looks like this was introduced in 2.16.0 release of python-api-core

@nathangriffiths-cdx
Copy link

We have encountered the same issue:

dbt-core 1.5.6
dbt-bigquery 1.5.6

Downgrading the google-api-core version to < 2.16 appears to fix the issue.

@colin-rogers-dbt
Copy link
Contributor Author

For more context see: googleapis/python-api-core#591

@yu-iskw
Copy link
Contributor

yu-iskw commented Feb 1, 2024

As @colin-rogers-dbt described the approach at googleapis/python-api-core#591 (comment) , we can improve the code to call the BigQuery API(s) at a fundamental level.

Let me suggest to suggest so that we can control the timeout of BigQuery jobs. Here is the example code. It would be great to address what @colin-rogers-dbt suggested. Meanwhile, that would be good to enable setting the timeout argument of JobQuery.result.

yu-iskw@15bf262

[UPDATE]
googleapis/python-api-core#592 enables us to avoid the timeout issue, even if we don't set the timeout argument of JobQuery.result(). So, we don't need to pass it any more after the patch is released.
I tested the patched code. That looks good for me.

googleapis/python-api-core#591 (comment)

@sprivers
Copy link

sprivers commented Feb 2, 2024

We started seeing this issue this week as well and we do not specify job_execution_timeout_seconds anywhere and yet our queries were timing out after 900 seconds.

after some research we discovered that the last build we did where this was working correctly, it was using the following transitive google library dependencies:

works with
google-api-core-2.15.0
google-auth-2.26.2
google-cloud-bigquery-3.16.0
google-cloud-core-2.4.1
google-cloud-dataproc-5.8.0

however when we built our latest docker image it pulled in newer versions of the google libraries

fails with
google-api-core-2.16.1
google-auth-2.27.0
google-cloud-bigquery-3.17.1
google-cloud-core-2.4.1
google-cloud-dataproc-5.9.0

by updating our requirements.txt to pin the versions of the google libraries then we no longer see the 900 sec timeout

google-api-core==2.15.0
google-auth==2.26.2
google-cloud-bigquery==3.16.0
google-cloud-core==2.4.1
google-cloud-dataproc==5.8.0
google-cloud-storage==2.14.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.62.0
dbt-core==1.7.5
dbt-bigquery==1.7.2

one of these libraries is causing the job_execution_timeout to be set on the connection properties

   @classmethod
    def get_job_execution_timeout_seconds(cls, conn):
        credentials = conn.credentials
        return credentials.job_execution_timeout_seconds  << we do not have this defined anywhere should not be set

this code for us should never execute

        # only use async logic if user specifies a timeout
        if job_execution_timeout:   << <should never get here because this should not be defined
            loop = asyncio.new_event_loop()
            future_iterator = asyncio.wait_for(
                loop.run_in_executor(None, functools.partial(query_job.result, max_results=limit)),
                timeout=job_execution_timeout,
            )

            try:
                iterator = loop.run_until_complete(future_iterator)
            except asyncio.TimeoutError:
                query_job.cancel()
                raise DbtRuntimeError(
                    f"Query exceeded configured timeout of {job_execution_timeout}s"
                )
            finally:
                loop.close()
        else:
            iterator = query_job.result(max_results=limit)
        return query_job, iterator

@yaozhang09
Copy link

any update on getting an actual fix in a release?

@jmesterh
Copy link

jmesterh commented Feb 8, 2024

python-api-core 2.17.0 with patch has been released https://github.com/googleapis/python-api-core/releases/tag/v2.17.0

@Fleid
Copy link
Contributor

Fleid commented Feb 22, 2024

@colin-rogers-dbt do you have an ETA, I see it in the current sprint.

@colin-rogers-dbt
Copy link
Contributor Author

Fix has been released in 1.6.10 and 1.7.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants