-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1760] [Bug] Python resource_tracker causing dbt test
to fail
#6535
Comments
dbt test
to faildbt test
to fail
@alexrosenfeld10 Thanks for opening! After a bit of googling, it seems like this will be tricky to reproduce. When a Python application is running multiple processes (as dbt does to handle threaded execution), this warning crops up when the application is terminated partway through, e.g. if it hits a memory limit. For example:
Is there any chance your Argo Workflow is running out of memory? I'd hope that there'd be other error messages to say something to that effect, and explain why the process is being killed... But it also sounds like this is related to a bug in Python 3.8+, related to a switch within the |
@jtcohen6 thanks for the reply! I wonder if the best course of action here is to simply wait for I'm unable to check the memory history on those pods (they've been churned and our monitoring software no longer has the data). I can check if / when this happens again. However, I'd be pretty surprised if they ran out of memory, as all they're responsible for is I didn't see any other error messages related to pods being OOMKilled, which I would've expected to see as well if that was the issue. |
|
Agreed, I wouldn't have expected that either. It would make more sense if it were a more memory-intensive command, e.g. I missed this on the first read, but I do see a message My hypothesis remains that, while the |
Glad dbt has their sights set on the new version! Thanks for the info. I've had to fork the snowflake connectors and DIY stuff because their development is fairly slow (that said, I'm using the .NET connector which I understand isn't their main marketshare). Hopefully they come through w/python 3.11 support soonish! It's possible it's being interrupted somewhere somehow, but I don't see any other indicators of that in our system. We'll look for more details if it pops up again. I'll close this out, as the next actionable thing here is upgrade python + collect more data. Thanks again for the info! |
Is this a new bug in dbt-core?
Current Behavior
We're seeing sporadic errors causing our production Argo Workflow runs to fail. This is not a consistent issue nor is it easily reproducible, but when it happens the
dbt test
command fails and the pod is terminated with code 143. This is obviously not a good thing in a production environment, and it causes alerts that go to on call engineers.For full clarity, I will post the various layers that ultimately invoke
dbt test
.The runs execute the following Argo Workflow template:
The
execute_domain.sh
script looks like this (slightly abridged, but this is where the rubber meets the road):The error that occurs during the run is:
Here is a screenshot detailing the error happening during the test run, and not during any other time:
Note: I realize the logs indicate that the resource_tracker issue is a
warning
level log, implying it's not the root cause of the failure... However, there are no other relevant logs output bydbt
during these failures, so this is all I have to go on.Expected Behavior
dbt test
should run end to end without crashing.Steps To Reproduce
This is challenging to reproduce. Let's get a discussion going, and from there if no one on the dbt Labs side of things has ideas, we can investigate ways to reproduce this in a shareable fashion.
Relevant log output
Environment
Which database adapter are you using with dbt?
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: