-
Notifications
You must be signed in to change notification settings - Fork 0
Log panic file contents on worker failure #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nto mguntupa/panic
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #294 +/- ##
==========================================
- Coverage 84.16% 83.81% -0.36%
==========================================
Files 17 17
Lines 1503 1538 +35
==========================================
+ Hits 1265 1289 +24
- Misses 238 249 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| def _monitor(self): | ||
| restart_count = 0 # Initialize a counter for restarts | ||
| max_restarts = 5 # Set the maximum number of restarts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make this configurable?
| # log.debug("Worker log output stopped") | ||
|
|
||
| def _monitor(self): | ||
| restart_count = 0 # Initialize a counter for restarts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to reset this when a worker starts up successfully?
Description
Changes include:
Regenerate the pydantic models
grab location of panic file - ping the status endpoint
if worker dies then (warning):
check greater than 0
readin line by line or limiting the size of the message (look at splitting on an empty line)
Feed panic file into logging (at warning level) for DCS Monitor, OTEL to pick up
qLimit restarts of worker to x (5)
Checklist