Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log fatal errors #52

Closed
lars-t-hansen opened this issue Jun 21, 2023 · 6 comments
Closed

Log fatal errors #52

lars-t-hansen opened this issue Jun 21, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@lars-t-hansen
Copy link
Collaborator

See #51. We should detect unrecoverable error situations that prevent monitoring from working, and we should arguably log them to some standard syslog or other medium where they will be seen.

@lars-t-hansen
Copy link
Collaborator Author

Probably there are two aspects of this, one is propagating the errors properly within sonar - add_gpu_info in ps.rs drops the error on the floor, for example - and the other is logging in a standard location in a standard way.

@lars-t-hansen
Copy link
Collaborator Author

It could look like it might be sufficient to use the syslog crate, https://crates.io/crates/syslog, and that whatever plumbing is necessary to adapt to the local system conventions is hidden behind the syslog service.

@lars-t-hansen lars-t-hansen self-assigned this Jul 24, 2023
@lars-t-hansen
Copy link
Collaborator Author

An alternative view is that sonar will always be run by cron and that the logging and output handling performed by cron - mailing the output to the owner of the job - is sufficient. I think there's no rush to implement anything here, we need to examine the entire pipeline first.

@bast
Copy link
Member

bast commented Jul 27, 2023

We can then follow this: https://rust-lang-nursery.github.io/rust-cookbook/development_tools/debugging/log.html

@lars-t-hansen
Copy link
Collaborator Author

Related to logging, I see intermittent clusters of sonar errors reported by cron of this form:

SONAR ERROR: "CPU process listing failed"

(This morning there was a cluster of six of these on ML7. It would have been useful to see more information here, to better diagnose the problem. Over the summer I had a similar cluster on another of the nodes.)

@lars-t-hansen lars-t-hansen added the enhancement New feature or request label Aug 8, 2023
@lars-t-hansen
Copy link
Collaborator Author

PR #71 addresses all of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants