Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a "job" for the ML and light-HPC systems #1

Closed
lars-t-hansen opened this issue Jun 14, 2023 · 1 comment
Closed

Define a "job" for the ML and light-HPC systems #1

lars-t-hansen opened this issue Jun 14, 2023 · 1 comment
Assignees

Comments

@lars-t-hansen
Copy link
Collaborator

See here for a general discussion of this matter. We need to define what a "job" is for systems without a job queue, because the "job" is the most reasonable unit of work that we'll be dealing with in the other tools. Most reasonably a job is something that is synthesized from the output of sonar. Sonar is effectively a sampling profiler, providing a snapshot of running processes with their current resource usage. A simple tool that monitors that output may be able to build up a set of "jobs" over time and may log them in the same form that SLURM logs them. It's possible that we need to augment the output of sonar a little bit so as to distinguish multiple back-to-back runs by the same user of the same programs as separate jobs - this should not be too hard.

@lars-t-hansen
Copy link
Collaborator Author

A sensible system has been implemented for this in sonar, and seems to work OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant