Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple jobs on the same node #41

Open
stephenlienharrell opened this issue Jun 13, 2023 · 2 comments
Open

Support multiple jobs on the same node #41

stephenlienharrell opened this issue Jun 13, 2023 · 2 comments

Comments

@stephenlienharrell
Copy link
Member

Currently we collect everything at a node-level. We need to examine what metrics can be split out (on a core or socket basis) and what is not able to be split out and if that is useful.

@stephenlienharrell
Copy link
Member Author

for CPU
need core-affinity matched to job id

For Memory:
Need to find all memory usage from primary job starter programmatically. Find job starter, then get all child process memory: ps -o pid,ppid,pgid,comm,%cpu,%me

Snapshot this at the same time as the rest of the metrics - find out if there is a way to get the job id, then match jobid to specific processes on-node to get snapshot of memory usage.

Can we do this programmatically for any other statistics?

@stephenlienharrell
Copy link
Member Author

regarding the approach above, need to make sure we can capture detached processes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant