Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batchless job manager #50

Closed
wants to merge 30 commits into from
Closed

Batchless job manager #50

wants to merge 30 commits into from

Conversation

lars-t-hansen
Copy link
Collaborator

A job manager that computes a job ID from the process tree, useful if there is no queue system being used (eg UiO ML-nodes and light-HPC).

@lars-t-hansen
Copy link
Collaborator Author

This is broken somehow, when --batchless is specified everything becomes a zombie, contrast first vs second run below:

[larstha@ml8 sonar]$ target/debug/sonar ps
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,root,0,tuned,4,90116,0,0,0,0
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,mateuwa,0,python,6693,277876104,0,0,0,0
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,riccarsi,0,python,281.7,233097616,1111,81,0,6868992
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,riccarsi,0,top,2.8,3828,0,0,0,0
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,einarvid,0,python3,5258.2,278281020,0,0,0,0
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,zabbix,0,zabbix_agentd,5.1,2024,0,0,0,0
2023-06-20T12:24:47.939400058+00:00,ml8.hpc.uio.no,192,einarvid,0,mongod,2.2,3547712,0,0,0,0
[larstha@ml8 sonar]$ target/debug/sonar ps --batchless
2023-06-20T12:24:54.630248696+00:00,ml8.hpc.uio.no,192,_zombie_2288850,0,_unknown_,0,0,11111111111111111111111111111111,0,0,1642496
2023-06-20T12:24:54.630248696+00:00,ml8.hpc.uio.no,192,_zombie_2288850,0,python,0,0,1,20,0,1642496
2023-06-20T12:24:54.630248696+00:00,ml8.hpc.uio.no,192,_zombie_1202293,0,_unknown_,0,0,11111111111111111111111111111111,0,0,1773568
2023-06-20T12:24:54.630248696+00:00,ml8.hpc.uio.no,192,_zombie_1202293,0,python,0,0,1,31,0,1773568
2023-06-20T12:24:54.630248696+00:00,ml8.hpc.uio.no,192,_zombie_1198074,0,_unknown_,0,0,11111111111111111111111111111111,0,0,3452928
2023-06-20T12:24:54.630248696+00:00,ml8.hpc.uio.no,192,_zombie_1198074,0,python,0,0,1111,34,0,3452928

@lars-t-hansen
Copy link
Collaborator Author

The reason it fails in that way is that the ps command hangs for sufficiently large output, and then the timeout kills it. On ML8 currently, the full ps output is about 2000 lines, 150KB of text. Even with a timeout of 100s this output is not properly processed. It could look like the workaround in command.rs is not appropriate or sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant