Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

monitor process memory consumption and alert for om[is]agent #2419

Merged
merged 4 commits into from
Apr 2, 2019

Conversation

xudifsd
Copy link
Member

@xudifsd xudifsd commented Mar 27, 2019

Get process memory consumption by ps rss, report those who use more than 500M to save space in prometheus. Also defined an alert rule for process omiagent and omsagent, these two processes are frequently causing OOM in azure VM, which DRI should take care of.

@xudifsd
Copy link
Member Author

xudifsd commented Mar 27, 2019

fixed #2385

@coveralls
Copy link

coveralls commented Mar 27, 2019

Coverage Status

Coverage increased (+0.2%) to 52.895% when pulling 145bf53 on dixu/process-mem into 62630ea on master.

src/job-exporter/src/ps.py Outdated Show resolved Hide resolved
src/job-exporter/src/ps.py Outdated Show resolved Hide resolved

if info.rss > 500 * 1024 * 1024:
# only record large memory consumption to save space in prometheus
cmd = info.cmd.split()[0] # remove args
Copy link
Member

@mzmssg mzmssg Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we strip the command details, so most commands will be bash or python, count them seems meaningless and wasteful
How about excluding such processes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed args because omiagent will also have args like /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING, I think the args is useless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not filter omiagent here, instead of in alert rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will be a special case in code

@xudifsd xudifsd merged commit 1bcb440 into master Apr 2, 2019
@xudifsd xudifsd deleted the dixu/process-mem branch April 2, 2019 05:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants