-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
omiagent using 100% cpu #615
Comments
@ELadner-cvx , it shows there are 2 providers in your omiagent process: |
I'm having the same issue. Note that I am running Docker on this instance, which while searching, I've found that there seems to be a commonality. Should I be running it in a container? Other ideas? $ cat /etc/os-release | grep VERSION=
VERSION="18.04.2 LTS (Bionic Beaver)"
$ docker -v
Docker version 18.09.3, build 774a1f4
$ ps aux | egrep "(oms|omi)agent"
omsagent 1692 0.2 0.5 1267116 46992 ? Sl 14:03 0:04 /opt/microsoft/omsagent/ruby/bin/ruby /opt/microsoft/omsagent/bin/omsagent -d /var/opt/microsoft/omsagent/{hash}/run/omsagent.pid -o /var/opt/microsoft/omsagent/{hash}/log/omsagent.log -c /etc/opt/microsoft/omsagent/{hash}/conf/omsagent.conf --no-supervisor
root 4504 0.0 0.1 391268 10792 ? Sl 14:24 0:00 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
root 29127 0.0 0.2 219956 20448 ? Sl 06:45 0:03 python /var/lib/waagent/Microsoft.EnterpriseCloud.Monitoring.OmsAgentForLinux-1.9.1/omsagent.py -telemetry
$ dpkg -s omi
Package: omi
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 4764
Maintainer: Microsoft Corporation
Architecture: amd64
Source: omi
Version: 1.6.0.0
Provides: omi
Depends: libc6 (>= 2.3.6), libpam-runtime (>= 0.79-3)
Conffiles:
/etc/opt/omi/conf/omiserver.conf {hash}
/etc/opt/omi/conf/omilogrotate.conf {hash}
Description: Open Management Infrastructure
omi server
$ dpkg -s omsagent
Package: omsagent
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 83204
Maintainer: Microsoft Corporation
Architecture: amd64
Source: omsagent
Version: 1.9.0.0
Provides: omsagent
Depends: omi (>= 1.3.0.2), scx (>= 1.6.3.212)
Conffiles:
/etc/opt/microsoft/omsagent/sysconf/installinfo.txt {hash}
Description: Microsoft Operations Management Suite for UNIX/Linux agent
Provides agent for the Microsoft Operations Management Suite. |
@dmhendricks no matter docker or normal box, it seems the issue happen on some customers' box this year, many similar high cpu ICM items this year, and related team: OMSagent team and dsc team seems are investigating the issue. |
@JumpingYang001 @ELadner-cvx I'm investigating similar issue, when analysing pmap I can only see the presence of libSCXCoreProviderModule.so. Also strace shows that 94% cpu taken by the system call "select", then I notice this tight loop in sock/selector.c. Having this as true mean the system call was suspended due to a signal: ( -1 == r ) and ( errno == EINTR ).
|
select system call in above code snapshot is a waiting call which waits for some evet to occur. Hence this code segment should not be a issue. |
There is no debug symbols, so there is no way to get the OMI callstacks. |
I have the same issue here. When observing the process whilst it exhibits this issue I note the following: strace -cwp 124261strace: Process 124261 attached 100.00 11.303628 1027603 11 select 100.00 11.303777 22 total So I don't think the issue is the omiagent code as such, but what or however the system reacts to the select call. |
With kernel highres=off I can say we are still observing the 100% CPU behaviour whilst the agent is waiting for a request to complete. A few seconds after the request completes omiagent recovers and goes back to the expected sleep pattern. |
Hi guys, please follow resolution steps in this documentation: https://github.com/microsoft/OMS-Agent-for-Linux/blob/master/docs/Troubleshooting.md#i-see-omiagent-using-100-cpu |
I just worked out the same here. Also note https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1044666 which manifests itself after about 12 hours of running. When you find the thread that's running 100% and strace that it repeatedly attempts to locate '/etc/pki/nssdb/cert9.db-wal' and '/etc/pki/nssdb/cert9.db-journal', the longer the process runs the more iterations of the above tests occur. Their solution is to 'Fix: NSS now avoids calls to sdb_measureAccess in lib/softoken/sdb.c s_open if [environment variable] NSS_SDB_USE_CACHE is "yes" ' |
Fixed in microsoft/pal@6c0c108. |
The omiagent process is using 100% cpu. Included pmap output after reviewing the other reports of this type.
The text was updated successfully, but these errors were encountered: