Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omiagent using 100% cpu #615

Closed
ELadner-cvx opened this issue Jan 14, 2019 · 11 comments
Closed

omiagent using 100% cpu #615

ELadner-cvx opened this issue Jan 14, 2019 · 11 comments

Comments

@ELadner-cvx
Copy link

The omiagent process is using 100% cpu. Included pmap output after reviewing the other reports of this type.

# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core) 
# rpm -qa | grep omi
omi-1.4.2-5.x86_64
# ps -eaf | grep [1]0441
omsagent  10441   5497 53 Jan10 ?        2-00:03:11 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
# pmap 10441
10441:   /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
0000000000400000    712K r-x-- omiagent
00000000005b1000     44K rw--- omiagent
00000000005bc000    132K rw---   [ anon ]
00000000011bd000  10704K rw---   [ anon ]
00007f6574000000  11544K rw---   [ anon ]
00007f6574b46000  53992K -----   [ anon ]
00007f657c000000  65536K rw---   [ anon ]
00007f6584000000    132K rw---   [ anon ]
00007f6584021000  65404K -----   [ anon ]
00007f658c000000  65508K rw---   [ anon ]
00007f658fff9000     28K -----   [ anon ]
00007f6591ad6000 103592K r---- locale-archive
00007f6598000000   1748K rw---   [ anon ]
00007f65981b5000  63788K -----   [ anon ]
00007f659c000000    968K rw---   [ anon ]
00007f659c0f2000  64568K -----   [ anon ]
00007f65a0000000    132K rw---   [ anon ]
00007f65a0021000  65404K -----   [ anon ]
00007f65a409f000      4K -----   [ anon ]
00007f65a40a0000    252K rw---   [ anon ]
00007f65a40df000      4K -----   [ anon ]
00007f65a40e0000    252K rw---   [ anon ]
00007f65a411f000    160K r-x-- libmicxx.so
00007f65a4147000   1020K ----- libmicxx.so
00007f65a4246000      8K rw--- libmicxx.so
00007f65a4248000    128K rw---   [ anon ]
00007f65a4268000   3308K r-x-- libSCXCoreProviderModule.so
00007f65a45a3000   2048K ----- libSCXCoreProviderModule.so
00007f65a47a3000    160K rw--- libSCXCoreProviderModule.so
00007f65a47cb000     16K rw---   [ anon ]
00007f65a56a0000    220K r-x-- libMSFT_nxServiceResource_root-oms.so
00007f65a56d7000   2048K ----- libMSFT_nxServiceResource_root-oms.so
00007f65a58d7000     16K rw--- libMSFT_nxServiceResource_root-oms.so
00007f65a58db000    128K rw---   [ anon ]
00007f65a58fb000    160K r-x-- libnsspem.so
00007f65a5923000   2048K ----- libnsspem.so
00007f65a5b23000      4K r---- libnsspem.so
00007f65a5b24000      4K rw--- libnsspem.so
00007f65a5b25000      8K r-x-- libnsssysinit.so
00007f65a5b27000   2044K ----- libnsssysinit.so
00007f65a5d26000      4K r---- libnsssysinit.so
00007f65a5d27000      4K rw--- libnsssysinit.so
00007f65a5d28000    524K r-x-- libfreeblpriv3.so
00007f65a5dab000   2044K ----- libfreeblpriv3.so
00007f65a5faa000      8K r---- libfreeblpriv3.so
00007f65a5fac000      4K rw--- libfreeblpriv3.so
00007f65a5fad000     16K rw---   [ anon ]
00007f65a5fb1000    708K r-x-- libsqlite3.so.0.8.6
00007f65a6062000   2044K ----- libsqlite3.so.0.8.6
00007f65a6261000      8K r---- libsqlite3.so.0.8.6
00007f65a6263000     12K rw--- libsqlite3.so.0.8.6
00007f65a6266000    240K r-x-- libsoftokn3.so
00007f65a62a2000   2048K ----- libsoftokn3.so
00007f65a64a2000      4K r---- libsoftokn3.so
00007f65a64a3000      4K rw--- libsoftokn3.so
00007f65a64a4000     20K r-x-- libnss_dns-2.17.so
00007f65a64a9000   2048K ----- libnss_dns-2.17.so
00007f65a66a9000      4K r---- libnss_dns-2.17.so
00007f65a66aa000      4K rw--- libnss_dns-2.17.so
00007f65a66ab000     48K r-x-- libnss_files-2.17.so
00007f65a66b7000   2044K ----- libnss_files-2.17.so
00007f65a68b6000      4K r---- libnss_files-2.17.so
00007f65a68b7000      4K rw--- libnss_files-2.17.so
00007f65a68b8000     24K rw---   [ anon ]
00007f65a68be000      4K -----   [ anon ]
00007f65a68bf000   8192K rw---   [ anon ]
00007f65a70bf000      4K -----   [ anon ]
00007f65a70c0000   8192K rw---   [ anon ]
00007f65a78c0000      4K -----   [ anon ]
00007f65a78c1000   8192K rw---   [ anon ]
00007f65a80c1000      8K r-x-- libfreebl3.so
00007f65a80c3000   2044K ----- libfreebl3.so
00007f65a82c2000      4K r---- libfreebl3.so
00007f65a82c3000      4K rw--- libfreebl3.so
00007f65a82c4000     32K r-x-- libcrypt-2.17.so
00007f65a82cc000   2044K ----- libcrypt-2.17.so
00007f65a84cb000      4K r---- libcrypt-2.17.so
00007f65a84cc000      4K rw--- libcrypt-2.17.so
00007f65a84cd000    184K rw---   [ anon ]
00007f65a84fb000    112K r-x-- libsasl2.so.3.0.0
00007f65a8517000   2044K ----- libsasl2.so.3.0.0
00007f65a8716000      4K r---- libsasl2.so.3.0.0
00007f65a8717000      4K rw--- libsasl2.so.3.0.0
00007f65a8718000     28K r-x-- librt-2.17.so
00007f65a871f000   2044K ----- librt-2.17.so
00007f65a891e000      4K r---- librt-2.17.so
00007f65a891f000      4K rw--- librt-2.17.so
00007f65a8920000    328K r-x-- libldap-2.4.so.2.10.7
00007f65a8972000   2048K ----- libldap-2.4.so.2.10.7
00007f65a8b72000      8K r---- libldap-2.4.so.2.10.7
00007f65a8b74000      4K rw--- libldap-2.4.so.2.10.7
00007f65a8b75000     56K r-x-- liblber-2.4.so.2.10.7
00007f65a8b83000   2044K ----- liblber-2.4.so.2.10.7
00007f65a8d82000      4K r---- liblber-2.4.so.2.10.7
00007f65a8d83000      4K rw--- liblber-2.4.so.2.10.7
00007f65a8d84000    232K r-x-- libnspr4.so
00007f65a8dbe000   2044K ----- libnspr4.so
00007f65a8fbd000      4K r---- libnspr4.so
00007f65a8fbe000      8K rw--- libnspr4.so
00007f65a8fc0000      8K rw---   [ anon ]
00007f65a8fc2000     16K r-x-- libplc4.so
00007f65a8fc6000   2044K ----- libplc4.so
00007f65a91c5000      4K r---- libplc4.so
00007f65a91c6000      4K rw--- libplc4.so
00007f65a91c7000     12K r-x-- libplds4.so
00007f65a91ca000   2044K ----- libplds4.so
00007f65a93c9000      4K r---- libplds4.so
00007f65a93ca000      4K rw--- libplds4.so
00007f65a93cb000    160K r-x-- libnssutil3.so
00007f65a93f3000   2044K ----- libnssutil3.so
00007f65a95f2000     28K r---- libnssutil3.so
00007f65a95f9000      4K rw--- libnssutil3.so
00007f65a95fa000   1168K r-x-- libnss3.so
00007f65a971e000   2048K ----- libnss3.so
00007f65a991e000     20K r---- libnss3.so
00007f65a9923000      8K rw--- libnss3.so
00007f65a9925000      8K rw---   [ anon ]
00007f65a9927000    144K r-x-- libsmime3.so
00007f65a994b000   2044K ----- libsmime3.so
00007f65a9b4a000     12K r---- libsmime3.so
00007f65a9b4d000      4K rw--- libsmime3.so
00007f65a9b4e000    308K r-x-- libssl3.so
00007f65a9b9b000   2044K ----- libssl3.so
00007f65a9d9a000     16K r---- libssl3.so
00007f65a9d9e000      4K rw--- libssl3.so
00007f65a9d9f000      4K rw---   [ anon ]
00007f65a9da0000    160K r-x-- libssh2.so.1.0.1
00007f65a9dc8000   2048K ----- libssh2.so.1.0.1
00007f65a9fc8000      4K r---- libssh2.so.1.0.1
00007f65a9fc9000      4K rw--- libssh2.so.1.0.1
00007f65a9fca000    200K r-x-- libidn.so.11.6.11
00007f65a9ffc000   2044K ----- libidn.so.11.6.11
00007f65aa1fb000      4K r---- libidn.so.11.6.11
00007f65aa1fc000      4K rw--- libidn.so.11.6.11
00007f65aa1fd000     84K r-x-- libgcc_s-4.8.5-20150702.so.1
00007f65aa212000   2044K ----- libgcc_s-4.8.5-20150702.so.1
00007f65aa411000      4K r---- libgcc_s-4.8.5-20150702.so.1
00007f65aa412000      4K rw--- libgcc_s-4.8.5-20150702.so.1
00007f65aa413000   1028K r-x-- libm-2.17.so
00007f65aa514000   2044K ----- libm-2.17.so
00007f65aa713000      4K r---- libm-2.17.so
00007f65aa714000      4K rw--- libm-2.17.so
00007f65aa715000    932K r-x-- libstdc++.so.6.0.19
00007f65aa7fe000   2044K ----- libstdc++.so.6.0.19
00007f65aa9fd000     32K r---- libstdc++.so.6.0.19
00007f65aaa05000      8K rw--- libstdc++.so.6.0.19
00007f65aaa07000     84K rw---   [ anon ]
00007f65aaa1c000    408K r-x-- libcurl.so.4.3.0
00007f65aaa82000   2044K ----- libcurl.so.4.3.0
00007f65aac81000      8K r---- libcurl.so.4.3.0
00007f65aac83000      4K rw--- libcurl.so.4.3.0
00007f65aac84000      4K rw---   [ anon ]
00007f65aac85000    556K r-x-- libomsconfig.so
00007f65aad10000   2048K ----- libomsconfig.so
00007f65aaf10000     48K rw--- libomsconfig.so
00007f65aaf1c000    148K rw---   [ anon ]
00007f65aaf41000    384K r-x-- libpcre.so.1.2.0
00007f65aafa1000   2048K ----- libpcre.so.1.2.0
00007f65ab1a1000      4K r---- libpcre.so.1.2.0
00007f65ab1a2000      4K rw--- libpcre.so.1.2.0
00007f65ab1a3000    144K r-x-- libselinux.so.1
00007f65ab1c7000   2044K ----- libselinux.so.1
00007f65ab3c6000      4K r---- libselinux.so.1
00007f65ab3c7000      4K rw--- libselinux.so.1
00007f65ab3c8000      8K rw---   [ anon ]
00007f65ab3ca000     88K r-x-- libresolv-2.17.so
00007f65ab3e0000   2044K ----- libresolv-2.17.so
00007f65ab5df000      4K r---- libresolv-2.17.so
00007f65ab5e0000      4K rw--- libresolv-2.17.so
00007f65ab5e1000      8K rw---   [ anon ]
00007f65ab5e3000     12K r-x-- libkeyutils.so.1.5
00007f65ab5e6000   2044K ----- libkeyutils.so.1.5
00007f65ab7e5000      4K r---- libkeyutils.so.1.5
00007f65ab7e6000      4K rw--- libkeyutils.so.1.5
00007f65ab7e7000     52K r-x-- libkrb5support.so.0.1
00007f65ab7f4000   2048K ----- libkrb5support.so.0.1
00007f65ab9f4000      4K r---- libkrb5support.so.0.1
00007f65ab9f5000      4K rw--- libkrb5support.so.0.1
00007f65ab9f6000     16K r-x-- libcap-ng.so.0.0.0
00007f65ab9fa000   2048K ----- libcap-ng.so.0.0.0
00007f65abbfa000      4K r---- libcap-ng.so.0.0.0
00007f65abbfb000      4K rw--- libcap-ng.so.0.0.0
00007f65abbfc000     84K r-x-- libz.so.1.2.7
00007f65abc11000   2044K ----- libz.so.1.2.7
00007f65abe10000      4K r---- libz.so.1.2.7
00007f65abe11000      4K rw--- libz.so.1.2.7
00007f65abe12000    100K r-x-- libk5crypto.so.3.1
00007f65abe2b000   2044K ----- libk5crypto.so.3.1
00007f65ac02a000      8K r---- libk5crypto.so.3.1
00007f65ac02c000      4K rw--- libk5crypto.so.3.1
00007f65ac02d000     12K r-x-- libcom_err.so.2.1
00007f65ac030000   2044K ----- libcom_err.so.2.1
00007f65ac22f000      4K r---- libcom_err.so.2.1
00007f65ac230000      4K rw--- libcom_err.so.2.1
00007f65ac231000    868K r-x-- libkrb5.so.3.3
00007f65ac30a000   2044K ----- libkrb5.so.3.3
00007f65ac509000     56K r---- libkrb5.so.3.3
00007f65ac517000     12K rw--- libkrb5.so.3.3
00007f65ac51a000    296K r-x-- libgssapi_krb5.so.2.2
00007f65ac564000   2048K ----- libgssapi_krb5.so.2.2
00007f65ac764000      4K r---- libgssapi_krb5.so.2.2
00007f65ac765000      8K rw--- libgssapi_krb5.so.2.2
00007f65ac767000    120K r-x-- libaudit.so.1.0.0
00007f65ac785000   2044K ----- libaudit.so.1.0.0
00007f65ac984000      4K r---- libaudit.so.1.0.0
00007f65ac985000      4K rw--- libaudit.so.1.0.0
00007f65ac986000     40K rw---   [ anon ]
00007f65ac990000   1800K r-x-- libc-2.17.so
00007f65acb52000   2048K ----- libc-2.17.so
00007f65acd52000     16K r---- libc-2.17.so
00007f65acd56000      8K rw--- libc-2.17.so
00007f65acd58000     20K rw---   [ anon ]
00007f65acd5d000   2256K r-x-- libcrypto.so.1.0.2k
00007f65acf91000   2048K ----- libcrypto.so.1.0.2k
00007f65ad191000    112K r---- libcrypto.so.1.0.2k
00007f65ad1ad000     52K rw--- libcrypto.so.1.0.2k
00007f65ad1ba000     16K rw---   [ anon ]
00007f65ad1be000    412K r-x-- libssl.so.1.0.2k
00007f65ad225000   2048K ----- libssl.so.1.0.2k
00007f65ad425000     16K r---- libssl.so.1.0.2k
00007f65ad429000     28K rw--- libssl.so.1.0.2k
00007f65ad430000     52K r-x-- libpam.so.0.83.1
00007f65ad43d000   2048K ----- libpam.so.0.83.1
00007f65ad63d000      4K r---- libpam.so.0.83.1
00007f65ad63e000      4K rw--- libpam.so.0.83.1
00007f65ad63f000      8K r-x-- libdl-2.17.so
00007f65ad641000   2048K ----- libdl-2.17.so
00007f65ad841000      4K r---- libdl-2.17.so
00007f65ad842000      4K rw--- libdl-2.17.so
00007f65ad843000     92K r-x-- libpthread-2.17.so
00007f65ad85a000   2044K ----- libpthread-2.17.so
00007f65ada59000      4K r---- libpthread-2.17.so
00007f65ada5a000      4K rw--- libpthread-2.17.so
00007f65ada5b000     16K rw---   [ anon ]
00007f65ada5f000    136K r-x-- ld-2.17.so
00007f65ada99000    676K r-x-- libmi.so
00007f65adb42000   1020K ----- libmi.so
00007f65adc41000     32K rw--- libmi.so
00007f65adc49000    176K rw---   [ anon ]
00007f65adc75000     28K r--s- gconv-modules.cache
00007f65adc7c000     16K rw---   [ anon ]
00007f65adc80000      4K r---- ld-2.17.so
00007f65adc81000      4K rw--- ld-2.17.so
00007f65adc82000      4K rw---   [ anon ]
00007fff43b50000    132K rw---   [ stack ]
00007fff43bdb000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total           718444K
@JumpingYang001
Copy link
Contributor

@ELadner-cvx , it shows there are 2 providers in your omiagent process:
libSCXCoreProviderModule.so is scx provider: https://github.com/Microsoft/scxcore
libMSFT_nxServiceResource_root-oms.so seems omsconfig provider: https://github.com/Microsoft/PowerShell-DSC-for-Linux

@dmhendricks
Copy link

I'm having the same issue. Note that I am running Docker on this instance, which while searching, I've found that there seems to be a commonality. Should I be running it in a container? Other ideas?

$ cat /etc/os-release | grep VERSION=
VERSION="18.04.2 LTS (Bionic Beaver)"

$ docker -v
Docker version 18.09.3, build 774a1f4

$ ps aux | egrep "(oms|omi)agent"
omsagent  1692  0.2  0.5 1267116 46992 ?       Sl   14:03   0:04 /opt/microsoft/omsagent/ruby/bin/ruby /opt/microsoft/omsagent/bin/omsagent -d /var/opt/microsoft/omsagent/{hash}/run/omsagent.pid -o /var/opt/microsoft/omsagent/{hash}/log/omsagent.log -c /etc/opt/microsoft/omsagent/{hash}/conf/omsagent.conf --no-supervisor
root      4504  0.0  0.1 391268 10792 ?        Sl   14:24   0:00 /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
root     29127  0.0  0.2 219956 20448 ?        Sl   06:45   0:03 python /var/lib/waagent/Microsoft.EnterpriseCloud.Monitoring.OmsAgentForLinux-1.9.1/omsagent.py -telemetry

$ dpkg -s omi
Package: omi
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 4764
Maintainer: Microsoft Corporation
Architecture: amd64
Source: omi
Version: 1.6.0.0
Provides: omi
Depends: libc6 (>= 2.3.6), libpam-runtime (>= 0.79-3)
Conffiles:
 /etc/opt/omi/conf/omiserver.conf {hash}
 /etc/opt/omi/conf/omilogrotate.conf {hash}
Description: Open Management Infrastructure
 omi server

$ dpkg -s omsagent
Package: omsagent
Status: install ok installed
Priority: optional
Section: utils
Installed-Size: 83204
Maintainer: Microsoft Corporation
Architecture: amd64
Source: omsagent
Version: 1.9.0.0
Provides: omsagent
Depends: omi (>= 1.3.0.2), scx (>= 1.6.3.212)
Conffiles:
 /etc/opt/microsoft/omsagent/sysconf/installinfo.txt {hash}
Description: Microsoft Operations Management Suite for UNIX/Linux agent
 Provides agent for the Microsoft Operations Management Suite.

@JumpingYang001
Copy link
Contributor

@dmhendricks no matter docker or normal box, it seems the issue happen on some customers' box this year, many similar high cpu ICM items this year, and related team: OMSagent team and dsc team seems are investigating the issue.

@abenbachir
Copy link

abenbachir commented Apr 3, 2019

@JumpingYang001 @ELadner-cvx I'm investigating similar issue, when analysing pmap I can only see the presence of libSCXCoreProviderModule.so.
Do you think this is related to libSCXCoreProviderModule.so ?

Also strace shows that 94% cpu taken by the system call "select", then I notice this tight loop in sock/selector.c.

Having this as true mean the system call was suspended due to a signal: ( -1 == r ) and ( errno == EINTR ).

static int _Select(
    fd_set* readSet,
    fd_set* writeSet,
    fd_set* exceptSet,
    MI_Uint64 timeoutUsec,
    MI_Boolean* keepRunning)
{
...
    do
    {
        r = select(n, readSet, writeSet, exceptSet, _tv);
    }
    while( (*keepRunning == MI_TRUE) && ( -1 == r ) && ( errno == EINTR ) );

    return r;
}
4992:   /opt/omi/bin/omiagent 9 10 --destdir / --providerdir /opt/omi/lib --loglevel WARNING
0000000000400000    712K r-x-- omiagent
00000000006b1000     40K rw--- omiagent
...
00007f846383f000   3308K r-x-- libSCXCoreProviderModule.so
00007f8463b7a000   2048K ----- libSCXCoreProviderModule.so
00007f8463d7a000    160K rw--- libSCXCoreProviderModule.so

@sarojcare
Copy link
Contributor

select system call in above code snapshot is a waiting call which waits for some evet to occur. Hence this code segment should not be a issue.

@abenbachir
Copy link

There is no debug symbols, so there is no way to get the OMI callstacks.

@cradockc
Copy link

cradockc commented May 9, 2019

I have the same issue here. When observing the process whilst it exhibits this issue I note the following:

strace -cwp 124261

strace: Process 124261 attached
...... some 10-15 seconds later
^Cstrace: Process 124261 detached
% time seconds usecs/call calls errors syscall


100.00 11.303628 1027603 11 select
0.00 0.000149 14 11 getppid


100.00 11.303777 22 total

So I don't think the issue is the omiagent code as such, but what or however the system reacts to the select call.
There was some talk of select and the highres timers somehow not playing ball, when the highres timers were first intrduced (c.f https://www.centos.org/forums/viewtopic.php?t=54235). So I'm testing a couple of servers with highres=off and see if this resolves the immediate issue.

@cradockc
Copy link

With kernel highres=off I can say we are still observing the 100% CPU behaviour whilst the agent is waiting for a request to complete. A few seconds after the request completes omiagent recovers and goes back to the expected sleep pattern.
With kernel highres=on it appears not to regain its composure and the only way to bring the agent back under control is to send it an interrupt (kill) signal, or restart the entire service.

@abenbachir
Copy link

Hi guys, please follow resolution steps in this documentation: https://github.com/microsoft/OMS-Agent-for-Linux/blob/master/docs/Troubleshooting.md#i-see-omiagent-using-100-cpu

@cradockc
Copy link

I just worked out the same here.

Also note https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1044666 which manifests itself after about 12 hours of running. When you find the thread that's running 100% and strace that it repeatedly attempts to locate '/etc/pki/nssdb/cert9.db-wal' and '/etc/pki/nssdb/cert9.db-journal', the longer the process runs the more iterations of the above tests occur. Their solution is to 'Fix: NSS now avoids calls to sdb_measureAccess in lib/softoken/sdb.c s_open if [environment variable] NSS_SDB_USE_CACHE is "yes" '

@JumpingYang001
Copy link
Contributor

Fixed in microsoft/pal@6c0c108.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants