Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tallow eating 100% of 1 CPU thread #15

Open
grahamwhaley opened this issue Nov 1, 2019 · 9 comments
Open

tallow eating 100% of 1 CPU thread #15

grahamwhaley opened this issue Nov 1, 2019 · 9 comments

Comments

@grahamwhaley
Copy link

I noticed in top that tallow was consuming 100% of a cpu thread.

A gdb attach shows its stack as:

(gdb) where
#0  0x00007fa3ae02fd4c in ?? () from /usr/lib64/libsystemd.so.0
#1  0x00007fa3ae0301be in ?? () from /usr/lib64/libsystemd.so.0
#2  0x00007fa3ae03e057 in sd_journal_get_data () from /usr/lib64/libsystemd.so.0
#3  0x0000555ba651b775 in ?? ()
#4  0x00007fa3ae1472c3 in __libc_start_main () from /usr/lib64/haswell/libc.so.6
#5  0x0000555ba651baee in ?? ()

A continue/stop then showed it as:

(gdb) where
#0  0x00007fa3ae0ca578 in ?? () from /usr/lib64/libpcre.so.1
#1  0x00007fa3ae0db08b in pcre_exec () from /usr/lib64/libpcre.so.1
#2  0x0000555ba651b801 in ?? ()
#3  0x00007fa3ae1472c3 in __libc_start_main () from /usr/lib64/haswell/libc.so.6
#4  0x0000555ba651baee in ?? ()

The tallow journal looks like:

 # journalctl -u tallow
-- Logs begin at Thu 2019-10-31 10:18:08 GMT, end at Fri 2019-11-01 17:28:18 GMT. --
Oct 31 13:57:04 skull tallow[216312]: Journal was rotated, resetting
Oct 31 18:00:40 skull systemd[1]: Stopping Tallow Service...
Oct 31 18:00:40 skull systemd[1]: tallow.service: Succeeded.
Oct 31 18:00:40 skull systemd[1]: Stopped Tallow Service.
-- Reboot --
Nov 01 09:53:49 skull systemd[1]: Started Tallow Service.
Nov 01 09:53:49 skull tallow[397]: /usr/share/tallow/sshd.json: 10 patterns
Nov 01 09:53:49 skull tallow[397]: Skipped reading /etc/tallow: No such file or directory
Nov 01 09:53:49 skull tallow[397]: Loaded 10 patterns total
Nov 01 09:53:49 skull tallow[397]: tallow 18 Started
Nov 01 10:06:34 skull tallow[397]: Journal was rotated, resetting
Nov 01 10:46:00 skull systemd[1]: Stopping Tallow Service...
Nov 01 10:46:00 skull systemd[1]: tallow.service: Succeeded.
Nov 01 10:46:00 skull systemd[1]: Stopped Tallow Service.
Nov 01 10:46:00 skull systemd[1]: Started Tallow Service.
Nov 01 10:46:00 skull tallow[134447]: /usr/share/tallow/sshd.json: 10 patterns
Nov 01 10:46:00 skull tallow[134447]: Skipped reading /etc/tallow: No such file or directory
Nov 01 10:46:00 skull tallow[134447]: Loaded 10 patterns total
Nov 01 10:46:00 skull tallow[134447]: tallow 18 Started
Nov 01 14:44:35 skull tallow[134447]: Journal was rotated, resetting

The only 'interesting' thing on this machine is that it is running a single node k8s cluster.

The machine is:

# cat /etc/os-release
NAME="Clear Linux OS"
VERSION=1
ID=clear-linux-os
ID_LIKE=clear-linux-os
VERSION_ID=31460
PRETTY_NAME="Clear Linux OS"
ANSI_COLOR="1;35"
HOME_URL="https://clearlinux.org"
SUPPORT_URL="https://clearlinux.org"
BUG_REPORT_URL="mailto:dev@lists.clearlinux.org"
PRIVACY_POLICY_URL="http://www.intel.com/privacy"
@chuckn408
Copy link

What happens when you kill tallow?
-Does another process rise in CPU?
-Does idle CPU increase?
-What does your process tree look like when this occurs?

@grahamwhaley
Copy link
Author

Oh, that was months ago ;-), and I don't think I've (knowingly) seen it since.
iirc, we I killed/restarted tallow, and it behaved again. I think we considered if it was something to do with log rotation/wrap at the time - maybe @ahkok remembers or has some further ideas...

@ahkok
Copy link
Contributor

ahkok commented Feb 18, 2020

I myself encountered the issue a few months back, which is why I'm keeping this open. I have not yet determined whether this bug is gone now (e.g. due to some of the recent large changes) or not

@NicolaiThagaard
Copy link

Hi. I came across the exact same issue as @grahamwhaley did. The problem was solved by killing and restarting tallow. I have noticed it will happen when you are in a SSH-session for too long (over 30 hours (Don't ask me why I was in a session for so long)). This issue is still a problem on the newest Clear Linux OS, and I have seen the error on my system multiple times now. it would be really nice if you could take a look at it ;-)

My machine:

NAME="Clear Linux OS" VERSION=1 ID=clear-linux-os ID_LIKE=clear-linux-os VERSION_ID=32820 PRETTY_NAME="Clear Linux OS" ANSI_COLOR="1;35" HOME_URL="https://clearlinux.org" SUPPORT_URL="https://clearlinux.org" BUG_REPORT_URL="mailto:dev@lists.clearlinux.org" PRIVACY_POLICY_URL="http://www.intel.com/privacy" BUILD_ID=32820

@gybfefe
Copy link

gybfefe commented Jan 18, 2022

I second that in 2022 on Clear Linux, so the symptom is tallow restarts who knows why, and gets 100% on its little thread. Thank God CPUs have more cores nowadays! :)

@erkexzcx
Copy link

erkexzcx commented Feb 11, 2022

Not good

image

Also not good

image

Not good as well (average is about 77 celsius), intel nuc i7.

image

EDIT: Intel NUC server at home, no firewalls, no access from the outside (only 80/443 ports for obvious reasons).

@geckolinux
Copy link

Same issue here, noticed performance was a bit off on a huge batch job in Ruby that I run frequently, and turns out tallow was using an entire core. This is on a VPS that I access over SSH. I had been running the job during the past ~72 hours, but I recently rebooted the VPS and started the job again, this is the first time I've noticed tallow runaway. But I have noticed performance inconsistencies in the past too, so I'll keep an eye on it.

@Renegade-Master
Copy link

I also noticed tallow eating up a core's worth of CPU. Sent a SIGTERM to the process, and when it restarted is was using "normal" resources.
image


Information (reduced):

$ cat /etc/os-release
NAME="Clear Linux OS"
VERSION=1
ID=clear-linux-os
VERSION_ID=39050
BUILD_ID=39050
$ lscpu
Architecture:           x86_64
  CPU op-mode(s):       32-bit, 64-bit
  Address sizes:        39 bits physical, 48 bits virtual
  Byte Order:           Little Endian
CPU(s):                 2
  On-line CPU(s) list:  0,1
Vendor ID:              GenuineIntel
  Model name:           Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          1
    Stepping:           9

@chriselrod
Copy link

I just observed this as well

$ cat /etc/os-release 
NAME="Clear Linux OS"
VERSION=1
ID=clear-linux-os
ID_LIKE=clear-linux-os
VERSION_ID=39930
PRETTY_NAME="Clear Linux OS"
ANSI_COLOR="1;35"
HOME_URL="https://clearlinux.org"
SUPPORT_URL="https://clearlinux.org"
BUG_REPORT_URL="mailto:dev@lists.clearlinux.org"
PRIVACY_POLICY_URL="http://www.intel.com/privacy"
BUILD_ID=39930
$ lscpu 
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  36
  On-line CPU(s) list:   0-35
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  18
    Socket(s):           1
    Stepping:            7
    CPU(s) scaling MHz:  72%
    CPU max MHz:         4800.0000
    CPU min MHz:         1200.0000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants