Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[stemcells>3312] syslogd on all VMs has memory leak #1537
We're observing syslogd using an excessive amount of memory on all stemcells >3312 (i.e. 3312.x series). This seems to be related to the version of rsyslogd, which got updated from
This prevents us from using most recent 3312.x stemcells which contain important security fixes.
dpkg_l.txt for 3312 stemcell has:
e.g. 3312.8 has:
This update probably happened because the base_os step to install packages on Ubuntu uses a PPA to install rsyslog which just contains 'latest v8 version' of rsyslog.
Here some numbers from random VMs showing that rsyslogd consumes way more memory in version 8.23.0 than it did in version 8.22.0:
Interestingly enough, most CF components continue to work, there are however other services that don't cope well if co-located with a memory hog.
Our 00-forwarder.conf contains:
which is basically what metron agent configures. I.e. we send logs via TCP.
We executed the following script on 2 VMs with stemcell 3312.7 and 3312:
Additional observation: our remote TCP log endpoint (an ELK) is not stable but seems to reset connections quite often.
Using stemcell 3312.7 (rsyslogd 8.23.0) we see increasing memory consumption and the following message in /var/log/syslog:
On stemcell 3312 (rsyslogd 8.22.0), memory consumption is stable and we don't see the messages above in /var/log/syslog.
Therefore our conclusion is that there is a memory leak in rsyslog 8.23.0 when the remote syslog endpoint is not stable.
Thank you for this detailed investigation.
We see that a version 8.24 was recently release, but the Changelog for that version makes no mention of a fix for the issue.
referenced this issue
Dec 15, 2016
What I have now to reproduce the problem on bosh-lite:
Then on that vm run a command to produce log entries:
Let me know if this is similar to what you're seeing.
rsyslog is configured to buffer, but in one of our environments, what happens is that
tries to buffer to
Resulting in permission errors:
The way to confirm that this is the case is to validate that the rsyslogd user is
Syslog was pinned back to 8.22 near after the time of this issue, and has remained at that version in subsequent stemcell series.…
On Tue, Oct 10, 2017 at 1:42 PM Paul Nath ***@***.***> wrote: What is the status of this issue? Has this been resolved? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1537 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHjFqncro9iYxck7LrWzRAYwJRSGv1tks5sq9a2gaJpZM4LM4XL> .
We'd like to propose releasing said version pin.
We (the team currently maintaining
The issue is that "slow increase in allocated memory" isn't necessarily a memory leak, especially if the memory allocation:
This is what we believe was actually happening in the replication case using HA Proxy above. HA Proxy without a functioning backend accepts and then quickly kills TCP connections, never actually accepting data. When we ran the replication case for longer than ten minutes, the memory usage stabilized and held steady at around 1.4% of available memory. Rsyslog's configured to enqueue actions that haven't been completed in a linked list. If all the replication shows is the growth of action queues, we'd expect the memory reservation to drop as soon as a single good connection was made - and when we performed that experiment, that's exactly what we saw.
Not only that, but the issue "replicates" exactly the same way using the reported method with both the "before/unaffected"
loading the impstats module will show you the queue status. even a short connect/disconnect to haproxy can have rsyslog send some messages (and with TCP, it won't know that they don't get through, you need to use relp to be sure of delivery), so this would look very differently with different volumes of logs being sent.
@mfine30 FYI, the team did some previous investigation in bumping rsyslog on the 3469.x through story #155528288. The intent was to let people test out that stemcell before bumping it everywhere, but I don't think we ended up getting much feedback on it.
For what it's worth, we're still installing this same rsyslog in the Xenial stemcell (which is probably bad)...
$ wget -qO- https://s3.amazonaws.com/bosh-aws-light-stemcells/light-bosh-stemcell-97-aws-xen-hvm-ubuntu-xenial-go_agent.tgz | tar -xOzf- packages.txt | grep rsyslog ii rsyslog 8.22.0-0adiscon1trusty1 amd64 a rocket-fast system for log processing ii rsyslog-gnutls 8.22.0-0adiscon1trusty1 amd64 TLS protocol support for rsyslog ii rsyslog-mmjsonparse 8.22.0-0adiscon1trusty1 amd64 Parsing/handling of CEE/Lumberjack JSON messages in rsyslog ii rsyslog-relp 8.22.0-0adiscon1trusty1 amd64 RELP protocol support for rsyslog
We may want to consider if this is something we should bump in a minor version, or if it warrants a full new version.