Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive amount of logs in audit.log on a CIS-hardened systems caused by grafana-agent accessing log files inside LXD containers #52

Open
przemeklal opened this issue Jan 31, 2024 · 5 comments
Assignees
Labels
bug Something isn't working Priority: High

Comments

@przemeklal
Copy link
Member

Bug Description

On a CIS hardened (level 2) Charmed Openstack control node hosting 25 LXDs running Openstack control plane services, installing and running grafana-agent inside those LXDs caused massive amounts of logs to be written to audit.log on the host level (12G in less than a day, then it just ran out of disk space).

Pretty much all these "new" entries in audit.log are reports of grafana-agent accessing /var/log/.../*log files.

Typical entries in audit.log look like this one:

type=AVC msg=audit(1706697973.264:19350496): apparmor="DENIED" operation="capable" namespace="root//lxd-juju-5f7845-5-lxd-0_<var-snap-lxd-common-lxd>" profile="snap.grafana-agent.grafana-agent" pid=1591373 comm="agent" capability=2  capname="dac_read_search"
type=SYSCALL msg=audit(1706697973.264:19350496): arch=c000003e syscall=262 success=yes exit=0 a0=ffffffffffffff9c a1=c0054dc030 a2=c0040e5148 a3=0 items=1 ppid=4083 pid=1591373 auid=4294967295 uid=100000 gid=100000 euid=100000 suid=100000 fsuid=100000 egid=100000 sgid=100000 fsgid=100000 tty=(none) ses=4294967295 comm="agent" exe="/snap/grafana-agent/16/agent" key=(null)
type=CWD msg=audit(1706697973.264:19350496): cwd="/var/snap/grafana-agent/16"
type=PATH msg=audit(1706697973.264:19350496): item=0 name="/var/log/aodh/aodh-evaluator.log" inode=229979748 dev=fc:00 mode=0100644 ouid=100113 ogid=100119 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PROCTITLE msg=audit(1706697973.264:19350496): proctitle=2F736E61702F67726166616E612D6167656E742F31362F6167656E74002D636F6E6669672E657870616E642D656E76002D636F6E6669672E66696C65002F6574632F67726166616E612D6167656E742E79616D6C

Logs from /var/log/aodh/aodh-evaluator.log (and all other files logged in audit.log) are searchable in Loki and everything else looks fine. There aren't any related errors being reported in the logs of the grafana-agent running inside the LXD.

Additionally, not all files accessed by grafana-agent in the LXDs are reported in audit.log on the host level. The main difference seems to be ownership of log files and directories. For example, I see many logs reporting /var/log/aodh/*.log files, /var/log/barbican/.log files, etc. but nothing about /var/log/juju/*.log or /var/log/syslog.

Their ownership is as follows:

# no entries in audit.log for these
227484779 drwxr-xr-x   2 syslog    adm             4.0K Jan 29 00:08 juju         
227484432 -rw-r-----   1 syslog    adm             2.6M Jan 31 12:43 syslog           
# these are being reported constantly
227484930 drwxr-x---   2 barbican  adm              12K Jan 31 00:00 barbican         
227484609 -rw-rw-r--   1 root      utmp            286K Jan 31 12:34 lastlog     
227484694 -rw-rw----   1 hacluster haclient           0 Dec 25  2022 pacemaker.log
227484449 -rw-rw-r--   1 root      utmp            183K Jan 31 12:34 wtmp

It seems that as long as files are owned by syslog:adm, grafana-agent's syscalls are not recorded. Accessing files owned by root, barbican (OpenStack service user), hacluster users, results in massive amounts of audit logs.

This may or may not be related to group membership of these user accounts:

root@juju-5f7845-5-lxd-1:/var/log# groups barbican 
barbican : barbican
root@juju-5f7845-5-lxd-1:/var/log# groups hacluster 
hacluster : haclient
root@juju-5f7845-5-lxd-1:/var/log# groups root
root : root sudo lxd
root@juju-5f7845-5-lxd-1:/var/log# groups syslog
syslog : syslog adm tty

This massive audit.log spam may have catastrophic results, for example, if the CIS "4.1.2.3 Ensure system is disabled when audit logs are full" rule is in place, in the worst case it may just shut down the system after running out of space on the /var/log/audit partition.

The issue doesn't occur with filebeat for example, so it might be also related to grafana-agent being a snap.

Is there anything that can be tweaked in grafana-agent snap that could help with this?

Also, my recommendation is to avoid relating grafana-agent to Loki in any CIS-hardened deployments until this is resolved.

To Reproduce

Deploy grafana-agent in any Openstack control-plane LXD container running on a CIS-hardened host, relate it to Loki and watch /var/log/audit/audit.log.

Environment

CIS-hardened Ubuntu 20.04

grafana-agent  0.35.4         16     latest/stable  0x12b       -

Charmed Openstack focal/ussuri

Relevant log output

Snippets posted above

audit.log file sizes for the sake of completeness:
-r--r----- 1 root adm  9.3G Jan 30 00:00 audit.log-20240130_000001
-rw-r----- 1 root adm     0 Jan 30 00:00 audit.log.1
-rw-r----- 1 root adm  2.1G Jan 30 06:28 audit.log

Additional context

This is a potential blocker for grafana-agent deployments on CIS-hardened clouds.

@dstathis dstathis added bug Something isn't working Priority: High labels Feb 12, 2024
@dstathis
Copy link
Contributor

Can you please provide steps for deploying a usable cis_hardened (level2) system with juju?

@dstathis
Copy link
Contributor

dstathis commented Mar 5, 2024

Hi @przemeklal, I managed to relate grafana agent to a cis hardened system (cis_level2_server) and I did not see this behavior. Could you see if you can reproduce the issue?

@przemeklal
Copy link
Member Author

Deploying g-agent to a cis-hardened machine is not enough to reproduce it. You should also deploy LXCs on that machine and relate g-agent to the apps running in these containers:

juju deploy ch:ubuntu inner --to lxd:0 --series focal  # where 0 is your cis-hardened machine id
juju relate grafana-agent inner

Once g-agents inside these LXDs try to access stuff in /var/log, auditd log spam starts one level below, on the LXD "host".
FWIW, LXDs should also be hardened but it would be interesting to see what happens if they're not.

@dstathis dstathis self-assigned this Mar 21, 2024
@dstathis
Copy link
Contributor

I am still having issues reproducing this nested lxd setup. If someone could reach out and schedule some time to walk me through it that would be very helpful.

@dstathis
Copy link
Contributor

The issues seems to be solved by the classically confined snap. Waiting for approval in the snap store. https://forum.snapcraft.io/t/classic-version-for-the-grafana-agent-snap/40378?u=dstathis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority: High
Projects
None yet
Development

No branches or pull requests

2 participants