Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

axosyslog:4.7.1 on CentOS 7: iv_fd_epoll_timerfd_poll: got error 1[Operation not permitted] #85

Open
mstopa-splunk opened this issue May 6, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@mstopa-splunk
Copy link

Hi team, do you support axosyslog:4.7.1 on CentOS 7?

We have the following error on start after upgrading:

sudo podman run ghcr.io/axoflow/axosyslog:4.7.1
iv_fd_epoll_timerfd_poll: got error 1[Operation not permitted]
iv_fd_epoll_timerfd_poll: got error 1[Operation not permitted]
@ikheifets-splunk
Copy link

ikheifets-splunk commented May 6, 2024

@bazsi @MrAnno it's seems that EOL of Cent OS 7 will be on 30 June 2024

@ikheifets-splunk
Copy link

ikheifets-splunk commented May 7, 2024

After talk with Balazs understood that setting IV_EXCLUDE_POLL_METHOD="epoll-timerfd epoll" for ivykis can solve problem.

In general we tested on typical AWS EC2 instances under CentOS 7 (7.9.2009, kernel 3.10.0)

P.S. Very strange but syslog-ng 4.6 working okay under same OS

@bazsi
Copy link
Member

bazsi commented May 8, 2024

Hmm.. there was an update to ivykis 0.43 which contains this patch:

buytenh/ivykis@491daf4

This should transparently fall back to using the older syscall as long as it returns ENOSYS (no such system call). But maybe this is not the case? Can you check with strace that it indeed invokes epoll_pwait2() and that it is returning EPERM instead of ENOSYS?

Quoting the manpage:

       epoll_pwait2() was added in Linux 5.11.

@bazsi
Copy link
Member

bazsi commented May 8, 2024

just confirmed that if running under a non-privileged container under centos7, we get EPERM instead of ENOSYS

that's why the autodetection for epoll_pwait2() does not work.

@rjha-splunk
Copy link

Thanks @bazsi for checking this, looks like we have a workaround then.

buytenh added a commit to buytenh/ivykis that referenced this issue May 16, 2024
Commit 491daf4 ("iv_fd_epoll: Add support for epoll_pwait2().")
added support for epoll_pwait2(), with a fallback to epoll_wait() in
case epoll_pwait2() is not supported by the kernel we are running on,
which would be indicated by epoll_pwait2() returning -ENOSYS.

Some reports (e.g. axoflow/axosyslog#85 ,
#33 (comment) )
suggest that some container technologies can cause -EPERM to be
returned for epoll_pwait2(), independently of whether or not
epoll_pwait2() is actually supported by the kernel we are running on,
and this trips us up because we don't currently handle -EPERM
gracefully, as we did not expect that we would have to do so.

Making system calls return -EPERM to indicate that they were filtered
out by a security policy framework seems somewhat dubious, especially
when considering the amount of application and user confusion generated
by system calls that are not documented as being able to fail with
-EPERM now suddenly being able to fail with -EPERM, but there is not
much we can do about this.

I would be against adding EPERM-as-ENOSYS fallbacks for every current
or future case where we handle ENOSYS, but:

1. it seems that this is the only case where this triggers;

2. upstream seems to agree that this EPERM behavior is a bug (see
   e.g. these links dug up by László Várady:
   containers/common#573 ,
   containers/podman#10337 ,
   opencontainers/runtime-spec#1087 ), so
   there will hopefully be no new cases of this in the future;

3. there's at least one container technology release (podman on
   CentOS 7) where this bug triggers and where the platform is
   sufficiently old to no longer be receiving updates, as pointed
   out by Balazs Scheidler, so this issue can't be fixed by users
   updating their container software.

Under these circumstances, adding a workaround on our end seems
reasonable, and this commit does so.

This issue was originally reported by @mstopa-splunk on GitHub.
Workaround originally by Balazs Scheidler.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
@MrAnno MrAnno added the bug Something isn't working label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants