New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted #1498

Closed
alogoc opened this Issue Aug 4, 2016 · 2 comments

Comments

@alogoc

alogoc commented Aug 4, 2016

Issue Report

I am using the following services https://github.com/coreos/etcd/blob/master/contrib/systemd/etcd2-backup-coreos/etcd2-backup.service to backup my etcd2 on S3 via rclone container.

My version is slightly modified to add $DATE & $HOSTNAME on the path.

Unit]
Description=rclone powered etcd2 backup service
After=etcd2.service

[Service]
Type=oneshot

ExecStartPre=/usr/bin/rm -rf ${ETCD_BACKUP_DIR}
ExecStartPre=/usr/bin/mkdir -p ${ETCD_BACKUP_DIR}/member/snap
ExecStartPre=/usr/bin/echo ETCD_DATA_DIR: ${ETCD_DATA_DIR}
ExecStartPre=/usr/bin/echo ETCD_BACKUP_DIR: ${ETCD_BACKUP_DIR}
ExecStartPre=/usr/bin/echo HOSTNAME: ${HOSTNAME}
ExecStartPre=/usr/bin/etcdctl backup --data-dir=${ETCD_DATA_DIR} --backup-dir=${ETCD_BACKUP_DIR}
ExecStartPre=/usr/bin/touch ${ETCD_BACKUP_DIR}/member/snap/iamhere.txt

# Copy the last backup, in case the new upload gets corrupted
ExecStartPre=/bin/sh -c '/usr/bin/docker run --rm \
                              -v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
                              quay.io/coreos/rclone:latest --config /etc/rclone.conf --checksum=${RCLONE_CHECKSUM} \
                              copy ${RCLONE_ENDPOINT}/${HOSTNAME}/`date +%F`/%m ${RCLONE_ENDPOINT}/${HOSTNAME}/`date +%F`/%m_backup'

# Upload new backup
ExecStart=/bin/sh -c '/usr/bin/docker run --rm \
                          -v ${ETCD_BACKUP_DIR}:/etcd2backup \
                          -v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
                          quay.io/coreos/rclone:latest --config ${RCLONE_CONFIG_PATH} --checksum=${RCLONE_CHECKSUM} \
                          copy /etcd2backup/ ${RCLONE_ENDPOINT}/${HOSTNAME}/`date +%F`/%m/'

[Install]
WantedBy=multi-user.target

Everything was working fine for quite some time till now.

All nodes in the cluster have the very same error.

Error

journalctl -p err -b
Aug 04 15:27:47 kubeworker4 systemd[2000]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:27:58 kubeworker4 systemd[2166]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:28:06 kubeworker4 systemd[2314]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:28:08 kubeworker4 systemd[1]: Failed to start rclone powered etcd2 backup service.
Aug 04 15:38:13 kubeworker4 systemd[9813]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:38:18 kubeworker4 systemd[1]: Failed to start rclone powered etcd2 backup service.

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1068.8.0
VERSION_ID=1068.8.0
BUILD_ID=2016-07-18-0616
PRETTY_NAME="CoreOS 1068.8.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

What hardware/cloud provider/hypervisor is being used to run CoreOS?

VMware

SELINUX:

cat /etc/selinux/config  | grep SELINUX
# SELINUX can take one of these three values:
SELINUX=permissive
# SELINUXTYPE can take one of these four values:
SELINUXTYPE=mcs

user@.service:

cat /usr/lib/systemd/system/user@.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=User Manager for UID %i
After=systemd-user-sessions.service

[Service]
User=%i
PAMName=systemd-user
Type=notify
ExecStart=-/usr/lib/systemd/systemd --user
Slice=user-%i.slice
KillMode=mixed
Delegate=yes
TasksMax=infinity
@alogoc

This comment has been minimized.

alogoc commented Aug 18, 2016

Update:

Found out a workaround by using systemctl reset-failed. After manually clearing the failed unit, process start again and complete successfully. For the record, reboot never helped on this.

I suppose this is just temporary "fix" and most probably will occur again.

The only relevant thread I found was on fedora https://bugzilla.redhat.com/show_bug.cgi?id=911370 and should have been solved after systemd-207.

@dm0-

This comment has been minimized.

Member

dm0- commented Sep 9, 2016

Can you try writing the following contents to /etc/pam.d/systemd-user and see if that helps your issue?

account  include system-auth
session  required pam_loginuid.so
session  include system-auth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment