Skip to content
This repository has been archived by the owner. It is now read-only.

Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted #1498

Closed
alogoc opened this issue Aug 4, 2016 · 2 comments
Closed

Comments

@alogoc
Copy link

@alogoc alogoc commented Aug 4, 2016

Issue Report

I am using the following services https://github.com/coreos/etcd/blob/master/contrib/systemd/etcd2-backup-coreos/etcd2-backup.service to backup my etcd2 on S3 via rclone container.

My version is slightly modified to add $DATE & $HOSTNAME on the path.

Unit]
Description=rclone powered etcd2 backup service
After=etcd2.service

[Service]
Type=oneshot

ExecStartPre=/usr/bin/rm -rf ${ETCD_BACKUP_DIR}
ExecStartPre=/usr/bin/mkdir -p ${ETCD_BACKUP_DIR}/member/snap
ExecStartPre=/usr/bin/echo ETCD_DATA_DIR: ${ETCD_DATA_DIR}
ExecStartPre=/usr/bin/echo ETCD_BACKUP_DIR: ${ETCD_BACKUP_DIR}
ExecStartPre=/usr/bin/echo HOSTNAME: ${HOSTNAME}
ExecStartPre=/usr/bin/etcdctl backup --data-dir=${ETCD_DATA_DIR} --backup-dir=${ETCD_BACKUP_DIR}
ExecStartPre=/usr/bin/touch ${ETCD_BACKUP_DIR}/member/snap/iamhere.txt

# Copy the last backup, in case the new upload gets corrupted
ExecStartPre=/bin/sh -c '/usr/bin/docker run --rm \
                              -v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
                              quay.io/coreos/rclone:latest --config /etc/rclone.conf --checksum=${RCLONE_CHECKSUM} \
                              copy ${RCLONE_ENDPOINT}/${HOSTNAME}/`date +%F`/%m ${RCLONE_ENDPOINT}/${HOSTNAME}/`date +%F`/%m_backup'

# Upload new backup
ExecStart=/bin/sh -c '/usr/bin/docker run --rm \
                          -v ${ETCD_BACKUP_DIR}:/etcd2backup \
                          -v ${RCLONE_CONFIG_PATH}:/etc/rclone.conf \
                          quay.io/coreos/rclone:latest --config ${RCLONE_CONFIG_PATH} --checksum=${RCLONE_CHECKSUM} \
                          copy /etcd2backup/ ${RCLONE_ENDPOINT}/${HOSTNAME}/`date +%F`/%m/'

[Install]
WantedBy=multi-user.target

Everything was working fine for quite some time till now.

All nodes in the cluster have the very same error.

Error

journalctl -p err -b
Aug 04 15:27:47 kubeworker4 systemd[2000]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:27:58 kubeworker4 systemd[2166]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:28:06 kubeworker4 systemd[2314]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:28:08 kubeworker4 systemd[1]: Failed to start rclone powered etcd2 backup service.
Aug 04 15:38:13 kubeworker4 systemd[9813]: user@0.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 04 15:38:18 kubeworker4 systemd[1]: Failed to start rclone powered etcd2 backup service.

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1068.8.0
VERSION_ID=1068.8.0
BUILD_ID=2016-07-18-0616
PRETTY_NAME="CoreOS 1068.8.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

What hardware/cloud provider/hypervisor is being used to run CoreOS?

VMware

SELINUX:

cat /etc/selinux/config  | grep SELINUX
# SELINUX can take one of these three values:
SELINUX=permissive
# SELINUXTYPE can take one of these four values:
SELINUXTYPE=mcs

user@.service:

cat /usr/lib/systemd/system/user@.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=User Manager for UID %i
After=systemd-user-sessions.service

[Service]
User=%i
PAMName=systemd-user
Type=notify
ExecStart=-/usr/lib/systemd/systemd --user
Slice=user-%i.slice
KillMode=mixed
Delegate=yes
TasksMax=infinity
@alogoc
Copy link
Author

@alogoc alogoc commented Aug 18, 2016

Update:

Found out a workaround by using systemctl reset-failed. After manually clearing the failed unit, process start again and complete successfully. For the record, reboot never helped on this.

I suppose this is just temporary "fix" and most probably will occur again.

The only relevant thread I found was on fedora https://bugzilla.redhat.com/show_bug.cgi?id=911370 and should have been solved after systemd-207.

@dm0-
Copy link

@dm0- dm0- commented Sep 9, 2016

Can you try writing the following contents to /etc/pam.d/systemd-user and see if that helps your issue?

account  include system-auth
session  required pam_loginuid.so
session  include system-auth
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.