Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: better labeling control over cgroupfs #39

Closed
pcmoore opened this issue Mar 11, 2018 · 29 comments

Comments

Projects
None yet
6 participants
@pcmoore
Copy link
Member

commented Mar 11, 2018

Taken from RHBZ 1553803 created by @rhatdan:

When creating a new directory in a cgroup file system, the new directory by default should get the label of the parent directory.

If I label a directory

/sys/fs/cgroup/unified/system.slice/docker-UUID

system_u:object_r:container_file_t:s0:c1,c2

Now I go into this directory and create a new directory.

The directory ends up labeled as

system_u:object_r:cgroup_t:s0

Where it should have been labeled

system_u:object_r:container_file_t:s0:c1,c2

This bug is preventing us for further locking down containers, by allowing them to modify partial hiarchies.

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2018

A follow up from @stephensmalley:

I think that supporting this requires patching kernfs (used internally by cgroup) to call security_inode_init_security() on newly created files. See mm/shmem.c:shmem_mknod() for an example of how we do this for tmpfs.

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented Mar 11, 2018

@stephensmalley is likely referring to commit 570bc1c (below), there have been some tweaks since it was originally introduced, but this is the starting point:

commit 570bc1c2e5ccdb408081e77507a385dc7ebed7fa
Author: Stephen Smalley
Date:   Fri Sep 9 13:01:43 2005 -0700

[PATCH] tmpfs: Enable atomic inode security labeling

This patch modifies tmpfs to call the inode_init_security LSM hook to set
up the incore inode security state for new inodes before the inode becomes
accessible via the dcache.

As there is no underlying storage of security xattrs in this case, it is
not necessary for the hook to return the (name, value, len) triple to the
tmpfs code, so this patch also modifies the SELinux hook function to
correctly handle the case where the (name, value, len) pointers are NULL.

The hook call is needed in tmpfs in order to support proper security
labeling of tmpfs inodes (e.g.  for udev with tmpfs /dev in Fedora).  With
this change in place, we should then be able to remove the
security_inode_post_create/mkdir/...  hooks safely.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

@pcmoore pcmoore self-assigned this Mar 29, 2018

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2018

Some quick notes so I don't forget this later ...

It appears that adding security_inode_init_security() to kernfs_init_inode() should get us what we need. Some quick thoughts on how we would call the security_inode_init_security():

security_inode_init_security(inode, dir, name, callback, fs_data)
  • inode: the inode passed to kernfs_init_inode()
  • dir: the inode associated with kn->parent, we can probably just call kernfs_get_inode(sb, kn->parent)
  • name: a qstr'd version of kn->name
  • callback: it looks like we need to write a callback, we can probably borrow heavily from shmem_initxattrs() as both use simple_xattrs
  • fs_data: I believe we can just pass NULL here

These are just quick notes on this, I haven't written anything, or dug too deply yet, it may all fall apart very quickly once I start putting this together.

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Mar 29, 2018

I'm not sure kernfs_init_inode() will work. We only want this to be called for directory/file creation operations, not for lookups, and kernfs_init_inode() gets called whenever an inode is created, which can also simply happen when a file that was previously created has its inode evicted and is then subsequently looked up again. In that situation, kernfs_refresh_inode() will extract the security context from the kn->iattr->ia_secdata/ia_secdata_len and apply it to the new inode via security_inode_notifysecctx(). Optimally we'd only call security_inode_init_security() from some higher level function(s) that are only invoked upon mkdir/creat/... calls. For shmem, this was shmem_mknod. Don't know if such a function exists in kernfs or perhaps it exists in the fs layer on top of kernfs (e.g. sysfs) instead. It's also better to call in a higher level function where parent inode (and file dentry) is already available; calling kernfs_get_inode() on the parent could create sticky locking scenarios. Agree that we should be able to copy shmem_initxattrs() and tweak for callback.

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented May 2, 2018

Looking at this again while I've got a few minutes, more notes below:

Based on the above, it looks like we want to hook kernfs_new_node() / __kernfs_new_node() with the possibility of special work in kernfs_create_root() depending on where the hook is placed.

@stephensmalley

This comment has been minimized.

Copy link
Member

commented May 2, 2018

Hmm...no inodes available there, which is what security_inode_init_security() needs. Looks like inodes might not even be instantiated until subsequent lookup. Looking at this again, we might need a new LSM hook that takes kernfs_node and mode instead of parent dir and new file inodes. Also, on second look, simple xattrs are NOT used by kernfs for security xattrs; kernfs_security_xattr_set() instead saves the context to kn->iattr->ia_secdata/ia_secdata_len, and this is later extracted by kernfs_refresh_inode() and set on the inode when it is instantiated. So if we call a new LSM hook from kernfs_new_node(), get a context, and stash it in ia_secdata, then it should be assigned to the inode automatically when the inode is created. OTOH, this means that we'll be allocating memory for the context for every kernfs node and using that on refresh, unlike the current situation where we only do this when userspace explicitly sets a context on a sysfs (or cgroup) file; otherwise we just let SELinux internally label it based on genfscon. And we have to be careful to not break other filesystems built on kernfs (e.g. sysfs) by any changes we make here. If we truly only want to alter cgroup behavior, that would argue for making the changes in cgroup_mkdir and similar. Not sure though how easily we can then access kernfs internals. This seems increasingly ugly.

@stephensmalley

This comment has been minimized.

Copy link
Member

commented May 2, 2018

BTW we already have one variant of security_inode_init_security(), security_dentry_init_security(), which is similar (no inode available yet, only a dentry). Used by nfsv4. Here we have neither, just a kernfs_node.

@doverride

This comment has been minimized.

Copy link

commented Aug 8, 2018

The default{user,role,type,range} rules should ideally apply here. Just to note: Fedora uses defaultrange target which is not the default.

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2018

Hi all,

I have been trying to find a solution to this problem for a while now and it seems to me there are two ways to implement this:

  1. Call security_inode_init_security() on new kernfs inodes whose backing kernfs nodes do not yet have a security context associated.

    Due to the nature of kernfs (it merely maps an internal hierarchical structure into a virtual filesystem, its nodes can be created/removed while it is not mounted anywhere, and the same structure can be mounted under multiple mountpoints, sharing basic inode attributes including LSM security context), the inodes are usually not created at the time that the backing kernfs node is created, but at the time the userspace (or kernel internally) does a lookup on the parent inode.

    For the basic cgroupfs+SELinux+genfscon scenario this doesn't really matter, but generally it means that whoever is the first to access the given node is the one whose subject context is used to determine the target context.

  2. Treat the raw kernfs nodes as a new object class and provide special LSM hooks and SELinux policy rules for them.

    We cannot treat kernfs nodes themselves as files, because that's not what they really are. They have their relative paths, some file-like attributes (mapped to the mounted files) and that's about it.

    Note that to make it possible to meaningfully label them, there needs to be a way to distinguish e.g. cgroup nodes from sysfs ones, i.e. each kernfs tree needs to be somehow globally named by its creator.

    This would be the most clean way to solve the problem, but also would be more difficult to implement and would require coordination with policy changes (plus some way to provide some forward/backward compatibility maybe).

FWIW, I managed to implement a proof-of-concept version of solution 1, that is surprisingly relatively simple, makes the context inheritance work, and doesn't seem to break anything. WIP patch is available here:
https://gitlab.com/omos/linux-public/commit/46eaa9fdd55b3aa0a9ce20a8dc5ec39abf82db84

I'm almost sure I did't do everything right (especially the mount.c changes worry me), but it might be a good starting point.

@pcmoore, @stephensmalley: Is it possible that solution 1 could be ever considered acceptable or should I scrap it completely, roll up my sleeves, and instead work on solution 2? Or would you rather see a different approach?

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

I'd prefer a focused solution on cgroup itself, possibly in cgroup_mkdir() with some kind of helper function in kernfs for setting the context. The concerns I have with solution 1 and the proof-of-concept patch are:

  1. labeling shouldn't be dependent on the first process to look up the inode. Even for SELinux, this could affect the user/role/level of the inode depending on policy, and even more so for Smack.
  2. ignoring failures while setting the secdata seems undesirable/unsafe,
  3. we do not want to have to allocate and store a separate copy of the attribute in kernfs for every inode, only for those that are explicitly set via setxattr() or dynamically created.

Solution 2 seems overly complicated for little gain.

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Oct 19, 2018

Addendum to point 1 in the previous comment: labeling on first lookup could even affect the type of the inode if there is a type_transition rule in policy. Userspace-created files need to be labeled upon creation based on the creator's context as well as the parent directory's context.
Addendum to point 3 in the previous comment: I guess all kernfs nodes are dynamically created by nature, but the point is that we only need/want to store a copy of the attribute value separately if it would differ from the value we would get from a policy genfscon lookup (security_genfs_sid), which today only occurs if explicitly set by userspace using setxattr(). Storing it for every inode will get costly especially for sysfs.

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Oct 30, 2018

OK, I'll look into doing it in such a way that security_inode_init() is called as part of the creation. However this will be tricky, because we will need to force the creation of all inodes that are created during a cgroupfs mkdir/mount operation and dealing with inodes is intentionally hidden away from kernfs consumers...

Ad 1: Yeah, I guess this solution won't really fly... We could perhaps blacklist cgroupfs from such rules (I believe they wouldn't work right now either), but obviously that would be ugly...

Ad 2: I could easily fix that, I just didn't want to change the existing function signatures too much.

Ad 3: I'm not entirely sure how this could be achieved. How can we distinguish if the context returned by security_inode_init_security() is "default" or not?

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

I'm not sure we necessarily have to create the inodes immediately. We could introduce another LSM hook as noted earlier that works without needing the inode, like selinux_dentry_init_security(), and stash the resulting context in the kn->iattr->ia_secdata/secdata_len for later use by kernfs_refresh_inode(). Fundamentally the only inputs we need are the parent dir SID/context, the new file name and the mode. Can we mirror the handling for setting the uid/gid (ala cgroup_kn_set_ugid)?
I guess the question is when is it even possible for the context to be other than the one returned by security_genfs_sid(). Previously it was only possible if explicitly set by userspace via setxattr(), in which case we stash it in ia_secdata/secdata_len and use that upon refresh. The case in view here is that userspace has set the context on a cgroup directory and is now creating a subdirectory within that directory. So in that situation the parent kernfs node will have a non-NULL ia_secdata/secdata_len. Can we use that to determine when we actually need to bother with calling the new hook? Actually, is the property we want just that we copy down the ia_secdata/secdata_len from parent to child upon file creations if set? In which case we don't need a LSM hook at all?

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2018

I'm not sure we necessarily have to create the inodes immediately. We could introduce another LSM hook as noted earlier that works without needing the inode, like selinux_dentry_init_security(), and stash the resulting context in the kn->iattr->ia_secdata/secdata_len for later use by kernfs_refresh_inode(). Fundamentally the only inputs we need are the parent dir SID/context, the new file name and the mode.

Looking at selinux_determine_inode_label(), there are some decisions made based on the superblock, but the cgroup_mkdir() function doesn't have a reference to the superblock via which the creation is done. Should we ignore the superblock-dependent logic entirely in the new hook (and just do the security_transition_sid branch)? Or do we need to break the kernfs abstraction and expose the inode/dentry/superblock in kernfs callbacks?

[...] So in that situation the parent kernfs node will have a non-NULL ia_secdata/secdata_len. Can we use that to determine when we actually need to bother with calling the new hook?

OK, I think we could make that work somehow... But first I'd like to have the above and below concerns resolved.

Actually, is the property we want just that we copy down the ia_secdata/secdata_len from parent to child upon file creations if set? In which case we don't need a LSM hook at all?

My concern about such approach is that in general other LSMs may have a different logic for assigning labels to new files than just simple inheritance. In fact, even SELinux has some alternative behaviors depending on the sbsec flags (although I'm not sure if these are relevant here).

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Dec 4, 2018

Looking at selinux_determine_inode_label(), there are some decisions made based on the superblock, but the cgroup_mkdir() function doesn't have a reference to the superblock via which the creation is done. Should we ignore the superblock-dependent logic entirely in the new hook (and just do the security_transition_sid branch)? Or do we need to break the kernfs abstraction and expose the inode/dentry/superblock in kernfs callbacks?

I think we likely need to take the superblock into account, particularly to avoid breaking any users who are performing context= mounts with cgroup (if there are any).

The other approach would be to save creator cred upon cgroup_mkdir() and defer label determination to inode creation (first lookup), but using the creator cred. Not sure about the tradeoffs.

My concern about such approach is that in general other LSMs may have a different logic for assigning labels to new files than just simple inheritance. In fact, even SELinux has some alternative behaviors depending on the sbsec flags (although I'm not sure if these are relevant here).

Yes, that's valid. So we at least need a hook so that other LSMs can provide different behavior.

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2018

I think we likely need to take the superblock into account, particularly to avoid breaking any users who are performing context= mounts with cgroup (if there are any).

Seems like cgroupfs cannot even be mounted with the context option:

# mkdir cgroupmnt
# mount -o context=system_u:object_r:tmpfs_t:s0 -t cgroup2 cgroup2 cgroupmnt
mount: /mnt/testarea/test/cgroup: wrong fs type, bad option, bad superblock on cgroup2, missing codepage or helper program, or other error.
# dmesg | tail -n 1
[ 1684.004502] cgroup: cgroup2: unknown option ""

The same with rootcontext, defcontext, and fscontext...

Nonetheless, there are code paths that can create kernfs nodes (also in cgroup) in a context without any reference to the superblock. So if we insist on adding a hook that requires a superblock reference, we could end up in a situation where the inheritance would mostly work, but sometimes not. If we respect the superblock properties only in the cases where we have it (e.g. mkdir calls), then we end up again with inconsistent behavior (superblock options not always respected).

Considering the above, I would say the best solution is to add a hook that takes just the parent context (taken from kernfs_node::iattr->ia_secdata[_len]), name, and mode and returns the new context. In SELinux, it would just do a simple security_context_to_sid() -> security_transition_sid() -> security_sid_to_context_force() chain.

I have put together a PoC here (it is probably not commitable as-is, but maybe it is a starting point):
https://gitlab.com/omos/linux-public/compare/selinux-next...selinux-fix-cgroupfs-v5

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Dec 10, 2018

The inability to perform a context mount with cgroup seems like a bug to me.

@rhatdan

This comment has been minimized.

Copy link

commented Dec 12, 2018

Any chance it was already mounted, and you were attempting to mount it a second time with a different label?

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2018

Yes, it was already mounted as usual under /sys/fs/cgroup/unified and I was creating a second mount. I retried it now on an eudyptula-boot VM [1], where I don't have cgroupfs mounted after boot, but I got the same error:

# mount
rootshare on / type 9p (ro,relatime,sync,dirsync,access=user,trans=virtio,version=9p2000.u)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,seclabel)
tmpfs on /var/tmp type tmpfs (rw,nosuid,nodev,relatime,seclabel)
tmpfs on /var/log type tmpfs (rw,nosuid,nodev,relatime,seclabel)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime,seclabel)
configshare on /tmp/config type 9p (rw,relatime,sync,dirsync,access=any,trans=virtio,version=9p2000.u)
proc on /proc type proc (rw,relatime)
sys on /sys type sysfs (rw,relatime,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
homeshare on /root type 9p (rw,relatime,sync,dirsync,access=0,trans=virtio)
moduleshare on /usr/lib/modules type 9p (ro,relatime,sync,dirsync,access=0,trans=virtio)
devtmpfs on /dev type devtmpfs (rw,relatime,seclabel,size=10240k,nr_inodes=122988,mode=755)
devpts on /dev/pts type devpts (rw,relatime,seclabel,mode=600,ptmxmode=000)
tmpfs on /etc/resolv.conf type tmpfs (rw,nosuid,nodev,relatime,seclabel)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /var/lib/selinux type tmpfs (rw,nosuid,nodev,relatime,seclabel)
tmpfs on /etc/selinux type tmpfs (rw,nosuid,nodev,relatime,seclabel)
# mount -n -t tmpfs tmpfs /sys/fs/cgroup
# mkdir /sys/fs/cgroup/unified
# mount -n -o context=system_u:object_r:tmpfs_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified
[   31.575952] cgroup: cgroup2: unknown option ""
mount: /sys/fs/cgroup/unified: wrong fs type, bad option, bad superblock on cgroup2, missing codepage or helper program, or other error.

[1] https://github.com/vincentbernat/eudyptula-boot

@stephensmalley

This comment has been minimized.

Copy link
Member

commented Dec 12, 2018

The context mount options are extracted via security_sb_copy_data() -> selinux_sb_copy_data(). They are removed from the option string passed on to the filesystem code. If the only mount options are context ones, then we'll end up with an empty string. Looks like filesystems like ext4 will then ignore an empty string, whereas cgroup is treating this as an error? This doesn't appear to be a recent bug but should probably be fixed at some point.

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

Looks like this one-line patch fixes that particular issue:

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6aaf5dd5383b..8cb616232035 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1744,7 +1744,7 @@ static int parse_cgroup_root_flags(char *data, unsigned int *root_flags)
 
        *root_flags = 0;
 
-       if (!data)
+       if (!data || *data == '\0')
                return 0;
 
        while ((token = strsep(&data, ",")) != NULL) {

Anyway, after applying the patch over selinux/next I played a bit with it and I noticed that with a context= mount, cgroupfs still allows me to relabel a file/directory, whereas with tmpfs I get -ENOTSUP... Looks like the logic in selinux_is_sblabel_mnt() should be revised - shouldn't it always return false (at least) if sbsec->behavior == SECURITY_FS_USE_MNTPOINT, regardless of whether the 'special handling' part evaluates to true?

Also, I was thinking whether we couldn't still make cgroupfs work with context= mounts by patching the right hooks to (conditionally) override the actual inode's context based on the superblock properties and ignoring the context we get from kernfs via selinux_inode_notifysecctx(). Basically we would need to handle this in the hook that gets called when the security struct is being initialized for an inode of an existing filesystem object (I don't remember which hook it is right now...) and in selinux_inode_notifysecctx()... Dumping these thoughts here while I dig into it deeper...

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2018

[...] Basically we would need to handle this in the hook that gets called when the security struct is being initialized for an inode of an existing filesystem object (I don't remember which hook it is right now...) [...]

Update: this is selinux_d_instantiate(), which calls inode_doinit_with_dentry(), which already does all the right things. So we should be able to fix the cgroup context= mounting by just tweaking selinux_inode_notifysecctx() and somehow resolving the selinux_is_sblabel_mnt() issue.

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 9, 2019

kernfs: Initialize security of newly created nodes
Use the new security_object_init_security() hook to allow LSMs to
possibly assign a non-default security context to newly created nodes
based on the context of their parent node.

This fixes an issue with cgroupfs under SELinux, where newly created
cgroup subdirectories would not inherit its parent's context if it had
been set explicitly to a non-default value (other than the genfs context
specified by the policy). This can be reproduced as follows:

    # mkdir /sys/fs/cgroup/unified/test
    # chcon -R system_u:object_r:cgroup_t:s0:c123 /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.controllers
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.max.depth
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.max.descendants
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.procs
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.stat
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.subtree_control
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.threads
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0      0 Jan  8 04:54 init.scope
    drwxr-xr-x. 25 root root system_u:object_r:cgroup_t:s0      0 Jan  8 04:54 system.slice
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0:c123 0 Jan  8 04:59 test
    drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0      0 Jan  8 04:55 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan  8 05:10 /sys/fs/cgroup/unified/test/subdir

Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:cgroup_t:s0:c123 0 Jan  8 05:10 /sys/fs/cgroup/unified/test/subdir

Link: SELinuxProject/selinux-kernel#39
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 10, 2019

kernfs: Initialize security of newly created nodes
Use the new security_object_init_security() hook to allow LSMs to
possibly assign a non-default security context to newly created nodes
based on the context of their parent node.

This fixes an issue with cgroupfs under SELinux, where newly created
cgroup subdirectories would not inherit its parent's context if it had
been set explicitly to a non-default value (other than the genfs context
specified by the policy). This can be reproduced as follows:

    # mkdir /sys/fs/cgroup/unified/test
    # chcon -R system_u:object_r:cgroup_t:s0:c123 /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.controllers
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.max.depth
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.max.descendants
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.procs
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.stat
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.subtree_control
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0      0 Jan  8 05:00 cgroup.threads
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0      0 Jan  8 04:54 init.scope
    drwxr-xr-x. 25 root root system_u:object_r:cgroup_t:s0      0 Jan  8 04:54 system.slice
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0:c123 0 Jan  8 04:59 test
    drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0      0 Jan  8 04:55 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan  8 05:10 /sys/fs/cgroup/unified/test/subdir

Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:cgroup_t:s0:c123 0 Jan  8 05:10 /sys/fs/cgroup/unified/test/subdir

Link: SELinuxProject/selinux-kernel#39
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>

WOnder93 added a commit to WOnder93/selinux-testsuite that referenced this issue Jan 30, 2019

selinux-testsuite: Add minimal test for cgroupfs labeling inheritance
This patch add a tiny test that checks that labels of newly created
files are correctly inherited from the parent in cgroupfs.

Should start passing when the following issue is fixed:
SELinuxProject/selinux-kernel#39

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

WOnder93 added a commit to WOnder93/selinux-testsuite that referenced this issue Jan 30, 2019

selinux-testsuite: Add minimal test for cgroupfs labeling inheritance
This patch add a tiny test that checks that labels of newly created
files are correctly inherited from the parent in cgroupfs.

Should start passing when the following issue is fixed:
SELinuxProject/selinux-kernel#39

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

WOnder93 added a commit to WOnder93/selinux-testsuite that referenced this issue Jan 30, 2019

selinux-testsuite: Add minimal test for cgroupfs labeling inheritance
This patch adds a tiny test that checks that labels of newly created
cgroupfs files/directories are correctly inherited from the parent.

Should start passing when the following issue is fixed:
SELinuxProject/selinux-kernel#39

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

WOnder93 added a commit to WOnder93/selinux-testsuite that referenced this issue Jan 30, 2019

selinux-testsuite: Add minimal test for cgroupfs label inheritance
This patch adds a tiny test that checks that labels of newly created
cgroupfs files/directories are correctly inherited from the parent.

Should start passing when the following issue is fixed:
SELinuxProject/selinux-kernel#39

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jan 30, 2019

kernfs: initialize security of newly created nodes
Use the new security_kernfs_init_security() hook to allow LSMs to
possibly assign a non-default security context to a newly created kernfs
node based on the attributes of the new node and also its parent node.

This fixes an issue with cgroupfs under SELinux, where newly created
cgroup subdirectories/files would not inherit its parent's context if
it had been set explicitly to a non-default value (other than the genfs
context specified by the policy). This can be reproduced as follows (on
Fedora/RHEL):

    # mkdir /sys/fs/cgroup/unified/test
    # # Need permissive to change the label under Fedora policy:
    # setenforce 0
    # chcon -t container_file_t /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.controllers
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.depth
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.descendants
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.procs
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.stat
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.subtree_control
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.threads
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 init.scope
    drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:21 system.slice
    drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
    drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Link: SELinuxProject/selinux-kernel#39
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

WOnder93 added a commit to WOnder93/selinux-testsuite that referenced this issue Jan 30, 2019

selinux-testsuite: Add minimal test for cgroupfs label inheritance
This patch adds a tiny test that checks that labels of newly created
cgroupfs files/directories are correctly inherited from the parent.

Should start passing when the following issue is fixed:
SELinuxProject/selinux-kernel#39

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 5, 2019

kernfs: initialize security of newly created nodes
Use the new security_kernfs_init_security() hook to allow LSMs to
possibly assign a non-default security context to a newly created kernfs
node based on the attributes of the new node and also its parent node.

This fixes an issue with cgroupfs under SELinux, where newly created
cgroup subdirectories/files would not inherit its parent's context if
it had been set explicitly to a non-default value (other than the genfs
context specified by the policy). This can be reproduced as follows (on
Fedora/RHEL):

    # mkdir /sys/fs/cgroup/unified/test
    # # Need permissive to change the label under Fedora policy:
    # setenforce 0
    # chcon -t container_file_t /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.controllers
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.depth
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.descendants
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.procs
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.stat
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.subtree_control
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.threads
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 init.scope
    drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:21 system.slice
    drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
    drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Link: SELinuxProject/selinux-kernel#39
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

fengguang pushed a commit to 0day-ci/linux that referenced this issue Feb 5, 2019

kernfs: initialize security of newly created nodes
Use the new security_kernfs_init_security() hook to allow LSMs to
possibly assign a non-default security context to a newly created kernfs
node based on the attributes of the new node and also its parent node.

This fixes an issue with cgroupfs under SELinux, where newly created
cgroup subdirectories/files would not inherit its parent's context if
it had been set explicitly to a non-default value (other than the genfs
context specified by the policy). This can be reproduced as follows (on
Fedora/RHEL):

    # mkdir /sys/fs/cgroup/unified/test
    # # Need permissive to change the label under Fedora policy:
    # setenforce 0
    # chcon -t container_file_t /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.controllers
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.depth
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.descendants
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.procs
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.stat
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.subtree_control
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.threads
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 init.scope
    drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:21 system.slice
    drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
    drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Link: SELinuxProject/selinux-kernel#39
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

@pcmoore pcmoore removed their assignment Feb 14, 2019

AndyLavr added a commit to Dragon-Team/dragon-v5 that referenced this issue Mar 5, 2019

security: Allow initializing the kernfs node's secctx based on its...
...parent

Changes in v6:
- remove copy-pasted duplicate macro definition

v5: https://lore.kernel.org/selinux/20190205110638.30782-1-omosnace@redhat.com/T/
Changes in v5:
- fix misplaced semicolon detected by 0day robot

v4: https://lore.kernel.org/selinux/20190205085915.5183-1-omosnace@redhat.com/T/
Changes in v4:
- reorder and rename hook arguments
- avoid allocating kernfs_iattrs unless needed

v3: https://lore.kernel.org/selinux/20190130114150.27807-1-omosnace@redhat.com/T/
Changes in v3:
- rename the hook to "kernfs_init_security"
- change the hook interface to simply pass pointers to struct iattr and
  struct simple_xattrs of both the new node and its parent
- add full security xattr support to kernfs (and fixup SELinux behavior
  to handle it properly)

v2: https://lore.kernel.org/selinux/20190109162830.8309-1-omosnace@redhat.com/T/
Changes in v2:
- add docstring for the new hook in union security_list_options
- initialize *ctx to NULL and *ctxlen to 0 in case the hook is not
  implemented

v1: https://lore.kernel.org/selinux/20190109091028.24485-1-omosnace@redhat.com/T/

TL;DR:
This series adds a new security hook that allows to initialize the
security context of kernfs properly, taking into account the parent
context (and possibly other attributes). Kernfs nodes require special
handling here, since they are not bound to specific inodes/superblocks,
but instead represent the backing tree structure that is used to build
the VFS tree when the kernfs tree is mounted.

The kernfs nodes initially do not store any security context and rely on
the LSM to assign some default context to inodes created over them.
Kernfs inodes, however, allow setting an explicit context via the
*setxattr(2) syscalls, in which case the context is stored inside the
kernfs node's internal structure.

SELinux (and possibly other LSMs) initialize the context of newly
created FS objects based on the parent object's context (usually the
child inherits the parent's context, unless the policy dictates
otherwise). This is done by hooking the creation of the new inode
corresponding to the newly created file/directory via
security_inode_init_security() (most filesystems always
create a fresh inode when a new FS object is created). However, kernfs
nodes can be created "behind the scenes" while the filesystem is not
mounted anywhere and thus no inodes can exist for them yet.

Therefore, to allow maintaining similar behavior for kernfs nodes, a new
LSM hook is needed, which will allow initializing the kernfs node's
security context based on its own attributes and those of the parent's
node.

The main motivation for this change is that the userspace users of
cgroupfs (which is built on kernfs) expect the usual security context
inheritance to work under SELinux (see [1] and [2]). This functionality
is required for better confinement of containers under SELinux.

Patch 1/5 changes SELinux to fetch security context from extended
attributes on kernfs filesystems, falling back to genfs-defined context
if that fails. Without this patch the 2/5 would be a regression for
SELinux (due to the removal of ...notifysecctx() call.

Patch 2/5 implements full security xattr support in kernfs using
simple_xattrs; patch 3/5 adds the new LSM hook; patch 4/5 implements the
new hook in SELinux; and patch 5/5 modifies kernfs to call the new hook
on new node creation.

Testing:
- passed the reproducer from the commit message of the last patch
- passed SELinux testsuite on Fedora 29 (x86_64) when applied on top of
  current Rawhide kernel (5.0.0-0.rc5.git0.1) [3]
  - including the new proposed selinux-testsuite subtest [4] (adapted
    from the reproducer)

[1] SELinuxProject/selinux-kernel#39
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1553803
[3] https://copr.fedorainfracloud.org/coprs/omos/kernel-testing/build/854148/
[4] SELinuxProject/selinux-testsuite#48

Ondrej Mosnacek (5):
  selinux: try security xattr after genfs for kernfs filesystems
  kernfs: use simple_xattrs for security attributes
  LSM: add new hook for kernfs node initialization
  selinux: implement the kernfs_init_security hook
  kernfs: initialize security of newly created nodes

  Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

pcmoore added a commit that referenced this issue Mar 21, 2019

kernfs: initialize security of newly created nodes
Use the new security_kernfs_init_security() hook to allow LSMs to
possibly assign a non-default security context to a newly created kernfs
node based on the attributes of the new node and also its parent node.

This fixes an issue with cgroupfs under SELinux, where newly created
cgroup subdirectories/files would not inherit its parent's context if
it had been set explicitly to a non-default value (other than the genfs
context specified by the policy). This can be reproduced as follows (on
Fedora/RHEL):

    # mkdir /sys/fs/cgroup/unified/test
    # # Need permissive to change the label under Fedora policy:
    # setenforce 0
    # chcon -t container_file_t /sys/fs/cgroup/unified/test
    # ls -lZ /sys/fs/cgroup/unified
    total 0
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.controllers
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.depth
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.max.descendants
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.procs
    -r--r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.stat
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.subtree_control
    -rw-r--r--.  1 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 cgroup.threads
    drwxr-xr-x.  2 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 init.scope
    drwxr-xr-x. 26 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:21 system.slice
    drwxr-xr-x.  3 root root system_u:object_r:container_file_t:s0 0 Jan 29 03:15 test
    drwxr-xr-x.  3 root root system_u:object_r:cgroup_t:s0         0 Jan 29 03:06 user.slice
    # mkdir /sys/fs/cgroup/unified/test/subdir

Actual result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root system_u:object_r:cgroup_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Expected result:

    # ls -ldZ /sys/fs/cgroup/unified/test/subdir
    drwxr-xr-x. 2 root root unconfined_u:object_r:container_file_t:s0 0 Jan 29 03:15 /sys/fs/cgroup/unified/test/subdir

Link: #39

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
@dustymabe

This comment has been minimized.

Copy link

commented May 8, 2019

hmmm looks like githubs automation broke with this one. The commit message:

Should start passing when the following issue is fixed:
https://github.com/SELinuxProject/selinux-kernel/issues/39

closed this PR... Please re-open

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

/me shakes angry fist

"GitHuuuuuuub!"

@pcmoore pcmoore reopened this May 8, 2019

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented May 8, 2019

@dustymabe I ran that test against a 5.1/5.2 kernel and it passed without problem - are you seeing failures?

@dustymabe

This comment has been minimized.

Copy link

commented May 10, 2019

@dustymabe I ran that test against a 5.1/5.2 kernel and it passed without problem - are you seeing failures?

I'm not running any tests. I just noticed that this issue got closed and it seemed unintentional so wanted to notify you. If this issue is fixed (i'm not familiar enough with it to know) then it would be nice if we tracked down the commit that fixed it and point to that when we close this issue.

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

It appears that the commit mentioned above will resolve this issue. Although the commit message is a bit confusing. @WOnder93?

@WOnder93

This comment has been minimized.

Copy link
Contributor

commented May 11, 2019

This issue should be resolved with e19dfdc (+ a few prerequisite & followup commits), which is now merged in Linus' tree. The commit that auto-closed this issue was the testsuite commit, where I referenced the issue and accidentally put the magic word "fixes:" before the link.

From my POV, the issue can be closed now.

@pcmoore

This comment has been minimized.

Copy link
Member Author

commented May 11, 2019

Thanks, closing.

@pcmoore pcmoore closed this May 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.