client: Clear setuid bits when writing or truncating #12412

merged 5 commits into from Dec 19, 2016


None yet

4 participants

jtlayton commented Dec 9, 2016

This is the remainder of the fix for this tracker bug:

The basic idea is to declare a new CEPH_SETATTR_KILL_SGUID flag. When this is set in a setattr request, the MDS will clear those bits out of the mode. This should allow us to fix the mode properly without relying on a read/modify/write cycle of the mode (which would be racy).

Truncates always go to the server so there we just set the KILL_SGUID flag if we don't have Ax caps.

Writes will now take As caps along with Fw in order to check whether to issue a setattr. I think that should be mostly harmless since we'll already have them in most cases, but there could be some small performance impact there. I'm open to suggestions if anyone sees a better way to do this.

Also, there are a couple of bugfixes in here for problems that I noticed by inspection, and I beefed up the new ClearSetuid test to test truncate and write behavior.

Once this is in, we should be set to change ceph-fuse over to handling setuid clearing itself, which will close some races that can occur now.

jtlayton added some commits Dec 8, 2016
@jtlayton jtlayton test: add missing ceph_ll_close calls
There's a noticable up to tick-length hang that occurs when you call
ceph_shutdown with open files still in play. Ensure that we always
call ceph_ll_close after a successful ceph_ll_create.

Signed-off-by: Jeff Layton <>
@jtlayton jtlayton client: drop correct caps when setting btime
btime lives under auth caps, but the client drops file caps when setting
it. Change it to drop auth caps instead.

Signed-off-by: Jeff Layton <>
@jtlayton jtlayton test: flesh out the setuid/setgit clearing test
Add tests for write and truncate and switch to symbolic constants.

Signed-off-by: Jeff Layton <>
@jtlayton jtlayton client/mds: have write codepath clear the setuid/setgid bits if they'…
…re set

Declare a new CEPH_SETATTR_KILL_SGUID flag. When that bit is set in
the mask, then that tells the server to clear out the S_ISUID and
S_ISGID bits. Doing that is less racy and problematic than trying
to do a read/modify/write cycle on the mode. Note that this flag is
ignored if the mode is being set in the same setattr call (uncommon,
but possible from something like ganesha or samba).

Then, change the client library write code to get As caps when issuing a
write, so we can check whether the mode has the setuid/setgid bits set.
If it does, then call __setattrx to clear them using

Signed-off-by: Jeff Layton <>
@jtlayton jtlayton client: set CEPH_SETATTR_KILL_SGUID on truncate
On a size change, try to clobber the S_ISUID and S_ISGID bits if we
have Ax caps. If we don't have them, then ask the server to do it

Signed-off-by: Jeff Layton <>
jtlayton commented Dec 9, 2016

I kicked off a test last night that has mostly passed, but the queue seems to be paused or something at the moment:

@jtlayton jtlayton added the cephfs label Dec 12, 2016

Couple comments on code, a few below based on the commit messages. And a more general question:
I get the impression from your verbal descriptions that this clearing is really supposed to be atomic with respect to the file being changed. We can do that with the metadata changes, but with write it's obviously going to be two separate steps we do our best to disguise as one thing. Here, it's being done with a setattr request to the MDS. But it seems like it would be more reliable to have the MDS locally clear ISUID|ISGID when it hands out the write capability on files with it set, and to only hand out that cap on demand. (I don't remember if we give out Fw proactively or not.) That would also mean we don't need As on every write, which even though it should be cheap seems inimical to some realistic workloads (like maybe some rsync configs?). Did you look at any options like that? Do you think the client-side best effort is good enough?

"There's a noticable up to tick-length hang that occurs when you call ceph_shutdown with open files still in play." Are there pieces of tick() we should invoke from shutdown()?

"test: flesh out the setuid/setgit clearing test" s/setgit/setgid/ ? :)

"Note that this flag is ignored if the mode is being set in the same setattr call (uncommon, but possible from something like ganesha or samba)." I'm not seeing that.

@@ -3800,7 +3800,7 @@ void Server::handle_client_setattr(MDRequestRef& mdr)
pi->mode = (pi->mode & ~07777) | (req->head.args.setattr.mode & 07777);
- else if ((mask & (CEPH_SETATTR_UID|CEPH_SETATTR_GID)) &&
S_ISREG(pi->mode)) {
pi->mode &= ~S_ISUID;
if ((pi->mode & (S_ISGID|S_IXGRP)) == (S_ISGID|S_IXGRP))
gregsfortytwo Dec 13, 2016 Member

This is from the previous set of commits but I guess I didn't ask — why are we unconditionally removing S_ISUID and doing a check before removing S_ISGID?

jtlayton Dec 13, 2016 Contributor

Having the S_ISGID bit set on a file that does not have the group execute bit is (oddly enough) used to indicate that the file is a candidate for mandatory locking. So you only want to clear that bit if the group execute bit is also set.

+ struct ceph_statx stx = { 0 };
+ put_cap_ref(in, CEPH_CAP_AUTH_SHARED);
+ r = __setattrx(in, &stx, CEPH_SETATTR_KILL_SGUID, f->actor_perms);
gregsfortytwo Dec 13, 2016 Member

So this is asynchronously sending off an MDS request and then doing the normal write. That does mean under nasty failure conditions we could send off the write, fail to send off the setattr, and then crash...

jtlayton Dec 13, 2016 Contributor

I'm pretty sure that's a synchronous call. do_setattr calls make_request, which appears to wait on the reply (like we would for any other setattr). Am I missing something there?

gregsfortytwo Dec 13, 2016 Member, I'm not sure how I got that confused. You're right of course.

@gregsfortytwo gregsfortytwo removed their assignment Dec 13, 2016

I don't see how this could be racy. Once we have As caps we know that the mode can't change until the client_mutex is released, which happens after the write is complete. It is possible that we might crash before we ever issue the write after doing the setattr, but I don't think we can make it any more atomic than that and if we end up clearing some mode bits in that case, then I think we can live with it (after all, we were going to do a write).

I did consider whether we might just clear them when the client takes Fw caps, and have the MDS recall Fw when we need to change the mode. The problem there is that POSIX is pretty clear that you need to clear the bits on the actual write. The client though, will request Fw when it opens a file for write. It's possible (maybe even likely) that some applications might open files O_RDWR and only issue reads to that fd. If we clear the bits on Fw acquisition then we'd clear the bits in that situation as well.

That said, I'm open to the idea if we want to go beyond what POSIX requires there. Just be forewarned that it has the potential to break some weird corner cases.


Oof. We don't actually hold on to As caps until the write is complete as we release them before doing the setattr call. That's hard to fix as we need to release As caps in order for the server to clear the bits in the mode for us.

So, it's possible we'd check the mode, and see it without any setuid bits, and before we do the write, a setattr from another client races in and sets those bits in the mode. I guess the question there is...does it matter? After all, we end up with the same result if the mode change gets in just after the write.

I tend to think this is good enough, and not likely to hurt performance in the common case.


I'm not so concerned about making writes slow when you have the bits set — it's more about imposing costs when they aren't, and I really don't like forcing a check every time.
In terms of the racing setattr, that also shouldn't be a problem since it requires the user to have overlapping syscalls. ;)

So, how about we change the client so it doesn't request write caps on a file with these mode bits set until it actually gets into the write syscall (and/or the MDS ignores Fw cap requests that are bundled up with opens on those files)? That doesn't seem too hard once we know the bits are set, which we we can handle with a one-time look at the mode (which we probably already have caps on if we're opening).


I agree about not imposing extra penalty on the common case where these aren't set. That said, I'm not convinced that what you're suggesting will be any better in that regard.

If we do that then we'll also need to ensure that the MDS recalls all outstanding Fw caps if a client has a request to change the mode in such a way that either setuid bit is set. The client will also need to proactively drop Fw on any mode changes that set the setuid bits as well.

Also, if the client has Ax and tries to set the mode to something with either bit set, then we'll need to take steps to ensure that the MDS recalls Fw caps in that case as well (probably by just dropping Ax at that point and sending the mode change directly to the MDS).

So yeah, with that you don't need As caps on the client to do a write. But...once we get As we're unlikely to lose them. uid/gid/mode/btime changes are quite rare in most workloads, so getting and holding As caps in most cases should be no big deal. Having to sort out As caps on the initial write doesn't seem any more costly than having to acquire Fw in the same situation.

Also, what about legacy clients? Are we ok with someone mounting their kraken share up with their jewel client and stripping off the setuid bits when they accidentally open a file with O_RDWR? I'm not yuuuugely concerned there, but we should be aware that that is the likely outcome in that situation.



I guess that's a good enough argument for me. We can do something more sophisticated if it causes trouble.
I went through where we actually do stuff that revokes As and I'm not seeing any other cases we need to worry about.

@liewegas, can you think of anything that would make it really bad for our writers to maintain an As cap on files they are actively writing?


It's kind of a bummer about older clients not stripping off the setuid/gid bits properly — I guess for them we just have to hope the kernel FUSE is doing the right thing?


Yes, older fuse clients should do the right thing as FUSE will send down a setattr request when the kernel needs to strip those bits. Aside from not handling that correctly on chown on recent buggy kernels, that should keep working as it always has.

@liewegas liewegas changed the title from Clear setuid bits when writing or truncating to client: Clear setuid bits when writing or truncating Dec 19, 2016


@jtlayton jtlayton merged commit 20f936e into master Dec 19, 2016

3 checks passed

Signed-off-by all commits in this PR are signed
Unmodifed Submodules submodules for project are unmodified
default Build finished.
@jtlayton jtlayton deleted the wip-jlayton-suid branch Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment