Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: add protection from clients without fscrypt support #54725

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lxbsz
Copy link
Member

@lxbsz lxbsz commented Nov 30, 2023

Clients that do not support fscrypt can execute operations that may cause unrecoverable data loss. Add protection on the MDS so that it prevents these clients from executing some operations.

Note, however, that clients will still be able corrupt encrypted files by appending data to them. And they will still be able to read encrypted data from those files.

This PR originally from @luis-henrix 's PR #45073. And I just fix the lock order issue, and will run more tests for it.

Please note this PR will fix one issue from Luis' previous PR:

1, MDS crash issue https://tracker.ceph.com/issues/63685

Fixes: https://tracker.ceph.com/issues/65217

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@lxbsz
Copy link
Member Author

lxbsz commented Dec 1, 2023

Locally the xfstests passed.

Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

1 similar comment
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@vshankar
Copy link
Contributor

@lxbsz Is this ready for review?

@lxbsz
Copy link
Member Author

lxbsz commented Jan 18, 2024

@lxbsz Is this ready for review?

@vshankar Yeah, it's ready.

Copy link
Contributor

@chrisphoffman chrisphoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see there's been discussion about server side semantics enforcement.

Do we need any tests included in this PR to ensure our intended behavior doesn't change? We can't one day inadvertently allow older or non-friendly clients to be able to do known malicious things.

@lxbsz
Copy link
Member Author

lxbsz commented Feb 26, 2024

I'm happy to see there's been discussion about server side semantics enforcement.

Do we need any tests included in this PR to ensure our intended behavior doesn't change?

Yeah, we need to run qa.

@chrisphoffman
Copy link
Contributor

I'm happy to see there's been discussion about server side semantics enforcement.
Do we need any tests included in this PR to ensure our intended behavior doesn't change?

Yeah, we need to run qa.

To be more specific, I was proposing new tests. For example testing a client working on a fscrypt tree that the client doesn't know about fscrypt. We want to make sure server enforced behavior fails on unaware clients. What do you think?

Do we already test this? If so, can you point me to it?

@lxbsz
Copy link
Member Author

lxbsz commented Feb 27, 2024

I'm happy to see there's been discussion about server side semantics enforcement.
Do we need any tests included in this PR to ensure our intended behavior doesn't change?

Yeah, we need to run qa.

To be more specific, I was proposing new tests. For example testing a client working on a fscrypt tree that the client doesn't know about fscrypt. We want to make sure server enforced behavior fails on unaware clients. What do you think?

Do we already test this? If so, can you point me to it?

Hmm, sounds reasonable to me. Currently there has not thus test yet.

Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@lxbsz lxbsz force-pushed the fscrypt-pro-clients branch 6 times, most recently from 43303ff to bd65286 Compare March 28, 2024 03:31
@lxbsz
Copy link
Member Author

lxbsz commented Mar 28, 2024

This is ready to review!

@lxbsz
Copy link
Member Author

lxbsz commented Mar 28, 2024

@luis-henrix @vshankar @chrisphoffman

Could you guys to have a look ? I am planing to push this to the recently releases. Thanks!

@lxbsz
Copy link
Member Author

lxbsz commented Mar 29, 2024

@lxbsz
Copy link
Member Author

lxbsz commented Mar 29, 2024

jenkins test make check arm64

@lxbsz
Copy link
Member Author

lxbsz commented Mar 29, 2024

jenkins test windows

@lxbsz
Copy link
Member Author

lxbsz commented Mar 29, 2024

Added a tracker for this https://tracker.ceph.com/issues/65217.

@gregsfortytwo
Copy link
Member

What's the impact of this PR going to be on the manager, which doesn't yet support fscrypt?

It's very important that we still be able to create snapshots and manipulate subvolumes. ;)

@lxbsz
Copy link
Member Author

lxbsz commented Apr 4, 2024

What's the impact of this PR going to be on the manager, which doesn't yet support fscrypt?

It's very important that we still be able to create snapshots and manipulate subvolumes. ;)

For the clients which hasn't support fscrypt yet, they couldn't do any of mkdir/mknod/mksnap/create/link/symlink/truncate/write operations to the encrypted directories/files. But they could only read and rm them.

The non-fscrypt supporting client couldn't make new directories/files, that because for each directory/file the key will be unique and since this client doesn't support the fscrypt, so this couldn't be done.

Else, for example when a non-fscrypt client creates a directory/file under it the kclient will try to dencrypt the dentry name, including the snapshot names, with the key and then will just assume the dentry name is corrupted as shown in tracker https://tracker.ceph.com/issues/65217.

So for manager they should create the snapshot under the kclient anyway, won't it ?

For the subvolumes the should encrypt the tail directory of the path, for example for /volume/subvol1 the / and volume won't be encrypted and only the subvol1 will be. So the manager should manipulate the subvol1 via the kclient, won't it ?

And the manager still could create new subvolumes under /volume/ and rm the existing subvolumes from /volume/.

@gregsfortytwo
Copy link
Member

What's the impact of this PR going to be on the manager, which doesn't yet support fscrypt?

It's very important that we still be able to create snapshots and manipulate subvolumes. ;)

For the clients which hasn't support fscrypt yet, they couldn't do any of mkdir/mknod/mksnap/create/link/symlink/truncate/write operations to the encrypted directories/files. But they could only read and rm them.

Does this mean move operations also won't work? I'm thinking about manual intervention when doing disaster recovery, here.

The non-fscrypt supporting client couldn't make new directories/files, that because for each directory/file the key will be unique and since this client doesn't support the fscrypt, so this couldn't be done.

Else, for example when a non-fscrypt client creates a directory/file under it the kclient will try to dencrypt the dentry name, including the snapshot names, with the key and then will just assume the dentry name is corrupted as shown in tracker https://tracker.ceph.com/issues/65217.

Ah. I did not realize these dentries aren't flagged individually as encrypted.

So for manager they should create the snapshot under the kclient anyway, won't it ?

No, the manager doesn't have a kernel mount and we can't give it one.

For the subvolumes the should encrypt the tail directory of the path, for example for /volume/subvol1 the / and volume won't be encrypted and only the subvol1 will be. So the manager should manipulate the subvol1 via the kclient, won't it ?

And the manager still could create new subvolumes under /volume/ and rm the existing subvolumes from /volume/.

I don't know the exact tree structure for subvolumes; I guess the encrypted directory will be one below where we take snapshots? @vshankar @kotreshhr your thoughts here?

@chrisphoffman
Copy link
Contributor

Does this mean move operations also won't work? I'm thinking about manual intervention when doing disaster recovery, here.

I tried the following on the ext4 implementation: mv of a locked encrypted file to a different dir within the same locked encrypted tree, doesn't work.

The mv command uses renameat2.

Consider:
https://docs.kernel.org/filesystems/fscrypt.html#without-the-key

Under "Without the key" it states:
nor can a name in an encrypted directory be the source or target of a rename

I expect your scenario will be supported in the future, see:
It is not currently possible to backup and restore encrypted files without the encryption key. This would require special APIs which have not yet been implemented.

@lxbsz
Copy link
Member Author

lxbsz commented Apr 9, 2024

Does this mean move operations also won't work? I'm thinking about manual intervention when doing disaster recovery, here.

I tried the following on the ext4 implementation: mv of a locked encrypted file to a different dir within the same locked encrypted tree, doesn't work.

The mv command uses renameat2.

Consider: https://docs.kernel.org/filesystems/fscrypt.html#without-the-key

Under "Without the key" it states: nor can a name in an encrypted directory be the source or target of a rename

I expect your scenario will be supported in the future, see: It is not currently possible to backup and restore encrypted files without the encryption key. This would require special APIs which have not yet been implemented.

Yeah, this has already supported:

In kclient:

[xiubli@ceph kcephfs]$ fscrypt lock mydir1/
"mydir1/" is now locked.
[xiubli@ceph kcephfs]$ ls mydir1/
JlQsm3usl+hj5pkp0CV1IHEBNdI2fDLHcx+UI6ZvaiU
[xiubli@ceph kcephfs]$ cd mydir1/
[xiubli@ceph mydir1]$ mv JlQsm3usl+hj5pkp0CV1IHEBNdI2fDLHcx+UI6ZvaiU klds
mv: cannot move 'JlQsm3usl+hj5pkp0CV1IHEBNdI2fDLHcx+UI6ZvaiU' to 'klds': Required key not available

Then in fuse client, no matter the mydir1 is locked or not it will always fail as:

[xiubli@ceph libcephfs]$ ls mydir1
JlQsm3usl+hj5pkp0CV1IHEBNdI2fDLHcx+UI6ZvaiU
[xiubli@ceph libcephfs]$ cd mydir1/
[xiubli@ceph mydir1]$ mv JlQsm3usl+hj5pkp0CV1IHEBNdI2fDLHcx+UI6ZvaiU kld
mv: cannot move 'JlQsm3usl+hj5pkp0CV1IHEBNdI2fDLHcx+UI6ZvaiU' to 'kld': Read-only file system

Clients that do not support fscrypt can execute operations that may cause
unrecoverable data loss.  Add protection on the MDS so that it prevents
these clients from executing some operations.

Note, however, that clients will still be able corrupt encrypted files by
appending data to them.  And they will still be able to read encrypted
data from those files.

Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Clients that do not support fscrypt can open & trunc a file and cause
unrecoverable data loss. This commit will just block it.

Fixes: https://tracker.ceph.com/issues/64961
Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Xiubo Li <xiubli@redhat.com>
In kclient the 'fscrypt_file' is a u64 number, while in MDS and
libcephfs it just switch it to a vector of uint8_t, which couldn't
be covert to a integer back by the stoll() and will crash the
client daemon in userspace.

Fixes: https://tracker.ceph.com/issues/64961
Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Actually the libcephfs hasn't support the fscrypt yet, we shouldn't
add the CEPHFS_FEATURE_ALTERNATE_NAME feature bit.

Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Just preparse for the fscrypt-protect tests.

Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Xiubo Li <xiubli@redhat.com>
s/task/tasks/

Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Xiubo Li <xiubli@redhat.com>
We need to mount a kclient and a fuse client to run the test, and
the kclient will creat and write to a encrypted file, when the
fuse client try to write the encrypted file it should fail and the
file contents won't change.

Fixes: https://tracker.ceph.com/issues/64852
Signed-off-by: Xiubo Li <xiubli@redhat.com>
@lxbsz
Copy link
Member Author

lxbsz commented Apr 16, 2024

jenkins retest this please

Copy link
Contributor

@chrisphoffman chrisphoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to digest the lock portion further, but wanted to include some comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what your trying to do here further? Do you have a recent teuthology run for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure to get #56326 merged before this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cephfs Ceph File System tests
Projects
None yet
4 participants