Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quincy: mds: prevent clients from exceeding the xattrs key/value limits #50981

Open
wants to merge 5 commits into
base: quincy
Choose a base branch
from

Conversation

rishabh-d-dave
Copy link
Contributor

backport tracker: https://tracker.ceph.com/issues/59037


backport of #46357
parent tracker: https://tracker.ceph.com/issues/55725

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave rishabh-d-dave requested a review from a team April 13, 2023 09:13
@vshankar
Copy link
Contributor

jenkins retest this please

@vshankar
Copy link
Contributor

@rishabh-d-dave Could you investigate the sphinx failure?

@rishabh-d-dave
Copy link
Contributor Author

@vshankar

@rishabh-d-dave Could you investigate the sphinx failure?

There's not enough to understand what part of patch caused this failure.

Here's the failure messge -

/home/jenkins-build/build/workspace/ceph-pr-docs/doc/api/mon_command_api.rst:1173: WARNING: Inline substitution_reference start-string without end-string.

@vshankar
Copy link
Contributor

@vshankar

@rishabh-d-dave Could you investigate the sphinx failure?

There's not enough to understand what part of patch caused this failure.

Here's the failure messge -

/home/jenkins-build/build/workspace/ceph-pr-docs/doc/api/mon_command_api.rst:1173: WARNING: Inline substitution_reference start-string without end-string.

I think sphinx is trying to auto-generate an rst based on mon command source.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@vshankar
Copy link
Contributor

@rishabh-d-dave Have you inspected the conflict resolution in the commits which might be the reason for the sphinx failures and other failures too?

@rishabh-d-dave
Copy link
Contributor Author

jenkins retest this please

@rishabh-d-dave
Copy link
Contributor Author

@mchangir Since Yuri removed the testing tag the QA run must've failed. Can you please post about the failure on this PR, so that I can try to fix it? Thanks!

@vshankar
Copy link
Contributor

vshankar commented Jul 4, 2023

jenkins retest this please

@rishabh-d-dave
Copy link
Contributor Author

@mchangir Since Yuri removed the testing tag the QA run must've failed. Can you please post about the failure on this PR, so that I can try to fix it? Thanks!

ping

@mchangir
Copy link
Contributor

@mchangir Since Yuri removed the testing tag the QA run must've failed. Can you please post about the failure on this PR, so that I can try to fix it? Thanks!

ping

I think this was one of the PRs in the set of PRs being tested where there were 72 failed jobs. I don't have the link to the QA run any more.

@mchangir
Copy link
Contributor

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

https://jenkins.ceph.com/job/ceph-pull-requests/118142/

----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 441, in test_fs_set
    self._assert_valid_command(['fs', 'set', 'default', 'max_file_size', '2'])
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 56, in _assert_valid_command
    self.assertNotIn(result, [{}, None])
AssertionError: {} unexpectedly found in [{}, None]

======================================================================
FAIL: test_fs_set_cluster_down (__main__.TestFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 429, in test_fs_set_cluster_down
    self._assert_valid_command(['fs', 'set', 'default', 'down', 'true'])
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 56, in _assert_valid_command
    self.assertNotIn(result, [{}, None])
AssertionError: {} unexpectedly found in [{}, None]

======================================================================
FAIL: test_fs_set_cluster_joinable (__main__.TestFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 435, in test_fs_set_cluster_joinable
    self._assert_valid_command(['fs', 'set', 'default', 'joinable', 'true'])
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 56, in _assert_valid_command
    self.assertNotIn(result, [{}, None])
AssertionError: {} unexpectedly found in [{}, None]

======================================================================
FAIL: test_fs_set_cluster_not_joinable (__main__.TestFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 438, in test_fs_set_cluster_not_joinable
    self._assert_valid_command(['fs', 'set', 'default', 'joinable', 'false'])
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 56, in _assert_valid_command
    self.assertNotIn(result, [{}, None])
AssertionError: {} unexpectedly found in [{}, None]

======================================================================
FAIL: test_fs_set_cluster_up (__main__.TestFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 432, in test_fs_set_cluster_up
    self._assert_valid_command(['fs', 'set', 'default', 'down', 'false'])
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 56, in _assert_valid_command
    self.assertNotIn(result, [{}, None])
AssertionError: {} unexpectedly found in [{}, None]

======================================================================
FAIL: test_fs_set_max_mds (__main__.TestFS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 425, in test_fs_set_max_mds
    self._assert_valid_command(['fs', 'set', 'default', 'max_mds', '1'])
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/pybind/test_ceph_argparse.py", line 56, in _assert_valid_command
    self.assertNotIn(result, [{}, None])
AssertionError: {} unexpectedly found in [{}, None]

----------------------------------------------------------------------

@vshankar
Copy link
Contributor

https://jenkins.ceph.com/job/ceph-pull-requests/118142/

So, this backport is causing some jenkins test failure? What's needed to fix those?

@vshankar vshankar removed the needs-qa label Jul 25, 2023
@rishabh-d-dave
Copy link
Contributor Author

jenkins retest this

@rishabh-d-dave
Copy link
Contributor Author

jenkins retest this please

@vshankar
Copy link
Contributor

vshankar commented Sep 6, 2023

@rishabh-d-dave ping?

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should also backport #48720 to reduce conflicts? It would be helpful for me too when backporting #51942

@@ -378,6 +378,7 @@ COMMAND("fs set "
"|allow_new_snaps|inline_data|cluster_down|allow_dirfrags|balancer"
"|standby_count_wanted|session_timeout|session_autoclose"
"|allow_standby_replay|down|joinable|min_compat_client "
"|refuse_client_session|max_xattr_size "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refuse_client_session is wrong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabh-d-dave pinging on this now that #50981 (comment) is resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has been taken care @rishabh-d-dave ?

Good to mark the comment as resolved?

@vshankar
Copy link
Contributor

vshankar commented Oct 5, 2023

@rishabh-d-dave ping?

@rishabh-d-dave
Copy link
Contributor Author

jenkins retest this please

This new configuration option will allow to define the maximum size for a
filesystem xattrs blob.  This is a filesystem-wide knob that will replace
the per-MDS mds_max_xattr_pairs_size option.

Note:

The kernel client patch to handle this new configuration was merged before
the corresponding ceph-side pull-request.  This was unfortunate because in
the meantime PR ceph#43284 was merged and the encoding/decoding of
'bal_rank_mask' got in between.  Hence the 'max_xattr_size' is being
encoding/decoded before 'bal_rank_mask'.

URL: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
(cherry picked from commit 7b8def5)

	src/mds/MDSMap.cc
	-  The change has been backported as it is but the lines
	   surrounding the change are different in Quincy compared to
	   main branch.
	src/mon/MonCommands.h
	-  The change has been backported as it is but the lines
	   surrounding the change are different in Quincy compared to
	   main branch.
Commit eb915d0 ("cephfs: fix write_buf's _len overflow problem") added
a limit to the total size of xattrs.  This limit is respected by clients
doing a "sync" operation, i.e. MDS_OP_SETXATTR.  However, clients with
CAP_XATTR_EXCL can still buffer these operations and ignore these limits.

This patch prevents clients from crashing the MDSs by also imposing the
xattr limits even when they have the Xx caps.  Replaces the per-MDS knob
"max_xattr_pairs_size" by the new mdsmap setting that the clients can
access.

Unfortunately, clients that misbehave, i.e. old clients that don't respect
this xattrs limit and buffer their xattrs, will see them vanishing.

URL: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
(cherry picked from commit 13e07ff)
When doing a sync setxattr (MDS_OP_SETXATTR) to set the first xattr in an
inode it is possible for the client to set a huge xattr key/value that would
exceed the limit.  Prevent this from from happening by checking the size
against the maximum allowed size.

URL: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
(cherry picked from commit ca1a397)
…p schema

Let the mdsmap schema know about the new field 'max_xattr_size'.  This prevents
the following error:

tasks.mgr.dashboard.helper._ValError: \
  In `input['fs_map']['filesystems'][0]['mdsmap']`: unknown keys: {'max_xattr_size'}

URL: https://tracker.ceph.com/issues/55725
Signed-off-by: Luís Henriques <lhenriques@suse.de>
(cherry picked from commit bf7ddd8)
…ield

Signed-off-by: Luís Henriques <lhenriques@suse.de>
(cherry picked from commit b70c752)
@yuriw yuriw removed the needs-qa label Jan 3, 2024
@vshankar vshankar added needs-qa and removed DNM labels Feb 7, 2024
Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Review in progress
8 participants