qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

rishabh-d-dave · 2024-05-07T14:57:00Z

After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.

Fixes: https://tracker.ceph.com/issues/65841

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

qa/tasks/cephfs/test_admin.py

rishabh-d-dave · 2024-05-08T09:42:27Z

There was an extra in TestFSFail in PR title.

qa/tasks/cephfs/test_admin.py

rishabh-d-dave · 2024-05-08T14:06:07Z

.

rishabh-d-dave · 2024-05-08T14:37:53Z

.

rishabh-d-dave · 2024-05-08T14:42:12Z

@batrick @vshankar Tests are running fine with vstart_runner and with FUSE and kernel client driver. PTAL. I intend to add fix for https://tracker.ceph.com/issues/65865 here so that mds fail issue can be discovered in QA run. This would also make backporting easy.

batrick

Please remove qa/cephfs: rectify variable name used for MDS name. This is code churn.

batrick · 2024-05-08T14:52:44Z

qa/tasks/cephfs/test_admin.py

@@ -2283,7 +2283,7 @@ def test_with_health_warn_oversize_cache(self):
        errmsg = 'mds_cache_oversized'
        self.negtest_ceph_cmd(args=f'mds fail {active_mds_name}',
                              retval=1, errmsgs=errmsg)
-        self.run_ceph_cmd(f'mds fail {self.fs.name} --yes-i-really-mean-it')
+        self.run_ceph_cmd(f'mds fail {active_mds_name} --yes-i-really-mean-it')


Why did this not show up in QA as a failure?

I talked about it in standup -- mds fail returns zero in such a scenario. Will add a fix for this on same PR.

https://tracker.ceph.com/issues/65865

Done, added fixes for this too. It's working fine with vstart, testing with teuth now...

@batrick @vshankar @gregsfortytwo

I've fixed the return value issue with mds fail (https://tracker.ceph.com/issues/65865) and errmsg issue with mds fail (https://tracker.ceph.com/issues/65875) and also added a test for both.

But just now I recalled that returning zero in case of mds fail is done intentionally and deliberately for the sake of idempotency. So we have 2 possibilities.

Either return zero and modify QA code to check stderr for every command and halt the test if output on stderr wasn't expected.

Or forsake idempotecy and do not return zero. This will automatically halt tests.

I think the preference lies with first one because forsaking idempotency is not an option. And therefore I should delete the commits that fixes "mds fail return value" issue and instead modify run_ceph_cmd() to check stderr and halt if stderr has any extra values.

Since then I've removed src and test code for "mds fail return value" issue and added these two commits to avoid same issues.

f3c781f

880421e

@batrick @vshankar @gregsfortytwo

I've fixed the return value issue with mds fail (https://tracker.ceph.com/issues/65865) and errmsg issue with mds fail (https://tracker.ceph.com/issues/65875) and also added a test for both.

But just now I recalled that returning zero in case of mds fail is done intentionally and deliberately for the sake of idempotency. So we have 2 possibilities.

Either return zero and modify QA code to check stderr for every command and halt the test if output on stderr wasn't expected.

Some command may not return any output and follow idempotentcy rules. What would you do about those?

I can't think of anything other than making them print some error message on standard error stream.

batrick · 2024-05-08T14:54:49Z

qa/tasks/cephfs/cephfs_test_case.py

@@ -438,3 +438,27 @@ def create_client(self, client_id, moncap=None, osdcap=None, mdscap=None):

        self.run_ceph_cmd(*cmd)
        return self.get_ceph_cmd_stdout(f'auth get {self.client_name}')
+
+    def gen_health_warn_mds_cache_oversized(self):


No, I think these are to specific to the nuances of the test. Keep them to test_admin please.

Cool. There are some other places (test_client_limits.py, IIRC) where at least one of the could be reused.

rishabh-d-dave · 2024-05-08T16:13:05Z

@batrick

Please remove qa/cephfs: rectify variable name used for MDS name. This is code churn.

Agreed but it is misguiding too.

rishabh-d-dave · 2024-05-09T05:15:28Z

https://jenkins.ceph.com/job/ceph-api/73683/

rishabh-d-dave · 2024-05-09T05:15:33Z

jenkins test api

rishabh-d-dave · 2024-05-16T06:27:18Z

@batrick

Can this PR be changed to just
qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd qa/cephfs: bring MDS up after running TestFSFail & TestMDSFail
or is there another interdependency I'm missing?

Done.

@batrick with latest change, inter-dependency is no more, so I've moved one of these commits to PR #57493 since you wanted 1 commit per PR. PTAL

cc @vshankar

rishabh-d-dave · 2024-05-16T07:28:55Z

Had a conversation with Venky about this, it's good to do a quick QA for this PR so that it can be merge ASAP to make sure test_admin does fine in QA runs -

https://pulpito.ceph.com/rishabh-2024-05-16_07:25:57-fs:functional-main-testing-default-smithi/

rishabh-d-dave · 2024-05-16T07:30:12Z

CI failures look unrelated -
make check: https://jenkins.ceph.com/job/ceph-pull-requests/135100/
ceph api: https://jenkins.ceph.com/job/ceph-api/74128/

rishabh-d-dave · 2024-05-16T07:30:18Z

jenkins test make check

rishabh-d-dave · 2024-05-16T07:30:21Z

jenkins test api

rishabh-d-dave · 2024-05-16T08:50:26Z

https://pulpito.ceph.com/rishabh-2024-05-16_08:48:59-fs:functional-main-testing-default-smithi/

rishabh-d-dave · 2024-05-16T11:33:48Z

This PR is under test in https://tracker.ceph.com/issues/66065.

* refs/pull/57333/head: qa/cephfs: set joinable on FS before exiting tests in TestFSFail Reviewed-by: Venky Shankar <vshankar@redhat.com>

rishabh-d-dave · 2024-05-16T12:40:06Z

This PR is under test in https://tracker.ceph.com/issues/66067.

* refs/pull/57333/head: qa/cephfs: set joinable on FS before exiting tests in TestFSFail Reviewed-by: Venky Shankar <vshankar@redhat.com>

After running TestFSFail, CephFSTestCase.tearDown() fails attempting to unmount CephFS. Set joinable on FS and wait for the MDS to be up before exiting the test. This will ensure that unmounting is successful in teardown. Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com>

* refs/pull/57333/head: qa/cephfs: set joinable on FS before exiting tests in TestFSFail Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>

* refs/pull/57333/head: qa/cephfs: set joinable on FS before exiting tests in TestFSFail Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

* refs/pull/57333/head: qa/cephfs: set joinable on FS before exiting tests in TestFSFail Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>

* refs/pull/57333/head: qa/cephfs: set joinable on FS before exiting tests in TestFSFail Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

rishabh-d-dave · 2024-05-17T12:23:50Z

QA run was successful - https://pulpito.ceph.com/rishabh-2024-05-17_04:50:48-fs:functional-main-testing-default-smithi/.

rishabh-d-dave added bug-fix cephfs Ceph File System tests needs-review labels May 7, 2024

rishabh-d-dave requested review from batrick, vshankar and a team May 7, 2024 14:57

batrick requested changes May 8, 2024

View reviewed changes

qa/tasks/cephfs/test_admin.py Outdated Show resolved Hide resolved

vshankar changed the title ~~qa/cephfs: fix estFSFail.test_with_health_warn_oversize_cache~~ qa/cephfs: fix TesttFSFail.test_with_health_warn_oversize_cache May 8, 2024

rishabh-d-dave changed the title ~~qa/cephfs: fix TesttFSFail.test_with_health_warn_oversize_cache~~ qa/cephfs: fix TestFSFail.test_with_health_warn_oversize_cache May 8, 2024

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 6d9aaec to 9179e88 Compare May 8, 2024 11:28

kotreshhr reviewed May 8, 2024

View reviewed changes

qa/tasks/cephfs/test_admin.py Show resolved Hide resolved

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 9179e88 to 0e236e3 Compare May 8, 2024 13:42

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 0e236e3 to ea267b9 Compare May 8, 2024 14:06

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from ea267b9 to e9bf105 Compare May 8, 2024 14:38

batrick requested changes May 8, 2024

View reviewed changes

rishabh-d-dave changed the title ~~qa/cephfs: fix TestFSFail.test_with_health_warn_oversize_cache~~ qa/cephfs: fixes to src and qa code for "fs fail" and "mds fai" May 8, 2024

rishabh-d-dave changed the title ~~qa/cephfs: fixes to src and qa code for "fs fail" and "mds fai"~~ qa/cephfs: fixes for src and qa code for "fs fail" and "mds fai" May 8, 2024

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from e9bf105 to 5efafc3 Compare May 8, 2024 18:45

rishabh-d-dave changed the title ~~qa/cephfs: fixes for src and qa code for "fs fail" and "mds fai"~~ qa/cephfs: fixes for src and qa code for "fs fail" and "mds fail" May 8, 2024

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 5efafc3 to fd6f45f Compare May 9, 2024 06:50

rishabh-d-dave requested a review from a team as a code owner May 9, 2024 06:50

github-actions bot added the core label May 9, 2024

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 92b30bc to 6114359 Compare May 16, 2024 06:22

rishabh-d-dave mentioned this pull request May 16, 2024

qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd #57493

Merged

14 tasks

rishabh-d-dave changed the title ~~qa/cephfs: fix QA code for "fs fail" and "mds fail"~~ qa/cephfs: set joinable on FS before exiting tests in TestFSFail May 16, 2024

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 6114359 to 85ccbe1 Compare May 16, 2024 06:32

vshankar approved these changes May 16, 2024

View reviewed changes

rishabh-d-dave added the wip-rishabh-testing3 label May 16, 2024

batrick approved these changes May 16, 2024

View reviewed changes

rishabh-d-dave force-pushed the fs-fail-cache-kclient branch from 85ccbe1 to faa30e0 Compare May 16, 2024 16:41

rishabh-d-dave merged commit 6859fe6 into ceph:main May 17, 2024
10 of 11 checks passed

rishabh-d-dave deleted the fs-fail-cache-kclient branch May 17, 2024 12:27

This was referenced Jun 10, 2024

squid: mon,cephfs: require confirmation flag to bring down unhealthy MDS #57840

Open

reef: mon,cephfs: require confirmation flag to bring down unhealthy MDS #57837

Open

quincy: mon,cephfs: require confirmation flag to bring down unhealthy MDS #57841

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

rishabh-d-dave commented May 7, 2024 •

edited

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 8, 2024

batrick left a comment

batrick May 8, 2024

rishabh-d-dave May 8, 2024 •

edited

rishabh-d-dave May 8, 2024

rishabh-d-dave May 9, 2024 •

edited

rishabh-d-dave May 9, 2024

rishabh-d-dave May 9, 2024

vshankar May 10, 2024

rishabh-d-dave May 14, 2024

batrick May 8, 2024

rishabh-d-dave May 8, 2024 •

edited

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 9, 2024

rishabh-d-dave commented May 9, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 17, 2024

qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

Conversation

rishabh-d-dave commented May 7, 2024 • edited

Contribution Guidelines

Checklist

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 8, 2024

batrick left a comment

Choose a reason for hiding this comment

batrick May 8, 2024

Choose a reason for hiding this comment

rishabh-d-dave May 8, 2024 • edited

Choose a reason for hiding this comment

rishabh-d-dave May 8, 2024

Choose a reason for hiding this comment

rishabh-d-dave May 9, 2024 • edited

Choose a reason for hiding this comment

rishabh-d-dave May 9, 2024

Choose a reason for hiding this comment

rishabh-d-dave May 9, 2024

Choose a reason for hiding this comment

vshankar May 10, 2024

Choose a reason for hiding this comment

rishabh-d-dave May 14, 2024

Choose a reason for hiding this comment

batrick May 8, 2024

Choose a reason for hiding this comment

rishabh-d-dave May 8, 2024 • edited

Choose a reason for hiding this comment

rishabh-d-dave commented May 8, 2024

rishabh-d-dave commented May 9, 2024

rishabh-d-dave commented May 9, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 16, 2024

rishabh-d-dave commented May 17, 2024

rishabh-d-dave commented May 7, 2024 •

edited

rishabh-d-dave May 8, 2024 •

edited

rishabh-d-dave May 9, 2024 •

edited

rishabh-d-dave May 8, 2024 •

edited