Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

Merged
merged 1 commit into from
May 17, 2024

Conversation

rishabh-d-dave
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave commented May 7, 2024

After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.

Fixes: https://tracker.ceph.com/issues/65841

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

qa/tasks/cephfs/test_admin.py Outdated Show resolved Hide resolved
@vshankar vshankar changed the title qa/cephfs: fix estFSFail.test_with_health_warn_oversize_cache qa/cephfs: fix TesttFSFail.test_with_health_warn_oversize_cache May 8, 2024
@rishabh-d-dave rishabh-d-dave changed the title qa/cephfs: fix TesttFSFail.test_with_health_warn_oversize_cache qa/cephfs: fix TestFSFail.test_with_health_warn_oversize_cache May 8, 2024
@rishabh-d-dave
Copy link
Contributor Author

There was an extra in TestFSFail in PR title.

@rishabh-d-dave
Copy link
Contributor Author

.

@rishabh-d-dave
Copy link
Contributor Author

.

@rishabh-d-dave
Copy link
Contributor Author

@batrick @vshankar Tests are running fine with vstart_runner and with FUSE and kernel client driver. PTAL. I intend to add fix for https://tracker.ceph.com/issues/65865 here so that mds fail issue can be discovered in QA run. This would also make backporting easy.

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove qa/cephfs: rectify variable name used for MDS name. This is code churn.

@@ -2283,7 +2283,7 @@ def test_with_health_warn_oversize_cache(self):
errmsg = 'mds_cache_oversized'
self.negtest_ceph_cmd(args=f'mds fail {active_mds_name}',
retval=1, errmsgs=errmsg)
self.run_ceph_cmd(f'mds fail {self.fs.name} --yes-i-really-mean-it')
self.run_ceph_cmd(f'mds fail {active_mds_name} --yes-i-really-mean-it')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this not show up in QA as a failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked about it in standup -- mds fail returns zero in such a scenario. Will add a fix for this on same PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added fixes for this too. It's working fine with vstart, testing with teuth now...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick @vshankar @gregsfortytwo

I've fixed the return value issue with mds fail (https://tracker.ceph.com/issues/65865) and errmsg issue with mds fail (https://tracker.ceph.com/issues/65875) and also added a test for both.

But just now I recalled that returning zero in case of mds fail is done intentionally and deliberately for the sake of idempotency. So we have 2 possibilities.

  1. Either return zero and modify QA code to check stderr for every command and halt the test if output on stderr wasn't expected.
  2. Or forsake idempotecy and do not return zero. This will automatically halt tests.

I think the preference lies with first one because forsaking idempotency is not an option. And therefore I should delete the commits that fixes "mds fail return value" issue and instead modify run_ceph_cmd() to check stderr and halt if stderr has any extra values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since then I've removed src and test code for "mds fail return value" issue and added these two commits to avoid same issues.

  1. f3c781f
  2. 880421e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batrick @vshankar @gregsfortytwo

I've fixed the return value issue with mds fail (https://tracker.ceph.com/issues/65865) and errmsg issue with mds fail (https://tracker.ceph.com/issues/65875) and also added a test for both.

But just now I recalled that returning zero in case of mds fail is done intentionally and deliberately for the sake of idempotency. So we have 2 possibilities.

  1. Either return zero and modify QA code to check stderr for every command and halt the test if output on stderr wasn't expected.

Some command may not return any output and follow idempotentcy rules. What would you do about those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything other than making them print some error message on standard error stream.

@@ -438,3 +438,27 @@ def create_client(self, client_id, moncap=None, osdcap=None, mdscap=None):

self.run_ceph_cmd(*cmd)
return self.get_ceph_cmd_stdout(f'auth get {self.client_name}')

def gen_health_warn_mds_cache_oversized(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think these are to specific to the nuances of the test. Keep them to test_admin please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. There are some other places (test_client_limits.py, IIRC) where at least one of the could be reused.

@rishabh-d-dave
Copy link
Contributor Author

@batrick

Please remove qa/cephfs: rectify variable name used for MDS name. This is code churn.

Agreed but it is misguiding too.

@rishabh-d-dave rishabh-d-dave changed the title qa/cephfs: fix TestFSFail.test_with_health_warn_oversize_cache qa/cephfs: fixes to src and qa code for "fs fail" and "mds fai" May 8, 2024
@rishabh-d-dave rishabh-d-dave changed the title qa/cephfs: fixes to src and qa code for "fs fail" and "mds fai" qa/cephfs: fixes for src and qa code for "fs fail" and "mds fai" May 8, 2024
@rishabh-d-dave rishabh-d-dave changed the title qa/cephfs: fixes for src and qa code for "fs fail" and "mds fai" qa/cephfs: fixes for src and qa code for "fs fail" and "mds fail" May 8, 2024
@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave rishabh-d-dave requested a review from a team as a code owner May 9, 2024 06:50
@github-actions github-actions bot added the core label May 9, 2024
@rishabh-d-dave
Copy link
Contributor Author

@batrick

Can this PR be changed to just
qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd qa/cephfs: bring MDS up after running TestFSFail & TestMDSFail
or is there another interdependency I'm missing?

Done.

@batrick with latest change, inter-dependency is no more, so I've moved one of these commits to PR #57493 since you wanted 1 commit per PR. PTAL

cc @vshankar

@rishabh-d-dave rishabh-d-dave changed the title qa/cephfs: fix QA code for "fs fail" and "mds fail" qa/cephfs: set joinable on FS before exiting tests in TestFSFail May 16, 2024
@rishabh-d-dave
Copy link
Contributor Author

Had a conversation with Venky about this, it's good to do a quick QA for this PR so that it can be merge ASAP to make sure test_admin does fine in QA runs -

https://pulpito.ceph.com/rishabh-2024-05-16_07:25:57-fs:functional-main-testing-default-smithi/

@rishabh-d-dave
Copy link
Contributor Author

CI failures look unrelated -
make check: https://jenkins.ceph.com/job/ceph-pull-requests/135100/
ceph api: https://jenkins.ceph.com/job/ceph-api/74128/

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

@rishabh-d-dave
Copy link
Contributor Author

@rishabh-d-dave
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/66065.

rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 16, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Venky Shankar <vshankar@redhat.com>
@rishabh-d-dave
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/66067.

rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 16, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Venky Shankar <vshankar@redhat.com>
After running TestFSFail, CephFSTestCase.tearDown() fails attempting
to unmount CephFS. Set joinable on FS and wait for the MDS to be up
before exiting the test. This will ensure that unmounting is
successful in teardown.

Fixes: https://tracker.ceph.com/issues/65841
Signed-off-by: Rishabh Dave <ridave@redhat.com>
rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 16, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 16, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 16, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 16, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
rishabh-d-dave added a commit to rishabh-d-dave/ceph that referenced this pull request May 17, 2024
* refs/pull/57333/head:
	qa/cephfs: set joinable on FS before exiting tests in TestFSFail

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
@rishabh-d-dave
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants