Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/nfs: use object_format decorators to simplify response handling #46209

Merged
merged 17 commits into from
Jan 22, 2023

Conversation

phlogistonjohn
Copy link
Contributor

Depends on: #45467

The changes above add a new python module to the mgr - a general way to handle mgr command responses and formatting. As yet it is unused. This series of changes converts the nfs mgr module to use the object_format decorators. This serves as both a demonstration of how the decorators can be used and how they benefit a mgr module - the code within the module becomes simpler and more pythonic. Only at the "outermost" layer do we concern ourselves with creating mgr response tuples. Because the nfs module is one of sufficient complexity and history not all existing APIs can use the simplest object_format.Responder decorator. A few edge cases (discussed below) were found and this required the addition of two simpler decorators "EmptyResponder" and "ErrorResponseHandler".

There are approximately three kinds of conversion that were done:

  • A function already returned JSON and could be mapped directly to Responder
  • A function makes changes to system state and (usually) returned no information were mapped to EmptyResponder
  • A function passed raw data stored in a rados object to the client. The one case of this we used the ErrorResponseHandler to ensure exceptions of the right base class are converted to response tuples. Otherwise the only other change was to isolate the mgr response logic to the CLICommand function.

I am filing this PR in draft state until the predecessor pr is merged. As it'll be in draft, discussion of things like the general approach and naming are quite welcome.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@phlogistonjohn
Copy link
Contributor Author

CC: @adk3798 @rkachach @epuertat
@ajarr may also be interested since you were making changes related to this topic recently

FWIW I still need to test parts of this. I'll leave it in draft until that's been done.

Note: I picked the nfs module because I recently worked in this area and was somewhat familiar with the module. If people have suggestions for other modules that would benefit from being updated to use object_format, please let me know and we can discuss that. I find that only when the "rubber hits the road" do we find out stuff that could be missing from the general approach of #45467.

@github-actions github-actions bot added build/ops CI Continuous Integration labels May 13, 2022
@phlogistonjohn phlogistonjohn force-pushed the jjm-format-nfs-mod branch 2 times, most recently from f9d56bd to 695762b Compare May 24, 2022 15:35
@phlogistonjohn phlogistonjohn marked this pull request as ready for review May 24, 2022 17:46
@djgalloway djgalloway changed the base branch from master to main May 25, 2022 19:59
Copy link
Contributor

@adk3798 adk3798 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty great. Even the conversion process from returning the old tuple to using the decorators looks fairly straightforward. Just a few minor things and questions from me.

src/pybind/mgr/object_format.py Outdated Show resolved Hide resolved
src/pybind/mgr/object_format.py Outdated Show resolved Hide resolved
src/pybind/mgr/nfs/export.py Outdated Show resolved Hide resolved
self._delete_export(cluster_id=cluster_id, pseudo_path=None,
export_obj=export)
except Exception as e:
raise NFSException(f"Failed to delete export {export.export_id}: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the fact that we're not raising an ErrorResponse wrapping the other exception here like in a lot of other cases related to the fact that we're called by a function using the EmptyResponder decorator rather than the regular Responder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should find most (all?) of the ErrorResponse exceptions are being added where the code used to return error-condition tuples. In this case the code was already raising an NFSException. AFAICT, the only place that calls delete_all_exports is delete_nfs_cluster in cluster.py, which will end up catching all exceptions and using ErrorResponse.wrap on the caught exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until I looked just now I assumed that some other code might call delete_all_exports and catch NFSException. I still think its worth changing as little as possible, though.

return 0, "", ""
return 0, "", "Cluster does not exist"
return
raise ErrorResponse("Cluster does not exist",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this previously returned a a return code 0 even while including an error message. Is the change to now return -errno.ENOENT instead for a return code on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That was probably an oversight, but I'm not sure what's best, retaining the somewhat strange old behavior of sending the success error code when this is clearly an error or trying to avoid any behavior change here. Would be happy to hear your opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends if we think it's more important for the command to be idempotent or to make sure users are aware that the cluster they deleted isn't there anymore. We actually had a command like this in cephadm that would just print success if you tried to delete a nonexistent service, but eventually somebody requested we have it print an error when you do that. So I guess maybe raising here is the right thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my subconscious was probably encouraging the idea that errors should be errors. Otherwise it is not worth reporting at all. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajarr flagged this in our standup today. See my review comment. :)

src/pybind/mgr/nfs/module.py Show resolved Hide resolved
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@phlogistonjohn phlogistonjohn force-pushed the jjm-format-nfs-mod branch 2 times, most recently from e64a882 to 0f1fa6b Compare June 15, 2022 21:10
@phlogistonjohn
Copy link
Contributor Author

phlogistonjohn commented Jun 15, 2022

I've applied a number of small, what I figured would be noncontroversial, changes. There are a few open sub-threads that I have not taken before I hear back from the reviewers. Threads I've acted upon should be marked as such.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
…ator

Signed-off-by: John Mulligan <jmulligan@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
The "export apply" functionality is unusual in that it allows either
one or multiple nested requests to change or create an export.
The previous implementation would concatenate the results of
multiple change operations into a single string. It also would continue
making changes if one case failed, adding the error to the string
and setting a non-zero error code.

The updated version keeps the general behavior but returns structured
JSON (or other formatted data) with one entry per change request.  In
order to accomplish this and match the old behavior as closely as
possible we add an intermediate type (AppliedExportResults) that can
return both the response data (the `to_simplified` method) and track if
there was a failure mixed in with the various updates (the
`mgr_return_value` method).

Signed-off-by: John Mulligan <jmulligan@redhat.com>
Signed-off-by: John Mulligan <jmulligan@redhat.com>
…rator

Signed-off-by: John Mulligan <jmulligan@redhat.com>
…decorator

Signed-off-by: John Mulligan <jmulligan@redhat.com>
This decorator is no longer needed as equivalent functionality is
handled internally by the class.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
This function is now unused as we no longer need to coerce exceptions
into response tuples at the layer in the code.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
These formatting changes are made by autopep8 when running tox.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
The patches that add object formatting / decorators to the nfs module
also made error handling more generic when accessing an nfs cluster and
now returns a nonzero exit code. A test was after the PR adding the
object format support that only checked an error message.
Update the test to match the new nfs module behavior as well as fixing a
typo.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
@phlogistonjohn
Copy link
Contributor Author

@adk3798 when do you think we can next retry merging this?

@adk3798
Copy link
Contributor

adk3798 commented Jan 18, 2023

@adk3798 when do you think we can next retry merging this?

as soon as centos 8 builds are possible again, which I think is when pushing to and pulling from the quay.ceph.io container registry works again. I tried testing it last weekend but that's when the builds broke. Since this seemed to cause a test failure in the last run we can't really merge it without checking the patch you added for the test works.

@adk3798
Copy link
Contributor

adk3798 commented Jan 22, 2023

https://pulpito.ceph.com/adking-2023-01-21_05:38:21-orch:cephadm-wip-adk-testing-2023-01-20-1359-distro-default-smithi/

Lots of failures (13) but all accounted for

Overall, PRs in the run should be okay to merge other than the mon crush location one causing failures. Will start to try and clean up the test suite now that we're able to make builds and run tests again.

@adk3798 adk3798 merged commit 919b244 into ceph:main Jan 22, 2023
@adk3798
Copy link
Contributor

adk3798 commented Jan 22, 2023

@phlogistonjohn were we planning to backport this t oquincy? I know we did backport the original PR that added the decorator.

@phlogistonjohn
Copy link
Contributor Author

No, IIRC we discussed this in one of the Weekly meetings and opted not to backport it. If we find out that not backporting it means we have difficulties backporting other things we can "easily" change course and backport this series later.

nizamial09 added a commit to rhcs-dashboard/ceph that referenced this pull request Feb 9, 2023
when you create an nfs export from dashboard it leaves this traceback and error

```
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard ERROR taskexec] Error while calling Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
                                             Traceback (most recent call last):
                                               File "/usr/share/ceph/mgr/dashboard/tools.py", line 550, in _run
                                                 val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs)  # type: ignore
                                               File "/usr/share/ceph/mgr/dashboard/controllers/nfs.py", line 148, in create
                                                 ret, _, err = export_mgr.apply_export(cluster_id, json.dumps(raw_ex))
                                             TypeError: 'AppliedExportResults' object is not iterable
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO taskmgr] finished Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO request] [::ffff:192.168.100.1:43896] [POST] [500] [0.767s] [admin] [172.0B] /api/nfs-ganesha/export
```
This started after ceph#46209, so dashboard code needs to be adapted

Fixes: https://tracker.ceph.com/issues/58681
Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this pull request Feb 9, 2023
when you create an nfs export from dashboard it leaves this traceback and error

```
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard ERROR taskexec] Error while calling Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
                                             Traceback (most recent call last):
                                               File "/usr/share/ceph/mgr/dashboard/tools.py", line 550, in _run
                                                 val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs)  # type: ignore
                                               File "/usr/share/ceph/mgr/dashboard/controllers/nfs.py", line 148, in create
                                                 ret, _, err = export_mgr.apply_export(cluster_id, json.dumps(raw_ex))
                                             TypeError: 'AppliedExportResults' object is not iterable
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO taskmgr] finished Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO request] [::ffff:192.168.100.1:43896] [POST] [500] [0.767s] [admin] [172.0B] /api/nfs-ganesha/export
```
This started after ceph#46209, so dashboard code needs to be adapted

Fixes: https://tracker.ceph.com/issues/58681
Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this pull request Feb 13, 2023
when you create an nfs export from dashboard it leaves this traceback and error

```
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard ERROR taskexec] Error while calling Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
                                             Traceback (most recent call last):
                                               File "/usr/share/ceph/mgr/dashboard/tools.py", line 550, in _run
                                                 val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs)  # type: ignore
                                               File "/usr/share/ceph/mgr/dashboard/controllers/nfs.py", line 148, in create
                                                 ret, _, err = export_mgr.apply_export(cluster_id, json.dumps(raw_ex))
                                             TypeError: 'AppliedExportResults' object is not iterable
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO taskmgr] finished Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO request] [::ffff:192.168.100.1:43896] [POST] [500] [0.767s] [admin] [172.0B] /api/nfs-ganesha/export
```
This started after ceph#46209, so dashboard code needs to be adapted

Fixes: https://tracker.ceph.com/issues/58681
Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this pull request Feb 13, 2023
when you create/edit an nfs export from dashboard it leaves this traceback and error

```
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard ERROR taskexec] Error while calling Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
                                             Traceback (most recent call last):
                                               File "/usr/share/ceph/mgr/dashboard/tools.py", line 550, in _run
                                                 val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs)  # type: ignore
                                               File "/usr/share/ceph/mgr/dashboard/controllers/nfs.py", line 148, in create
                                                 ret, _, err = export_mgr.apply_export(cluster_id, json.dumps(raw_ex))
                                             TypeError: 'AppliedExportResults' object is not iterable
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO taskmgr] finished Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO request] [::ffff:192.168.100.1:43896] [POST] [500] [0.767s] [admin] [172.0B] /api/nfs-ganesha/export
```
This started after ceph#46209, so dashboard code needs to be adapted

Fixes: https://tracker.ceph.com/issues/58681
Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this pull request Feb 14, 2023
when you create/edit an nfs export from dashboard it leaves this traceback and error

```
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard ERROR taskexec] Error while calling Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
                                             Traceback (most recent call last):
                                               File "/usr/share/ceph/mgr/dashboard/tools.py", line 550, in _run
                                                 val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs)  # type: ignore
                                               File "/usr/share/ceph/mgr/dashboard/controllers/nfs.py", line 148, in create
                                                 ret, _, err = export_mgr.apply_export(cluster_id, json.dumps(raw_ex))
                                             TypeError: 'AppliedExportResults' object is not iterable
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO taskmgr] finished Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO request] [::ffff:192.168.100.1:43896] [POST] [500] [0.767s] [admin] [172.0B] /api/nfs-ganesha/export
```
This started after ceph#46209, so dashboard code needs to be adapted

Fixes: https://tracker.ceph.com/issues/58681
Signed-off-by: Nizamudeen A <nia@redhat.com>
nizamial09 added a commit to rhcs-dashboard/ceph that referenced this pull request Feb 14, 2023
when you create/edit an nfs export from dashboard it leaves this traceback and error

```
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard ERROR taskexec] Error while calling Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
                                             Traceback (most recent call last):
                                               File "/usr/share/ceph/mgr/dashboard/tools.py", line 550, in _run
                                                 val = self.task.fn(*self.task.fn_args, **self.task.fn_kwargs)  # type: ignore
                                               File "/usr/share/ceph/mgr/dashboard/controllers/nfs.py", line 148, in create
                                                 ret, _, err = export_mgr.apply_export(cluster_id, json.dumps(raw_ex))
                                             TypeError: 'AppliedExportResults' object is not iterable
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO taskmgr] finished Task(ns=nfs/create, md={'path': 'e2e.nfs.bucket', 'fsal': 'RGW', 'cluster_id': 'testnfs'})
Feb 09 14:15:54 ceph-node-00 ceph-mgr[3235]: [dashboard INFO request] [::ffff:192.168.100.1:43896] [POST] [500] [0.767s] [admin] [172.0B] /api/nfs-ganesha/export
```
This started after ceph#46209, so dashboard code needs to be adapted

Fixes: https://tracker.ceph.com/issues/58681
Signed-off-by: Nizamudeen A <nia@redhat.com>
@phlogistonjohn phlogistonjohn deleted the jjm-format-nfs-mod branch February 22, 2023 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants