New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr: Add one finisher thread per module #47893
Conversation
|
@kotreshhr @batrick I'm not against the approach taken, however, I would like to check if the point made here(*) is worth considering? |
I think the issue with that approach is you'd still need to somehow queue other events like mdsmap/osdmap notifications or even subsequent CLI commands. How many async commands should run at once? Leave it to the modules to decide? It gets messy quickly, especially with ordering guarantees. This approach keeps the sequential nature of commands/events but prevents one module from blocking others from operating. I think it's a nice compromise. |
I'm a tiny bit worried about bloating up ceph-mgr's memory usage with this approach. It solves the issue in hand, however, we are adding up a finite memory (queue, locks, etc..) for each mgr module. IIRC, there was some discussion (somewhere) about trimming ceph-mgr's memory footprint. @badone ^^ |
I don't recall. Are you sure it was me involved (did you mean Boris maybe)? |
Maybe, I'm not sure. It was a while back... Anyway, we need more eyes on this change. |
|
@vshankar Do you see it adding significantly to the memory usage of the manager? Seems to me the additional overhead of having a finisher for each module plus a generic one shouldn't be massively different to having just a generic one since the content of the combined finisher queues should be about the same size as the queue of the finisher in the one finisher approach right? |
There would be one thread per mgr plugin. That's different that having one thread which is the case now. And then, there is a (finisher) thread even for modules that are not enabled (which can be worked around). But still... How many plugins do we have now in ceph-mgr? |
|
In the mid-to-high thirties if my quick count is correct but they are not all enabled/active of course. |
Yes, there is nothing changed w.r.t queue size. With this approach, the queue size is distributed among multiple threads which was with the single thread before. The added memory usage is the new Finisher object per new thread. I am not very sure yet how much that would add up to resource consumption. |
|
I guess this change is fine since noone has shown objection on the approach taken here. @badone - @kotreshhr has run additionally this through rados:mgr. Would that be enough as a source of validation? |
|
ping? |
ping? |
Have Patrick's concerns been addressed? It's still showing a requested change. |
@badone I had missed noticing Patrick comment. I am working on it. I will refresh the PR once done. Are there any additional testing that you think which needs to be run ? |
|
@kotreshhr I think you can just run the same suites again on our updated PR and that should be sufficient. |
2467508
to
48f5e65
Compare
|
jenkins test windows |
|
Scheduled fs and rados suite: |
|
jenkins test api |
Yes, Done. Thank you the review. It was added as a separate suite initially as it required a single mgr cluster. I fixed it down the line but the new suite remained. |
Sure, the changes are in place but I couldn't do sanity test since teuthology is down. I will do it once it's back and add |
af37e52
to
929eb07
Compare
|
@yuriw marked |
|
jenkins test windows |
1 similar comment
|
jenkins test windows |
|
jenkins test api |
|
jenkins test windows |
2 similar comments
|
jenkins test windows |
|
jenkins test windows |
|
@athanatos PTAL. Addressed your comments |
|
Repinging @athanatos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suite changes look reasonable to me now -- as long as tests pass LGTM
| if val >= minval: | ||
| if isinstance(minval, int) and val >= minval: | ||
| seen.add(name) | ||
| elif isinstance(expected_val, int) and val == expected_val: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kotreshhr This caused the following errors:
2023-03-28T10:34:36.302 ERROR:teuthology.run_tasks:Manager failed: check-counter
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_main/teuthology/run_tasks.py", line 192, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/home/teuthworker/src/git.ceph.com_teuthology_main/teuthology/task/__init__.py", line 132, in __exit__
self.end()
File "/home/teuthworker/src/github.com_ceph_ceph-c_9d12d2cb27d4a45c58218390bfbb56820b010b77/qa/tasks/check_counter.py", line 117, in end
elif isinstance(expected_val, int) and val == expected_val:
UnboundLocalError: local variable 'expected_val' referenced before assignment
More detail please see https://pulpito.ceph.com/yuriw-2023-03-28_00:15:34-fs-wip-yuri5-testing-2023-03-27-1315-distro-default-smithi/.
Could you have a look ? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Xiubo. I will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it. Ran the failed one with the fix. It works. Hopefully this is the last failure.
http://pulpito.front.sepia.ceph.com/khiremat-2023-03-29_06:17:28-fs-wip-yuri5-testing-2023-03-27-1315-distro-default-smithi/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuriw This needs a qa re-run after this fix.
Fixes: https://tracker.ceph.com/issues/51177 Signed-off-by: Kotresh HR <khiremat@redhat.com>
929eb07
to
2c2ef6d
Compare
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work.
Adds a finisher thread for each module. Each module commands are
executed via corresponding finisher thread. There is generic
finisher thread via which all other commands like notify, config
change are run.
Fixes: https://tracker.ceph.com/issues/51177
Signed-off-by: Kotresh HR khiremat@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows