New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephadm: set pids-limit unlimited for all ceph daemons #50083
Conversation
jenkins test api |
We actually had this setup before, but ran into issues. Some teuthology test had failed in the fs suite, so it was modified to only affect iscsi and rgw daemons (ceph#45798) and then the changes were reverted entirely (so no pids-limit modifying code at all) in quincy and pacific because the LRC ran into issues with the change related to the podman version (ceph#45932). This new patch now addresses the podman versions, specifically that the patch that makes -1 work for a pids-limit seems to have landed in podman 3.4.1 based on containers/podman#12040. We'll need to make sure that this doesn't break anything in the fs suites again as I don't remember the details of the first issue, or why having it only set the pids-limit for iscsi and rgw fixes it. Assuming that isn't a problem we should hopefully be able to unify at least how reef and quincy handle this now that the podman version issue is being addressed in this patch. See the linked tracker issue for a discussion on why we're going at this again and why I'm trying to do this for all ceph daemon types. Fixes: https://tracker.ceph.com/issues/58685 Signed-off-by: Adam King <adking@redhat.com>
There is an existing test for both iSCSI containers, let's make sure it still passes. |
failed/dead job reruns: https://pulpito.ceph.com/adking-2023-02-24_17:44:54-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/ After reruns, 3 failures and 1 dead job
Another instance of this test passed in original run so didn't bother with another rerun
Overall, nothing to block merging. Will note initial version of basic monitoring stack test passed.
the iscsi pids limit test passed in the reruns (and initial failure was teuthology/infra issue) https://pulpito.ceph.com/adking-2023-02-24_17:44:54-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/7186371 |
We actually had this setup before, but ran into issues. Some teuthology test had failed in the fs suite, so it was modified to only affect iscsi and rgw daemons (#45798) and then the changes were reverted entirely (so no pids-limit modifying code at all) in quincy and pacific because the LRC ran into issues with the change related to the podman version (#45932). This new patch now addresses the podman versions, specifically that the patch that makes -1 work for a pids-limit seems to have landed in podman 3.4.1 based on containers/podman#12040. We'll need to make sure that this doesn't break anything in the fs suites again as I don't remember the details of the first issue, or why having it only set the pids-limit for iscsi and rgw fixes it. Assuming that isn't a problem we should hopefully be able to unify at least how reef and quincy handle this now that the podman version issue is being addressed in this patch.
See the linked tracker issue for a discussion on why we're going at this again and why I'm trying to do this for all ceph daemon types.
Fixes: https://tracker.ceph.com/issues/58685
Signed-off-by: Adam King adking@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows