Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephadm: set pids-limit unlimited for all ceph daemons #50083

Merged
merged 1 commit into from Feb 28, 2023

Conversation

adk3798
Copy link
Contributor

@adk3798 adk3798 commented Feb 12, 2023

We actually had this setup before, but ran into issues. Some teuthology test had failed in the fs suite, so it was modified to only affect iscsi and rgw daemons (#45798) and then the changes were reverted entirely (so no pids-limit modifying code at all) in quincy and pacific because the LRC ran into issues with the change related to the podman version (#45932). This new patch now addresses the podman versions, specifically that the patch that makes -1 work for a pids-limit seems to have landed in podman 3.4.1 based on containers/podman#12040. We'll need to make sure that this doesn't break anything in the fs suites again as I don't remember the details of the first issue, or why having it only set the pids-limit for iscsi and rgw fixes it. Assuming that isn't a problem we should hopefully be able to unify at least how reef and quincy handle this now that the podman version issue is being addressed in this patch.

See the linked tracker issue for a discussion on why we're going at this again and why I'm trying to do this for all ceph daemon types.

Fixes: https://tracker.ceph.com/issues/58685

Signed-off-by: Adam King adking@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@adk3798 adk3798 requested a review from a team as a code owner February 12, 2023 20:41
@vumrao vumrao requested review from a team February 13, 2023 17:43
@adk3798
Copy link
Contributor Author

adk3798 commented Feb 13, 2023

jenkins test api

We actually had this setup before, but ran into issues.
Some teuthology test had failed in the fs suite, so it was
modified to only affect iscsi and rgw daemons (ceph#45798)
and then the changes were reverted entirely (so no pids-limit
modifying code at all) in quincy and pacific because
the LRC ran into issues with the change related to the podman
version (ceph#45932). This new patch
now addresses the podman versions, specifically that the patch
that makes -1 work for a pids-limit seems to have landed in
podman 3.4.1 based on containers/podman#12040.
We'll need to make sure that this doesn't break anything in the
fs suites again as I don't remember the details of the first
issue, or why having it only set the pids-limit for iscsi and rgw fixes it.
Assuming that isn't a problem we should hopefully be able to unify
at least how reef and quincy handle this now that the podman version
issue is being addressed in this patch.

See the linked tracker issue for a discussion on why we're going at
this again and why I'm trying to do this for all ceph daemon types.

Fixes: https://tracker.ceph.com/issues/58685

Signed-off-by: Adam King <adking@redhat.com>
@idryomov
Copy link
Contributor

  * [x]  No tests

There is an existing test for both iSCSI containers, let's make sure it still passes.

@vumrao vumrao self-requested a review February 15, 2023 20:33
@adk3798
Copy link
Contributor Author

adk3798 commented Feb 27, 2023

https://pulpito.ceph.com/adking-2023-02-21_05:38:18-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/

failed/dead job reruns: https://pulpito.ceph.com/adking-2023-02-24_17:44:54-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/

After reruns, 3 failures and 1 dead job

  • dead job was failure pulling a podman package
Failed to download packages: podman-docker-3:4.3.1-2.module_el8.8.0+1254+78119b6e.noarch:
  Cannot download, all mirrors were already tried without success

Another instance of this test passed in original run so didn't bother with another rerun

  • 2 failures were staggered upgrade test issue tracked by https://tracker.ceph.com/issues/58535
  • last failure was in currently flaky test_nfs task test. Passed when I tried an interactive rerun for debugging

Overall, nothing to block merging. Will note initial version of basic monitoring stack test passed.

  * [x]  No tests

There is an existing test for both iSCSI containers, let's make sure it still passes.

the iscsi pids limit test passed in the reruns (and initial failure was teuthology/infra issue) https://pulpito.ceph.com/adking-2023-02-24_17:44:54-orch:cephadm-wip-adk-testing-2023-02-20-1650-distro-default-smithi/7186371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants