New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephadm: ISCSI: Allow unlimited number of threads #42214
cephadm: ISCSI: Allow unlimited number of threads #42214
Conversation
src/cephadm/cephadm
Outdated
| @@ -2498,6 +2498,8 @@ def get_container(ctx: CephadmContext, | |||
| # So the container can modprobe iscsi_target_mod and have write perms | |||
| # to configfs we need to make this a privileged container. | |||
| privileged = True | |||
| # --pids-limit 0 to allow more than 2048 threads | |||
| container_args.append('--pids-limit=0') | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are consciously ignoring TasksMax=infinity from systemd/ceph*@.service.in templates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately the cephadm binary has not yet access to the templates there. That depends on #41855 . See
Lines 3094 to 3146 in f0b79b3
| def get_unit_file(ctx, fsid): | |
| # type: (CephadmContext, str) -> str | |
| extra_args = '' | |
| if isinstance(ctx.container_engine, Podman): | |
| extra_args = ('ExecStartPre=-/bin/rm -f %t/%n-pid %t/%n-cid\n' | |
| 'ExecStopPost=-/bin/rm -f %t/%n-pid %t/%n-cid\n' | |
| 'Type=forking\n' | |
| 'PIDFile=%t/%n-pid\n') | |
| if ctx.container_engine.version >= CGROUPS_SPLIT_PODMAN_VERSION: | |
| extra_args += 'Delegate=yes\n' | |
| docker = isinstance(ctx.container_engine, Docker) | |
| u = """# generated by cephadm | |
| [Unit] | |
| Description=Ceph %i for {fsid} | |
| # According to: | |
| # http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget | |
| # these can be removed once ceph-mon will dynamically change network | |
| # configuration. | |
| After=network-online.target local-fs.target time-sync.target{docker_after} | |
| Wants=network-online.target local-fs.target time-sync.target | |
| {docker_requires} | |
| PartOf=ceph-{fsid}.target | |
| Before=ceph-{fsid}.target | |
| [Service] | |
| LimitNOFILE=1048576 | |
| LimitNPROC=1048576 | |
| EnvironmentFile=-/etc/environment | |
| ExecStart=/bin/bash {data_dir}/{fsid}/%i/unit.run | |
| ExecStop=-{container_path} stop ceph-{fsid}-%i | |
| ExecStopPost=-/bin/bash {data_dir}/{fsid}/%i/unit.poststop | |
| KillMode=none | |
| Restart=on-failure | |
| RestartSec=10s | |
| TimeoutStartSec=120 | |
| TimeoutStopSec=120 | |
| StartLimitInterval=30min | |
| StartLimitBurst=5 | |
| {extra_args} | |
| [Install] | |
| WantedBy=ceph-{fsid}.target | |
| """.format(container_path=ctx.container_engine.path, | |
| fsid=fsid, | |
| data_dir=ctx.data_dir, | |
| extra_args=extra_args, | |
| # if docker, we depend on docker.service | |
| docker_after=' docker.service' if docker else '', | |
| docker_requires='Requires=docker.service\n' if docker else '') | |
| return u |
for the current template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all those templates override TasksMax, perhaps TasksMax=infinity could be added to this template in the interim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering because --pids-limit seems to be treated differently between podman and docker:
$ podman run --help | grep pids-limit
--pids-limit int Tune container pids limit (set 0 for unlimited, -1 for server defaults) (default 2048)
$ docker run --help | grep pids-limit
--pids-limit int Tune container pids limit (set -1 for unlimited)
And also because there are other settings in those templates that we are missing in cephadm deployments. For example LimitNOFILE=1048576?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LimitNOFILE etc is there already:
Lines 3122 to 3123 in f0b79b3
| LimitNOFILE=1048576 | |
| LimitNPROC=1048576 |
So this must come from somewhere else. But you're right, we should probably see, if we can adopt other settings as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed the shared section. Gihub quotes can be deceiving...
So why TasksMax=infinity isn't there in the shared section? Is there a particular reason it was omitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed. Does this work for you? Yes, we need to get the service files in sync definitely.
e79800f
to
e7bd53a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fix the tcmu-runner container TaskMax value as the container isn't managed by systemd.
By any chance, do you know what needs to be done to fix this? |
|
I suspect my initial idea of passing |
|
I think we have to do both:
|
7f96322
to
8b78f5b
Compare
It looks like using Only using |
Limits (like e.g. TaskMax) of the tcmu-container are not managed by systemd, as the container is executed in the background. Thus we have to set again here. Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
8b78f5b
to
71da44c
Compare
|
@dsavineau + @idryomov : in pacific, with the exception of the tcmu-runner, the cgroups are indeed shared between the systemd unit and the containers. Haven't yet landed in downstream, but in upstream this fix should be enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @lxbsz has indicated that we want unlimited threads for both iscsi containers (i.e. non tcmu-runner container too), so this workaround appears incomplete to me.
|
it seems like more changes might be wanted here but at least the current version of this is passing |
Yeah, we need to fix this in both containers, or we will hit the same issue sooner or later. In that BZ I hit the threads limitation issue in the tcmu-runner container and Gopi hit it in the ceph-iscsi container instead. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This PR is replaced by #44579. |
Otherwise we're not able to create max luns per target.
Signed-off-by: Sebastian Wagner sewagner@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox