Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[skip ci] container: add pids limit parameter #6777

Merged
merged 1 commit into from Aug 4, 2021

Conversation

asm0deuz
Copy link
Collaborator

@asm0deuz asm0deuz commented Jul 29, 2021

Description of problem:

RGW container fails to start (on a RHEL8 setup) with:

Jul 15 14:20:43 servera kernel: cgroup: fork rejected by pids controller in /machine.slice/libpod-54832992fcbbbf92b0d10d0491f7ff987728bec87c1c55b79cb3921c6f503f49.scope
Jul 15 14:20:43 servera conmon[34853]: terminate called after throwing an instance of 'std::system_error'
Jul 15 14:20:43 servera conmon[34853]:  what():  Resource temporarily unavailable
Jul 15 14:20:43 servera conmon[34853]: *** Caught signal (Aborted) **
Jul 15 14:20:43 servera conmon[34853]: in thread 7f5d605e1280 

The podman default pids-limit is set to 2048.

$ grep . sys/fs/cgroup/pids/machine.slice/libpod-*/pids.max
sys/fs/cgroup/pids/machine.slice/libpod-9336707e04da464b9128b7c57a0ee9b70efc5acb5207ea03ab413583c2264283.scope/pids.max:2048
sys/fs/cgroup/pids/machine.slice/libpod-d0eb2257c2fb371e015af9b68c716ed280399d9d2e88925aad9323ac26f659f3.scope/pids.max:2048

While this value of 2048 is more than sufficient when the rgw thread pool size uses its default value of 512, when rgw thread pool size is increased up to a value near to the pids-limit value, it does not leave place for the other processes to spawn and run within the container and the container crashes.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041

Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you squash your commit ?

Also the --pids-limit parameter isn't specific to podman, the same exists for docker too.

Regarding the value, I'm not sure if it's a good idea to hardcode 2048 even if it's the defaul podman value then one could have override this.

@dsavineau
Copy link
Contributor

Also the --pids-limit parameter isn't specific to podman, the same exists for docker too.

Just to clarify, --pids-limit is available for docker too but the default is different (4096 on docker 1.13.1 for instance).

@guits I'm wondering if we should just set unlimited pids for all containers. What do you think ?

@guits
Copy link
Collaborator

guits commented Jul 30, 2021

@guits I'm wondering if we should just set unlimited pids for all containers. What do you think ?

that seems to be the easiest fix. I'm just not sure what could be the impact.

@asm0deuz
Copy link
Collaborator Author

Just did the test on my lab using --pids-limit=0 (unlimited). The max is limited by the systemd setting DefaultTasksMax:

ansible@servera]# sudo systemctl show -p DefaultTasksMax
DefaultTasksMax=49484

[ansible@servera ~]$ sudo cat /sys/fs/cgroup/pids/machine.slice/libpod-907d46541475e6234f80c78facd3aeb9c552da0ae419c99df17d066000d220d3.scope/pids.max
49484

With the rgw thread pool size default value of 512 the pids.current value is:

[ansible@servera ~]$ cat /sys/fs/cgroup/pids/machine.slice/libpod-907d46541475e6234f80c78facd3aeb9c552da0ae419c99df17d066000d220d3.scope/pids.current
601

When increasing rgw thread pool size up to 2048:

[ansible@servera ~]$ cat /sys/fs/cgroup/pids/machine.slice/libpod-c9e69c9ab167f50a5a6f17a95501fcb97798c3069c35212f0ca9c9bd52262a3e.scope/pids.current
2137

Delta between rgw thread pool size and pids.current for both cases is 89...

Either we go for the unlimited value or we use some kind of formula to calculate the pids-limit for the container.

@guits
Copy link
Collaborator

guits commented Aug 2, 2021

Just did the test on my lab using --pids-limit=0 (unlimited). The max is limited by the systemd setting DefaultTasksMax:

ansible@servera]# sudo systemctl show -p DefaultTasksMax
DefaultTasksMax=49484

[ansible@servera ~]$ sudo cat /sys/fs/cgroup/pids/machine.slice/libpod-907d46541475e6234f80c78facd3aeb9c552da0ae419c99df17d066000d220d3.scope/pids.max
49484

With the rgw thread pool size default value of 512 the pids.current value is:

[ansible@servera ~]$ cat /sys/fs/cgroup/pids/machine.slice/libpod-907d46541475e6234f80c78facd3aeb9c552da0ae419c99df17d066000d220d3.scope/pids.current
601

When increasing rgw thread pool size up to 2048:

[ansible@servera ~]$ cat /sys/fs/cgroup/pids/machine.slice/libpod-c9e69c9ab167f50a5a6f17a95501fcb97798c3069c35212f0ca9c9bd52262a3e.scope/pids.current
2137

Delta between rgw thread pool size and pids.current for both cases is 89...

Either we go for the unlimited value or we use some kind of formula to calculate the pids-limit for the container.

let's go with unlimited value

@guits
Copy link
Collaborator

guits commented Aug 2, 2021

@guits I'm wondering if we should just set unlimited pids for all containers. What do you think ?

@dsavineau is that really relevant to do this for all Ceph services?

@guits
Copy link
Collaborator

guits commented Aug 3, 2021

jenkins test centos-container-all_in_one

@guits
Copy link
Collaborator

guits commented Aug 3, 2021

jenkins test centos-container-external_clients

@guits
Copy link
Collaborator

guits commented Aug 3, 2021

jenkins test centos-container-update

@guits
Copy link
Collaborator

guits commented Aug 3, 2021

jenkins test centos-non_container-all_daemons

@guits
Copy link
Collaborator

guits commented Aug 3, 2021

jenkins test centos-non_container-collocation

@guits
Copy link
Collaborator

guits commented Aug 3, 2021

jenkins test centos-non_container-lvm_osds

@dsavineau dsavineau changed the title Ceph rgw/podman pids limit container: add pids limit parameter Aug 3, 2021
Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except the missing backslash for ceph-mgr, this looks good

The only thing needed here, is to squash your two commits and amend it to add the BZ link (like Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041 in the commit body) just before the signed-off line

roles/ceph-mgr/templates/ceph-mgr.service.j2 Outdated Show resolved Hide resolved
@dsavineau dsavineau changed the title container: add pids limit parameter [skip ci] container: add pids limit parameter Aug 3, 2021
sufficient for the default value (512) of rgw thread pool size.
But if its value is increased near to the pids-limit value,
it does not leave place for the other processes to spawn and run within
the container and the container crashes.

pids-limit set to unlimited regardless of the container engine.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1987041

Signed-off-by: Teoman ONAY <tonay@redhat.com>
@dsavineau dsavineau requested a review from guits August 3, 2021 19:14
@dsavineau dsavineau changed the title [skip ci] container: add pids limit parameter container: add pids limit parameter Aug 3, 2021
@dsavineau
Copy link
Contributor

jenkins test centos-container-all_in_one

@dsavineau
Copy link
Contributor

jenkins test centos-container-all_daemons

@dsavineau
Copy link
Contributor

jenkins test centos-container-lvm_osds

@dsavineau
Copy link
Contributor

jenkins test centos-container-lvm_batch

@dsavineau
Copy link
Contributor

jenkins test centos-container-external_clients

@guits
Copy link
Collaborator

guits commented Aug 4, 2021

jenkins test centos-container-all_in_one

@asm0deuz asm0deuz requested a review from dsavineau August 4, 2021 08:14
@guits guits changed the title container: add pids limit parameter [skip ci] container: add pids limit parameter Aug 4, 2021
@guits guits merged commit 9b5d97a into ceph:master Aug 4, 2021
@sebastian-philipp
Copy link

See ceph/ceph#42214 for the corresponding cephadm PR. Would be great, if someone could that PR pr over , as I'm too ignorant in this particularity.

@asm0deuz asm0deuz deleted the ceph-rgw/podman-pids-limit branch August 17, 2021 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants