Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup1 delete: proceed to the next subsystem when a cgroup is not found #296

Merged
merged 2 commits into from
Jun 20, 2023

Conversation

singholt
Copy link
Contributor

@singholt singholt commented Jun 10, 2023

Signed-off-by: Anuj Singh singholt@amazon.com

The cgroup1 Delete method checks for running processes within each subsystem, before deleting the cgroup from it.

I noticed that there are cases when removing a cgroup from one subsystem, also removes it from others.
For example, on Amazon Linux (kernel 4.14), removing a cgroup from net_cls subsystem also removes the cgroup from net_prio. This is because both net_cls and net_prio have a symlink to a common dir net_cls, net_prio. Similarly, removing a cgroup from cpu also removes it from the cpuacct subsystem, because of a symlink to cpu,cpuacct.

ls -l /sys/fs/cgroup/net_cls /sys/fs/cgroup/net_prio
/sys/fs/cgroup/net_cls -> net_cls,net_prio
/sys/fs/cgroup/net_prio -> net_cls,net_prio 

In such a case the Delete() errors out with an error like

lstat /sys/fs/cgroup/net_prio/test: no such file or directory

Sample code: https://gist.github.com/singholt/2a600802c0f74c8acdacd247760dc291 that fails on the Delete() without this fix but passes with it.

This change handles the above case, when a query to get cgroup processes within a subsystem returns a fs.ErrNotExist. In such a case, we move to the next subsystem, instead of erroring out of the Delete() immediately. This ensures that the cgroup is deleted from all subsystems that have it.

As per cgroups v1 documentation, since files in a cgroup directory cannot and need not be removed, it is safe to assume that if cgroup.procs doesn't exist for a cgroup (within a subsystem), the cgroup has been removed from the subsystem.

This PR has two commits - first addressing the problem above, with the unit test updated and second, just fixing a typo.

@singholt singholt force-pushed the main branch 2 times, most recently from 54ad06b to 9e56b3e Compare June 15, 2023 15:59
The Delete method checks for running processes within each subsystem, before deleting the cgroup from it. On certain systems, there are cases when removing a cgroup from one subsystem, also removes it from others.
For example, removing a cgroup from net_cls subsystem also removes the cgroup from net_prio. This is because both net_cls and net_prio have a symlink to 'net_cls,net_prio'. This is also true for cpu and cpuacct.

This change handles the above case, when a query to get cgroup processes within a subsystem returns a fs.ErrNotExist. In such a case, we move to the next subsystem, instead of erroring out of Delete() immediately.
This ensures that the cgroup is deleted from all subsystems that have it.

Signed-off-by: Anuj Singh <singholt@amazon.com>
Signed-off-by: Anuj Singh <singholt@amazon.com>
@singholt singholt changed the title cgroup delete: proceed to the next subsystem when a cgroup is not found cgroup1 delete: proceed to the next subsystem when a cgroup is not found Jun 15, 2023
Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@singholt
Copy link
Contributor Author

thanks for reviewing Phil! It looks like I may need another review, perhaps @kzys ? I'd appreciate if you could take a look 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants