Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate memory consumption of che-operator with growing number of namespaces on the cluster #20647

Closed
tolusha opened this issue Oct 18, 2021 · 4 comments · Fixed by eclipse-che/che-operator#1166
Assignees
Labels
area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator kind/task Internal things, technical debt, and to-do tasks to be performed. new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@tolusha
Copy link
Contributor

tolusha commented Oct 18, 2021

Is your task related to a problem? Please describe

che-operator pod is OOMKilled when there are a lot of namespaces on the cluster.
It has been fixed by [1] but still I can observer that operator consumes too much memory.
We have to find the reason and fix it.

[1] https://issues.redhat.com/browse/CRW-2383
[2] eclipse-che/che-operator#1146

Describe the solution you'd like

N/A

Describe alternatives you've considered

No response

Additional context

devfile/devworkspace-operator#616
#20529

Release Notes Text

Title: Improved operator memory consumption
Content: Fixed an operator issue that forced to augment the memory requirement based on the number of the namespaces on the cluster.

@tolusha tolusha added kind/task Internal things, technical debt, and to-do tasks to be performed. sprint/next severity/P1 Has a major impact to usage or development of the system. area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator team/deploy labels Oct 18, 2021
@tolusha tolusha mentioned this issue Oct 18, 2021
25 tasks
@mmorhun mmorhun self-assigned this Oct 19, 2021
@tolusha tolusha added this to the 7.38 milestone Oct 20, 2021
@mmorhun mmorhun modified the milestones: 7.38, 7.39 Oct 26, 2021
@ibuziuk
Copy link
Member

ibuziuk commented Oct 27, 2021

FYI here is how the memory consumption pattern looks like on the Sandbox clusters:

image

image

image

@tolusha
Copy link
Contributor Author

tolusha commented Nov 19, 2021

Don't close issue until doc is ready

@tolusha tolusha reopened this Nov 19, 2021
@l0rd l0rd added new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes status/release-notes-review-needed Issues that needs to be reviewed by the doc team for the Release Notes wording labels Nov 23, 2021
@tolusha tolusha closed this as completed Nov 26, 2021
@mmorhun
Copy link
Contributor

mmorhun commented Nov 30, 2021

The problem was in caches that's managed by Controller Runtime framework used in Che Operator. It caches all objects of the kind if the operator gets at least one object. It had been worked fine, but when Che Operator was switched to watch all namespaces (this PR) it started to cache all objects in cluster. We spot the bug only on a big cluster.
To fix the issue we limit set of objects that the Operator caches. However, it is not possible to control caching of every single object as Controller Runtime framework doesn't allow it. Instead, it is possible to define a selector for each kind of objects that should be cached. To reach that we need a label on each object of kind we limit cache for (we do it for all standard kinds like deployment, service, etc., and do not need the label for namespaces and all CRs).
The label that should be present on all objects Operator interacts with is app.kubernetes.io/part-of=che.eclipse.org.
After applying the fix, the memory usage for ~5000 namespace cluster dropped from ~1.2 GiB (with peaks to ~3.3 GiB) to just ~ 30 MiB:

Screenshot from 2021-11-16 12-41-21

Also, migration is implemented, so update should be seamless for existing Che installations.

@nickboldt
Copy link
Contributor

sync'd to Red Hat JIRA https://issues.redhat.com/browse/CRW-2549

@max-cx max-cx removed the status/release-notes-review-needed Issues that needs to be reviewed by the doc team for the Release Notes wording label Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/che-operator Issues and PRs related to Eclipse Che Kubernetes Operator kind/task Internal things, technical debt, and to-do tasks to be performed. new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants