Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use custom cache function in Che Operator #1166

Merged
merged 21 commits into from
Nov 19, 2021
Merged

feat: Use custom cache function in Che Operator #1166

merged 21 commits into from
Nov 19, 2021

Conversation

mmorhun
Copy link
Contributor

@mmorhun mmorhun commented Nov 5, 2021

What does this PR do?

This PR introduces custom cache function for the default k8s client of the Operator. This is required, because default cache function caches every object of the kind cluster-wide that the Operator ever reads. Such behavior ends up in a huge memory usage on a cluster with many resources. Consequently, the Operator is terminated by OOM.

However, Operator's controller runtime has own restrictions on the cache function: every kind of k8s object is limited to a single selector that is used to detect if specific object should be cached. In turn, a selector cannot have or condition in it. And that's the price we have to pay: all k8s objects that are handled by the default Operator client should have a label (or set of them) set. Otherwise, Che Operator will not see the objects at all (unless a separate slow client is used).

After this PR is merged all k8s resources that Che Operator interacts with must have app.kubernetes.io/part-of=che.eclipse.org label.

The requirement above is also applicable to the user defined configurations in Config Maps and Secrets (such as custom TLS certificates, etc.). To not to break existing installations, Che Operator will search for such objects on its start and add the required label. This way existing deployments of Che will not be broken.

Screenshot/screencast of this PR

N/A

What issues does this PR fix or reference?

Resolves: eclipse-che/che#20647
Depends on: che-dockerfiles/che-tls-secret-creator#6

How to test this PR?

  1. Deploy Eclipse Che using operator installer.
  2. Edit Che Operator deployment by setting memory limit to 256Mi
  3. Create 500 namespaces: for i in {1..500}; do oc create namespace test$i; done
  4. Wait and check no OOM error in the Operator pod
  5. Restart Operator pod
  6. Wait and check no OOM error in the Operator pod

PR Checklist

As the author of this Pull Request I made sure that:

Reviewers

Reviewers, please comment how you tested the PR when approving it.

@tolusha
Copy link
Contributor

tolusha commented Nov 5, 2021

What about this label?
app.kubernetes.io/part-of: che.eclipse.org

main.go Show resolved Hide resolved
Copy link
Contributor

@amisevsk amisevsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real issues on my end but I don't have a deep insight into how the Che operator works.

pkg/deploy/migrate-labels.go Outdated Show resolved Hide resolved
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
…work migration.

Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2021

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm label Nov 19, 2021
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
Signed-off-by: Mykola Morhun <mmorhun@redhat.com>
@openshift-ci
Copy link

openshift-ci bot commented Nov 19, 2021

@mmorhun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/v7-devworkspace-happy-path 6e2b7f6 link true /test v7-devworkspace-happy-path

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate memory consumption of che-operator with growing number of namespaces on the cluster
6 participants