Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Spikes on Openshift with unusual operator behaviour #3441

Closed
Mohid-A opened this issue Jul 12, 2022 · 8 comments
Closed

CPU Spikes on Openshift with unusual operator behaviour #3441

Mohid-A opened this issue Jul 12, 2022 · 8 comments
Labels
area/operator kind/bug Something isn't working
Milestone

Comments

@Mohid-A
Copy link

Mohid-A commented Jul 12, 2022

Hi Community,

We are running into an issue when the camel-k operator restarts when we have more than four integrations running on OCP(version mentioned below). Upon restart, the operator keeps on reconciling the integrations continuously which causes CPU spikes on the master node, also resulting in the latency on the kube-api server. The logs for the issue is mentioned below,

VERSION

Camel-k-operator 1.9.1
Camel K Client 1.9.1
OCP 4.9.37Using Kubernetes 1.22

Command to produce the Issue

kamel --kube-config=$QA_KUBECONFIG run $APP(integration file variable) --trait container.enabled=$ENABLED --trait container.request-cpu=$REQUESTCPU --trait container.request-memory=$REQUESTMEMORY --trait container.limit-cpu=$LIMITCPU --trait container.limit-memory=$LIMITMEMORY --trait jvm.options=-Doracle.jdbc.timezoneAsRegion=false --pod-template $PVC2 --config secret:$SECRET --config configmap:$CONFIGMAP -t logging.level=DEBUG

Error Log

{"level":"info","ts":1657633698.0446534,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.0447812,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.3031335,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.3032014,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.5536115,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification"}
{"level":"info","ts":1657633698.5536752,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"send-email-notification","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"send-email-notification"}
{"level":"info","ts":1657633698.9342558,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633698.9343414,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1299353,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42"}
{"level":"info","ts":1657633699.1300168,"logger":"camel-k.controller.integration","msg":"Invoking action monitor","request-namespace":"esb-jpoller-qa","request-name":"appointments3plcn-wh42","api-version":"camel.apache.org/v1","kind":"Integration","ns":"esb-jpoller-qa","name":"appointments3plcn-wh42"}

Expected Behavior

We want the operator to be stable upon restart, as restarting has an impact on the platform and other workloads.

Thanks

@heiko-braun
Copy link

@christophd Have you encountered this before?

@squakez squakez added the kind/bug Something isn't working label Jul 12, 2022
@squakez
Copy link
Contributor

squakez commented Jul 12, 2022

Thanks for reporting the problem. I managed to replicate the issue on a local environment as well. Strangely this happens whenever there are more than a few integrations running (I tried with 5). If you stop the operator pod, as soon as it restarts, it tries reconciling all the running integrations for a few seconds repeatedly. In my case it stops after less than a minute, but it is worth to investigate and to see how to fix.

@Mohid-A
Copy link
Author

Mohid-A commented Jul 12, 2022

For us, we noticed when the operator pod restarts the reconciling does not stop, the only fix is we had to delete the running integrations and bring the count to 4 to stop this operator behavior

@gtata007
Copy link

Is this a CamelK operator issue or an environment(OpenShift) issue?.
If this is an Operator issue, can we have any other Camel K version (1.6.0 or 1.6.3 ) which might be stable on the OpenShift environment?

@heiko-braun
Copy link

If I remember correctly, @christophd and @astefanutti talked about this recently

@astefanutti
Copy link
Member

astefanutti commented Jul 13, 2022

If I understand the issue correctly, it is two folds:

1. All the Integration resources are reconciled upon the operator startup:

This is the standard operator behavior, i.e., all the managed resources are reconciled once, so any changes to their state, that could have occurred while the operator was down, are taken into account, so the system can achieve eventual consistency.
That indeed may cause a spike w.r.t. compute resources and API server requests. We could look into further tuning the client side QPS and Burst parameters that control API request throttling. These have been increased as part of #2814, but we could make them configurable.

2. The reconciliation goes on indefinitely:

This may be an occurence of the issue fixed by #3285, which has yet to be released in the upcoming 1.9.3 version.

@squakez
Copy link
Contributor

squakez commented Jul 13, 2022

Thanks for the feedback @astefanutti. I've just tested with 1.10-nightly and I confirm the indefinite reconciliation loop has been fixed:

camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.6277504,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it2"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702227.955836,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it3"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.074853,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it4"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.6067405,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it5"}
camel-k-operator-74d899c876-ns2zx camel-k-operator {"level":"info","ts":1657702228.981502,"logger":"camel-k.controller.integration","msg":"Reconciling Integration","request-namespace":"default","request-name":"it1"}

I am keeping this open until we do release officially both 1.10 and 1.9.3

@squakez squakez modified the milestones: 1.10.0, 1.9.3 Jul 13, 2022
@tadayosi tadayosi modified the milestones: 1.10.0, 1.11.0 Aug 25, 2022
@tadayosi
Copy link
Member

I mistakenly put it to 1.11.0. Moving it back to 1.10.0 as it can be closed once we release 1.10.0.

@oscerd oscerd modified the milestones: 1.10.0, 1.11.0 Sep 5, 2022
@oscerd oscerd closed this as completed Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants