Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem on updating the external secret for s390x-knative #1452

Closed
BbolroC opened this issue Feb 6, 2022 · 7 comments
Closed

Problem on updating the external secret for s390x-knative #1452

BbolroC opened this issue Feb 6, 2022 · 7 comments

Comments

@BbolroC
Copy link
Contributor

BbolroC commented Feb 6, 2022

This is the same issue as reported in #1441

Since 2022-02-06 07:30:00 UTC, an update on the following external secrets has not been working.

https://github.com/GoogleCloudPlatform/oss-test-infra/blob/master/prow/knative/cluster/build/600-kubernetes_external_secrets.yaml

@cjwagner Could you look into this again?

@cjwagner
Copy link
Member

cjwagner commented Feb 7, 2022

Looks like KES just stopped again for some reason. The log output right before it gets stuck is different from normal operation:

{"level":30,"message_time":"2022-02-06T07:10:29.761Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"starting poller for test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:10:39.751Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"running poll on the secret test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:10:39.767Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"fetching secret kubeconfig from GCP Secret for project s390x-knative with version latest"}
{"level":30,"message_time":"2022-02-06T07:10:39.767Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"fetching secret ko-docker-repository from GCP Secret for project s390x-knative with version latest"}
{"level":30,"message_time":"2022-02-06T07:10:39.767Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"fetching secret registry-certificate from GCP Secret for project s390x-knative with version latest"}
{"level":30,"message_time":"2022-02-06T07:10:39.767Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"fetching secret knative01-ssh from GCP Secret for project s390x-knative with version latest"}
{"level":30,"message_time":"2022-02-06T07:10:39.767Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"fetching secret docker-config from GCP Secret for project s390x-knative with version latest"}
{"level":30,"message_time":"2022-02-06T07:10:39.797Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"getting secret test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:10:39.808Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"updating secret test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:10:39.844Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"stopping poller for test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:10:39.845Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"starting poller for test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:10:43.284Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"Stopping watch stream for namespace * due to event: END"}
{"level":30,"message_time":"2022-02-06T07:10:43.288Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"stopping poller for test-pods/s390x-cluster1"}
{"level":30,"message_time":"2022-02-06T07:11:43.289Z","pid":18,"hostname":"kubernetes-external-secrets-5487fc7656-cwf77","payload":{},"msg":"No watch event for 60000 ms, restarting watcher for *"}

It looks like KES in the service cluster is experiencing similar issues:

{"level":30,"message_time":"2022-01-27T00:08:22.938Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"starting poller for prow-monitoring/grafana"}
{"level":30,"message_time":"2022-01-27T00:08:32.927Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"running poll on the secret prow-monitoring/grafana"}
{"level":30,"message_time":"2022-01-27T00:08:32.936Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"fetching secret knative-prow__prow-monitoring__grafana from GCP Secret for project knative-tests with version latest"}
{"level":30,"message_time":"2022-01-27T00:08:32.959Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"getting secret prow-monitoring/grafana"}
{"level":30,"message_time":"2022-01-27T00:08:32.967Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"skipping secret prow-monitoring/grafana upsert, objects are the same"}
{"level":30,"message_time":"2022-01-27T00:08:32.976Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"stopping poller for prow-monitoring/grafana"}
{"level":30,"message_time":"2022-01-27T00:08:32.976Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"starting poller for prow-monitoring/grafana"}
{"level":30,"message_time":"2022-01-27T00:08:37.252Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"Stopping watch stream for namespace * due to event: END"}
{"level":30,"message_time":"2022-01-27T00:08:37.255Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"stopping poller for prow-monitoring/grafana"}
{"level":30,"message_time":"2022-01-27T00:09:37.255Z","pid":18,"hostname":"kubernetes-external-secrets-85b98f665c-r26tz","payload":{},"msg":"No watch event for 60000 ms, restarting watcher for *"}

Looking at the upstream repo, this appears to be an issue others are experiencing as well and the only solution being recommended is migrating to an alternative tool. external-secrets/kubernetes-external-secrets#826 (comment)

I'll restart the pods for now, but we may need to look into external-secrets/kubernetes-external-secrets#864. It is strange that we've only seen this affect the Knative Prow instance.

@BbolroC
Copy link
Contributor Author

BbolroC commented Feb 7, 2022

Thanks for looking into the issue and restarting the pod. I will try to figure out what makes KES operations on s390x-cluster1 different from others tomorrow. Thank you. 😉

@BbolroC
Copy link
Contributor Author

BbolroC commented Feb 8, 2022

No meaningful findings for the solution.

Is it possible to get the operator scheduled for restart on a regular basis during the time period when it is least used? (until the migration to ESO is done)

Or to place a monitor for a metric kubernetes_external_secrets_sync_calls_count (https://github.com/external-secrets/kubernetes-external-secrets#metrics, and see external-secrets/kubernetes-external-secrets#362 (comment)) and restart the pod when the metric does not increase any more.

Any suggestion from your side to get around this problem for a while would be welcome!

@BbolroC
Copy link
Contributor Author

BbolroC commented Feb 28, 2022

@cjwagner Since around at 23:00 on the 25th of Feb, KES for s390x-knative has not been updated. Could you restart the pod again? At this time, I will restructure my code to make our CI working independent of whether KES works or not by the next failure. Thank you.

@cjwagner
Copy link
Member

According to the logs KES is updating the s390x-cluster1 secret successfully (there is no s390x-knative secret):

{"level":30,"message_time":"2022-02-28T19:25:20.366Z","pid":19,"hostname":"kubernetes-external-secrets-5487fc7656-chsv6","payload":{},"msg":"updating secret test-pods/s390x-cluster1"}

I'll restart the pod, but I wouldn't expect a change in behavior this time since the KES pod is not stuck like it was before.
FWIW we are currently planning to replace KES with an alternative next month.

@BbolroC
Copy link
Contributor Author

BbolroC commented Feb 28, 2022

oh, good to know. Thanks for restarting the pod. Based on the log (today's) you have showed me, it looks everything works fine. 😉 But the log on the 25th from my side, the secret was not updated (maybe instant failure? 🤔 ). Then I stopped updating the secret. I should have kept it updated until you look into the pod. Sorry.

Again, thanks a lot! have a great start of the week!

@BbolroC
Copy link
Contributor Author

BbolroC commented Mar 18, 2022

I will close this issue because the implementation to make CI working independent of whether KES works has been finished.

@BbolroC BbolroC closed this as completed Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants