Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow-Controller panic: can't create an event with namespace 'default' in namespace 'argo'' #7613

Closed
shuker85 opened this issue Jan 21, 2022 · 9 comments · Fixed by #8020
Labels
area/controller Controller issues, panics type/bug

Comments

@shuker85
Copy link
Contributor

Summary

I've noticed a unexpected restart of the pod

What happened/what you expected to happen?
This is instance managed by ArgoCD instance and currently not being actively used

What version is it broken in?
v3.2.6
What version was it working in?

Diagnostics

What Kubernetes provider are you using?
GKE on 1.21.5-gke.1802

What executor are you running? Docker/K8SAPI/Kubelet/PNS/Emissary

Emissary

...
time="2022-01-21T07:54:25.370Z" level=info msg="Update leases 200"
time="2022-01-21T07:54:29.124Z" level=info msg="List workflows 200"
time="2022-01-21T07:54:29.124Z" level=info msg=healthz age=5m0s err="<nil>" instanceID= labelSelector="!workflows.argoproj.io/phase,!workflows.argoproj.io/controller-instanceid" managedNamespace=
time="2022-01-21T07:54:30.377Z" level=info msg="Get leases 200"
time="2022-01-21T07:54:30.385Z" level=info msg="Update leases 200"
time="2022-01-21T07:54:35.431Z" level=info msg="Get leases 200"
time="2022-01-21T07:54:35.442Z" level=info msg="Update leases 200"
time="2022-01-21T07:54:48.550Z" level=info msg="Get leases 500"
E0121 07:54:48.551558       1 leaderelection.go:325] error retrieving resource lock argo/workflow-controller: etcdserver: request timed out
I0121 07:54:50.443189       1 leaderelection.go:278] failed to renew lease argo/workflow-controller: timed out waiting for the condition
E0121 07:54:50.443268       1 leaderelection.go:301] Failed to release lock: resource name may not be empty
time="2022-01-21T07:54:50.443Z" level=info msg="stopped leading" id=argo-workflow-controller-7797b9c9d5-p48p5
E0121 07:54:50.443353       1 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".16cc39f57309758c", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Lease", Namespace:"", Name:"", UID:"", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"", FieldPath:""}, Reason:"LeaderElection", Message:"argo-workflow-controller-7797b9c9d5-p48p5 stopped leading", Source:v1.EventSource{Component:"workflow-controller", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc072b7929a6a118c, ext:542583113832237, loc:(*time.Location)(0x3346660)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc072b7929a6a118c, ext:542583113832237, loc:(*time.Location)(0x3346660)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'can't create an event with namespace 'default' in namespace 'argo''(may retry after sleeping)
panic: http: Server closed

goroutine 148 [running]:
github.com/argoproj/argo-workflows/v3/workflow/metrics.runServer.func1(0x1, 0x21eabe1, 0x8, 0x2382, 0x0, 0x0, 0xc0007bc0e0)
	/go/src/github.com/argoproj/argo-workflows/workflow/metrics/server.go:56 +0x11d
created by github.com/argoproj/argo-workflows/v3/workflow/metrics.runServer
	/go/src/github.com/argoproj/argo-workflows/workflow/metrics/server.go:53 +0x27e

Message from the maintainers:

Impacted by this regression? Give it a 👍. We prioritise the issues with the most 👍.

@shuker85 shuker85 added type/bug type/regression Regression from previous behavior (a specific type of bug) triage labels Jan 21, 2022
@sarabala1979
Copy link
Member

@shuker85 how do you install argo workflow?

@sarabala1979
Copy link
Member

kubernetes-sigs/kind#717

@shuker85
Copy link
Contributor Author

Helm chart handled by app-of-apps pattern

@alexec
Copy link
Contributor

alexec commented Jan 26, 2022

Leader election is confused somehow, thinking it is is the wrong namespace.

Are you using managed namespace feature where controller is in different namespace to workflows?

@emenendez
Copy link

Hi! I'm also running into this error. In my case, I've installed Argo Workflows as part of Kubeflow, and yes, our workflows are in different namespaces from the controller. Can you suggest any other steps I can try to troubleshoot? Thank you!

@alexec alexec reopened this Feb 25, 2022
@alexec
Copy link
Contributor

alexec commented Feb 25, 2022

please add stack trace

@emenendez
Copy link

Thanks @alexec!

I0224 06:27:16.729930       1 leaderelection.go:278] failed to renew lease kubeflow/workflow-controller: timed out waiting for the condition
E0224 06:27:16.730011       1 leaderelection.go:301] Failed to release lock: resource name may not be empty
time="2022-02-24T06:27:16.730Z" level=info msg="stopped leading" id=workflow-controller-f78bdfff-8ntb2
E0224 06:27:16.730072       1 event.go:273] Unable to write event: 'can't create an event with namespace 'default' in namespace 'kubeflow'' (may retry after sleeping)
time="2022-02-24T06:27:16.730Z" level=info msg="Shutting workflow TTL worker"
panic: http: Server closed

goroutine 235 [running]:
github.com/argoproj/argo-workflows/v3/workflow/metrics.runServer.func1(0x1, 0x1eeb199, 0x8, 0x2382, 0x0, 0x0, 0xc00aa43500)
	/go/src/github.com/argoproj/argo-workflows/workflow/metrics/server.go:56 +0x117
created by github.com/argoproj/argo-workflows/v3/workflow/metrics.runServer
	/go/src/github.com/argoproj/argo-workflows/workflow/metrics/server.go:53 +0x27e

@alexec
Copy link
Contributor

alexec commented Feb 25, 2022

Are you using --managed-namespace?

@alexec alexec removed the type/regression Regression from previous behavior (a specific type of bug) label Feb 25, 2022
@alexec
Copy link
Contributor

alexec commented Feb 25, 2022

The panic is a red herring. It is caused by the shutdown, but will not have impact. Any fix for that would go here, but I think it is just annoying, so not needed:

mux := http.NewServeMux()

The event issue is coming from here, but I can't see the problem:

go leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{

There is probably a bug in client-go upstream library:

kubernetes/client-go#966

I think all we need to do is this:

	ReleaseOnCancel: false,

@shuker85 would you like to submit a PR?

shuker85 added a commit to shuker85/argo-workflows that referenced this issue Feb 27, 2022
Should fix argoproj#7613

Signed-off-by: Shyukri Shyukriev <shyukri.shyukriev@mariadb.com>
alexec pushed a commit that referenced this issue Feb 28, 2022
Signed-off-by: Shyukri Shyukriev <shyukri.shyukriev@mariadb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants