Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

antrea-mc-controller no longer deploys successfully #6149

Closed
antoninbas opened this issue Mar 26, 2024 · 0 comments · Fixed by #6150
Closed

antrea-mc-controller no longer deploys successfully #6149

antoninbas opened this issue Mar 26, 2024 · 0 comments · Fixed by #6150
Assignees
Labels
area/multi-cluster Issues or PRs related to multi cluster. kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@antoninbas
Copy link
Contributor

Describe the bug
The antrea-mc-controller never becomes ready. When looking at the logs:

I0326 18:43:08.528590       1 leader.go:124] "Leader MC Controller Starting Manager"
I0326 18:43:08.529271       1 server.go:185] "Starting metrics server" logger="controller-runtime.metrics"
I0326 18:43:08.529547       1 stale_controller.go:97] "Starting StaleResCleanupController"
E0326 18:43:08.530113       1 stale_controller.go:71] "Fail to get MemberClusterAnnounces" err="the cache is not started, can not read objects"
I0326 18:43:08.530113       1 server.go:224] "Serving metrics server" logger="controller-runtime.metrics" bindAddress=":8080" secure=false
I0326 18:43:08.531109       1 server.go:191] "Starting webhook server" logger="controller-runtime.webhook"
W0326 18:43:08.538798       1 reflector.go:539] pkg/mod/k8s.io/client-go@v0.29.2/tools/cache/reflector.go:229: failed to list *v1alpha1.ResourceExport: resourceexports.multicluster.crd.antrea.io is forbidden: User "system:serviceaccount:antrea-multicluster:antrea-mc-controller" cannot list resource "resourceexports" in API group "multicluster.crd.antrea.io" at the cluster scope
E0326 18:43:08.539779       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.29.2/tools/cache/reflector.go:229: Failed to watch *v1alpha1.ResourceExport: failed to list *v1alpha1.ResourceExport: resourceexports.multicluster.crd.antrea.io is forbidden: User "system:serviceaccount:antrea-multicluster:antrea-mc-controller" cannot list resource "resourceexports" in API group "multicluster.crd.antrea.io" at the cluster scope
W0326 18:43:08.541011       1 reflector.go:539] pkg/mod/k8s.io/client-go@v0.29.2/tools/cache/reflector.go:229: failed to list *v1alpha1.MemberClusterAnnounce: memberclusterannounces.multicluster.crd.antrea.io is forbidden: User "system:serviceaccount:antrea-multicluster:antrea-mc-controller" cannot list resource "memberclusterannounces" in API group "multicluster.crd.antrea.io" at the cluster scope
E0326 18:43:08.541105       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.29.2/tools/cache/reflector.go:229: Failed to watch *v1alpha1.MemberClusterAnnounce: failed to list *v1alpha1.MemberClusterAnnounce: memberclusterannounces.multicluster.crd.antrea.io is forbidden: User "system:serviceaccount:antrea-multicluster:antrea-mc-controller" cannot list resource "memberclusterannounces" in API group "multicluster.crd.antrea.io" at the cluster scope
W0326 18:43:09.988975       1 reflector.go:539] pkg/mod/k8s.io/client-go@v0.29.2/tools/cache/reflector.go:229: failed to list *v1alpha1.ResourceExport: resourceexports.multicluster.crd.antrea.io is forbidden: User "system:serviceaccount:antrea-multicluster:antrea-mc-controller" cannot list resource "resourceexports" in API group "multicluster.crd.antrea.io" at the cluster scope
E0326 18:43:09.989068       1 reflector.go:147] pkg/mod/k8s.io/client-go@v0.29.2/tools/cache/reflector.go:229: Failed to watch *v1alpha1.ResourceExport: failed to list *v1alpha1.ResourceExport: resourceexports.multicluster.crd.antrea.io is forbidden: User "system:serviceaccount:antrea-multicluster:antrea-mc-controller" cannot list resource "resourceexports" in API group "multicluster.crd.antrea.io" at the cluster scope

This is causing the jenkins-e2e-multicluster CI job to fail.
I am still investigating, but I think that #5843 may have caused this regression.

To Reproduce
Create a K8s cluster, and use the latest version (non-released) of all Antrea images: antrea/antrea-controller-ubuntu:latest antrea/antrea-agent-ubuntu:latest antrea/antrea-mc-controller:latest.

helm install -n kube-system antrea build/charts/antrea/ --set featureGates.Multicluster=true --set multicluster.enableGateway=true
kubectl create ns antrea-multicluster
antctl mc deploy leadercluster -n antrea-multicluster

Expected
The antrea-mc-controller Pod becomes Ready after a short while.

Actual behavior
The antrea-mc-controller Pod never becomes Ready and hence keeps getting restarted by K8s.

Versions:
Only affects top-of-tree (latest), not any released Antrea version.

@antoninbas antoninbas added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/multi-cluster Issues or PRs related to multi cluster. labels Mar 26, 2024
@antoninbas antoninbas self-assigned this Mar 26, 2024
antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 26, 2024
A few issues were introduced by antrea-io#5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Fixes antrea-io#6149

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 26, 2024
A few issues were introduced by antrea-io#5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Fixes antrea-io#6149

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/antrea that referenced this issue Mar 26, 2024
A few issues were introduced by antrea-io#5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Fixes antrea-io#6149

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit that referenced this issue Mar 27, 2024
A few issues were introduced by #5843 because of changes in the
sigs.k8s.io/controller-runtime interface.

The biggest issue was that the call to ctrl.NewManager was not using the
Options object populated earlier in the setupManagerAndCertController
function. Instead it was creating and using a new, incomplete Options
object.

Additionally, the decoder is no longer injected automatically, it needs to be
instantiated by us. Otherwise the admission webhook panics.
See kubernetes-sigs/controller-runtime#2695

Fixes #6149

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster. kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant