Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

Closed
kian99 opened this issue Mar 28, 2024 · 5 comments · Fixed by #356
Closed

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

kian99 opened this issue Mar 28, 2024 · 5 comments · Fixed by #356

Comments

@kian99
Copy link

kian99 commented Mar 28, 2024

Bug Description

This bug report doesn't necessarily need a fix in Traefik but there is some behaviour which could be changed to improve an underlying Juju bug.

Currently (seen in Juju 3.1.6) when a Traefik unit is unable to connect to a Juju controller the LoadBalancer svc that Traefik created reverts to a ClusterIP as mentioned in this PR. In one observed cloud (Openstack/PS6) this triggers the existing load balancer on the cloud to be deleted and when the service is patched, returning the svc type from ClusterIP to LoadBalancer, the LB in the cloud no longer exists.

The current fix for this in the above scenario is to manually run kubectl patch svc cos-ingress -p '{"metadata":{"annotations":{"loadbalancer.openstack.org/load-balancer-id":null}}}' and have Traefik request a new LB.

Some relevant information on why this happens is available in this discussion but quoting the relevant bits

A controller restart will cause juju to rewrite/patch the resources it thinks it owns. Anything which updates those juju manahed resources from behind juju's back will cause trouble.

It seems in light of the above that the Traefik operator shouldn't be modifying the existing ClusterIP service that Juju creates but rather create a separate LoadBalancer resource with the same selector as the one currently used. This would prevent the svc from being updated when Juju "resets" things.

To avoid the above manual operation the operator could detect when the cloud is returning a 404 for the requested LB and clear the load-balancer-id annotation so that a new load balancer is automatically requested. Alternatively the existing behaviour may be desirable as it makes it easier to debug these issues on the cloud which are out of the operator's control.

To Reproduce

WIP

Environment

Juju: v3.1.6
traefik-k8s: latest/stable 166

Relevant log output

-

Additional context

Relevant Juju bug https://bugs.launchpad.net/juju/+bug/2059411

@kian99 kian99 changed the title Request new LoadBalancer when previous no longer exists Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc Mar 28, 2024
@kian99
Copy link
Author

kian99 commented Mar 28, 2024

Renaming/rewording this bug with new information.

@gregory-schiano
Copy link

FYI we're affected by this issue on our COS deployment, currently more or less every week we loose our Octavia LB and have to manually patch the service (remove the lb ID annotation) and make a DNS PR to change the IP and sometimes do a FW PR

@sed-i
Copy link
Contributor

sed-i commented Mar 28, 2024

Waiting for followup from juju team. @wallyworld

@tlm
Copy link

tlm commented Apr 2, 2024

Hi Everyone,

Thanks for providing the information. As from the Juju's teams perspective I agree with the current description of this issue in that the charm should not be modifying any resource made by Juju in the Kubernetes cluster. Juju operates in a reconciliation loop where it constantly drives the desired state back into the external system. By modifying our own resources we are going to end up in a situation where we ping pong the change around.

The better approach would be for the charm to provision their own service. Juju will detect this and also clean up this service on behalf of the charm when the charm is removed from a controller.

In an ideal world we very much understand that we need to model ingress and load balancers into Juju and that will give everyone the best of both worlds.

@wallyworld
Copy link

Juju used to allow an application to be deployed such that the type of k8s service created for it could be specified. Unfortunately the transition to sidecar charms saw that capability go away. We have at carious times internally discussed ideas around allowing resources created by Juju to be patched, but as Tom says, this is dangerous and would be a last resort that should be avoided if there's a viable alterative. We really do want to properly model ingress and other missing aspects of the network model offered by Juju, but that's a way off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants