Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

kian99 · 2024-03-28T11:25:20Z

Bug Description

~~This bug report doesn't necessarily need a fix in Traefik but there is some behaviour which could be changed to improve an underlying Juju bug.~~

Currently (seen in Juju 3.1.6) when a Traefik unit is unable to connect to a Juju controller the LoadBalancer svc that Traefik created reverts to a ClusterIP as mentioned in this PR. In one observed cloud (Openstack/PS6) this triggers the existing load balancer on the cloud to be deleted and when the service is patched, returning the svc type from ClusterIP to LoadBalancer, the LB in the cloud no longer exists.

The current fix for this in the above scenario is to manually run kubectl patch svc cos-ingress -p '{"metadata":{"annotations":{"loadbalancer.openstack.org/load-balancer-id":null}}}' and have Traefik request a new LB.

Some relevant information on why this happens is available in this discussion but quoting the relevant bits

A controller restart will cause juju to rewrite/patch the resources it thinks it owns. Anything which updates those juju manahed resources from behind juju's back will cause trouble.

It seems in light of the above that the Traefik operator shouldn't be modifying the existing ClusterIP service that Juju creates but rather create a separate LoadBalancer resource with the same selector as the one currently used. This would prevent the svc from being updated when Juju "resets" things.

To avoid the above manual operation the operator could detect when the cloud is returning a 404 for the requested LB and clear the load-balancer-id annotation so that a new load balancer is automatically requested. Alternatively the existing behaviour may be desirable as it makes it easier to debug these issues on the cloud which are out of the operator's control.

To Reproduce

WIP

Environment

Juju: v3.1.6
traefik-k8s: latest/stable 166

Relevant log output

Additional context

Relevant Juju bug https://bugs.launchpad.net/juju/+bug/2059411

The text was updated successfully, but these errors were encountered:

kian99 · 2024-03-28T12:08:20Z

Renaming/rewording this bug with new information.

gregory-schiano · 2024-03-28T14:14:48Z

FYI we're affected by this issue on our COS deployment, currently more or less every week we loose our Octavia LB and have to manually patch the service (remove the lb ID annotation) and make a DNS PR to change the IP and sometimes do a FW PR

sed-i · 2024-03-28T14:31:50Z

Waiting for followup from juju team. @wallyworld

tlm · 2024-04-02T06:53:52Z

Hi Everyone,

Thanks for providing the information. As from the Juju's teams perspective I agree with the current description of this issue in that the charm should not be modifying any resource made by Juju in the Kubernetes cluster. Juju operates in a reconciliation loop where it constantly drives the desired state back into the external system. By modifying our own resources we are going to end up in a situation where we ping pong the change around.

The better approach would be for the charm to provision their own service. Juju will detect this and also clean up this service on behalf of the charm when the charm is removed from a controller.

In an ideal world we very much understand that we need to model ingress and load balancers into Juju and that will give everyone the best of both worlds.

wallyworld · 2024-04-03T07:36:30Z

Juju used to allow an application to be deployed such that the type of k8s service created for it could be specified. Unfortunately the transition to sidecar charms saw that capability go away. We have at carious times internally discussed ideas around allowing resources created by Juju to be patched, but as Tom says, this is dangerous and would be a last resort that should be avoided if there's a viable alterative. We really do want to properly model ingress and other missing aspects of the network model offered by Juju, but that's a way off.

kian99 added Status: Triage Type: Bug labels Mar 28, 2024

kian99 changed the title ~~Request new LoadBalancer when previous no longer exists~~ Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc Mar 28, 2024

IbraAoad mentioned this issue May 10, 2024

KubernetesServicePatch Add support for creating a new K8s service canonical/observability-libs#90

Merged

IbraAoad mentioned this issue May 29, 2024

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #356

Merged

IbraAoad closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

kian99 commented Mar 28, 2024 •

edited

Loading

kian99 commented Mar 28, 2024

gregory-schiano commented Mar 28, 2024

sed-i commented Mar 28, 2024

tlm commented Apr 2, 2024

wallyworld commented Apr 3, 2024

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

Create new LoadBalancer svc instead of modifying the Juju ClusterIP svc #319

Comments

kian99 commented Mar 28, 2024 • edited Loading

Bug Description

To Reproduce

Environment

Relevant log output

Additional context

kian99 commented Mar 28, 2024

gregory-schiano commented Mar 28, 2024

sed-i commented Mar 28, 2024

tlm commented Apr 2, 2024

wallyworld commented Apr 3, 2024

kian99 commented Mar 28, 2024 •

edited

Loading