feat: skip retry on not found cluster #550

omegabytes · 2022-12-12T20:36:26Z

Skips retrying config reconciliation when the cluster does not exist.

GGabriele · 2022-12-13T14:42:42Z

Were you be able to reproduce the issue locally and see it solved with your change?

GGabriele · 2022-12-13T17:04:02Z

internal/server/kong/ws/manager.go

+			if s.Code() == codes.InvalidArgument &&
+				strings.Contains(s.Message(), "cluster not found") {
+				m.logger.Error("cluster not found, skipping retry")
+				err = backoff.Permanent(err)


Since this in only stopping the reconciliation retry (which is good), Koko will still have the nodes registered, as well as the streamer open and the pings being handled:

2022-12-13T17:55:37.987+0100 info ws/manager.go:297 websocket ping handler received hash {"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f", "client-ip": "127.0.0.1:52889", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542", "node-protocol": "ws", "node-hostname": "docker-desktop", "node-version": "2.8.0.0-enterprise-edition", "config_hash": "249455dfb4ba3489249455dfb4ba3489"} 2022-12-13T17:55:37.988+0100 error ws/manager.go:185 update kong node resource {"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f", "error": "rpc error: code = InvalidArgument desc = cluster not found: 4168295f-015e-4190-837e-0fcc5d72a52f", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542"}

I think we should also do some cleanup of the registered nodes and release of the streamer. This can be achieved with adding something like this:

for _, node := range m.nodes.All() { m.nodes.Remove(node) m.removeNode(node) }

This will cause the streamer to be released (along with its resources):

2022-12-13T17:50:35.794+0100 info ws/manager.go:405 no nodes connected, disabling stream {"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f"}

If the DP keeps attempting to connect, it will receive a 401 this time:

2022-12-13T17:50:41.086+0100 error ws/server.go:117 failed to authenticate DP node {"component": "control-server", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542", "node-hostname": "docker-desktop", "node-version": "2.8.0.0-enterprise-edition", "error": " (code 401)"}

@javierguerragiraldez @hbagdi WDYT?

m.removeNode(node) already calls m.nodes.Remove(node) but handles the error, which is only ever not found. Are you calling m.nodes.Remove(node) here and discarding the error purposefully? Should we tweak the behavior instead?

Ops, I guess wrong copy&paste on the double call. Calling m.removeNode(node) should be enough (please verify)

omegabytes requested a review from a team as a code owner December 12, 2022 20:36

skip retry on not found cluster

9c57d7c

omegabytes force-pushed the feat/enchance-config-reconciliation branch from e05ce1e to 9c57d7c Compare December 12, 2022 20:39

javierguerragiraldez previously approved these changes Dec 12, 2022

View reviewed changes

GGabriele reviewed Dec 13, 2022

View reviewed changes

add mocks and test for updateEventHandler

5cfe86f

omegabytes dismissed javierguerragiraldez’s stale review via 5cfe86f December 16, 2022 01:03

omegabytes closed this Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: skip retry on not found cluster #550

feat: skip retry on not found cluster #550

omegabytes commented Dec 12, 2022

GGabriele commented Dec 13, 2022

GGabriele Dec 13, 2022 •

edited

Loading

omegabytes Dec 13, 2022

GGabriele Dec 13, 2022

feat: skip retry on not found cluster #550

feat: skip retry on not found cluster #550

Conversation

omegabytes commented Dec 12, 2022

GGabriele commented Dec 13, 2022

GGabriele Dec 13, 2022 • edited Loading

Choose a reason for hiding this comment

omegabytes Dec 13, 2022

Choose a reason for hiding this comment

GGabriele Dec 13, 2022

Choose a reason for hiding this comment

GGabriele Dec 13, 2022 •

edited

Loading