Skip to content
This repository has been archived by the owner on Jun 5, 2024. It is now read-only.

feat: skip retry on not found cluster #550

Closed
wants to merge 2 commits into from

Conversation

omegabytes
Copy link
Contributor

Skips retrying config reconciliation when the cluster does not exist.

@omegabytes omegabytes requested a review from a team as a code owner December 12, 2022 20:36
@omegabytes omegabytes force-pushed the feat/enchance-config-reconciliation branch from e05ce1e to 9c57d7c Compare December 12, 2022 20:39
@GGabriele
Copy link
Contributor

Were you be able to reproduce the issue locally and see it solved with your change?

if s.Code() == codes.InvalidArgument &&
strings.Contains(s.Message(), "cluster not found") {
m.logger.Error("cluster not found, skipping retry")
err = backoff.Permanent(err)
Copy link
Contributor

@GGabriele GGabriele Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this in only stopping the reconciliation retry (which is good), Koko will still have the nodes registered, as well as the streamer open and the pings being handled:

2022-12-13T17:55:37.987+0100	info	ws/manager.go:297	websocket ping handler received hash	{"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f", "client-ip": "127.0.0.1:52889", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542", "node-protocol": "ws", "node-hostname": "docker-desktop", "node-version": "2.8.0.0-enterprise-edition", "config_hash": "249455dfb4ba3489249455dfb4ba3489"}
2022-12-13T17:55:37.988+0100	error	ws/manager.go:185	update kong node resource	{"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f", "error": "rpc error: code = InvalidArgument desc = cluster not found: 4168295f-015e-4190-837e-0fcc5d72a52f", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542"}

I think we should also do some cleanup of the registered nodes and release of the streamer. This can be achieved with adding something like this:

                 for _, node := range m.nodes.All() {
					m.nodes.Remove(node)
					m.removeNode(node)
				}

This will cause the streamer to be released (along with its resources):

2022-12-13T17:50:35.794+0100	info	ws/manager.go:405	no nodes connected, disabling stream	{"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f"}

If the DP keeps attempting to connect, it will receive a 401 this time:

2022-12-13T17:50:41.086+0100	error	ws/server.go:117	failed to authenticate DP node	{"component": "control-server", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542", "node-hostname": "docker-desktop", "node-version": "2.8.0.0-enterprise-edition", "error": " (code 401)"}

@javierguerragiraldez @hbagdi WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m.removeNode(node) already calls m.nodes.Remove(node) but handles the error, which is only ever not found. Are you calling m.nodes.Remove(node) here and discarding the error purposefully? Should we tweak the behavior instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, I guess wrong copy&paste on the double call. Calling m.removeNode(node) should be enough (please verify)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants