-
Notifications
You must be signed in to change notification settings - Fork 4
Conversation
e05ce1e
to
9c57d7c
Compare
Were you be able to reproduce the issue locally and see it solved with your change? |
if s.Code() == codes.InvalidArgument && | ||
strings.Contains(s.Message(), "cluster not found") { | ||
m.logger.Error("cluster not found, skipping retry") | ||
err = backoff.Permanent(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this in only stopping the reconciliation retry (which is good), Koko will still have the nodes registered, as well as the streamer
open and the pings being handled:
2022-12-13T17:55:37.987+0100 info ws/manager.go:297 websocket ping handler received hash {"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f", "client-ip": "127.0.0.1:52889", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542", "node-protocol": "ws", "node-hostname": "docker-desktop", "node-version": "2.8.0.0-enterprise-edition", "config_hash": "249455dfb4ba3489249455dfb4ba3489"}
2022-12-13T17:55:37.988+0100 error ws/manager.go:185 update kong node resource {"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f", "error": "rpc error: code = InvalidArgument desc = cluster not found: 4168295f-015e-4190-837e-0fcc5d72a52f", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542"}
I think we should also do some cleanup of the registered nodes and release of the streamer
. This can be achieved with adding something like this:
for _, node := range m.nodes.All() {
m.nodes.Remove(node)
m.removeNode(node)
}
This will cause the streamer
to be released (along with its resources):
2022-12-13T17:50:35.794+0100 info ws/manager.go:405 no nodes connected, disabling stream {"cluster-id": "4168295f-015e-4190-837e-0fcc5d72a52f"}
If the DP keeps attempting to connect, it will receive a 401 this time:
2022-12-13T17:50:41.086+0100 error ws/server.go:117 failed to authenticate DP node {"component": "control-server", "node-id": "bcd75180-95ed-457d-af5f-0a08e147a542", "node-hostname": "docker-desktop", "node-version": "2.8.0.0-enterprise-edition", "error": " (code 401)"}
@javierguerragiraldez @hbagdi WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m.removeNode(node)
already calls m.nodes.Remove(node)
but handles the error, which is only ever not found
. Are you calling m.nodes.Remove(node)
here and discarding the error purposefully? Should we tweak the behavior instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops, I guess wrong copy&paste on the double call. Calling m.removeNode(node)
should be enough (please verify)
Skips retrying config reconciliation when the cluster does not exist.