Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set node callback each time we reinit the coordinator in servertailnet #12140

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Feb 14, 2024

I think this will resolve #12136 but lets get a proper test at the system level before closing.

Before this change, we only register the node callback at start of day for the server tailnet. If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord.

The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator.

If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback.

This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator. Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

@spikecurtis spikecurtis marked this pull request as ready for review February 14, 2024 16:35
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I wonder if the test failure is related, though?

    t.go:108: 2024-02-14 16:34:18.608 [erro]  pgcoord: failed to write binding to database  coordinator_id=d6a85ffb-78b2-4c20-b37a-201b43f7d658  binding_id="[74 174 236 19 74 52 64 134 184 243 51 11 140 133 243 253]"  node="id:2386665641149345937  as_of:{seconds:1707928458  nanos:568114000}  key:\"np\\x8a\\xe4\\x1a\\xebv\\xcc=\\x07\\xe6M\\xf5\\x05\\x84\\x8c\\n\\x99\\x91\\xfb\\xd2ҳ\\x1eY\\x1a\\xeeg\\xce\\x19q\\xf5+B\"  disco:\"discokey:d9c88cb39f04aa14298f300bd8871b1b9916256d3a86b91a73652e1e5229fb13\"  preferred_derp:999  derp_latency:{key:\"999-v4\"  value:0.00029301}  addresses:\"fd7a:115c:a1e0:4594:bb1c:bb6b:af5c:27e7/128\"  allowed_ips:\"fd7a:115c:a1e0:4594:bb1c:bb6b:af5c:27e7/128\"  endpoints:\"127.0.0.1:53738\"  endpoints:\"172.17.0.1:53738\"  endpoints:\"192.168.100.229:53738\""  error="pq: insert or update on table \"tailnet_peers\" violates foreign key constraint \"tailnet_peers_coordinator_id_fkey\""

@spikecurtis spikecurtis merged commit 04991f4 into main Feb 14, 2024
28 checks passed
@spikecurtis spikecurtis deleted the spike/12136-fix-node-callback-servertailnet branch February 14, 2024 16:45
Copy link
Contributor Author

Merge activity

@github-actions github-actions bot locked and limited conversation to collaborators Feb 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coder web terminal and apps not loading on big
4 participants