Various fixes.#5
Merged
Merged
Conversation
e89f98d to
748254c
Compare
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
The heartbeat function was previously part of the dqlite dialFunc, but since the heartbeat uses the dqlite client helpers, which results in more calls to the dialFunc, this basically creates an infinite loop. With the heartbeat lock, this doesn't pose any functional issue but I've noticed my CPU spiking to 600% during heartbeats, which is certainly not good! This commit instead adds a loopHeartbeat helper which calls the hearbeat function once per second. This helper is called once the database is opened. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
… be sent If we are in the middle of an update, it's possible some node is unreachable so don't fail if we can't send that node a notification that we have updated our schema. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
The tcp connection can't be reused unless the body is fully read and closed, so this explicitly makes sure of that for each returned response. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
I've encountered an odd issue where sometimes the dqlite client helpers will lock up, so this adds a timeout context of 5 seconds (like the client context). Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
This properly sets the amount of time to sleep between heartbeat attempts to a minimum of 2 seconds and a maximum of half the client HeartbeatTimeout. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
… data cluster.Query is called with concurrent=true, so access to the hbInfo map can cause a panic if two threads access it at once. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Additionally close the database on stop. Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Signed-off-by: Max Asnaashari <max.asnaashari@canonical.com>
Contributor
Author
|
This is a bit longer than 8 commits now as I've added the shutdown command as well. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #4
This adds several bug fixes that I noticed upon more nuanced testing:
Most importantly, this moves initiating the heartbeat out of the dqlite dialFunc. Calls to the dqlite client methods (to check the leader and cluster status during a heartbeat) will initiate further calls to the dialFunc, which calls more heartbeats, and on and on. I noticed my CPU spiking up to 600% from getting slammed with heartbeat requests. The corresponding commit here adds a helper that calls the heartbeat function every second, and is now set up to run in a goroutine upon opening the database.
Auto-schema-update logic was missing from the heartbeat sequence. Now there is an environment variable:
SCHEMA_UPDATEthat will be checked for an auto-update executable.Adds a 5s context timeout for calls to the dqlite client since they can be blocking in some cases. I'm unable to replicate this frequently enough (It seems to happen randomly as far as I can tell) to be able to test how exactly this works, but I noticed that very occasionally, attempting to check if we are the dqlite leader froze the daemon. If this happens on the leader then no heartbeats will be sent until the leader is reloaded.
Ensures all API responses are fully read and closed so we can reuse the connection.
Prevents erroring out during reload if other cluster members are still updating
The heartbeat timeout logic was incorrect, and is now correctly sleeping a maximum of half the client context timeout for heartbeats (
HeartbeatTimeout), and a minimum of 2 seconds, as the other cluster members will report their heartbeat times a bit ahead of the leader).Adds some debug logs for the heartbeat.