New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SnapbackSM - processStateMachineOperation() function rewrite + peerset monitoring v0 #1469
Conversation
…et-calculation-monitoring
*/ | ||
async computeContentNodePeerSet (nodeUserInfoList) { | ||
computeContentNodePeerSet (nodeUserInfoList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of mapping three times plus a concat, you can just add directly to the set in one iteration
…573-peerset-monitoring
…573-peerset-monitoring
…573-peerset-monitoring
…573-peerset-monitoring
…573-peerset-monitoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great! Mostly minor cleanup stuff and renaming, nothing major. Reads so much better than the previous flow
@dmanjunath addressed all comments - easiest way to diff would be to look at the last 3 commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Description
What is the purpose of this PR? What is the current behavior? New behavior? Relevant links (e.g. Trello) and/or information pertaining to PR?
Add PeerSet Monitoring (skeleton code) to Snapback
processStateMachineOperation()
to check if any replicas in peerSet are unhealthycomputePeerHealth()
functionupdateReplicaSet()
function with a log to start gathering metricsRewrite core SnapbackSM logic (
processStateMachineOperation()
):Tests
List the manual tests and repro instructions to verify that this PR works as anticipated. Include log analysis if possible.
❗ If this change impacts clients, make sure that you have tested the clients ❗
snapbackDevModeEnabled()
(this forces all sync ops to go throughprocessStateMachineOperation()
)Testing unhealthy peer:
❗ Reminder 💡❗:
If this PR touches a critical flow (such as Indexing, Uploads, Gateway or the Filesystem), make sure to add the
requires-special-attention
label. Add relevant labels as necessary.