Consume SecondarySyncRequestOutcomes in Snapback reconfig #1614

SidSethi · 2021-07-02T14:29:50Z

Description

What is the purpose of this PR? What is the current behavior? New behavior? Relevant links (e.g. Trello) and/or information pertaining to PR?

Previous PR built SecondarySyncRequestOutcomes tracking in redis.
This now consumes it in Snapback to more intelligently trigger reconfigs.

Tests

List the manual tests and repro instructions to verify that this PR works as anticipated. Include log analysis if possible.
❗ If this change impacts clients, make sure that you have tested the clients ❗

SecondarySyncRequestOutcome logging:

Executing (default): SELECT "cnodeUserUUID", "walletPublicKey", "lastLogin", "latestBlockNumber", "clock", "createdAt", "updatedAt" FROM "CNodeUsers" AS "CNodeUser" WHERE "CNodeUser"."walletPublicKey" = '0x67347bf69527853f1f1e349c0561ef3351770236';
[2021-07-09T17:19:54.377Z]  INFO: audius_creator_node/30 on 2617c5b0a12c: SnapbackSM: additionalSyncIsRequired (MANUAL): wallet 0xecd41512d8f558a76db3eabc69cab6e4399fa026 secondary http://cn4_creator-node_1:4003 primaryClock 14 secondaryClock 14
[2021-07-09T17:19:54.377Z]  INFO: audius_creator_node/30 on 2617c5b0a12c: SnapbackSM: additionalSyncIsRequired (MANUAL): wallet 0xecd41512d8f558a76db3eabc69cab6e4399fa026 secondary http://cn4_creator-node_1:4003 primaryClock 14 secondaryClock 14 || Sync completed in 9ms
[2021-07-09T17:19:54.378Z]  INFO: audius_creator_node/30 on 2617c5b0a12c: SecondarySyncHealthTracker:_recordSyncRequestOutcome || Recorded SecondarySyncRequestOutcomes-Daily:::http://cn4_creator-node_1:4003:::0xecd41512d8f558a76db3eabc69cab6e4399fa026:::MANUAL:::2021-07-09:::Success

SecondarySyncRequestOutcome success rate tracking:

[2021-07-09T17:20:00.407Z]  INFO: audius_creator_node/30 on 2617c5b0a12c: SnapbackSM: SIDTEST COMPUTEUSERSECONDARYSYNCSUCCESSRATES sec1UserSyncSuccessRate: 1 || sec2UserSyncSuccessRate: 1

❗ Reminder 💡❗:
If this PR touches a critical flow (such as Indexing, Uploads, Gateway or the Filesystem), make sure to add the requires-special-attention label. Add relevant labels as necessary.

How will this change be monitored?

For features that are critical or could fail silently please describe the monitoring/alerting being added.

creator-node/src/snapbackSM/snapbackSM.js

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

creator-node/src/snapbackSM/snapbackSM.js

creator-node/src/config.js

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

SidSethi · 2021-07-08T23:47:52Z

@dmanjunath refactored per discussion, pls re-review

creator-node/scripts/dev-server.sh

SidSethi · 2021-07-08T23:48:46Z

creator-node/default-config.json

@@ -10,6 +10,7 @@
  "logLevel": "debug",
  "redisHost": "localhost",
  "redisPort": 4379,
+


just re-ordering

SidSethi · 2021-07-08T23:49:24Z

creator-node/compose/env/base.env

+# Sync / SnapbackSM configs
+snapbackReconfigEnabled=true
+snapbackJobInterval=10000 # ms
+snapbackModuloBase=2


all re-ordering, except for the new minimumSecondaryUserSyncSuccessPercent on line 51

SidSethi · 2021-07-08T23:50:04Z

creator-node/src/config.js

-  },
-
-  // Rate limit configs
+  /**


all changes are just re-ordering + grouping, except for minimumSecondaryUserSyncSuccessPercent on line 559

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

vicky-g · 2021-07-09T03:35:29Z

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

-    }
-  },
-
+const Utils = {


im not particularly familiar with this code structure convention. what was the benefit of grouping functions into consts, and then reassigning them to SecondarySyncHealthTracker when exported?

just for logical separation, helps me understand the code separation a lot better. trying to avoid the situation in other classes where we have 20-30 functions all at the same level even though there is a clear hierarchical structure
makes no difference logically

I understand how this format possibly helps with clearly defining the hierarchy. however, i have a few points against this:

it's a little strange that in these conts, you have inner functions referencing the consts itself (e.x: in Utils const, the fn _getMetricsMatchingPattern() calls Utils._getAllKeysMatchingPattern(pattern)). looks like we could've just made this a static class and/or keep the prior commit version of this file

it's redundant to have outer consts like Setters and Getters to group the actual fns. getters/setters should be very clearly obvious via the function name and grouped structurally in the same area of the file.

this file introduces a new pattern that isn't around in the code base. I rather keep to current structures we have in the code base (no hierarchal structuring of fns, just arranging fns in the proper place and possibly introducing strong comments like // ========= GETTERS ========= to divide the file)

vicky-g · 2021-07-09T03:38:37Z

creator-node/src/snapbackSM/snapbackSM.js

+          /**
+           * If either secondary has a Sync success rate for user below threshold, add it to `unhealthyReplicas` list
+           */
+          const userSecondarySyncMetrics = await SecondarySyncHealthTracker.computeUserSecondarySyncSuccessRates(


can we move this logic into the peerSetManager file? It would be cleaner to have the peer health determination logic in this file as I have moved any peer health calculation in my reconfig pr into this file too

i think this is actually fine where it is. the distinction is peerSetManager is abstracted on a node level whereas this is per user per node. if i understand correctly, the unhealthyReplicas is on a per user basis below.

vicky i kinda see your point but this logic of secondarySyncHealth tracking only works from the primary's angle, and PeerSetMAnager doesn't really have a notion of that distinction right now

might make sense but given its like 8 lines, not sure if its worth doing right now

creator-node/package-lock.json

dmanjunath

some more cleanup comments but structurally 👍

dmanjunath · 2021-07-09T15:04:26Z

creator-node/src/config.js

+    doc: 'Minimum percent of failed Syncs for a user on a secondary for the secondary to be considered healthy for that user',
+    format: 'nat',
+    env: 'minimumSecondaryUserSyncSuccessPercent',
+    default: 75


this is likely what it's defaulted to for third party nodes, do we want it it at 50 or 75

will set to 50 now for minimal impact rollout but the end goal for this should def be higher as this is at a per-user level

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

dmanjunath · 2021-07-09T15:22:39Z

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

+      const secondary = Utils._parseSecondaryFromRedisKey(key)
+
+      if (!(secondary in secondarySyncMetrics)) {
+        // This case can be hit for old secondaries that have been cycled out of user's replica set - these can be safely skipped


wouldn't the only case where secondary is not in sync metrics is if there's no data for that secondary? eg no syncs recorded?

no, secondarySyncMetrics would contain every current user secondary that was passed in as a param even if it had 0 data
the case where a secondary from redis is not in secondarySyncMetrics would be if it previously had data recorded for it but is no longer in current replica set

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

dmanjunath · 2021-07-09T15:29:28Z

creator-node/src/snapbackSM/snapbackSM.js

+          /**
+           * If either secondary has a Sync success rate for user below threshold, add it to `unhealthyReplicas` list
+           */
+          const userSecondarySyncMetrics = await SecondarySyncHealthTracker.computeUserSecondarySyncSuccessRates(


i think this is actually fine where it is. the distinction is peerSetManager is abstracted on a node level whereas this is per user per node. if i understand correctly, the unhealthyReplicas is on a per user basis below.

creator-node/src/snapbackSM/secondarySyncHealthTracker.js

SidSethi added 2 commits July 2, 2021 14:23

Consume SecondarySyncRequestOutcomes in Snapback reconfig

9df9192

Bugfix + comment

5ac4b96

SidSethi requested review from dmanjunath and vicky-g July 2, 2021 14:29

minimumSuccessfulSyncCountPercentage as envvar

0b8c2fa

dmanjunath reviewed Jul 2, 2021

View reviewed changes

creator-node/src/config.js Show resolved Hide resolved

SidSethi added 2 commits July 2, 2021 22:32

wip

e52ba4d

Bugfixes + address CR comments

6264cfb

SidSethi requested a review from dmanjunath July 3, 2021 04:06

dmanjunath reviewed Jul 5, 2021

View reviewed changes

creator-node/src/snapbackSM/secondarySyncHealthTracker.js Outdated Show resolved Hide resolved

vicky-g reviewed Jul 7, 2021

View reviewed changes

creator-node/src/snapbackSM/secondarySyncHealthTracker.js Outdated Show resolved Hide resolved

vicky-g reviewed Jul 7, 2021

View reviewed changes

creator-node/src/snapbackSM/secondarySyncHealthTracker.js Outdated Show resolved Hide resolved

SidSethi added 2 commits July 8, 2021 22:16

Merge branch 'master' into ss-consume-secsynchealth-snapback

a473455

Refactored secondarySyncHealthTracker

b558930

SidSethi marked this pull request as ready for review July 8, 2021 23:47

SidSethi requested a review from dmanjunath July 8, 2021 23:47

SidSethi commented Jul 8, 2021

View reviewed changes

creator-node/scripts/dev-server.sh Outdated Show resolved Hide resolved

SidSethi commented Jul 8, 2021

View reviewed changes

Revert nit

f759c85

vicky-g reviewed Jul 9, 2021

View reviewed changes

creator-node/src/snapbackSM/secondarySyncHealthTracker.js Outdated Show resolved Hide resolved

vicky-g reviewed Jul 9, 2021

View reviewed changes

creator-node/package-lock.json Outdated Show resolved Hide resolved

dmanjunath suggested changes Jul 9, 2021

View reviewed changes

SidSethi added 2 commits July 9, 2021 15:43

Merge branch 'master' into ss-consume-secsynchealth-snapback

49e94ac

Code review comments

5f84bd7

SidSethi requested a review from dmanjunath July 9, 2021 17:26

dmanjunath previously approved these changes Jul 9, 2021

View reviewed changes

creator-node/src/snapbackSM/secondarySyncHealthTracker.js Show resolved Hide resolved

SidSethi added content-node Content Node (previously known as Creator Node) feature New features labels Jul 9, 2021

Remove old fn

ad1bb1b

SidSethi dismissed dmanjunath’s stale review via ad1bb1b July 9, 2021 17:33

SidSethi requested a review from dmanjunath July 9, 2021 17:33

dmanjunath approved these changes Jul 9, 2021

View reviewed changes

SidSethi merged commit d3e2a05 into master Jul 9, 2021

SidSethi deleted the ss-consume-secsynchealth-snapback branch July 9, 2021 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume SecondarySyncRequestOutcomes in Snapback reconfig #1614

Consume SecondarySyncRequestOutcomes in Snapback reconfig #1614

SidSethi commented Jul 2, 2021 •

edited

SidSethi commented Jul 8, 2021

SidSethi Jul 8, 2021

SidSethi Jul 8, 2021

SidSethi Jul 8, 2021

vicky-g Jul 9, 2021

SidSethi Jul 9, 2021

vicky-g Jul 12, 2021

vicky-g Jul 9, 2021

dmanjunath Jul 9, 2021

SidSethi Jul 9, 2021 •

edited

dmanjunath left a comment

dmanjunath Jul 9, 2021

SidSethi Jul 9, 2021

dmanjunath Jul 9, 2021

SidSethi Jul 9, 2021

dmanjunath Jul 9, 2021

Consume SecondarySyncRequestOutcomes in Snapback reconfig #1614

Consume SecondarySyncRequestOutcomes in Snapback reconfig #1614

Conversation

SidSethi commented Jul 2, 2021 • edited

Description

Tests

How will this change be monitored?

SidSethi commented Jul 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SidSethi Jul 9, 2021 • edited

Choose a reason for hiding this comment

dmanjunath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SidSethi commented Jul 2, 2021 •

edited

SidSethi Jul 9, 2021 •

edited