-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SecondarySyncHealthTracker change recordfail #4353
Conversation
...r-node/src/services/stateMachineManager/stateReconciliation/issueSyncRequest.jobProcessor.ts
Outdated
Show resolved
Hide resolved
ignore tests for now. will update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all looking good besides having the sync tracker as a class. that's my only big request to change here so most comments are about that or just nits
...-node/src/services/stateMachineManager/stateMonitoring/findReplicaSetUpdates.jobProcessor.ts
Outdated
Show resolved
Hide resolved
...-node/src/services/stateMachineManager/stateMonitoring/findReplicaSetUpdates.jobProcessor.ts
Outdated
Show resolved
Hide resolved
creator-node/src/services/stateMachineManager/stateReconciliation/SecondarySyncHealthTracker.ts
Outdated
Show resolved
Hide resolved
creator-node/src/services/stateMachineManager/stateReconciliation/SecondarySyncHealthTracker.ts
Outdated
Show resolved
Hide resolved
creator-node/src/services/stateMachineManager/stateReconciliation/SecondarySyncHealthTracker.ts
Outdated
Show resolved
Hide resolved
creator-node/src/services/stateMachineManager/stateMonitoring/monitorState.jobProcessor.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work - it was especially nice and reassuring to see the manual tests!
creator-node/src/services/stateMachineManager/stateReconciliation/SecondarySyncHealthTracker.ts
Show resolved
Hide resolved
creator-node/src/services/stateMachineManager/stateReconciliation/SecondarySyncHealthTracker.ts
Outdated
Show resolved
Hide resolved
failure_fetching_user_replica_set: 10, | ||
failure_force_resync_check: 10, | ||
failure_fetching_user_gateway: 10, | ||
failure_delete_db_data: 3, | ||
failure_delete_disk_data: 3, | ||
failure_sync_secondary_from_primary: 10, | ||
failure_db_transaction: 3, | ||
failure_export_wallet: 10, | ||
failure_import_not_consistent: 3, | ||
failure_import_not_contiguous: 3, | ||
failure_inconsistent_clock: 10, | ||
failure_undefined_sync_status: 3, | ||
default: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagging that we'll probably want to fine tune these and add other errors based on what we see in prod after release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically these errors only get added from the recordFailure
call. these fields are from the prometheus constants so if more errors come up, there needs to be a manual update here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree these all seem reasonable for now and we can tweak later
creator-node/src/services/stateMachineManager/stateReconciliation/SecondarySyncHealthTracker.ts
Show resolved
Hide resolved
@@ -0,0 +1,19 @@ | |||
// See prometheus.constants.js for reference on potential errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe valuable to denote either in comment or in variable name that these are NUMBER_OF_DAILY_RETRIES
* add flag to disable sync tracking * disable recording sync * update env var * update test * remove extra }) * remove .only * WIP remove recordSuccess * update tests to remove recordSuccess * fix test * WIP * also wip * wip also redis * wip change names + redis calls * WIP * remove unused file * remove personal file * remove unused redis command * fix old comments + update retry + refactor * pass in serialized data but let class instance handle processing * add question mark + remove sync type * fix configs * remove additional properties * add sync undefined status response * remove comment * update some tests * fix some tests * fix test again * fix tests again * fix test * remove. only * fix tests * fix tests * fix some tests * fix tests again * remove .only * fix test? * write tests * fix test * cleanup * remove unused vars * move fn * short circuit * remove t/f map * fix tests * fix tests * remove forced failure
Description
Summary: This PR is to relieve pressure on redis by only tracking the errors encountered on the primary->secondary syncs and adding fine tuned granularity on the error encountered to the number of times encountered.
This PR includes:
SecondarySyncHealthTracker
to go from success rate \ error rate computation -> error countSecondarySyncHealthTracker
is consumed around various placesTests
)Tests
SecondarySyncHealthTracker.test.ts
Happy path:
Wallet was able to sync successfully with no errors recorded in redis
Redis
User state
Sad path:
Uploaded track and user only had primary clock values increase:
SecondarySyncHealthTracker
logs highlighting wallet encountered max errors allowed:and
Sync job failing for a wallet
Redis highlighting that a wallet-secondary-date recorded the max allowed fails:
Monitoring - How will this change be monitored? Are there sufficient logs / alerts?
Check log surrounding
"SecondarySyncHealthTracker":"computeWalletOnSecondaryExceedsMaxErrorsAllowed","msg":"Wallets on secondaries have exceeded the allowed error capacity for today
for the wallets that exceed the max errors allowedand also
Wallet encountered max errors allowed on secondary
for if max errors are encountered