-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover empty queues, fix 1 cause of missing Bull locks #4315
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did a first pass but i'm pretty confused about a lot of this logic tbh, i think i need to do another deeper pass
don't block on me for testing this on staging/prod (might be too much work to squash and cherry pick commit onto a prod node)
creator-node/src/services/stateMachineManager/stateReconciliation/index.js
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
still confusing, but not worth further effort rn
only thing i can think of that might immediately help things is to put a ./utils/cluster/README.md
that just summarizes the different types. i know you documented across various files, but still hard to track down
Description
Increases stability of cluster+Bull when running into unexpected failures:
Tests
I added debug logs (now removed) to verify that the correct worker is marked as special when respawning, and I did the following:
A up; A seed clear; A seed create-user; A seed upload-track
kill
it after docker exec'ing into the container)Monitoring - How will this change be monitored? Are there sufficient logs / alerts?
/health/bull
): monitoring-state, c-node-endpoint-to-sp-id, and recover-orphaned-datawas empty - restarting it