Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle script reinstantiation during Redis failover #316

Closed
wants to merge 2 commits into from
Closed

fix: handle script reinstantiation during Redis failover #316

wants to merge 2 commits into from

Conversation

rtertiaer
Copy link

This PR helps handle script re-instantiation during Redis failover events (in AWS at least.) When swapping instances, the script cache disappears, even while keeping the contents of the Redis cache intact. At present, bee-queue only loads scripts during startup, which means that after failover events this evalsha call fails for already running bee-queue instances. This PR attempts to catch this error, re-instantiate the scripts on the new Redis server, and try again.

An alternative and less obtuse implementation could be to utilize SCRIPT EXISTS to check before every EVALSHA, but that doubles the round trips to Redis on every run; instead we catch this odd edgecase and only try to re-instantiate them when we encounter it.

My javascript isn't great; I'd appreciate any and all checks for correctness, code quality & style, and things I just hadn't considered. As a result, I'm filing this as a draft to receive early feedback on implementation. I'll be testing this out more fully on localhost later; at present, npm test passes all 140 tests.

@rtertiaer
Copy link
Author

I believe this may address #147.

@rtertiaer rtertiaer marked this pull request as draft November 10, 2020 21:57
lib/queue.js Outdated Show resolved Hide resolved
lib/queue.js Show resolved Hide resolved
@skeggse skeggse self-requested a review November 12, 2020 02:28
Copy link
Member

@skeggse skeggse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want/need to revisit lib/lua/index.js, seeing as it was written with the assumption that none of the script loading was performance-critical. I suspect it's more important now.

Also worth noting:

An alternative and less obtuse implementation could be to utilize SCRIPT EXISTS to check before every EVALSHA

^ this is actually how the proposed changes operate, because buildCache invokes loadScriptIfMissing, which runs SCRIPT EXISTS.

lib/queue.js Outdated Show resolved Hide resolved
lib/queue.js Outdated Show resolved Hide resolved
lib/queue.js Outdated
err.name === 'ReplyError' &&
err.message === 'NOSCRIPT No matching script. Please use EVAL.'
) {
return lua.buildCache(this.client).then((client) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat concerned that this will hammer the redis server if the client tries to execute a bunch of scripts all at once - they'll all come back with NOSCRIPT errors and will all separately trigger this code path. IMO we should handle that by introducing new code (either in buildCache or between here and buildCache) to memoize the promise. It'd want to be per-client, I suspect - maybe using a WeakMap (just storing it on the Queue would work too)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe WeakMap requires node versions >= 12 (10 behind an interpreter flag); does this change this recommendation at all? I see that you are planning deprecation of node 4 & 6 - wondering about 8/10 in this context & given that your friends at Mixmax (probably) aren't the only users of this package. I know I personally use 10 in some places still :L (tho this could be the kick in the butt to prioritize some other things.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing Node 0.12, not 12 - am I looking at something different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, you're correct - I mean 0.12 and not 12. d'oh, I'm just mixed up.

lib/queue.js Outdated
if (err.code !== 'NOSCRIPT') {
throw err;
}
return lua.buildCache(this.client).then((client) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The promise produced by buildCache will not resolve to the client. We'll want to use _commandable again anyway in case the Queue is in the process of exiting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh!

lib/lua/index.js Outdated
Comment on lines 54 to 61
if (scriptsLoaded) return scriptsLoadedPromise;
scriptsLoaded = true;
// We could theoretically pipeline this, but it's pretty insignificant.
return readScripts().then(() =>
return (scriptsLoadedPromise = readScripts().then(() =>
Promise.all(
Object.keys(shas).map((key) => loadScriptIfMissing(client, key))
)
);
));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skeggse is this a reasonable way to accomplish this, in lieu of a WeakMap? I see this being done above when reading the scripts in. perhaps this isn't reasonable because it's not done at the Client level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'll need to do it per-Queue or per-Client - otherwise we won't correctly load scripts into auxiliary clients.

lib/queue.js Outdated
@@ -32,6 +32,8 @@ class Queue extends Emitter {
this.bclient = null;
this.eclient = null;

this._scriptCachePromise = new WeakMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you're already aware but I don't see where this is used.

@rtertiaer
Copy link
Author

In light of a potential rewrite, given that Mixmax has successfully accomplished the overarching task that prompted this, and a lack of time on my part to fixup this PR, I'm closing this. Thanks for the time spent reviewing! It has been good for me 💟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants