Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not queue, ReplyError from redis use replicate_commands() #99

Closed
wwillsey opened this issue Jan 9, 2019 · 8 comments
Closed

Could not queue, ReplyError from redis use replicate_commands() #99

wwillsey opened this issue Jan 9, 2019 · 8 comments

Comments

@wwillsey
Copy link

wwillsey commented Jan 9, 2019

Hi! Im running into an error:

ReplyError: ERR Error running script (call to f_71aee5518c2d20433a3cd18e00231b3003bd41f1): @user_script:213: @user_script: 213: Write commands not allowed after non deterministic commands. Call redis.replicate_commands() at the start of your script in order to switch to single commands replication mode.
    at JavascriptRedisParser.returnError (/usr/src/app/node_modules/ioredis/lib/redis/parser.js:24:25)
    at JavascriptRedisParser.execute (/usr/src/app/node_modules/redis-parser/lib/parser.js:572:12)
    at Socket.<anonymous> (/usr/src/app/node_modules/ioredis/lib/redis/event_handler.js:107:22)
    at Socket.emit (events.js:182:13)
    at Socket.EventEmitter.emit (domain.js:441:20)
    at addChunk (_stream_readable.js:287:12)
    at readableAddChunk (_stream_readable.js:268:11)
    at Socket.Readable.push (_stream_readable.js:223:10)
    at TCP.onStreamRead [as onread] (internal/stream_base_commons.js:122:17)
From previous event:
    at Command.initPromise (/usr/src/app/node_modules/ioredis/lib/command.js:63:18)
    at new Command (/usr/src/app/node_modules/ioredis/lib/command.js:51:8)
    at Script.execute (/usr/src/app/node_modules/ioredis/lib/script.js:22:17)
    at Redis.register (/usr/src/app/node_modules/ioredis/lib/commander.js:164:20)
    at _this3.Promise (/usr/src/app/node_modules/bottleneck/lib/RedisDatastore.js:135:52)
    at /usr/src/app/node_modules/bottleneck/lib/RedisDatastore.js:122:14
    at Generator.next (<anonymous>)
    at asyncGeneratorStep (/usr/src/app/node_modules/bottleneck/lib/RedisDatastore.js:11:103)
    at _next (/usr/src/app/node_modules/bottleneck/lib/RedisDatastore.js:13:194)
    at _tickCallback (internal/process/next_tick.js:43:7)
    at processImmediate (timers.js:605:7)
    at process.topLevelDomainCallback (domain.js:120:23)
  name: 'ReplyError',
  message:
   'ERR Error running script (call to f_71aee5518c2d20433a3cd18e00231b3003bd41f1): @user_script:213: @user_script: 213: Write commands not allowed after non deterministic commands. Call redis.replicate_commands() at the start of your script in order to switch to single commands replication mode. ',
  command:
   { name: 'evalsha',
     args:
      [ '71aee5518c2d20433a3cd18e00231b3003bd41f1',
        '7',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_settings',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_job_weights',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_job_expirations',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_job_clients',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_client_running',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_client_num_queued',
        'mcperf:b_group-key-key:consume:f6cfa8f0-d0c1-11e8-8b5f-7fa90c56db27_client_last_registered',
        '1547072997557',
        'p2z8cwhjqt',
        'z7xhq92umbp',
        '1',
        '300000' ] } } CallStack:    at Object.limiter.on [as cb] (/usr/src/app/node_modules/mce-utils/utils/RateLimitedJobQueue.js:188:16)
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:90:71
    at Generator.next (<anonymous>)
    at asyncGeneratorStep (/usr/src/app/node_modules/bottleneck/lib/Events.js:3:103)
    at _next (/usr/src/app/node_modules/bottleneck/lib/Events.js:5:194)
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:5:364
    at new Promise (<anonymous>)
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:5:97
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:109:25
    at Array.map (<anonymous>)
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:75:40
    at Generator.next (<anonymous>)
    at asyncGeneratorStep (/usr/src/app/node_modules/bottleneck/lib/Events.js:3:103)
    at _next (/usr/src/app/node_modules/bottleneck/lib/Events.js:5:194)
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:5:364
    at new Promise (<anonymous>)
    at /usr/src/app/node_modules/bottleneck/lib/Events.js:5:97
    at Events.trigger (/usr/src/app/node_modules/bottleneck/lib/Events.js:124:7)
    at _drainOne.then.catch.e (/usr/src/app/node_modules/bottleneck/lib/Bottleneck.js:332:28)
    at tryCatcher (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/promise.js:689:18)
    at Async._drainQueue (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues [as _onImmediate] (/usr/src/app/node_modules/mce-utils/node_modules/bluebird/js/release/async.js:17:14)
    at processImmediate (timers.js:632:19)
    at process.topLevelDomainCallback (domain.js:120:23)

It only comes through the debug event, with message 'Could not queue ' along with the data from above image.

It only happens a small percent of the queue attempts, but the schedule submit succeeds, but job never queues.

Any help would be appreciated!
-Will

@SGrondin
Copy link
Owner

Thank you for the bug report! This is definitely a bug in Bottleneck, I'll fix it ASAP.

Can you please paste your settings?
Are you using Redis Cluster or Redis Sentinel?

Thank you!

@wwillsey
Copy link
Author

const connection = new Bottleneck.IORedisConnection({ client: redisClient });

this.limiter = new Bottleneck.Group({ connection, maxConcurrent: 5, minTime: 20 });

Let me know if I can get you any more info, I'll get it to you asap.

Thanks!

@SGrondin
Copy link
Owner

SGrondin commented Jan 10, 2019

I need a little bit more information:

  • You seem to be using the latest versino (v2.15.0), is that true?
  • What is the version of your Redis server?
  • Is your Redis server using Redis Cluster (master-slave replication) or is it just a single node?
  • How is your ioredis client created? Which options are you using?

Thank you! For reference, this error message seems to indicate that in a small percentage of cases, the Redis operations executed by Bottleneck are not compatible with replication. I designed Bottleneck to work with replication, so I need to find which edge case I missed. Different versions of Redis have different requirements in order to support replication in a safe and atomic manner.

EDIT: I'm 98% sure I've found the problem, I believe it's an optimization in the new load balancing algorithm that's not compatible with replication. Have you been using Bottleneck with your current settings since before v2.14? I believe this issue was introduced in v2.14, and it would help me reproduce the problem faster if you were able to confirm that this issue did not happen before v2.14. Anyways, I'll have a fix by Saturday at the latest.

@wwillsey
Copy link
Author

wwillsey commented Jan 10, 2019

Here's my redis INFO:

Server

redis_version:4.0.7
os:Linux 4.15.0-39-generic x86_64

Replication

role:master
connected_slaves:2

Cluster

cluster_enabled:0

ioredis using version 3.2.2,
client created with no special options, just specify master and sentinels
bottleneck 2.15.0

Thanks for such speedy response!

@wwillsey
Copy link
Author

I can try out a pre 2.14 version and see if the issue persists. Will update when i do.

@SGrondin
Copy link
Owner

I can try out a pre 2.14 version and see if the issue persists. Will update when i do.

Please do, that will help. Just be aware that pre 2.14 does not try to spread out load between instances in a Cluster, it's purely left out to randomness.

@SGrondin
Copy link
Owner

I was about to confirm that the Redis PUBSUB NUMSUB command is the culprit here. It's not compatible with Redis Cluster (replication) because it surprisingly returns server-local data, not cluster data. This makes it non-deterministic and thus incompatible with replication. It also caused Bottleneck in some cases to not make the right decision during load balancing. The unusual behavior of this command doesn't seem to be documented at all. I found this github issue and I was able to confirm it by testing it myself.

127.0.0.1:6379> eval "redis.call('pubsub', 'numsub', 'a'); redis.call('set', 'a', 'b')" 0
(error) ERR Error running script (call to f_269f55ba573a9db2b3729e11cbc9394d138543b1): @user_script:1: @user_script: 1: Write commands not allowed after non deterministic commands. Call redis.replicate_commands() at the start of your script in order to switch to single commands replication mode.

I just published v2.15.1, it avoids using the problematic command altogether.

@SGrondin
Copy link
Owner

This also helped me find another issue where in certain cases it would still believe a limiter was present after it had gone away. It would look "stuck" and only execute jobs every 4-5s on average. This has been fixed in 2.15.2 and there's now a strong test to ensure this cannot happen anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants