Sentinel aware #302

Open
tanguylebarzic opened this Issue Oct 10, 2012 · 42 comments

Projects

None yet
@tanguylebarzic

Hi,

Are there plans to make node_redis sentinel aware? I'm thinking about being able to use a list of sentinels at startup, then discover and connect to the masters, and subscribing to sentinels message to take into account masters changes. If not, would be happy to start working on it!

Tanguy

@joaojeronimo

+1 , why not make a wrapper on top of this node_redis client ? If you start working on it please let me know because I'd like to star that repository.

@tanguylebarzic

Hi,

I've started working on this, the result can be checked at https://github.com/tanguylebarzic/node_redis
An example of use:

var RedisMetaClient = require("./../node_redis/sentinel").RedisMetaClient;
var sentinels = [
    {host: "127.0.0.1", port: 26382},
    {host: "127.0.0.1", port: 26383},
    {host: "127.0.0.1", port: 26384}
];
var redisMetaClient = new RedisMetaClient("mymaster", sentinels);
var i = 0;

var client = redisMetaClient.createMasterClient();
client.on('error', function(error){
    console.log(error);
});

setInterval(function(){
    client.set('test:' + i, i, function(error, result){
        if(!error){
            console.log('success, ' + i);
            ++i;
        }
        else {
            console.log('error, ' + i);
        }
   });
}, 500);

Basically, you first have to create a 'RedisMetaClient', configured with the name of the master and a list of sentinels. Then, calling redisMetaClient.createMasterClient(); will give you a client similar to the usual redis.createClient();

What works:

  • Basic functionnality, ie. setup of the correct master, failover... If the master change, all the current masterClients (created by redisMetaClient.createMasterClient(); will be updated to point to the new master, so that one doesn't need to look for client.on('error') to update the connection.
    I have try to preserve the idea of an offline queue, ie. being able to send commands before the master is found and configured.
  • I have tested with simple requests, pub/sub... but not with multi (may work, just not sure).

How was is it done:
I've tried to modify as less as possible the code and behaviour of the single client (what's in 'client.js'). sentinel.js contains kind of a wrapper around client.js, to deal with sentinels.

What remains to be done:

  • Dealing with slaves (for now, nothing is done around them).
  • Testing (both by hand and automatic) of multiple scenarios
  • Documentation

Known issues:

  • the event ready is triggered multiple times for a masterClient if a failover happened, which can result in unexpected behaviours based on the way it's used
  • many others that I don't know about
@benbuckman

tanguylebarzic, thanks for starting to tackle sentinel support. I'm confused by your approach, however: You seem to be implementing the sentinel functionality (e.g. identifying the master and slaves, handling a quorum) in your node.js code. But in the sentinel docs that is all set up in the redis-sentinel daemon's config. Node shouldn't care about who the master or slaves are for a given sentinel, it should simply connect to the sentinel, which is handling all that already. (And that SentinelClient should behave like a regular RedisClient, so it's backwards-compatible with connect-redis session store, etc.)

Can you clarify your approach? Thanks!

@benbuckman

I've started implementing an alternate approach with a RedisSentinelClient that behaves transparently like a RedisClient: https://github.com/DocuSignDev/node_redis

Always interested in feedback, suggestions, and/or collaborators.

@tanguylebarzic

newleafdigital: sorry for the late reply. indeed, I noticed it was not clean to implement this the way I did (too much logic on the client). I've changed it to better match antirez' guidelines, although there are still some questions remaining IMO. Your approach looks interesting (and add cluster awareness!), I'm going to look at it!

@jochenonline

@newleafdigital: I have tried your fork together with socket.io. When I setup the redisstore:

io.set('store', new socketio.RedisStore({
    redis: redis,
    redisPub:   redis.createClient( 26379, "127.0.0.1", {sentinel: true} ),
    redisSub:   redis.createClient( 26379, "127.0.0.1", {sentinel: true} ),
    redisClient:redis.createClient( 26379, "127.0.0.1", {sentinel: true} )
 }));

My app crashes on startup:

 Error: Redis connection to 127.0.0.1:6379 failed - connect ECONNREFUSED
     at RedisClient.on_error (/home/green/debug/node_modules/redis/index.js:168:24)
     at Socket.RedisClient.initialize_stream_listeners.stream.on.self.should_buffer (/home/green/debug/node_modules/redis/index.js:93:14)
     at Socket.EventEmitter.emit (events.js:96:17)
     at Socket._destroy.self.errorEmitted (net.js:328:14)
     at process.startup.processNextTick.process._tickCallback (node.js:244:9)
 [Wed, 13 Feb 2013 16:46:56 GMT] INFO worker 6622 died

Please be aware that the error message sais that the connection failed to port 6379, instead of 26379!!!

Any idea what could be the problem? When I connect directly to the redis-server (master) everything is fine.

@benbuckman

Try using a single hash for the parameters:

redis.createClient({ port: 26379, host: "127.0.0.1", sentinel: true } );

When you connect to the sentinel via redis-cli -p 23679 (not to the master - sentinel != master), does it connect?

@jochenonline
  1. Yes, connecting to the sentinel via redis-cli -p 26379 connects successfully.
  2. Using the alternate syntax doesn't crash on connect but returns an error on the first get of the redisSentinelClient:
[Error: ERR unknown command 'get']

In that case I used the normal redis functionality (not socket.io).

@jochenonline

Hhhhhm. Maybe no problem of your module....When I connect to the sentinel it connects successfully, but it does not know any redis commands:

green@mycomp1:~/redis/redis$ src/redis-cli -p 26379
redis 127.0.0.1:26379> set test 1
(error) ERR unknown command 'set'
redis 127.0.0.1:26379>

What could be the reason for this? Just to be clear: I am connecting to the sentinel, neither to the master nor to one of the slaves. Correct?

ADDENDUM: Reading the docs more in detail I saw that this behavior is normal. But why does the your client do the same? Shouldn't it behave as a "normal" redisClient?

@benbuckman

The sentinel itself shouldn't handle get/set/etc, but the sentinel client should handle it, by delegating it to the master.
Could you put some debugging code in the SentinelClient constructor, to make sure it's instantiating a SentinelClient and not a regular client?
There's also a sentinel test suite that we added, run with Mocha -- maybe run that (changing the port) and see if those pass on your system?

Thanks

@jochenonline

@newleafdigital: Is it possible that you expect the sentinel running in the same machine as redis? I have my sentinel running on a different machine (the one on which node runs) and I doubt that this scenario works for the current implementation. I still get the error

 Error: Redis connection to 127.0.0.1:6379 failed - connect ECONNREFUSED

and that leads me to the conclusion that it tries to connect to a redis instance on 127.0.0,1 === this machine.

@jochenonline

This seems to be a specific problem of using your client with socket.io. Did you ever test the two together?

@jochenonline

@newleafdigital: Now I seem to have found out what is going on. I did not deeply dig into your code but I can quite exactly describe the behaviour:

In my current configuration (node-app on machine1, redis_master on machine2, redis_slaves on machines3+4, redis_sentinels on machines2+3+4) your code only works if I connect to one of the slave_sentinels on machine3+4. It does not work if I connect to the sentinel on the master (machine2).

When I connect to one of the two slave_sentinels the systems logs correctly:

debug connected to sentinel listener
debug connected to sentinel talker
debug new master info { host: 'the.ip.of.master', port: '6379' }
debug Changing master from 127.0.0.1:9999 to the.ip.of.master:6379
debug New master is ready [Yippieee!]

When I connect to the sentinel on the master it sais (and does not work afterwards):

debug connected to sentinel listener
debug connected to sentinel talker
debug new master info { host: '127.0.0.1', port: '6379' } 
debug Changing master from 127.0.0.1:9999 to 127.0.0.1:6379

and no debug New master is ready. As you can see, when I connect to the master's sentinel, the master is not addressed correctly (127.0.0.1 instead of the correct ip).

If the sentinels run on other machines but the redis' machines (i.e. on the same machine like the node-app - as in my very first configuration) it only works too, if the sentinel is connected to one of the slaves (and not the master).

And...together wird socket.io/RedisStore it does not work either (concerning my configuration) in any way, even if I connect to the sentinel on one of the slaves.

@DTrejo
Contributor
DTrejo commented Feb 24, 2013

+1 this is great as a separate module as it is much more likely to change.

If you'd like to have your module mentioned in the wiki, come up with a tag for it e.g. #redis-sentinel and then
I'll add a link in the README to all packages with that tag e.g. https://npmjs.org/browse/keyword/redis-sentinel

@benbuckman

Thanks for all the feedback here, and sorry I haven't had a chance to reply in depth. I've allocated myself time for this next week (March 4).

@DTrejo, you're right about making this its own module, I think I'll do that (next week as well).

@jamessharp

Hi guys

I've also been having a look at this (I need to be able to connect to get the master from a list of sentinels and reconnect to a new master if the old one goes down).

I've taken a slightly different approach to @newleafdigital by just using the existing RedisClient reconnection stuff and changing the desired host/port so when reconnection happens we reconnect to a new instance (so for all intents and purposes it's as if the client has just done a normal reconnect)

The advantages to this method are that it is just a wrapper over node_redis so there is no need to get down and dirty messing with that code. Also it should mean that since the client returned is a (slightly extended) RedisClient then it should slot in nicely with existing code. And finally it just feels simpler than the existing code

My code is here: https://github.com/ortoo/node-redis-sentinel - it's fairly limited at the moment but feels pretty extensible. Any thoughts/assertions that I'm going off on the wrong track would be appreciated.

Current functionality:

  • Specify a list of sentinel endpoints and master name
  • Get a RedisClient back that will connect to the master
  • If the current master dies then RedisClient will attempt to reconnect (resolving the master from the sentinels)
@brycebaril
Member

👍 to all of the people working on sentinel libraries! I haven't had a chance to take a look yet, but I'm eager to try it out.

@benbuckman

@jamessharp, nice work on node-redis-sentinel. I'm trying to figure out if I should adopt yours or split my SentinelClient into its own module. One thing I can't quite figure out with yours, is it meant to create a single client that transparently handles failover/reconnect? Or does the app need to instantiate 3 clients (master, slave, sentinel) and switch between the active client when it fails over?

From your description above, it sounds like the former, but looking at the code and trying to use it, it seems more like the latter.

Thanks!

@jamessharp

Thanks. It's meant to create a single client that transparently handles failover/reconnect. The three clients that you can have (master/slave/sentinel) are meant to give you a persistent connection to each of the different types. So if you are happy to do reads from a slave then you can get a slave connection (which will attempt to transparently reconnect to a new slave instance if the one you're accessing goes down). If you need a permanent connection to the master (whichever server that happens to be) then you can just use the master client and if you want a direct connection to the sentinel instance (that again should transparently failover) you can grab the sentinel client.

Hope that makes sense.

Of course there's still quite a lot of work that needs doing but it works well enough for me at the moment...

@benbuckman

Thanks for the quick reply @jamessharp. What are the big missing pieces?

I wrote a test here to see how it works - https://gist.github.com/newleafdigital/5469571. Before running the script, you start up a master+slave+sentinel; the script connects to the master, and every second adds an incremental key to a hash. After 5 seconds, the master is killed, and the idea is to see if anything is lost. It seems that the master client never actually connects to the new master, so I'm not sure if I'm doing it wrong, or if it's not working.

During a failover, is it supposed to buffer the data to avoid loss, or is I/O done during the failover supposed to be lost?

Thanks

@jamessharp

A couple of thoughts as to why it may not be working:

  • the default sentinel configuration won't actually failover until the master has been down for 30 seconds (I think)
  • the time between reconnection attempts of the client backs off exponentially so there could be a minute before the reconnection actually happens

I'm not sure whether the data is buffered. I'd hope so but the behaviour will be the same as whatever happens when the underlying client tries to reconnect.

The main missing piece is detecting when the master is changed without it going down (I.e. a manual failover). But some more work could be done on making the reconnection as snappy as possible.

@benbuckman

I added a test with our implementation to compare. The results are in the gist. All the I/O done during the failover is lost with node-redis-sentinel, and no data is lost with our (much bulkier) solution. We're buffering in ours; the basic redisClient drains its buffer on disconnect or errors (I don't remember exactly), so it can't easily buffer through a failover.

I like the simplicity of your solution, but I wonder if it's sufficient for the goals we were hoping to achieve:

  • Transparent, drop-in replacement for redisClient
  • handles all redisClient operations include pub/sub transparently
  • no loss of data during failover

For now, I'll assume our implementation adds some value, and I'll separate it into its own module. If it turns out your much simpler alternative will suffice, we'll switch to that.

Thanks for enriching the space!

@jamessharp

Yeh your goals are spot on. However a lot of them would actually be useful in the core node_redis client. When it comes down to it there shouldn't be any difference between a reconnection to a single instance (i.e. what node_redis currently handles) and a failover to a new master/slave in a sentinel controlled cluster.

Rather than putting in the logic for no loss of data during failover and pub/sub persistence in a separate module, I reckon we should put it in the core client and then have a separate module for the sentinel specific logic (along the lines of what I've done in node-redis-sentinel). Its win-win. It makes for a better node_redis client and a simpler sentinel module.

@brycebaril
Member

I've started work on a refactor of the core node_redis client to accomplish a couple things: break up the current index.js so it is a bit easier to work on, but also to add in a mechanism for plugins such as this. I agree that due to the way that node_redis manages command replays and offline queueing for connection interruptions, the sentinel behavior probably works best working directly with the core client.

My goal is that the refactor results in code where you can drop in a 3rd party connection manager library much the way you can currently swap in a parser, and it will expose the appropriate hooks such that something like a sentinel library could use the core client's command queueing & bookkeeping, but replace the connection management portions.

I'll try to get a branch pushed soon for feedback.

@benbuckman

That all sounds good but pretty heavy. Our fork of node_redis already handles the reconnection steps needed for sentinel to work. I'm not sure a pluggable connection manager is necessary.

@benbuckman

Actually, correction: The fork there does not have all the necessary pieces; they exist in another copy which I need to merge into that fork. I will do that on Monday (tomorrow).

@brycebaril
Member

I'm not entirely sure what you mean by heavy -- the goal is to encourage a lighter codebase by making it easier to plug features in without forcing them to be a part of the core library.

@benbuckman

I separated our sentinel client implementation into a new module, redis-sentinel-client:

https://github.com/DocuSignDev/node-redis-sentinel-client
https://npmjs.org/package/redis-sentinel-client

It still depends on some minor changes to node_redis (so it uses our fork); I will pair down the fork to only those necessary changes, and submit PR's for them.

(@brycebaril, my original thought was that a pluggable connection manager would be redundant/overkill. I'm probably wrong about that. I'll see what changes are actually necessary to node_redis to support a sentinel client in a little bit.)

@benbuckman

I've submitted two pull requests to support redis-sentinel-client:

#428 - Export utils and commands to share (with sentinel client, or others) - this is non-breaking/low-impact and makes node_redis better overall regardless of this particular use.

#429 - Flexible connections for sentinel support - this is potentially more breaking/controversial and related to the thread above about pluggable connection managers.

Thanks to @tanguylebarzic's early work on sentinel support, and to everyone who works on node_redis!

@ghost Unknown referenced this issue in DocuSignDev/node-redis-sentinel-client May 3, 2013
Closed

Usage with other languages e.g. Java #1

@brycebaril brycebaril was assigned May 4, 2013
@tanguylebarzic

Hi,

I missed a lot of the discussions around this... Internally, we've switched to an approach very close to the sentinel clients guidelines (http://redis.io/topics/sentinel-clients), that makes it simpler IMO. It's closer to what @jamessharp has proposed - see gist here: https://gist.github.com/tanguylebarzic/5521378
It's been running without issues for a few months now. However, it requires some changes in index.js to make it work smoothly. For example, if for some reason a master is downgraded to a sentinel, redis will answer with an reply error (READONLY You can't write against a read only slave.) that is currently not emitted as an error from the redis-client, thus making it impossible to catch and trigger an update to the connection. The pull requests by @newleafdigital certainly help.

@pavele
pavele commented Oct 1, 2013

@tanguylebarzic, your sample looks really nice. Can you give more details what changes exactly should be done to index.js that you mention.

Thanks for the help.

@benbuckman

FYI - I wrote a blog post on our dev blog, http://docusigndev.github.io/articles/redis-sentinel-client-nodejs/, about our solution to this problem (our Redis Sentinel Client). I'd love to see a conversation around the subject to figure out the best approach!

@brycebaril
Member

Hi @benbuckman have you seen #504 -- I'm hoping that will enable the correct integration of a sentinel manager (and cluster) with this library. Would very much welcome your input & help in that regard!

@asilvas
asilvas commented Jun 12, 2014

Any progress here?

@jeremiahlee
Contributor

I'm using @benbuckman's module, but agree with him that I'd like to see more convo and action on the best approach here.

@gobwas
gobwas commented Dec 9, 2014

Whats about sentinel support now? Is there any progress?

@qubyte
qubyte commented Feb 6, 2015

Sorry for the me-too... What's the progress on this? It looks like there's no active development on this module right now.

@saltukalakus

Native sentinel support in node_redis +1

@blainsmith blainsmith closed this Aug 15, 2015
@mhart
mhart commented Aug 15, 2015
@qubyte
qubyte commented Aug 17, 2015

@blainsmith Could you let us know what changed? I'd like to know if this a won't-fix or something, so some explanation would be appreciated.

@blainsmith
Contributor

Oops. I forgot to paste in my canned response to this. I was going through and cleaning up really old issues. Sorry about that. I will reopen this and tag it accordingly!

@blainsmith blainsmith reopened this Aug 17, 2015
@qubyte
qubyte commented Aug 17, 2015

@blainsmith Ah, I see. Thanks for the update. :)

@brycebaril brycebaril was unassigned by BridgeAR Jan 3, 2016
@BridgeAR BridgeAR added the Help wanted label Jan 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment