Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication timeout is responsibility of the slave alone, should we try this? #918

Closed
antirez opened this Issue Jan 30, 2013 · 5 comments

Comments

Projects
None yet
2 participants
@antirez
Copy link
Owner

antirez commented Jan 30, 2013

Since the master -> slave channel is one-way in Redis, currently replication timeout is only up to the slave. Basically this is what happens:

  • The master normally sends the replication stream to the slave.
  • The master also PINGs the slave from time to time, to make sure that even when no new data is available to be transmitted to the slave, the slave still get some news from the master.

Because of this, there is always data going from master to slave, so the slave will be able to detect when the connection is down and reconnect if possible.

On the master side, if we don't get an error from the socket, we keep sending data. The output buffer can get too large, but the limits in Redis 2.6 will detect that and close the connection with the slave.

However there is no way currently for the master, in absence of socket errors, to detect if the slave is down and close the connection earlier. This is usually not a big issue but I'm opening this issue to get some feedback about that and to reconsider the issue before 2.8 final.

@ghost ghost assigned antirez Jan 30, 2013

@charsyam

This comment has been minimized.

Copy link
Contributor

charsyam commented Feb 1, 2013

@antirez I think it is a kind of policy problem. but In my opinion.
users might expect slaves try to connect master automatically.
and extracting timeout to option. if you agree this, I will write the patch for this.
:) Thank you.

@antirez

This comment has been minimized.

Copy link
Owner Author

antirez commented Feb 1, 2013

Hello @charsyam, sorry I'm not sure I understand your message, slave of course automatically reconnect in every case as they handle timeout explicitly. Here the problem is only that a dead slave can still be seen by a master as active.

About the patch, thanks, that's appreciated, but I assigned the issue to me as when things are very critical usually I try to address them myself, since it is very little code and mostly design, so to review a patch tends to be as time consuming or more as writing the code ;-)

@charsyam

This comment has been minimized.

Copy link
Contributor

charsyam commented Feb 1, 2013

@antirez, I'm sorry that I misunderstand this issue a little. how about checking slave's status using PING Command in sendBulkToSlave. but it should open 1 more connection with slave(to send a ping) and it can cause replication speed down.

@antirez

This comment has been minimized.

Copy link
Owner Author

antirez commented Feb 1, 2013

Yes there are a number of solutions, we already send PINGs to slaves suppressing the reply. It is possible to enable replies just for slaves and implement master-driven timeout as well. But I'm not convinced so I opened this issue just as a reminder to evaluate this matter in the future. Cheers.

@antirez

This comment has been minimized.

Copy link
Owner Author

antirez commented Jun 6, 2013

FIxed, closing.

@antirez antirez closed this Jun 6, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.