Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-2251:Add Client side packet response timeout to avoid infinite wait. #119

Closed
wants to merge 1 commit into from
Closed

Conversation

arshadmohammad
Copy link
Contributor

Add Client side packet response timeout to avoid infinite wait.

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look over the patch and here is initial feedback.

while (!packet.finished) {
packet.wait();
packet.wait(requestTimeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditionally adds a timeout is a pretty significant semantic change. Whether or not we should wait indefinitely is a separate issue to discuss later, but from a compatibility point of view we'd at least provide a configuration option to let user opt in this feature rather than having this turned on by default. So by default there is no timeout and client still waits indefinitely, user who wants to opt in to turn on the timeout needs to explicit say so and provide a value they think make sense to them. Otherwise, we'd have to solve the problem of what the default timeout value we should set in ZK config if this feature is turned on by default, and that problem is hard and very likely there is no default value that makes everyone happy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled this feature by default. any non positive value will disable this feature

@@ -387,6 +389,8 @@ public void setCode(int code) {
EPHEMERALONLOCALSESSION (EphemeralOnLocalSession),
/** Attempts to remove a non-existing watcher */
NOWATCHER (-121),
/** Not received packet timely */
TIMEOUT (-122),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar error code needs to be added to C client to make both client library compatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding only error code will not be useful. Do you mean, add response timeout functionality in C code?

Copy link
Contributor

@hanm hanm Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same error code should be added to C client so the C client can react to the new error code properly, instead of crash (the worst case). Though, I suspect that the C client might be just fine as this error code will not be fired on server side (it's a pure Java client thing), so maybe we are fine without touch C client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have OPERATIONTIMEOUT error code, so TIMEOUT sound too generic. Instead of using TIMEOUT, would something more descriptive better? I am thinking SERVERRESPONSETIMEOUT (server response timeout). Consider this is user facing, have a descriptive name with clear semantic sounds better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Request time out is also a kind of operation time out. OPERATIONTIMEOUT, I think it is not used any where in the java client code or server code, not sure about other client codes. leaving OPERATIONTIMEOUT as it is.
Changing TIMEOUT to SERVERRESPONSETIMEOUT is OK but I do not want to change property name. Can we use REQUESTTIMEOUT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUESTTIMEOUT sounds good to me.

<term>zookeeper.request.timeout</term>
<listitem>
<para>
<emphasis role="bold">New in 3.5.3:</emphasis>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably would not make to 3.5.3. 3.6.0 would be a more realistic value.

while (!packet.finished) {
packet.wait();
packet.wait(requestTimeout);
if (!packet.finished && ((System.currentTimeMillis()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.currentTimeMillis() is dependent on System clock. It probably micro-corrected by an external programme..

I think using System.nanoTime() to measure elapsed time is the correct solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and actually we have a Time class which we can use here that uses nanoTime internally for exact same reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

@anmolnar
Copy link
Contributor

@hanm @arshadmohammad This PR hasn't been updated for ages unfortunately, but looks like it indicates a pretty significant bug in Zk client library. Additionally we could also fix an long outstanding flaky test by fixing this, so killing two birds with a single shot.

Although Akihiro's comment on the jira sounds reasonable: by using TCP there's no such thing as "packet loss" and we should better try to find the bug in Zookeeper code, I believe that waiting indefinitely is generally a bad thing and introducing a reasonable and configurable timeout would be beneficial.

Introducing such thing in 3.6 is no harm and I would consider it merging to 3.5 too.

@hanm
Copy link
Contributor

hanm commented Jul 17, 2018

Sounds good to me. I had some comments before and I believe those are still applicable.

@arshadmohammad
Copy link
Contributor Author

Thanks @anmolnar @hanm for your reviews, I will address the comments and update MR.

Copy link
Contributor

@hanm hanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two minor comments regarding naming of the new keeper exception error code. Rest of the patch looks great.

@@ -387,6 +389,8 @@ public void setCode(int code) {
EPHEMERALONLOCALSESSION (EphemeralOnLocalSession),
/** Attempts to remove a non-existing watcher */
NOWATCHER (-121),
/** Not received packet timely */
TIMEOUT (-122),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have OPERATIONTIMEOUT error code, so TIMEOUT sound too generic. Instead of using TIMEOUT, would something more descriptive better? I am thinking SERVERRESPONSETIMEOUT (server response timeout). Consider this is user facing, have a descriptive name with clear semantic sounds better.

/**
* @see Code#TIMEOUT
*/
public static class TimeoutException extends KeeperException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, use a more descriptive name like ServerResponseTimeoutException sounds better to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to RequestTimeoutException

import org.junit.Assert;
import org.junit.Test;

public class ClientResponseTimeoutTest extends QuorumPeerTestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to start a full cluster, what about adding a bunch of mockito based tests?

I think that having a test which uses the full stack may have some value, so maybe we can keep this. But in case of failure we are not sure about the real cause

@@ -56,9 +56,15 @@
@SuppressWarnings("deprecation")
public static final String SECURE_CLIENT = ZooKeeper.SECURE_CLIENT;
public static final int CLIENT_MAX_PACKET_LENGTH_DEFAULT = 4096 * 1024; /* 4 MB */
public static final String ZOOKEEPER_REQUEST_TIMEOUT = "zookeeper.request.timeout";
/**
* Feature is disabled by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about enabling it by default with a default computed from desired session expire timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session timeout and this client request timeout are completely different. I think we should not mix these two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, they are very different things. I mentioned it because configured session timeout is like a client side hint about how much time you can tolerate to be disconnected/not responsive, but it is better to keep it separate from this low level knob.
Anyway I wonder if it would be better to set a default timeout.
If there is a chance of infinite wait maybe it can be worth.
We can use a very large timeout, in the order of minutes.
Tipical session timeout is in the order of seconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to learn towards the conservative side to have this feature turned off by default, because I feel we don't have deep insight regarding what's the value we could use as default value (if we turn the feature on by default), and a premature setting of that value might impact existing users. For few of users who do have the problem they can tune the parameter based on their prod env.

Copy link
Contributor

@eolivelli eolivelli Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, no default

@hanm
Copy link
Contributor

hanm commented Jul 26, 2018

This is good to go - I am kicking the bot and once it's green I'll commit this.

@asfgit asfgit closed this in 9f82798 Jul 27, 2018
asfgit pushed a commit that referenced this pull request Jul 27, 2018
…nite wait.

Add Client side packet response timeout to avoid infinite wait.

Author: Mohammad Arshad <arshad@apache.org>

Reviewers: Michael Han <hanm@apache.org>, nrico Olivelli <eolivelli@gmail.com>

Closes #119 from arshadmohammad/ZOOKEEPER-2251

(cherry picked from commit 9f82798)
Signed-off-by: Michael Han <hanm@apache.org>
@hanm
Copy link
Contributor

hanm commented Jul 27, 2018

committed to 3.6/3.5. thanks @arshadmohammad !

RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
…nite wait.

Add Client side packet response timeout to avoid infinite wait.

Author: Mohammad Arshad <arshad@apache.org>

Reviewers: Michael Han <hanm@apache.org>, nrico Olivelli <eolivelli@gmail.com>

Closes apache#119 from arshadmohammad/ZOOKEEPER-2251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants