client: renew index on watch timeouts #1292

yichengq · 2014-10-12T02:24:03Z

When watch timeouts, it will receive a etcd timeout error. Then it should renew index on watch based on X-ETCD-INDEX at the header of the response.

The text was updated successfully, but these errors were encountered:

jonboulle · 2014-10-12T20:53:44Z

Do we need this in etcd proper for 0.5? Does that mean we're pushing client/ as the recommended client?

kelseyhightower · 2014-10-13T14:15:40Z

I must be hitting this bug now. Currently the go-etcd library does not produce an error when timeouts happen and watches seem to be broken after the timeout. Maybe I'm just doing it wrong: https://github.com/kelseyhightower/flannel-route-manager/blob/master/server/server.go#L126

yichengq · 2014-10-13T18:21:33Z

@jonboulle More context for this issue:
Current discovery service depends on watch mechanism. When watch timeouts, it will retry. At the third time of retry, the request will fail because the index that is watching on is out of window, and etcd prints out badrequest error.

jonboulle · 2014-10-13T19:27:38Z

Capturing OOB discussion:

We only send X-Etcd-Index once at the start of a response. In the case of a longstanding watch that's aborted after a timeout (e.g. a Gateway Timeout after 10 minutes), even if it's retried immediately with that X-Etcd-Index value there's a reasonable chance (particularly on a busy cluster) that the index has fallen out of the history window already. So, any good client must really incorporate "catch-up" behaviour into its watch mechanism to get back into the index window.

jonboulle · 2014-10-14T21:02:23Z

Also to be clear there are two timeouts (at least) that we need to deal with:

504 Gateway Timeout (e.g. in the case of discovery.etcd.io, this is what the load balancer in front of it will return)
The etcd server timeout (introduced @ 084dcb5), which simply closes HTTP connections (semi-gracefully: per chunked transfer encoding, we send a final chunk length of 0)

jonboulle · 2014-10-14T21:04:21Z

@unihorn After thinking about it a bit more and looking at the chunked transfer encoding spec, I am wondering if we should send try to send another X-Etcd-Index in the trailer in the case of etcdserver timeouts, as a hint to the user.

(Still doesn't help with 504s, but I'm anticipating that they're dramatically less common)

yichengq · 2014-10-14T22:37:26Z

@jonboulle
Like discovery service, 504 is a general case that should be handled by our proxy too.
If the connection is closed accidentally, client will miss the trailer. we need to serve the bad path too.
I think at the first step, we should define how good strategy etcd should provide for index renewal. I would say that if the client doesn't disconnect from the server for more than 5s, etcd should be able to keep watching.

jonboulle · 2014-10-14T22:41:56Z

@unihorn you mean, resume a watch? Don't we need to track sessions then?

yichengq · 2014-10-14T22:53:54Z

@jonboulle I mean resume/relaunch watch.
Personally I dislike the session thoughts because i think server should not record client info, which may make etcd complicated and limit the client number.

jonboulle · 2014-10-15T21:08:41Z

@unihorn how do you propose for etcd to do resume/relaunch without sessions?

xiang90 · 2014-12-15T20:12:17Z

@yichengq @jonboulle For 0.4x, when there is a watch timeout the client can:

try to watch from the last known index
1.1 watch successfully
1.1 watch returns "out of window" -> recursive get all the content and watch from the index of the get

We will introduce more reliable watching mechanism in the new api.

yichengq added this to the v0.5.0 milestone Oct 12, 2014

jonboulle changed the title ~~renew index when recursive watch timeouts~~ renew index when watch timeouts Oct 13, 2014

jonboulle changed the title ~~renew index when watch timeouts~~ client: renew index on watch timeouts Oct 13, 2014

yichengq mentioned this issue Oct 14, 2014

client: return timeout error for gateway timeout #1291

Closed

xiang90 closed this as completed Dec 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client: renew index on watch timeouts #1292

client: renew index on watch timeouts #1292

yichengq commented Oct 12, 2014

jonboulle commented Oct 12, 2014

kelseyhightower commented Oct 13, 2014

yichengq commented Oct 13, 2014

jonboulle commented Oct 13, 2014

jonboulle commented Oct 14, 2014

jonboulle commented Oct 14, 2014

yichengq commented Oct 14, 2014

jonboulle commented Oct 14, 2014

yichengq commented Oct 14, 2014

jonboulle commented Oct 15, 2014

xiang90 commented Dec 15, 2014

client: renew index on watch timeouts #1292

client: renew index on watch timeouts #1292

Comments

yichengq commented Oct 12, 2014

jonboulle commented Oct 12, 2014

kelseyhightower commented Oct 13, 2014

yichengq commented Oct 13, 2014

jonboulle commented Oct 13, 2014

jonboulle commented Oct 14, 2014

jonboulle commented Oct 14, 2014

yichengq commented Oct 14, 2014

jonboulle commented Oct 14, 2014

yichengq commented Oct 14, 2014

jonboulle commented Oct 15, 2014

xiang90 commented Dec 15, 2014