Watch endpoint should have a timeout option #2468

stevenschlansker · 2015-03-09T17:26:50Z

Currently, the ?wait=true option to GET (to create a watch) will wait indefinitely for the key to change. This interacts poorly with socket timeouts.

If you have any non-infinite socket timeout, you must know that a socket timeout exception can actually happen simply if no change has happened before the timeout. You can't differentiate between this state and the server disappearing and being unable to reply.

If you have an infinite socket timeout, and the remote server crashes, you may wait forever for a connection which is long dead.

Thoughts on how to improve this:

Add a "timeout" to watches which, on expiry, returns a unique HTTP code, say 204 No Content
Periodically write a small amount of (ignored) data out to the HTTP stream as a keepalive (whitespace, so it's still equivalent JSON?)

The text was updated successfully, but these errors were encountered:

il9ue · 2015-03-16T06:47:31Z

Hi, I am a newbie to ETCD and right now studying it closely.
However, to improve such scenario, is it possible to apply event and delegate scenario or Publish/subscribe partially so that remote servers don't need to wait for sockets to be arrived?

stevenschlansker · 2015-03-16T17:18:41Z

It may be possible to add a pub/sub interface, but that would be a new feature and should probably be discussed separately.

brentryan · 2015-04-02T01:59:09Z

We just built the timeout into our java etcd client instead. So when we watch we only watch for 20 seconds max and then timeout and retry so we avoid this. Once we get events then we can remove this sorta thing.

xiang90 · 2015-04-02T02:19:39Z

@stevenschlansker @brentryan

We have some discussion about this internally:

add tcp keepalive for the long polling connections
add notification for progress every N index increases even if there are no events happening
add a max_index flag to stop watching when it reaches.

But these are for v3 api.

stevenschlansker · 2015-04-02T17:01:07Z

I think option 2 and 3 do not really solve the problem, because the etcd index is intentionally time-independent, but TCP connections are explicitly time-dependent. So in an idle cluster you may not receive any index changes for a period of time, and then still observe timeouts.

Why does this need to be a v3 feature? At least the simple version (adding a timeout query parameter) is not a breaking change, so it would be nice to consider it for a minor release rather than waiting all the way for v3?

xiang90 · 2015-04-02T17:08:06Z

@stevenschlansker

I think option 2 and 3 do not really solve the problem

They are relevant things. Client needs to detect the server status and server also needs to detect client status. It might not help with you specific problem. But a lot of people want to control the timeout of watchers themselves and they need to when where to start re-watch.

If you have an infinite socket timeout, and the remote server crashes, you may wait forever for a connection which is long dead.

This is a client-side thing. It should not be handled at server side. etcd server are detecting dead client via tcp keepalive.

bgrant0607 · 2015-04-03T21:52:14Z

cc @hchoudh

fasaxc · 2015-04-08T01:02:45Z

I'd like to see a timeout parameter and a dedicated "nothing happened" response such as a 204 at the end of it.

The current behaviour interacts very poorly with HTTP connection pooling because the server never completes the HTTP transaction if no events occur.

Since HTTP 1.x is request-response based, there's no way to time out a request without also tearing down the underlying socket. Doing that is very expensive; it wastes file descriptors at both ends and adds an additional TCP or even a TLS handshake in order to reconnect.

xiang90 · 2015-04-08T01:06:36Z

Since HTTP 1.x is request-response based, there's no way to time out a request without also tearing down the underlying socket.

This is a good point. We are actually considering switch to stream based communication.

fasaxc · 2015-04-08T01:15:48Z

Streaming is nice but it's hard to write a correct client for that.

I'd recommend doing something simple to allow connection reuse and then adding streaming later for advanced users via format=stream, for example.

fasaxc · 2015-04-09T23:49:02Z

I've just come across another corner case that is hard to handle without a response. I want my code to be able to detect if the cluster has been replaced wholesale by a backup, for example. In that case, I'll poll against the backup server but with an index from the pre-backup server; that index will be far in the future for the new server so I'll never get a response. Without a response, I can't check the X-Etcd-Cluster-Id header and spot I polled against the wrong one.

Current etcd implementation does not yet support timeout option when `wait=true`: etcd-io/etcd#2468 Originaly I've implemented `watch` method for `Etcd` class in a following manner: if the leader key was updated just because master needs to update ttl and watch timeout is not yet expired, I was recalculating timeout and starting `watch` call once again. Usually after "restart" we were getting urllib3.exceptions.TimeoutError. The only possible way to recover after such exception - close socket and establish a new connection. With pure http it's relatively cheap, but with https and some kind of authorization on etcd side it would became rather expensive and should be avoided.

stevenschlansker · 2015-09-21T19:38:00Z

We also just ran into the "backup restore" edge case -- we had a storage outage and restored from backup, all clients that were watching ended up in an indefinite hang as the cluster index had moved far into the past.

philips · 2015-11-05T09:49:22Z

@stevenschlansker Can you file a new bug? I am confused on what you mean by "backup restore" edge case.

philips · 2015-11-05T09:53:29Z

This blog post resports some odd behavior: http://www.projectclearwater.org/adventures-in-debugging-etcd-http-pipelining-and-file-descriptor-leaks/

Eventually, Python exits, and closes TCP connection A, by sending a FIN packet – but etcd doesn’t send a FIN of its own, or send a zero-byte chunk to finish the 200 OK, as it does in the success case. This causes the socket to leak on etcd’s side – we think this is a bug in Go’s HTTP stack.

For clients that know the max amount of time they want to wait for a watch response they can specify waitTimeout in nanoseconds. ``` $ time curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitTimeout=2000000000' real 0m2.008s user 0m0.003s sys 0m0.004s ``` Potential fix to etcd-io#2468

bdarnell · 2015-11-05T15:56:32Z

Adding a waitTimeout parameter seems like a workaround rather than fixing the real problem: the client here doesn't have any opinion on how long to wait; the real issue is how long the server is willing to let possibly-abandoned file descriptors hang around. I would suggest fixing this by setting defaultWatchTimeout to something reasonable (vs the current MaxInt64) instead of (or in addition to) a client-specified timeout.

Also, in the case described on projectclearwater.org, the connection was shut down cleanly, so using CloseNotify would have worked too. This is even better than the timeout since you can free up the file descriptor immediately, but it doesn't always detect unclean disconnects promptly so you need the timeout as well.

xiang90 · 2015-11-05T16:18:22Z

@bdarnell

Adding a waitTimeout parameter seems like a workaround rather than fixing the real problem

I agree. So I am not convinced that we should support it now.

I would suggest fixing this by setting defaultWatchTimeout to something reasonable (vs the current MaxInt64) instead of (or in addition to) a client-specified timeout.

We tried. But timeout at the server side breaks the connections. And go client library seems to have a bug that cannot handle it well. See golang/go#8946.

We set tcp level keepalive and also use CloseNotify. But in his case, the connection are pooled in the client itself. So tcp connection is not closed. Keepalive does not work in that case. I am not entirely sure why CloseNotify does not work. We need to investigate into this more.

xiang90 · 2015-11-05T16:37:45Z

@bdarnell Yicheng reminded me that you might suggest set the timeout for context, not for the connection. So the server will finish the HTTP. That probably works.

bdarnell · 2015-11-05T17:50:26Z

Sorry, I thought you weren't using CloseNotify because the only occurrence grep found was in client.go, but it turns out that file is a server-side implementation of API for clients. So I'm surprised that didn't work. The CloseNotifier should have fired when the client closed its connection.

I don't follow how a timeout on the server side breaks things. When the timeout fires you should just send back an empty response, or whatever you want to use to signify to the client that the timeout elapsed without any news.

xiang90 · 2015-11-05T17:52:46Z

@bdarnell

I don't follow how a timeout on the server side breaks things.

I was wrong. Sorry. See #2468 (comment).
I was still thinking in the old world, when we set the timeout on http.Server struct (https://golang.org/pkg/net/http/#Server). And it breaks the client.

joelnn · 2016-05-12T15:04:26Z

Just a note, keep in mind that TCP keep-alive timeout parameters are configured system-wide (at least on Linux), and generally are set such that timeout happens after 10+ minutes. It's a nice thing to have as a fallback, but may not meet the needs of finding dead connections within a reasonable time for some etcd use cases.

xiang90 · 2016-05-12T15:13:12Z

TCP keep-alive timeout parameters are configured system-wide

It can be configured per connection. (see 4.3 of http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/)

joelnn · 2016-05-12T15:29:16Z

You are right, thanks.

That will work for the server to detect unreachable clients, but I doubt the client will be able to set it properly through grpc, where the socket is quite hidden. I don't see any support in https://github.com/grpc/grpc

grpc-java has an issue open for it here grpc/grpc-java#1648

bdarnell · 2016-05-12T15:29:30Z

You can turn TCP keepalives on and off per socket, but the duration of the timeout is system-wide (and usually very large - hours in some cases). If you need to detect dropped connections faster than that you have to use your own timer and send some data over the connection. This is tricky to do cleanly with HTTP/1; it's easier in HTTP/2 or GRPC.

joelnn · 2016-05-12T15:35:23Z

@bdarnell that was my understanding as well, but on careful re-reading of that doc, there are setsockopt options to override the system-wide parameters. TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL.

xiang90 · 2016-05-12T15:36:42Z

the duration of the timeout is system-wide

You can tune that too, although it is not that portable and usually at minute level. For etcd, we do application level keepalive on peer communication.

On clientside, transportation is the responsibility of gRPC. We do application level keep alive on long living streams.

xiang90 · 2016-06-17T20:45:47Z

This is irrelevant to v3 api which uses stream to support watch and clients can require from periodically etcd layer keep alive notification. We do not have plan to add timeout feature as a workaround into v2 either.

xiang90 · 2016-11-01T16:28:29Z

Closing. v3 cancellation works well so far without timeout option.

shuting-yst · 2017-01-04T10:29:58Z

@xiang90 so what's the solution? Could you provided some examples of using v3' watcher to distinguish the status of no event and server crashs which being unable to reply?
I have confronted the problem with v3's watcher.

...
wch := etcdClient.Watch(context.Background(), "/EtcdWatcherTest/TestWatchWithFailover/", opts...)

go func() {
   // shutdown etcd server
   // sleep 120s
   //  restart etcd server
   // KV.Put(xxxx)
}
wch get nothing and no err, just wait for a long long time
...

xiang90 added the area/performance label Mar 9, 2015

xiang90 added this to the v3.0.0-maybe milestone Mar 9, 2015

xiang90 assigned barakmich Mar 9, 2015

xiang90 mentioned this issue Apr 3, 2015

Allow clients to request watch heartbeats to refresh the clients window #2048

Closed

CyberDem0n mentioned this issue Sep 16, 2015

Feature/reschedule next run of ha cycle depending on return value of watch zalando/patroni#39

Merged

philips mentioned this issue Nov 5, 2015

etcdserver: add waitTimeout parameter #3816

Closed

jonboulle unassigned barakmich Nov 5, 2015

yichengq mentioned this issue Nov 5, 2015

net/http: CloseNotifier fails to fire when underlying connection is gone golang/go#13165

Closed

xiang90 self-assigned this May 10, 2016

xiang90 modified the milestones: unplanned, v3.0.0-maybe Jun 17, 2016

xiang90 closed this as completed Nov 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch endpoint should have a timeout option #2468

Watch endpoint should have a timeout option #2468

stevenschlansker commented Mar 9, 2015

il9ue commented Mar 16, 2015

stevenschlansker commented Mar 16, 2015

brentryan commented Apr 2, 2015

xiang90 commented Apr 2, 2015

stevenschlansker commented Apr 2, 2015

xiang90 commented Apr 2, 2015

bgrant0607 commented Apr 3, 2015

fasaxc commented Apr 8, 2015

xiang90 commented Apr 8, 2015

fasaxc commented Apr 8, 2015

fasaxc commented Apr 9, 2015

stevenschlansker commented Sep 21, 2015

philips commented Nov 5, 2015

philips commented Nov 5, 2015

bdarnell commented Nov 5, 2015

xiang90 commented Nov 5, 2015

xiang90 commented Nov 5, 2015

bdarnell commented Nov 5, 2015

xiang90 commented Nov 5, 2015

joelnn commented May 12, 2016

xiang90 commented May 12, 2016

joelnn commented May 12, 2016

bdarnell commented May 12, 2016

joelnn commented May 12, 2016

xiang90 commented May 12, 2016

xiang90 commented Jun 17, 2016

xiang90 commented Nov 1, 2016

shuting-yst commented Jan 4, 2017

Watch endpoint should have a timeout option #2468

Watch endpoint should have a timeout option #2468

Comments

stevenschlansker commented Mar 9, 2015

il9ue commented Mar 16, 2015

stevenschlansker commented Mar 16, 2015

brentryan commented Apr 2, 2015

xiang90 commented Apr 2, 2015

stevenschlansker commented Apr 2, 2015

xiang90 commented Apr 2, 2015

bgrant0607 commented Apr 3, 2015

fasaxc commented Apr 8, 2015

xiang90 commented Apr 8, 2015

fasaxc commented Apr 8, 2015

fasaxc commented Apr 9, 2015

stevenschlansker commented Sep 21, 2015

philips commented Nov 5, 2015

philips commented Nov 5, 2015

bdarnell commented Nov 5, 2015

xiang90 commented Nov 5, 2015

xiang90 commented Nov 5, 2015

bdarnell commented Nov 5, 2015

xiang90 commented Nov 5, 2015

joelnn commented May 12, 2016

xiang90 commented May 12, 2016

joelnn commented May 12, 2016

bdarnell commented May 12, 2016

joelnn commented May 12, 2016

xiang90 commented May 12, 2016

xiang90 commented Jun 17, 2016

xiang90 commented Nov 1, 2016

shuting-yst commented Jan 4, 2017