Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch endpoint should have a timeout option #2468

Closed
stevenschlansker opened this issue Mar 9, 2015 · 28 comments
Closed

Watch endpoint should have a timeout option #2468

stevenschlansker opened this issue Mar 9, 2015 · 28 comments
Assignees

Comments

@stevenschlansker
Copy link
Contributor

Currently, the ?wait=true option to GET (to create a watch) will wait indefinitely for the key to change. This interacts poorly with socket timeouts.

If you have any non-infinite socket timeout, you must know that a socket timeout exception can actually happen simply if no change has happened before the timeout. You can't differentiate between this state and the server disappearing and being unable to reply.

If you have an infinite socket timeout, and the remote server crashes, you may wait forever for a connection which is long dead.

Thoughts on how to improve this:

  • Add a "timeout" to watches which, on expiry, returns a unique HTTP code, say 204 No Content
  • Periodically write a small amount of (ignored) data out to the HTTP stream as a keepalive (whitespace, so it's still equivalent JSON?)
@il9ue
Copy link

il9ue commented Mar 16, 2015

Hi, I am a newbie to ETCD and right now studying it closely.
However, to improve such scenario, is it possible to apply event and delegate scenario or Publish/subscribe partially so that remote servers don't need to wait for sockets to be arrived?

@stevenschlansker
Copy link
Contributor Author

It may be possible to add a pub/sub interface, but that would be a new feature and should probably be discussed separately.

@brentryan
Copy link

We just built the timeout into our java etcd client instead. So when we watch we only watch for 20 seconds max and then timeout and retry so we avoid this. Once we get events then we can remove this sorta thing.

@xiang90
Copy link
Contributor

xiang90 commented Apr 2, 2015

@stevenschlansker @brentryan

We have some discussion about this internally:

  1. add tcp keepalive for the long polling connections
  2. add notification for progress every N index increases even if there are no events happening
  3. add a max_index flag to stop watching when it reaches.

But these are for v3 api.

@stevenschlansker
Copy link
Contributor Author

I think option 2 and 3 do not really solve the problem, because the etcd index is intentionally time-independent, but TCP connections are explicitly time-dependent. So in an idle cluster you may not receive any index changes for a period of time, and then still observe timeouts.

Why does this need to be a v3 feature? At least the simple version (adding a timeout query parameter) is not a breaking change, so it would be nice to consider it for a minor release rather than waiting all the way for v3?

@xiang90
Copy link
Contributor

xiang90 commented Apr 2, 2015

@stevenschlansker

I think option 2 and 3 do not really solve the problem

They are relevant things. Client needs to detect the server status and server also needs to detect client status. It might not help with you specific problem. But a lot of people want to control the timeout of watchers themselves and they need to when where to start re-watch.

If you have an infinite socket timeout, and the remote server crashes, you may wait forever for a connection which is long dead.

This is a client-side thing. It should not be handled at server side. etcd server are detecting dead client via tcp keepalive.

@bgrant0607
Copy link

cc @hchoudh

@fasaxc
Copy link
Contributor

fasaxc commented Apr 8, 2015

I'd like to see a timeout parameter and a dedicated "nothing happened" response such as a 204 at the end of it.

The current behaviour interacts very poorly with HTTP connection pooling because the server never completes the HTTP transaction if no events occur.

Since HTTP 1.x is request-response based, there's no way to time out a request without also tearing down the underlying socket. Doing that is very expensive; it wastes file descriptors at both ends and adds an additional TCP or even a TLS handshake in order to reconnect.

@xiang90
Copy link
Contributor

xiang90 commented Apr 8, 2015

Since HTTP 1.x is request-response based, there's no way to time out a request without also tearing down the underlying socket.

This is a good point. We are actually considering switch to stream based communication.

@fasaxc
Copy link
Contributor

fasaxc commented Apr 8, 2015

Streaming is nice but it's hard to write a correct client for that.

I'd recommend doing something simple to allow connection reuse and then adding streaming later for advanced users via format=stream, for example.

@fasaxc
Copy link
Contributor

fasaxc commented Apr 9, 2015

I've just come across another corner case that is hard to handle without a response. I want my code to be able to detect if the cluster has been replaced wholesale by a backup, for example. In that case, I'll poll against the backup server but with an index from the pre-backup server; that index will be far in the future for the new server so I'll never get a response. Without a response, I can't check the X-Etcd-Cluster-Id header and spot I polled against the wrong one.

CyberDem0n pushed a commit to zalando/patroni that referenced this issue Sep 16, 2015
Current etcd implementation does not yet support timeout option when
`wait=true`: etcd-io/etcd#2468

Originaly I've implemented `watch` method for `Etcd` class in a
following manner: if the leader key was updated just because master
needs to update ttl and watch timeout is not yet expired, I was
recalculating timeout and starting `watch` call once again.
Usually after "restart" we were getting urllib3.exceptions.TimeoutError.
The only possible way to recover after such exception - close socket and
establish a new connection. With pure http it's relatively cheap, but
with https and some kind of authorization on etcd side it would became
rather expensive and should be avoided.
@stevenschlansker
Copy link
Contributor Author

We also just ran into the "backup restore" edge case -- we had a storage outage and restored from backup, all clients that were watching ended up in an indefinite hang as the cluster index had moved far into the past.

@philips
Copy link
Contributor

philips commented Nov 5, 2015

@stevenschlansker Can you file a new bug? I am confused on what you mean by "backup restore" edge case.

@philips
Copy link
Contributor

philips commented Nov 5, 2015

This blog post resports some odd behavior: http://www.projectclearwater.org/adventures-in-debugging-etcd-http-pipelining-and-file-descriptor-leaks/

Eventually, Python exits, and closes TCP connection A, by sending a FIN packet – but etcd doesn’t send a FIN of its own, or send a zero-byte chunk to finish the 200 OK, as it does in the success case. This causes the socket to leak on etcd’s side – we think this is a bug in Go’s HTTP stack.

philips pushed a commit to philips/etcd that referenced this issue Nov 5, 2015
For clients that know the max amount of time they want to wait for a
watch response they can specify waitTimeout in nanoseconds.

```
$ time curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitTimeout=2000000000'

real	0m2.008s
user	0m0.003s
sys	0m0.004s
```

Potential fix to etcd-io#2468
philips pushed a commit to philips/etcd that referenced this issue Nov 5, 2015
For clients that know the max amount of time they want to wait for a
watch response they can specify waitTimeout in nanoseconds.

```
$ time curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitTimeout=2000000000'

real	0m2.008s
user	0m0.003s
sys	0m0.004s
```

Potential fix to etcd-io#2468
@bdarnell
Copy link
Contributor

bdarnell commented Nov 5, 2015

Adding a waitTimeout parameter seems like a workaround rather than fixing the real problem: the client here doesn't have any opinion on how long to wait; the real issue is how long the server is willing to let possibly-abandoned file descriptors hang around. I would suggest fixing this by setting defaultWatchTimeout to something reasonable (vs the current MaxInt64) instead of (or in addition to) a client-specified timeout.

Also, in the case described on projectclearwater.org, the connection was shut down cleanly, so using CloseNotify would have worked too. This is even better than the timeout since you can free up the file descriptor immediately, but it doesn't always detect unclean disconnects promptly so you need the timeout as well.

@xiang90
Copy link
Contributor

xiang90 commented Nov 5, 2015

@bdarnell

Adding a waitTimeout parameter seems like a workaround rather than fixing the real problem

I agree. So I am not convinced that we should support it now.

I would suggest fixing this by setting defaultWatchTimeout to something reasonable (vs the current MaxInt64) instead of (or in addition to) a client-specified timeout.

We tried. But timeout at the server side breaks the connections. And go client library seems to have a bug that cannot handle it well. See golang/go#8946.

We set tcp level keepalive and also use CloseNotify. But in his case, the connection are pooled in the client itself. So tcp connection is not closed. Keepalive does not work in that case. I am not entirely sure why CloseNotify does not work. We need to investigate into this more.

@xiang90
Copy link
Contributor

xiang90 commented Nov 5, 2015

@bdarnell Yicheng reminded me that you might suggest set the timeout for context, not for the connection. So the server will finish the HTTP. That probably works.

@bdarnell
Copy link
Contributor

bdarnell commented Nov 5, 2015

Sorry, I thought you weren't using CloseNotify because the only occurrence grep found was in client.go, but it turns out that file is a server-side implementation of API for clients. So I'm surprised that didn't work. The CloseNotifier should have fired when the client closed its connection.

I don't follow how a timeout on the server side breaks things. When the timeout fires you should just send back an empty response, or whatever you want to use to signify to the client that the timeout elapsed without any news.

@xiang90
Copy link
Contributor

xiang90 commented Nov 5, 2015

@bdarnell

I don't follow how a timeout on the server side breaks things.

I was wrong. Sorry. See #2468 (comment).
I was still thinking in the old world, when we set the timeout on http.Server struct (https://golang.org/pkg/net/http/#Server). And it breaks the client.

@xiang90 xiang90 self-assigned this May 10, 2016
@joelnn
Copy link

joelnn commented May 12, 2016

Just a note, keep in mind that TCP keep-alive timeout parameters are configured system-wide (at least on Linux), and generally are set such that timeout happens after 10+ minutes. It's a nice thing to have as a fallback, but may not meet the needs of finding dead connections within a reasonable time for some etcd use cases.

@xiang90
Copy link
Contributor

xiang90 commented May 12, 2016

TCP keep-alive timeout parameters are configured system-wide

It can be configured per connection. (see 4.3 of http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/)

@joelnn
Copy link

joelnn commented May 12, 2016

You are right, thanks.

That will work for the server to detect unreachable clients, but I doubt the client will be able to set it properly through grpc, where the socket is quite hidden. I don't see any support in https://github.com/grpc/grpc

grpc-java has an issue open for it here grpc/grpc-java#1648

@bdarnell
Copy link
Contributor

You can turn TCP keepalives on and off per socket, but the duration of the timeout is system-wide (and usually very large - hours in some cases). If you need to detect dropped connections faster than that you have to use your own timer and send some data over the connection. This is tricky to do cleanly with HTTP/1; it's easier in HTTP/2 or GRPC.

@joelnn
Copy link

joelnn commented May 12, 2016

@bdarnell that was my understanding as well, but on careful re-reading of that doc, there are setsockopt options to override the system-wide parameters. TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL.

@xiang90
Copy link
Contributor

xiang90 commented May 12, 2016

the duration of the timeout is system-wide

You can tune that too, although it is not that portable and usually at minute level. For etcd, we do application level keepalive on peer communication.

On clientside, transportation is the responsibility of gRPC. We do application level keep alive on long living streams.

@xiang90
Copy link
Contributor

xiang90 commented Jun 17, 2016

This is irrelevant to v3 api which uses stream to support watch and clients can require from periodically etcd layer keep alive notification. We do not have plan to add timeout feature as a workaround into v2 either.

@xiang90 xiang90 modified the milestones: unplanned, v3.0.0-maybe Jun 17, 2016
@xiang90
Copy link
Contributor

xiang90 commented Nov 1, 2016

Closing. v3 cancellation works well so far without timeout option.

@xiang90 xiang90 closed this as completed Nov 1, 2016
@shuting-yst
Copy link

@xiang90 so what's the solution? Could you provided some examples of using v3' watcher to distinguish the status of no event and server crashs which being unable to reply?
I have confronted the problem with v3's watcher.

...
wch := etcdClient.Watch(context.Background(), "/EtcdWatcherTest/TestWatchWithFailover/", opts...)

go func() {
   // shutdown etcd server
   // sleep 120s
   //  restart etcd server
   // KV.Put(xxxx)
}
wch get nothing and no err, just wait for a long long time
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests