Description
What happened?
I've discovered this issue when debugging watch not retrying in openWatchClient
isHaltErr was returning true on ErrLeaderChanged
This is only applicable when auth is enabled.
w.remote.Watch(w.ctx, w.callOpts...) -> streamClientInterceptor
-> getToken -> Auth.Authenticate
-> toErr
toErr
will convert ErrGRPCLeaderChanged to ErrLeaderChanged, isHaltErr(ErrLeaderChanged)
returns true
When auth is disabled, Auth.Authenticate
isn't called and toErr
conversion doesn't happen.
What did you expect to happen?
expected watch to retry
How can we reproduce it (as minimally and precisely as possible)?
See unit and integration test in the PR.
Note, test is using ErrGRPCNoLeader, but in production we've observed ErrLeaderChanged.
Anything else we need to know?
Another side effect of this issue is that prometheus interceptor will only increment Unknown grpc code in metrics on error.
I suspect this might also affect behavior of retry_interceptor and client/v3/leasing that retries on transient error.
Etcd version (please run commands below)
etcd 3.4, 3.5 and main
$ etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: f43dccb2f
Go Version: go1.19.2
Go OS/Arch: darwin/amd64
$ etcdctl version
etcdctl version: 3.6.0-alpha.0
API version: 3.6
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
Relevant log output
No response