-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clientv3: respect up/down notifications from grpc #5845
clientv3: respect up/down notifications from grpc #5845
Conversation
numGets uint32 | ||
// mu protects upEps, downEps, and numGets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mu sync.Mutex
on the top of upEps and numGets?
And what is downEps
? I don't see it in the struct fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment was outdated; fixed
24bed9c
to
4085ae1
Compare
941a85f
to
a4563d1
Compare
5414a8b
to
38f2cd0
Compare
fixed up to use balancer for this functionality since withblock patch was rejected PTAL /cc @xiang90 |
v := atomic.AddUint32(&b.numGets, 1) | ||
ep := b.eps[v%uint32(len(b.eps))] | ||
return grpc.Address{Addr: getHost(ep)}, func() {}, nil | ||
b.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that we need to do more work here?
see comments on Get at https://godoc.org/google.golang.org/grpc#Balancer
Also: https://github.com/grpc/grpc-go/blob/master/balancer.go#L272-L364
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of that complication is from making it as general as possible. It's safe to assume FailFast is false so there's no need to implement the suggested convoluted blocking logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expose the failfast option to user now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. good.
8755254
to
8b53cea
Compare
defer b.mu.Unlock() | ||
|
||
if b.pinAddr != nil { | ||
if _, ok := b.upEps[b.pinAddr.Addr]; ok || time.Since(b.pinTime) < b.pinWait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not very clear about this. why do we need to have a pinWait? if one pined address is still up, should we use it until it fails? current pinWait is 500ms, so a new rpc coming after 500ms will choose a new endpoint now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this code is bad; the dial timeout is already in the grpc dialer and there should only be one connecting / up endpoint at a time. I can simplify it.
c78bd3f
to
83e7ddb
Compare
@xiang90 all fixed PTAL |
b.mu.Unlock() | ||
// notify client that a connection is up | ||
select { | ||
case b.upc <- struct{}{}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like we only needs to send this once when we create the client? should we rename this to ready? and wrap it with sync.Once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
LGTM |
There may be a subtle bug in this if |
83e7ddb
to
af6b266
Compare
Added a test for the failover case. Now blocked on grpc/grpc-go#810 |
e4946c9
to
7985874
Compare
7985874
to
9e82185
Compare
7de19f7
to
11f2b99
Compare
11f2b99
to
7a9fa02
Compare
return func(rpcCtx context.Context, f rpcFunc) { | ||
for { | ||
err := f(rpcCtx) | ||
// ignore grpc conn closing on fail-fast calls; they are transient errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a dial failure error? can connclosing happen after writing the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no explicit dial failure error; the closest thing is the helpfully unexported errConnClosing
(which gets grpc.Errorf()'d into a grpc formatted error). Transport errors seem to either go through ConnectionErrorf
or prefixed with "transport:". I guess the safest policy is retry only on isConnClosing(err)
and bail out otherwise?
7a9fa02
to
3eadf96
Compare
OK, changed the retry logic to bail if |
lgtm |
Partial patch; will need to revendor grpc once the fix on that end is merged.
Fixes #5842