Skip to content

Commit

Permalink
contexthelpers: Fix deadlock when nobody recvs on success channel
Browse files Browse the repository at this point in the history
[ upstream commit 4a66961 ]

It has been observed that the "sessionSuccess <- true" statement can
block forever. E.g. Cilium v1.7.9:

    goroutine 742 [chan send, 547 minutes]:
    github.com/cilium/cilium/pkg/kvstore.(*etcdClient).renewLockSession(0xc000f66000, 0x2790b40, 0xc000e247c0, 0x0, 0x0)
	    /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:657 +0x3e2
    github.com/cilium/cilium/pkg/kvstore.connectEtcdClient.func6(0x2790b40, 0xc000e247c0, 0x3aae820, 0x2)
	    /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:819 +0x3e
    github.com/cilium/cilium/pkg/controller.(*Controller).runController(0xc000f42500)
	    /go/src/github.com/cilium/cilium/pkg/controller/controller.go:205 +0xa2a
    created by github.com/cilium/cilium/pkg/controller.(*Manager).updateController
	    /go/src/github.com/cilium/cilium/pkg/controller/manager.go:120 +0xb09

This can happen when the context cancellation timer fires after a
function which consumes the context has returned, but before
"sessionSuccess <- true" is executed. After the timeout, the receiving
goroutine is closed, making the sending block forever.

Fix this by making the sessionSuccess channel buffered. Note that after
sending we don't check whether the context has been cancelled, as we
expect that any subsequent functions which consume the context will
check for the cancellation.

Fixes: 0262854 ("etcd: Fix incorrect context usage in session renewal")
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
  • Loading branch information
brb authored and gandro committed Oct 7, 2020
1 parent 6644448 commit 415a7f0
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
2 changes: 1 addition & 1 deletion pkg/contexthelpers/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ type SuccessChan chan bool
// NewConditionalTimeoutContext returns a context which is cancelled when
// success is not reported within the specified timeout
func NewConditionalTimeoutContext(ctx context.Context, timeout time.Duration) (context.Context, context.CancelFunc, SuccessChan) {
ch := make(SuccessChan)
ch := make(SuccessChan, 1)
c, cancel := context.WithCancel(ctx)

go func() {
Expand Down
6 changes: 5 additions & 1 deletion pkg/contexthelpers/context_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ func (b *ContextSuite) TestConditionalTimeoutContext(c *check.C) {
c.Errorf("context cancelled despite reporting success")
case <-time.After(100 * time.Millisecond):
}

cancel()

_, _, ch = NewConditionalTimeoutContext(context.Background(), 10*time.Millisecond)
time.Sleep(30 * time.Millisecond)
// validate that sending to success channel does not deadlock after the timeout
ch <- false
}

0 comments on commit 415a7f0

Please sign in to comment.