Skip to content

Commit

Permalink
wireguard: Fix timeout in unit test
Browse files Browse the repository at this point in the history
This commit fixes a deadlock in a unit test which ironically tests for
deadlocks. The unit test in question ensures that the `wireguard.Agent`
`UpdatePeer` method does not create a deadlock if a concurrent IPCache
update is performed.

The previous version of this test wanted to ensure this by taking a
read-lock on the IPCache, which would ensure that only `UpdatePeer`
would make progress (as it also just takes an RLock). However, that
approach could lead to a timeout when `ipCache.Upsert` was invoked
before `wgAgent.UpdatePeer`, as due to the FIFO nature of underlying
mutex implementation, `UpdatePeer` will never obtain an `RLock` if there
is a waiting writer.

This commit addresses this by taking the `wgAgent` lock instead. This
means that `UpdatePeer` will lock the IPCache and then wait for the
`wgAgent` lock to become available. Any concurrent IPCache updates will
also be blocked until `UpdatePeer` has finished as before.

This commit also introduces some additional checks to ensure the spawned
go routines have actually been scheduled. This is still best effort, as
there is easy way to ensure that a certain method is blocked on a
particular mutex.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
  • Loading branch information
gandro committed May 5, 2021
1 parent 1070b19 commit fc3a3a0
Showing 1 changed file with 40 additions and 5 deletions.
45 changes: 40 additions & 5 deletions pkg/wireguard/agent/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,31 @@ func (a *AgentSuite) TestAgent_PeerConfig(c *C) {
c.Assert(containsIP(k8s1.allowedIPs, pod2IPv4), Equals, true)
c.Assert(containsIP(k8s1.allowedIPs, pod2IPv6), Equals, true)

// Test that IPCache updates are blocked by a concurrent UpdatePeer
ipCache.RLock()
// Tests that IPCache updates are blocked by a concurrent UpdatePeer.
// We test this by issuing an UpdatePeer request while holding
// the agent lock (meaning the UpdatePeer call will first take the IPCache
// lock and then wait for the agent lock to become available),
// then issuing an IPCache update (which will be blocked because
// UpdatePeer already holds the IPCache lock), and then releasing the
// agent lock to allow both operations to proceed.
wgAgent.Lock()

agentUpdated := make(chan struct{})
agentUpdatePending := make(chan struct{})
go func() {
close(agentUpdatePending)
err = wgAgent.UpdatePeer(k8s2NodeName, k8s2PubKey, k8s2NodeIPv4, k8s2NodeIPv6)
c.Assert(err, IsNil)
close(agentUpdated)
}()

// wait for the above go routine to be scheduled
<-agentUpdatePending

ipCacheUpdated := make(chan struct{})
ipCacheUpdatePending := make(chan struct{})
go func() {
close(ipCacheUpdatePending)
// Insert pod3
ipCache.Upsert(pod3IPv4Str, k8s2NodeIPv4, 0, nil, ipcache.Identity{ID: 3, Source: source.Kubernetes})
ipCache.Upsert(pod3IPv6Str, k8s2NodeIPv6, 0, nil, ipcache.Identity{ID: 3, Source: source.Kubernetes})
Expand All @@ -138,10 +159,24 @@ func (a *AgentSuite) TestAgent_PeerConfig(c *C) {
close(ipCacheUpdated)
}()

err = wgAgent.UpdatePeer(k8s2NodeName, k8s2PubKey, k8s2NodeIPv4, k8s2NodeIPv6)
c.Assert(err, IsNil)
// wait for the above go routine to be scheduled
<-ipCacheUpdatePending

// At this point we know both go routines have been scheduled. We assume
// that they are now both blocked by checking they haven't closed the
// channel yet. Thus once release the lock we expect them to make progress
select {
case <-agentUpdated:
c.Fatal("agent update not blocked by agent lock")
case <-ipCacheUpdated:
c.Fatal("ipcache update not blocked by agent lock")
default:
}

wgAgent.Unlock()

ipCache.RUnlock()
// Ensure that both operations succeeded without a deadlock
<-agentUpdated
<-ipCacheUpdated

k8s1 = wgAgent.peerByNodeName[k8s1NodeName]
Expand Down

0 comments on commit fc3a3a0

Please sign in to comment.