New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node-neigh: Fix unit test flake #16072
Conversation
We don't return early if arping was skipped. This can happen when insertNeighbor() is invoked by the non-refresh path and nexthop is not new. Make sure that lastPing is updated only if arping was sent and it was successful (if hwAddr != nil condition). Signed-off-by: Martynas Pumputis <m@lambda.lt>
test-runtime |
test-net-next |
err = linuxNodeHandler.NodeAdd(nodev1) | ||
c.Assert(err, check.IsNil) | ||
// insertNeighbor is invoked async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comment appears a few lines below. Was this maybe forgotten from an earlier version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed redundant comments.
@@ -1026,18 +1027,41 @@ func (s *linuxPrivilegedIPv4OnlyTestSuite) TestArpPingHandling(c *check.C) { | |||
}) | |||
c.Assert(err, check.IsNil) | |||
|
|||
wait := func(now time.Time, nodeID nodeTypes.Identity, isDelete bool) { | |||
err := testutils.WaitUntil(func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my reading of the code now
is a lower bound for when the operation happened.
However, it is used only for insertion and not deletion (since for the latter, we do not have a deletion timestamp to check).
Given this, and the overall structure of the function, I wonder if it would be better to have two versions of this function one for deletes and one for inserts.
I think this would also improve readability on the caller side (at the cost of some code duplication).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my reading of the code now is a lower bound for when the operation happened.
Did s/now/before/. Hope it's more clear.
However, it is used only for insertion and not deletion
Made before
as *time.Time
, so that nil can be passed for the wait for the deletion.
Given this, and the overall structure of the function, I wonder if it would be better to have two versions of this function one for deletes and one for inserts.
I want this waiting routine to be concise to avoid polluting the test case. Also, to follow DRY principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good.
We can inspect the neighLastPingByNextHop map to check when insertNeighbor() or deleteNeighbor() was called. Signed-off-by: Martynas Pumputis <m@lambda.lt>
90dc1bf
to
984ae0e
Compare
@@ -1026,18 +1027,41 @@ func (s *linuxPrivilegedIPv4OnlyTestSuite) TestArpPingHandling(c *check.C) { | |||
}) | |||
c.Assert(err, check.IsNil) | |||
|
|||
wait := func(now time.Time, nodeID nodeTypes.Identity, isDelete bool) { | |||
err := testutils.WaitUntil(func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good.
It's possible that in the case of multiple concurrent insertNeighbor() executions the oldest (or older) goroutine will overwrite the latest arping result due to the fine-grained locking. To fix this, avoid updating neigh entry if we detect that prev last ping timestamp is after our arping timestamp. Signed-off-by: Martynas Pumputis <m@lambda.lt>
test-net-next |
test-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last patch looks good to me as well.
Out of curiosity, was this triggered because of the small refresh period:
option.Config.ARPPingRefreshPeriod = time.Duration(1 * time.Nanosecond)
Noup, I've set such low refresh period to make sure that updates of the periodic goroutines (scheduled in the unit tests) don't return early due to the last ping check. |
See commit msgs.
Fix #16040
Fix #16075