Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eni: Fix node manager test #11773

Merged
merged 1 commit into from May 29, 2020
Merged

eni: Fix node manager test #11773

merged 1 commit into from May 29, 2020

Conversation

errordeveloper
Copy link
Contributor

@errordeveloper errordeveloper commented May 29, 2020

The test was failing due resulting values being greater then expected,
however it is evident that minAllocate is a minimum value and should
be treated as such.

Fixed #11560.

The test was failing due resulting values being greater then expected,
however it is evident that `minAllocate` is a minimum value and should
be treated as such.

Fixes: #11560

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
@errordeveloper errordeveloper added area/CI Continuous Integration testing issue or flake area/eni Impacts ENI based IPAM. labels May 29, 2020
@errordeveloper errordeveloper requested a review from a team as a code owner May 29, 2020 12:24
@maintainer-s-little-helper

This comment has been minimized.

2 similar comments
@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented May 29, 2020

Having fixed the first issues, I am still seeing it fail on something else:

FAIL: node_manager_test.go:526: ENISuite.TestNodeManagerExceedENICapacity

node_manager_test.go:556:
    c.Assert(node.Stats().AvailableIPs, check.Equals, 45)
... obtained int = 20
... expected int = 45

But I cannot really tell what's going on, maybe the tests ends up looking at the wrong actual node or something like that? Or perhaps another assertion is missing at an earlier stage? I know virtually nothing about this code, but I'd say I do understand the domain of ENI IPAM.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.004%) to 36.839% when pulling 4777edd on pr/errordeveloper/fix-11560 into 05daeef on master.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented May 29, 2020

On another several attempts, different values, this time higher then expected:

FAIL: node_manager_test.go:526: ENISuite.TestNodeManagerExceedENICapacity

node_manager_test.go:556:
    c.Assert(node.Stats().AvailableIPs, check.Equals, 45)
... obtained int = 54
... expected int = 45

It looks like this is not random.

Looks like 20 relates to this:

newCiliumNode("node2", "i-testNodeManagerExceedENICapacity-1", "m4.xlarge", "us-west-1", "vpc-1", 1, 8, 20, 0)

However, 54 doesn't seem to relate to anything and that's what I'm seeing most often.

@christarazi
Copy link
Member

I think we should remove that this PR fixes the issue because the underlying cause doesn't seem to be fixed. From my previous investigation, I noticed that when a failure occurs, there's always a message like:

Instance not found! ...

which is coming from here:

scopedLog.Warning("Instance not found! Please delete corresponding ciliumnode if instance has already been deleted.")

and also from here:

scopedLog.Warning("Instance not found! Please delete corresponding ciliumnode if instance has already been deleted.")

I think the cause of these failures is that the some Stats fields are zeroed out:

		scopedLog.Warning("Instance not found! Please delete corresponding ciliumnode if instance has already been deleted.")
		// Avoid any further action
		n.stats.NeededIPs = 0
		n.stats.ExcessIPs = 0

when the condition that causes the log message to occur is hit. I can't seem to put a finger on why it happens, as most runs it works just fine. I think there's some more raciness going on with the async triggers / reconciliation code.

@christarazi
Copy link
Member

On second thought, if the new failure you mentioned @errordeveloper is occurring less often, then we could merge this fix and continue investigating.

@errordeveloper
Copy link
Contributor Author

@christarazi yes, I thin there is some sort of raciness for sure!

@christarazi
Copy link
Member

Great, merging and will keep a note on the original issue that it may come back.

@christarazi christarazi merged commit f719893 into master May 29, 2020
1.8.0 automation moved this from In progress to Merged May 29, 2020
@christarazi christarazi deleted the pr/errordeveloper/fix-11560 branch May 29, 2020 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake area/eni Impacts ENI based IPAM. release-note/ci This PR makes changes to the CI.
Projects
No open projects
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

CI: Travis: ENISuite.TestNodeManagerManyNodes fails
6 participants