Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8 backports 2020-10-14 #13564

Merged
merged 8 commits into from
Oct 15, 2020
Merged

v1.8 backports 2020-10-14 #13564

merged 8 commits into from
Oct 15, 2020

Conversation

rolinh
Copy link
Member

@rolinh rolinh commented Oct 14, 2020

Once this PR is merged, you can update the PR labels via:

$ for pr in 13517 13488 13534 13545 13532 13476 13560; do contrib/backporting/set-labels.py $pr done 1.8; done

aanm and others added 8 commits October 14, 2020 16:22
[ upstream commit f6e42c4 ]

In case of an error the node manager mutex was never unlocked in case
of an error which would them cause a deadlock for this structure.

Fixes: 2eb51a3 ("eni: Refactor ENI IPAM into generic ipam.NodeManager")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit 1662955 ]

Fixes the following potential deadlock

```
POTENTIAL DEADLOCK: Inconsistent locking. saw this ordering in one goroutine:
happened before
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/azure/ipam/node.go:162 ipam.(*Node).ResyncInterfacesAndIPs { n.manager.mutex.RLock() }
pkg/ipam/node.go:358 ipam.(*Node).recalculate { a, err := n.ops.ResyncInterfacesAndIPs(context.TODO(), scopedLog) }
pkg/ipam/node.go:334 ipam.(*Node).UpdatedResource { n.recalculate() }
pkg/ipam/node_manager.go:315 ipam.(*NodeManager).Update { return node.UpdatedResource(resource) }
pkg/azure/ipam/ipam_test.go:345 ipam.(*IPAMSuite).TestIpamManyNodes { mngr.Update(state[i].cn) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:475 reflect.Value.call { call(frametype, fn, args, uint32(frametype.size), uint32(retOffset)) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 reflect.Value.Call { return v.call("Call", in) }
pkg/../vendor/gopkg.in/check.v1/check.go:781 check%2ev1.(*suiteRunner).forkTest.func1 { c.method.Call([]reflect.Value{reflect.ValueOf(c)}) }
pkg/../vendor/gopkg.in/check.v1/check.go:675 check%2ev1.(*suiteRunner).forkCall.func1 { dispatcher(c) }
happened after
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/ipam/types/types.go:340 types.(*InstanceMap).ForeachAddress { m.mutex.RLock() }
pkg/azure/ipam/node.go:164 ipam.(*Node).ResyncInterfacesAndIPs { n.manager.instances.ForeachAddress(n.node.InstanceID(), func(instanceID, interfaceID, ip, poolID string, addressObj ipamTypes.Address) error { }
pkg/ipam/node.go:358 ipam.(*Node).recalculate { a, err := n.ops.ResyncInterfacesAndIPs(context.TODO(), scopedLog) }
pkg/ipam/node.go:334 ipam.(*Node).UpdatedResource { n.recalculate() }
pkg/ipam/node_manager.go:315 ipam.(*NodeManager).Update { return node.UpdatedResource(resource) }
pkg/azure/ipam/ipam_test.go:345 ipam.(*IPAMSuite).TestIpamManyNodes { mngr.Update(state[i].cn) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:475 reflect.Value.call { call(frametype, fn, args, uint32(frametype.size), uint32(retOffset)) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 reflect.Value.Call { return v.call("Call", in) }
pkg/../vendor/gopkg.in/check.v1/check.go:781 check%2ev1.(*suiteRunner).forkTest.func1 { c.method.Call([]reflect.Value{reflect.ValueOf(c)}) }
pkg/../vendor/gopkg.in/check.v1/check.go:675 check%2ev1.(*suiteRunner).forkCall.func1 { dispatcher(c) }
in another goroutine: happened before
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/ipam/types/types.go:381 types.(*InstanceMap).ForeachInterface { m.mutex.RLock() }
pkg/azure/ipam/node.go:74 ipam.(*Node).PrepareIPAllocation { err = n.manager.instances.ForeachInterface(n.node.InstanceID(), func(instanceID, interfaceID string, interfaceObj ipamTypes.InterfaceRevision) error { }
pkg/ipam/node.go:548 ipam.(*Node).determineMaintenanceAction { a.allocation, err = n.ops.PrepareIPAllocation(scopedLog) }
pkg/ipam/node.go:584 ipam.(*Node).maintainIPPool { a, err := n.determineMaintenanceAction() }
pkg/ipam/node.go:678 ipam.(*Node).MaintainIPPool { err := n.maintainIPPool(ctx) }
pkg/ipam/node_manager.go:272 ipam.(*NodeManager).Update.func1 { if err := node.MaintainIPPool(context.TODO()); err != nil { }
pkg/trigger/trigger.go:206 trigger.(*Trigger).waiter { t.params.TriggerFunc(reasons) }
happened after
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/azure/ipam/instances.go:75 ipam.(*InstancesManager).FindSubnetForAllocation { m.mutex.RLock() }
pkg/azure/ipam/node.go:115 ipam.(*Node).PrepareIPAllocation.func1 { poolID, available := n.manager.FindSubnetForAllocation(preferredPoolIDs) }
pkg/ipam/types/types.go:364 types.foreachInterface { if err := fn(instanceID, rev.Resource.InterfaceID(), rev); err != nil { }
pkg/ipam/types/types.go:386 types.(*InstanceMap).ForeachInterface { return foreachInterface(instanceID, instance, fn) }
pkg/azure/ipam/node.go:74 ipam.(*Node).PrepareIPAllocation { err = n.manager.instances.ForeachInterface(n.node.InstanceID(), func(instanceID, interfaceID string, interfaceObj ipamTypes.InterfaceRevision) error { }
pkg/ipam/node.go:548 ipam.(*Node).determineMaintenanceAction { a.allocation, err = n.ops.PrepareIPAllocation(scopedLog) }
pkg/ipam/node.go:584 ipam.(*Node).maintainIPPool { a, err := n.determineMaintenanceAction() }
pkg/ipam/node.go:678 ipam.(*Node).MaintainIPPool { err := n.maintainIPPool(ctx) }
pkg/ipam/node_manager.go:272 ipam.(*NodeManager).Update.func1 { if err := node.MaintainIPPool(context.TODO()); err != nil { }
pkg/trigger/trigger.go:206 trigger.(*Trigger).waiter { t.params.TriggerFunc(reasons) }
```

Fixes: 24cb061 ("azure: Calculate available addresses based on subnet resource")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit 94fd90d ]

The official kubeadm documentation [0] used to have instructions for
deploying various network providers, including Cilium, but it doesn't
anymore. This change adds instructions about deploying Cilium on kubeadm
managed clusters to our documentation.

[0] https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

Fixes #13278

Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit a5d2a4c ]

Issue: #13533
Signed-off-by: Vadim Ponomarev <velizarx@gmail.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit b2e58d7 ]

Fixes: #13544
Signed-off-by: Vadim Ponomarev <velizarx@gmail.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit 8f0e7fa ]

Currently, during agent startup, cilium removes XDP from all
interfaces except for `cilium_host`, `cilium_net` and `$XDP_DEV`
regardless of whether there is an XDP program attached to it.

For some drivers, e.g. Mellanox mlx5, the following command will
cause device reset regardless of whether there is an XDP program
attached to it, which introduces node and pod network interruption:
`ip link set dev $DEV xdpdrv off`.

This patch adds a check of XDP program existence to avoid such
network interruption.

Fixes: #13526
Reported-by: ArthurChiao <arthurchiao@hotmail.com>
Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit 3453877 ]

Before this commit, natting are blindly changing l4 ports to non-first ipv4
fragments, causing wrong csum.

Also fixes the same issue during the rev natting for dsr.

Signed-off-by: Yuan Liu <liuyuan@google.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
[ upstream commit e38b336 ]

This issue affects zsh users, for some reason bash doesn't
attempt to interpret the `.SecurityGroups[0].GroupId` expression,
while zsh does and it results in an error like this:

zsh: no matches found: .SecurityGroups[0].GroupId
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe

Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com>
Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
@rolinh rolinh added kind/backports This PR provides functionality previously merged into master. backport/1.8 labels Oct 14, 2020
@rolinh rolinh requested a review from a team as a code owner October 14, 2020 14:26
@rolinh
Copy link
Member Author

rolinh commented Oct 14, 2020

test-backport-1.8

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good for my commits. Thanks!

Copy link
Member

@vadorovsky vadorovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commit. Thanks!

Copy link
Contributor

@errordeveloper errordeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good, thanks for backporting!

Copy link
Contributor

@velp velp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good also, thx for backporitng

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 15, 2020
@tklauser tklauser merged commit ab575b1 into v1.8 Oct 15, 2020
@tklauser tklauser deleted the pr/v1.8-backport-2020-10-14 branch October 15, 2020 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants