Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2021-06-22 #16614

Merged
merged 14 commits into from Jun 25, 2021

Conversation

aditighag
Copy link
Member

@aditighag aditighag commented Jun 22, 2021

Skipped due to conflicts -

Once this PR is merged, you can update the PR labels via:

$ for pr in 16545 16434 16523 16563 16530 16604 16578 16557 16548 16529 16509; do contrib/backporting/set-labels.py $pr done 1.10; done

aanm and others added 14 commits June 22, 2021 05:32
[ upstream commit 286a900 ]

The script got broken with the introduction of CRD alphav1 which
contains another occurrence of the schema version. To handle this, the
script will take into account the first occurrence of the schema
version under 'pkg/k8s'.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 67b946d ]

Fixes: 09d9e1e ("policy: Disable well-known identities for non-managed etcd")

Signed-off-by: Mauricio Vásquez <mauricio@accuknox.com>
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 0c9d55e ]

Following cilium/metallb#4, Cilium is now
tracking the code from the v0.9.6 branch of cilium/metallb:
https://github.com/cilium/metallb/tree/v0.9.6

This was done in a backwards-compatible way to ensure that older
versions of Cilium can still build by avoiding the invalidation of the
previous commit SHA (40d425d20241).

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 8b3f009 ]

Fixes: cilium#16549

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit db06a64 ]

Log the correct field for HostIP.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 27122d4 ]

Example trace seen in dmesg:

  [...]
  [ 7710.165608] enp10s0f0np0: hw csum failure
  [ 7710.165621] skb len=84 headroom=78 headlen=84 tailroom=30
                 mac=(64,14) net=(78,20) trans=98
                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
                 csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0)
                 hash(0x14006e3a sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
  [ 7710.165631] dev name=enp10s0f0np0 feat=0x0x0032b18217514ba9
  [ 7710.165635] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165638] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165641] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165644] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165646] skb headroom: 00000040: b8 ce f6 05 e7 62 b8 ce f6 05 e7 76 08 00
  [ 7710.165649] skb linear:   00000000: 45 00 00 54 8a 07 00 00 40 01 84 e8 c0 a8 a0 04
  [ 7710.165652] skb linear:   00000010: 0a 9a 00 73 00 00 23 57 00 f8 15 db cd 74 d0 60
  [ 7710.165654] skb linear:   00000020: 00 00 00 00 5c 2d 0d 00 00 00 00 00 10 11 12 13
  [ 7710.165657] skb linear:   00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23
  [ 7710.165660] skb linear:   00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33
  [ 7710.165663] skb linear:   00000050: 34 35 36 37
  [ 7710.165665] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165668] skb tailroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165672] CPU: 26 PID: 0 Comm: swapper/26 Not tainted 5.13.0-rc3+ #174
  [ 7710.165674] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F22 08/20/2020
  [ 7710.165676] Call Trace:
  [ 7710.165677]  <IRQ>
  [ 7710.165680]  dump_stack+0x7d/0x9c
  [ 7710.165683]  netdev_rx_csum_fault.part.0+0x41/0x45
  [ 7710.165686]  netdev_rx_csum_fault.cold+0xb/0x10
  [ 7710.165687]  __skb_checksum_complete+0xdd/0xf0
  [ 7710.165690]  ? skb_send_sock_locked+0x20/0x20
  [ 7710.165692]  ? reqsk_fastopen_remove+0x190/0x190
  [ 7710.165693]  nf_ip_checksum+0x5b/0x120
  [ 7710.165697]  nf_conntrack_icmpv4_error+0x112/0x160 [nf_conntrack]
  [ 7710.165706]  nf_conntrack_in.cold+0x1d/0x74 [nf_conntrack]
  [ 7710.165714]  ? nft_do_chain_inet_ingress+0x280/0x2e0 [nf_tables]
  [ 7710.165722]  ipv4_conntrack_in+0x14/0x20 [nf_conntrack]
  [ 7710.165731]  nf_hook_slow+0x44/0xb0
  [ 7710.165733]  nf_hook_slow_list+0x71/0xf0
  [ 7710.165735]  ip_sublist_rcv+0x1d1/0x1f0
  [ 7710.165737]  ? ip_sublist_rcv+0x1f0/0x1f0
  [ 7710.165739]  ip_list_rcv+0xf5/0x120
  [ 7710.165741]  __netif_receive_skb_list_core+0x228/0x250
  [ 7710.165745]  netif_receive_skb_list_internal+0x1a1/0x2b0
  [ 7710.165747]  napi_complete_done+0x7a/0x1b0
  [ 7710.165749]  mlx5e_napi_poll+0x16e/0x730 [mlx5_core]
  [ 7710.165795]  __napi_poll+0x31/0x170
  [ 7710.165796]  net_rx_action+0x22f/0x280
  [ 7710.165798]  __do_softirq+0xce/0x281
  [ 7710.165800]  irq_exit_rcu+0xa2/0xd0
  [ 7710.165803]  common_interrupt+0x8d/0xa0
  [ 7710.165805]  </IRQ>
  [ 7710.165806]  asm_common_interrupt+0x1e/0x40
  [ 7710.165808] RIP: 0010:cpuidle_enter_state+0xcc/0x360
  [...]

The trace was only reproducible with NICs using CHECKSUM_COMPLETE as
csum type for inbound packets. It has been observed with mlx5, for
example. The hw csum failure was only reproducible under the following
conditions:

 - Protocol is ICMP, e.g. triggered by Cilium health probe packets
 - Pod from one node was pinging a remote node address
 - BPF based masquerading was used to SNAT Pod IP to node IP
 - BPF NAT engine found a collision in the NAT table such that
   it was forced to select a different ICMP id, and hence caused
   L4 rewrites

In the case of ICMPv4 the bug was that BPF_F_PSEUDO_HDR was used for
updating the L4 checksum. However, ICMPv4 does not have a pseudo
header, only ICMPv6. The packet based csum was okay either way, but
the flag caused to have a buggy skb->csum. Setting flag to 0 for
ICMPv4 stopped the hw csum traces.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 128f0f8 ]

As for some cases WaitUntil() is a DoS tool.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 8260f9d ]

The test became notoriously flaky. It seems that some goroutines were
lagging behind with the updates and they were overwritting the new MAC
addr entry with the obsolete.

To fix this, retry multiple times until the correct entry is found.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 4c4a5dc ]

The change is probably noop, but itshould improve the last ping
timestamp precision.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit d42614e ]

Five minutes after IPsec key rotations, we cleanup the old IPsec state
and print the following message:

    level=info msg="New encryption keys reclaiming SPI" spi=0 subsys=ipsec

Unfortunately, due to a bug the SPI was always 0 in that log message.
This commit changes it and also logs the old SPI value if we have it:

    level=info msg="New encryption keys reclaiming SPI" SPI=7 oldSPI=0 subsys=ipsec

Fixes: 3f12fb6 ("cilium: ipsec, add cleanup xfrm routine")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit a7d73e4 ]

Previously, we were restoring the original clusterIP
service even when the service was deleted.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 92d851d ]

The `deletePolicyService` function was previously
common to both delete policy and delete service callbacks.
Refactor the logic to pass the policy config directly, thereby
skip config look up.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit a75599d ]

Make IdentitySelectionUpdated() callbacks lock-free by queueing them
while still holding selectorcache lock (to keep FIFO order) and
calling from a goroutine not holding any locks. This prevents
deadlocks caused by the implementation of IdentitySelectionUpdated()
taking locks such as endpoint or selectorcache locks.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 876e9db ]

If hubble-ca-secret already exists, then certgen is
going to update it.

To let certgen do its job, we need to configure update
verb in the binded ClusterRole, otherwise it will fail
with cannot update resource \"secrets\" in API group
message.

Fixes: cilium#16508

Signed-off-by: Alex Szakaly <alex.szakaly@gmail.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag aditighag requested a review from a team as a code owner June 22, 2021 05:44
@aditighag aditighag added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Jun 22, 2021
@aditighag
Copy link
Member Author

test-backport-1.10

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good 👍

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commit

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my PR, thanks.

@aditighag aditighag closed this Jun 23, 2021
@aditighag aditighag reopened this Jun 23, 2021
@aditighag
Copy link
Member Author

aditighag commented Jun 23, 2021

test-backport-1.10

https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-5.4/440/console

Fetching changes from the remote Git repository
git config remote.origin.url https://github.com/cilium/cilium # timeout=10
Fetching upstream changes from https://github.com/cilium/cilium
git --version # timeout=10
using GIT_ASKPASS to set credentials Used for GH PR Commit Status
git fetch --tags --progress https://github.com/cilium/cilium +refs/pull/:refs/remotes/origin/pr/ --depth=1 # timeout=20
ERROR: Checkout failed
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)

@aditighag
Copy link
Member Author

test-backport-1.10

@aditighag
Copy link
Member Author

aditighag commented Jun 24, 2021

test-backport-1.10

15:41:59  
�[K    box: Progress: 80% (Rate: 207M/s, Estimated time remaining: 0:00:09)
�[KAn error occurred while downloading the remote file. The error
15:41:59  message, if any, is reproduced below. Please fix this error and try
15:41:59  again.
15:41:59  
15:41:59    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
15:41:59                                   Dload  Upload   Total   Spent    Left  Speed
15:41:59  
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 25, 2021
@pchaigno
Copy link
Member

Marking as ready to merge given tests are passing and we have several reviews already.

@errordeveloper errordeveloper merged commit 2d6c372 into cilium:v1.10 Jun 25, 2021
@aditighag
Copy link
Member Author

@errordeveloper Don't forget to run -

$ for pr in 16545 16434 16523 16563 16530 16604 16578 16557 16548 16529 16509; do contrib/backporting/set-labels.py $pr done 1.10; done

@errordeveloper
Copy link
Contributor

@aditighag thanks for the reminder, I done this just now, sorry for taking so long... I wonder if we can teach @ciliumbot to do this somehow, any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants