Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.12 backports 2022-10-06 #21610

Closed
wants to merge 18 commits into from

Conversation

ldelossa
Copy link
Contributor

@ldelossa ldelossa commented Oct 6, 2022

$ for pr in 21444 21588 21577 21497 21461 21463 21495 20520; do contrib/backporting/set-labels.py $pr done 1.12; done

aditighag and others added 16 commits October 6, 2022 16:44
[ upstream commit 22cf89a ]

The graceful termination feature relies on receiving
terminating endpoint events from Kubernetes, which are gated
by a flag. Reference the terminating endpoint metric in the
feature section so that users can track the metric to check
if the feature is working correctly, as a first step.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 18ef607 ]

The troubleshooting guide explains packet drops seen due to
conntrack map fill-up issue. Reference some of the conntrack
garbage collection related metrics so that users get visibility
into conntrack gc runs - how often it's being triggered, how many
entries were gc'd, etc.

Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 5b41102 ]

The mal-formed instance-type looks like below in CiliumNode spec:

```yaml
spec:
  ...
  alibaba-cloud:
    instance-type: |
      <?xml version="1.0" encoding="iso-8859-1"?>
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
       <head>
        <title>404 - Not Found</title>
       </head>
       <body>
        <h1>404 - Not Found</h1>
       </body>
      </html>
```

which prohibits cilium-operator from allocating PodIPs for this node.

Signed-off-by: ArthurChiao <arthurchiao@hotmail.com>
Reported-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 5b40396 ]

These are the only 2 image fields that are currently not quoted. Some
users templatize image name / tag values outside of Helm. Not quoting
the image fields can cause issues if these values contain special
characters.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 8ae1562 ]

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 3fe7919 ]

This commit has no functional changes. It simply renames a few variables
in enableIPsec to make their relationships clearer.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 91fdc20 ]

This commit adds two new arguments to UpsertIPsecEndpoint to specify the
outer source and destination IP address for IPsec. It doesn't include
any functional changes.

    - UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, ...
    + UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, outerLocal, outerRemote net.IP, ...

Until now, those two outer IP addresses were carried as part of the
first two CIDR arguments, `local` and `remote`. For example, `local`
would be equal to 192.168.56.11/24 where 192.168.56.0/24 would be used
to match packets in XFRM policies and 192.168.56.11 as the outer IP
address in XFRM states.

The outer IPs are now in separate arguments and the next commit will
change the local and remote arguments to not carry the IPs.

Why this change? Because in a subsequent commit, I will need the CIDR
and IP arguments to diverge. For example, we will have
 UpsertIPsecEndpoint calls with `local=0.0.0.0/0` and
`outerLocal=192.168.56.11`.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 645da80 ]

The previous commit changed the UpsertIPsecEndpoint function as follows:

    - UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, ...
    + UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, outerLocal, outerRemote net.IP, ...

The first two CIDR arguments, `local` and `remote`, now don't need to carry
the outer IP addresses (moved to `outerLocal` and `outerRemote`). We can
therefore change calls to this function so that those two first
arguments carry only the CIDR (i.e., changed from e.g. 192.168.56.11/24
to 192.168.56.0/24). As a result, we also don't need to mask those two
arguments when we want only the CIDR part.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit b33adeb ]

gops and pprof do not use the same protocol to collect profile data.
Thus, the default port for pprof debug endpoints in `cilium-bugtool`
should not be the one used for gops, but the default one for pprof
itself.

Besides, clustermesh-apiserver does not support pprof yet, but only
gops. Thus, the help message for the pprof port option in cilium-bugtool
is fixed accordingly.

Fixes: #416319b1cd (bugtool: Default to the agent's gops port)

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 5520be1 ]

submit-backport tried to create a backport PR with reviews from all
contributors whose fixes are being backported, including people who do
not have collaborator status in the repository. GitHub only allows
reviews to be assigned to collaborators, and thus rejected the review
assignments.

This commit changes submit-backport to filter the review assignments to
only include collaborators.

Fixes:  cilium#21548
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 1019e70 ]

cilium#21120 (comment)
uncovered that WaitUntilReady() is racy when using the DemoDaemonSet.
After adding an additional DaemonSet to this manifest, the number of
expected pods (as calculated per numExpectedPods()) no longer matches the
actually deployed pods. So WaitUntilReady() exits prematurely, and we
potentially access the pods' state before they are fully set up. This
explains the CI flakes we've observed in eg.
cilium#21120.

Fix this by updating the internal manifest description of the demo_ds.yaml
manifest.

This allows WaitUntilReady() to wait for the correct number of pods to
become ready when using demo_ds. Also deleteInAnyNamespace() can clean up
_all_ daemon sets of this manifest.

Fixes: 74d9f56 ("egressgw: manager: add support for new egressGateway policy's property")
Fixes cilium#21120
Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit d45999d ]

The script originally just exited and assumed the kubelet would restart
it, but this is quite disruptive, since that restarts the agent as well.

Additionally, it was discovered that some AWS installations need the
service network to be functioning before the AWS CNI file is written,
leaving clusters in a broken state.

So, change the script to retry looking for the AWS CNI file itself.

Fixes: cilium#21243

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 4532996 ]

In order to reclaim orphan entries in the NAT table the agent uses the
connection tracking state: if a NAT entry is not backed by a connection,
it should be considered orphaned and can be reclaimed.

Currently the SNAT logic assumes that all connections that needs to be
translated are already tracked. The only exception for this is
connections originating from the local host, which are tracked by
snat_v4_track_local().

In practice egress gateway is another case that requires special
attention: in fact, all connections tunneled from a different node to a
gateway node are currently not tracked. This means that whenever the CT
garbage collector kicks in, all NAT entries for egress gateway will be
deleted, potentially leading to connections breakage.

This commit fix this by making sure we always track these connections.

While at it, also rename snat_v4_track_local() to
snat_v4_track_connection().

Towards: cilium#21346
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 19c2d59 ]

For consistency with its v4 counterpart.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit c62b209 ]

No functional changes introduced.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 658045e ]

In snat_v4_needed(), whenever NAT is required because of an egress
gateway policy, we unconditionally set the from_endpoint parameter to
true, as egress gateway traffic is always originating from an endpoint.

This is wrong as that parameter is supposed to track whether traffic is
coming from a _local_ endpoint, and so that variable should be set to true
only if the __lookup_ip4_endpoint() lookup is successful.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
@ldelossa ldelossa requested a review from a team as a code owner October 6, 2022 18:14
@ldelossa ldelossa added backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Oct 6, 2022
@michi-covalent
Copy link
Contributor

✅ ack on a2534e4 thanks @ldelossa ! ❤️

@ldelossa
Copy link
Contributor Author

ldelossa commented Oct 6, 2022

@jibi looks like I left a git diff marker in the commit, will fix momentarily.

[ upstream commit 59ac84f ]

Move the from_endpoint parameter of snat_v4_needed() into
ipv4_nat_target, and rename the function to snat_v4_prepare_state().

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit f9697d2 ]

Use the information stored in the SNAT target state to determine if the
packet is coming from a local endpoint, saving one map lookup.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
@ldelossa ldelossa force-pushed the pr/v1.12-backport-2022-10-06 branch from 945019a to 80a76e5 Compare October 6, 2022 18:25
@ldelossa
Copy link
Contributor Author

ldelossa commented Oct 6, 2022

force-pushed the pr/v1.12-backport-2022-10-06 branch from 945019a to 80a76e5 40 seconds ago

@jibi ^ force push removed git diff marker.

@joestringer
Copy link
Member

The PR description for each backport needs a section like the following so that we can properly generate release notes for the release:

```upstream-prs
$ for pr in 20721 20833; do contrib/backporting/set-labels.py $pr done 1.11; done
```

The standard procedure for backports in the backports guide should do this automatically for you, is there a reason this PR doesn't follow that process?

@ldelossa
Copy link
Contributor Author

ldelossa commented Oct 6, 2022

@joestringer ah, i think i may have deleted the original generated file the script generates. I kept my own copy and used that as input to the branch creation script.

@joestringer
Copy link
Member

@ldelossa I see. The general format of the description doesn't matter to me, but either way please ensure that the PR description has that upstream-prs snippet 🙏

@ldelossa
Copy link
Contributor Author

ldelossa commented Oct 6, 2022

Yup will do.

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4c1a0e2 LGTM, thanks!

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

788f7e8 looks fine, thanks! 💯

@squeed
Copy link
Contributor

squeed commented Oct 7, 2022

/ci-awscni-1.12

@squeed
Copy link
Contributor

squeed commented Oct 7, 2022

/ci-aks-1.12

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR looks good. Thanks!

@squeed
Copy link
Contributor

squeed commented Oct 7, 2022

ci-awscni and ci-aks passed: all good for my commit!

edit: I'm a dummy, I need to test EKS ☕

@squeed
Copy link
Contributor

squeed commented Oct 7, 2022

/ci-eks-1.12

@pchaigno
Copy link
Member

pchaigno commented Oct 7, 2022

@squeed These will all get tested as part of CI once Louis triggers it.

@ldelossa ldelossa changed the title PRs for stable backporting: 10 v1.12 backports 2022-10-06 Oct 7, 2022
@ldelossa
Copy link
Contributor Author

ldelossa commented Oct 7, 2022

@joestringer fixed upstream code block. ✔️

@sayboras
Copy link
Member

sayboras commented Oct 8, 2022

dropped #21499 as per offline discussion.

@sayboras
Copy link
Member

sayboras commented Oct 8, 2022

Due to permission to forked repo, replace this by #21631

@sayboras sayboras closed this Oct 8, 2022
@jibi jibi mentioned this pull request Oct 10, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet