Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8 backports 2020-06-11 #12027

Merged
merged 48 commits into from Jun 12, 2020
Merged

v1.8 backports 2020-06-11 #12027

merged 48 commits into from Jun 12, 2020

Conversation

aanm
Copy link
Member

@aanm aanm commented Jun 11, 2020

Once this PR is merged, you can update the PR labels via:

$ for pr in 11980 11947 11974 11982 11989 11882 11765 11987 11996 11007 11757 11995 11973 11998 11993 11891 12008 11571 12020 12016 12007 12023 11894 12006 12030; do contrib/backporting/set-labels.py $pr done 1.8; done

@genbit this PR didn't apply successfully, is the 1.8 branch already with such changes?

pchaigno and others added 30 commits June 11, 2020 17:34
[ upstream commit 749d03a ]

Many tests are failing when configuring a device while BPF-based NodePort
is disabled (cf. #11972). A bug in the loader causes it to call init.sh
with the mode set to direct instead of e.g., vxlan. The cilium_vxlan is
then not created (or even removed if it existed) which causes
connectivity disruptions in multiple scenarios (mostly multi-node).

This commit fixes the wrong assumption that multiple devices are only
set when NodePort is enabled. That assumption might have been correct
when that code was introduced (in 3a83623).

Fixes: #11972
Fixes: 3a83623 ("bpf: add support for local NodePort via tunnel")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 009fb34 ]

The endpoint slice implementation did not account that a service could
map to multiple endpoint slices. Wrongly assuming that only a single
endpoint slice existed for a single service can cause Cilium to fail
the translation of a service to a backend in case 2 or more endpoint
slices existed. In some rare occurrences where a single endpoint slice
was deleted and another one was recreated, for the same service, it
could cause Cilium stop doing service translation entirely.

The unit test added easily replicates the issue as we can see the test
fails in the current master where the 2nd endpoint slice added overwrote
the 1st endpoint slice:

```
c.Assert(endpoints.String(), check.Equals, "2.2.2.2:8080/TCP,2.2.2.3:8080/TCP")
Expected :string = "2.2.2.2:8080/TCP,2.2.2.3:8080/TCP"
Actual   :string = "2.2.2.3:8080/TCP"
```

Fixes: c3b5ca6 ("add support for k8s endpoint slice")
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit b4627b7 ]

Similar to the v1.Endpoints, in case Cilium is running with
EndpointSlice enabled, it should wait for the k8s watchers watching
those resources before it starts.

Fixes: c3b5ca6 ("add support for k8s endpoint slice")
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 0d89f05 ]

Encryption fixes, when setting the encrypt ctx->mark field we need to
skip resetting it with the source identity. Its not needed in the
encryption case anywasy because we already checked the endpoint is
remote before encoding encryption signal.

Next if fib lookup is not available we will discover the route at
init time and encode it in the ENCRYPT_IFACE define. If this field
is non-zero we should use it. Otherwise in some configurations
where there is not a route to egress in the main routing table
the packet will be dropped.

Fixes: 86db0fd ("cilium: encryption, use fib_lookup to rewrite dmac/smac")
Fixes: f25d8b9 ("bpf: Preserve source identity for hairpin via stack")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit b41bbc0 ]

We observed in the K8sWatcher for "ciliumendpoints" the call
ConvertToCiliumEndpointAddFunc was taking an endpoint event with a
valid Encryption field and converting it to '0'.

To fix we can make the translation more explicit.

Fixes: 720c0b0 ("pkg/k8s: do not DeepCopy when converting to CiliumEndpoint")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 57deb48 ]

This option check is unnecessary in hostns case because either bpf_sock
or kube-proxy will always handle the translation of VIPs to backends that
the host firewall requires.

Fixes: f74087e ("daemon: Introduce enable-host-firewall option")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit f8da3b9 ]

Make 0532f88 ("bpf: avoid using fib lookup in nodeport for dsr v4/v6
hairpin case") generic such that also the tc BPF case can benefit from it
when only a single external device was specified or detected on the node.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 4673688 ]

- Update instructions to enable Hubble.
- Update minikube and kind guides to use experimental-install.yaml
  to deploy Cilium instead of quick-install.yaml so that Hubble is
  enabled.
- Delete outdated {hubble-install,getting-started-next-steps}.rst.
- Set SKIP_LINT for html-netlify target. There is another status
  check that performs linting, and this lets us ignore inconsistent
  spell results for now.

Fixes #11151

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 2fc6b97 ]

Add tooling to allow a number of tests out of the total number of test
cases to fail. All tests currently allow 0 fails.

Related: #11313
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: André Martins <andre@cilium.io>
…l7 test.

[ upstream commit 9d263a2 ]

Add monitoring to gain more insight.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 49dca5c ]

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 566de37 ]

The ineffassign dependency was removed from the VM image in
cilium/packer-ci-build#211, in preparation for the switch to golangci-lint
implemented by #11528. However, (1) the VM image update was merged before
the golangci-lint PR was, and (2) the golangci-lint PR won't be
backported to v1.8, whereas the VM image update has already been.

We therefore need to reinstall the ineffassign dependency in the dev. VM
until #11528 is merged and in order to backport this fix to v1.8.

Fixes: cilium/packer-ci-build#211
Fixes: #11917
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 121cd37 ]

The `prometheus.io/{scrape,port}` annotations are used to tell
Prometheus to automatically scrape the Hubble metrics on the
given port.

Unfortunately, we cannot add this annotation directly on the
cilium-agent pod, as it is already used for the cilium-agent metrics
and the Prometheus annotation only supports a single port. Therefore,
this commit annotates the `hubble-metrics` service instead and changes
the Helm chart to create the `hubble-metrics` service whenever Hubble
metrics are enabled.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit b910da8 ]

This commit slightly restructures the "Monitoring & Metrics" section
in the documentation and adds the reference for the available Hubble
metrics.

The document is restructured to explain how to enable the Cilium/
Cilium Operator metrics in Helm separately from the Hubble metrics
provided by embedded Hubble. The rationale for this split
is that Cilium metrics are used to monitor Cilium itself, while Hubble
metrics are used to monitor network traffic of applications in the
cluster.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit a89dedb ]

This patch will try to fix up all the --help sections properly based
on the feature requirements.

Fixes: #10944
Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit bbf5377 ]

Fixes: #11755

Signed-off-by: Sean Winn <sean@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit c16e111 ]

Wait for each POD incrementing their policy revision also after
deleting a policy, as otherwise there is a race between policy
deletion and the following test traffic that expects policy deletion
to have taken effect already.

This fixes CI flakes following policy deletes like:

08:35:50 STEP: Deleting Egress deny all clusterwide policy
08:35:51 STEP: Testing ingress connectivity from "app2" to "app1" in "2020060908341k8spolicytestclusterwidepoliciestestclusterwidecon" namespace
FAIL: Ingress connectivity should be allowed for service in 2020060908341k8spolicytestclusterwidepoliciestestclusterwidecon namespace
Expected command: kubectl exec -n 2020060908341k8spolicytestclusterwidepoliciestestclusterwidecon app2-88b7f8c4b-cvg64 -- curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5 --max-time 20 http://10.110.187.138/public -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'"
To succeed, but it failed:
Exitcode: 28
Stdout:
 	 time-> DNS: '0.000013()', Connect: '0.000000',Transfer '0.000000', total '5.000688'
Stderr:
 	 command terminated with exit code 28

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 6530325 ]

Update proxy image to build with Envoy 1.13.2. This fixes
CVE-2020-11080 by rejecting HTTP/2 SETTINGS frames with too many
parameters.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit c9ea0e9 ]

Lets avoid pointless runs of sockops tests where the feature is disabled due
to kernel limitations. Should save us a couple runs in CI.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 47f8d32 ]

The curl is reaching out to a world / external resource so retrying is
acceptable, and helps test flakes.

Fixes: #11797

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 941e328 ]

This makes sure that cancelation signals or timeouts are propagated
properly.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: André Martins <andre@cilium.io>
…flag

[ upstream commit f6d62b5 ]

The flag no longer has any effect and is scheduled for removal in
Cilium 1.8.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit a961b04 ]

Support instances that are not part of a Virtual Machine Scale Set.

Cilium only assumption with regard to machines running in VMSS was
limited to interfaces discovery. Also listing standard (non-VMSS)
network interfaces from the same Azure ResourceGroup is all we need
for those to work proper with Cilium.

Signed-off-by: Benjamin Pineau <benjamin.pineau@datadoghq.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 742c318 ]

Standalone (non VMSS) instances are slightly different than VMSS in
that regard: the interfaces are attached to instances, not part of
the instance definition.

Signed-off-by: Benjamin Pineau <benjamin.pineau@datadoghq.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 20c092c ]

Otherwise values set during helm install get wiped out.

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 2ed8851 ]

Include Envoy-supplied error detail in NACK warning logs to facilitate debugging.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 0de65a1 ]

We are occasionally seeing log archives without most of the logs which
is caused by this timeout being hit. Let's increase it and see if this
helps.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 7fdedb1 ]

This change adds two commands which are meant to gather Cilium logs in
case more granular methods fail.

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 3aadcc0 ]

Port 12000 is used for historical reasons. Considering it is an HTTP
frontend, we should use port 80 for the ClusterIP service.

Fixes: #11250

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 2cd23fe ]

Pull the latest version which includes a new field LinkIndex in Addr
struct [1]. This is going to be used for retrieving link by ip addr.

[1]: vishvananda/netlink#543

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
brb and others added 10 commits June 11, 2020 17:38
[ upstream commit 298cb13 ]

The param specifies to which devices bpf_host.o should be attached.
Currently, it's used by BPF NodePort, host-fw and BPF masquerading.

Also, mark --device as deprecated. If a user specifies both,
cilium-agent will log a fatal msg.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit bd31f7d ]

Previously, the constraint made it impossible to have BPF NodePort
exposed via multiple devices, as a request to a NodePort service which
was received by non-XDP device (handled by TC) was dropped due to
the constraint.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 1fbaf82 ]

Attach XDP NodePort to a device which is used for direct routing among
nodes.

In the case of a multi-dev setup, if we attach to other than the direct
routing device, a request to a NodePort svc which real destination is
a remote backend won't be redirected to another device (only possible
via hairpinning). So, we need to attach to a device which can forward
to another node which is the direct routing device.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit dcbce7e ]

Otherwise, we would need to make the daemon unit test suite as
privileged due to bpftool depedency for the feature detection in
initKubeProxyReplacementOptions().

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 0b5891a ]

Previously, when running with "--k8s-require-ipv{4,6}-pod-cidr=false",
retrieveNodeInformation() might have failed retrieving a self
(Cilium)Node object on a busy cluster.

To avoid this, return an error if the BPF NodePort dev auto-detection
might happen. This will make the retry mechanism not to give up on
waiting for the object.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 6beb6b3 ]

Disable the kube-proxy replacement on the CI 4.19 job until
has been merged #11915, but
keep bpf_sock to avoid bpf_lxc complexity issues.

Also, get rid of non-working kernel vsn setting which complicates
passing of the KERNEL env var to ginkgo test runner.

Finally, disable ipsec + vxlan test on 4.19, as it is broken with the
contemporary 4.19 setup (TODO to investigate it).

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit b2c73e7 ]

When Reinitialize()-ing the datapath, transient iptables rules are set
up to avoid dropping packets while Cilium's rules are not in place.

In rare occasions, a failure to add those rules has been observed (see
issue #11276), leading to an early exit from Reinitialize() and a
failure to set up the daemon.

But those transient rules are just used to lend a hand and keep packets
going for a very small window of time: it does not actually matter much
if we fail to install them, and it should not stop the reinitializing of
the daemon. Let's simply log a warning and carry on if we fail to add
those rules.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 66aa41d ]

We do the same thing for IPv4 and IPv6, with just the name of the
program changing. Then we do nearly the same thing for flushing and
deleting a chain. Let's refactor (no functional change).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit 052fa31 ]

When Cilium reinitialises its daemon, transient iptables rules are set
up to avoid dropping packets while the regular rules are not in place.

On rare occasions, setting up those transient rules has been found to
fail, for an unknown reason (see issue #11276). The error message states
that the "Chain already exists", even though we try to flush and remove
any leftover from previous transient rules before adding the new ones.
It sounds likely that removing the leftovers is failing, but we were not
able to understand why, because we quieten the function to avoid
spurious warnings the first time we try to remove them (since none is
existing).

It would be helpful to get more information to understand what happens
in those rare occasions where setting up transient rules fails. Let's
find a way to get more logs, without making too much noise.

We cannot warn unconditionally in remove() since we want removal in the
normal case to remain quiet. What we can do is logging when the "quiet"
flag is not passed, _or_ when the error is different from the chain
being not found, i.e. when the error is different from the one we want
to silence on start up. This means matching on the error message
returned by ip(6)tables. It looks fragile, but at least this message has
not changed since 2009, so it should be relatively stable and pretty
much the same on all supported systems. Since remove() is used for
chains other than for transient rules too, we also match on chain name
to make sure we are dealing with transient rules if ignoring the "quiet"
flag.

This additional logging could be removed once we reproduce and fix the
issue.

Alternative approaches could be:

- Uncoupling the remove() function for transient rules and regular
  rules, to avoid matching on chain name (but it sounds worse).
- Logging on failure for all rules even when the "quiet" flag is passed,
  but on "info" level instead of "warning". This would still require a
  modified version of runProg(), with also a modified version of
  CombinedOutput() in package "exec". Here I chose to limit the number
  of logs and keep the changes local.
- Listing the chain first before trying to remove it, so we only try to
  remove if it exists, but this would likely add unnecessary complexity
  and latency.

Should help with (but does not solve): #11276
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit e78405a ]

In an attempt to catch #11276 in CI, let's add any message related to a
failure to flush or delete the chain related to iptables transient rules
to the list of badLogMessages we want to catch.

We need to filter on the name of the chain for transient rules to avoid
false positives, which requires exporting that name.

We also need to modify the log message error, to avoid adding four
disting logs to the list (combinations for iptables/ip6tables,
flush/delete).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm requested a review from a team as a code owner June 11, 2020 15:39
@aanm aanm added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Jun 11, 2020
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for my changes!

Vagrantfile Show resolved Hide resolved
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes

@joestringer
Copy link
Member

This will fix the hubble-relay version issue:
#12030

[ upstream commit ea52a9e ]

In particular, add the new hubble-relay values file to the list of files
that we fix these values up for in the make target.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
@aanm
Copy link
Member Author

aanm commented Jun 11, 2020

test-backport-1.8

@genbit
Copy link
Contributor

genbit commented Jun 11, 2020

@genbit this PR didn't apply successfully, is the 1.8 branch already with such changes?

@aanm was it due to the conflicts in Documentation/requirements.txt ?

@genbit
Copy link
Contributor

genbit commented Jun 11, 2020

@aanm I've created PR with two docs changes #12032

@joestringer
Copy link
Member

joestringer commented Jun 11, 2020

All failures are because of the hubble-relay image pull issue which is likely because of a lack of v1.8.0-rc2 image tag on dockerhub. I will tag that now.

EDIT: Hmm, not that simple; it appears to be already tagged in dockerhub. Maybe there's another lurking issue in the image generation.

Trying it out locally with a GKE cluster to see what kind of infrastructure issue it might be.

Counter-intuitively, in the CI framework we build images from the
current tree and tag them as `latest`, even if the current tree is for
instance `v1.8` (with version `v1.8.0-rc2`).

At the same time, the Helm charts in the tree specify in the files that
they will pull the current tag from `VERSION`, for instance right now
`v1.8.0-rc2`.

For `cilium` and `operator` images, we *override* the helm chart
defaults with `latest` to pick up the build/cached images from the first
step above.

However we weren't doing this for Hubble Relay, which meant that it
tried to pull the image with the tag `v1.8.0-rc2` but this image didn't
exist in the local registry.

To fix this and get the v1.8 tree to properly build with a version of
hubble-relay from v1.8 branch, we must tag, cache and deploy images that
represent the latest v1.8 development with `latest`, and pull that image
from the CI framework.

:exploding_head:

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer
Copy link
Member

joestringer commented Jun 12, 2020

Given that all tests are currently passing (except K8sHubble), I will push the fix for the hubble issue and focus-test only the K8sHubble tests. If that + smoke test pass, we should be good to go.

@joestringer
Copy link
Member

test-focus K8sHubbleTest

@joestringer
Copy link
Member

Hit known flake #11955 in Travis, ignoring.

@joestringer joestringer merged commit 2eaf354 into v1.8 Jun 12, 2020
@joestringer joestringer deleted the pr/v1.8-backport-2020-06-11 branch June 12, 2020 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet