Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8 backport 2020-07-02 #12378

Merged
merged 17 commits into from Jul 2, 2020
Merged

v1.8 backport 2020-07-02 #12378

merged 17 commits into from Jul 2, 2020

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Jul 2, 2020

v1.8 backports 2020-07-02

Once this PR is merged, you can update the PR labels via:

$ for pr in 12351 12313 12246 12361 11936 12360 12311; do contrib/backporting/set-labels.py $pr done 1.8; done

christarazi and others added 17 commits July 2, 2020 08:44
[ upstream commit ecac73d ]

This is useful in cases where a backport PR is started on the same day
that another backport PR was created, for the same branch (e.g. v1.6).
In this case, the developer would have to manually modify the script to
create a non-conflicting branch name.

This commit allows the developer instead to pass a suffix to
disambiguate the branch name, without need to modify the script.

Example usage:

```
$ ./contrib/backporting/start-backport 1.6 "-2"
```

This creates a backport branch name pr/v1.6-backport-2020-06-30-2.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 5cfa8a8 ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 9673c48 ]

Identity allocation uses cache and refcnt mechanisms, if the identity info is
already in remote kvstore and localkeys store, it will just increase the refcnt,
then notify the caller that this identity is reused. The caller will then not
bump up the identity counter. However, there is a corner case that not get
handled: refcnt from 0 to 1, which will result to negative identity count in
the metrics output.

This patch fixes the problem by returning another flag to indicate whether the
identity is first-time referenced (refcnt from 0 to 1) or not. The caller then
uses this information to determine whether or not to increase the counter.

Signed-off-by: arthurchiao <arthurchiao@hotmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit ea8b727 ]

The pod cidr should match 10.217.0.0/16 mentioned previously,
not the default value 10.0.0.0/16 set by default via helm in this tutorial.
--set global.ipam.operator.clusterPoolIPv4PodCIDR=10.217.0.0/16

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: MaiReo <sawako.saki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 5968d5a ]

The `if` condition is actually inverted which makes it impossible to
create a backport branch. This commit fixes this issue.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit c388f1f ]

Checkpatch was run with the following command:

    checkpatch.pl --no-tree --show-types \
    --ignore COMPLEX_MACRO --ignore MULTISTATEMENT_MACRO_USE_DO_WHILE \
        -f bpf/**/*.{c,h}

The following error types were fixed in the bpf/ sources:

- ASSIGN_IN_IF: do not use assignment in if condition
- CODE_INDENT: code indent should use tabs where possible
- FUNCTION_WITHOUT_ARGS: Bad function definition - xxx() should probably
  be xxx(void)
- INITIALISED_STATIC: do not initialise statics to NULL
- OPEN_BRACE: open brace '{' following struct go on the same line
- OPEN_BRACE: open brace '{' following function definitions go on the
  next line
- POINTER_LOCATION: "(foo*)" should be "(foo *)"
- SPACING: space required before the open brace '{'
- SPACING: space required after that ',' (ctx:VxV)
- SWITCH_CASE_INDENT_LEVEL: switch and case should be at the same indent
- TRAILING_WHITESPACE: trailing whitespace

The following were ignored in this patch:

- COMPLEX_MACRO, because of IPCACHE4_PREFIXES and IPCACHE6_PREFIXES that
  we do not want to enclose between parenthesis.
- MULTISTATEMENT_MACRO_USE_DO_WHILE, we do not want do {} while() for
  macros used to define struct contents.
- SPACING (some of them), when we intentionally aligned fields on
  several lines.
- TRAILING_STATEMENTS, we do not want to add line breaks everywhere in
  bpf/include/bpf/builtins.h.
- Issues reported with lower levels (warnings, minor checks).

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit ca9e91a ]

Checkpatch was run as follows:

    $ checkpatch.pl --no-tree --show-types \
        --ignore BLOCK_COMMENT_STYLE --ignore CONST_STRUCT \
        --ignore CONSTANT_CONVERSION --ignore JIFFIES_COMPARISON \
        --ignore MACRO_WITH_FLOW_CONTROL \
        --ignore PRINTK_WITHOUT_KERN_LEVEL --ignore TRAILING_SEMICOLON \
	--ignore VOLATILE \
        -f bpf/**/*.{c,h}

The following warning types were fixed in the bpf/ files:

- ARRAY_SIZE: Prefer ARRAY_SIZE(kernel_hz)
- BRACES: braces {} are not necessary for any arm of this statement
- BRACES: braces {} are not necessary for single statement blocks
- LEADING_SPACE: please, no spaces at the start of a line
- LINE_CONTINUATIONS: Avoid unnecessary line continuations
- LINE_SPACING: Missing a blank line after declarations
- RETURN_VOID: void function return statements are not generally useful
- SPACE_BEFORE_TAB: please, no space before tabs
- SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in
  line 1
- TABSTOP: Statements should start on a tabstop
- UNNECESSARY_ELSE: else is not generally useful after a break or return

The following were ignored:

- BLOCK_COMMENT_STYLE, let's not fix those just yet
- CONST_STRUCT, broken with empty struct list
- CONSTANT_CONVERSION, we do want to use __constant_htons
- CONSTANT_COMPARISON, one false positive in bpf/lib/trace.h, some cases
  in bpf/lib/icmp6.h where we bound the value and it looks better
- JIFFIES_COMPARISON, we do want jiffies
- MACRO_WITH_FLOW_CONTROL, we still want some macros with flow control
- PREFER_ALIGNED, false positive on __aligned definition
- PREFER_PACKED, false postivie on __packed definition
- PRINTK_WITHOUT_KERN_LEVEL, our printk() does not have KERN_<LEVEL>
- TRAILING_SEMICOLON, we do want semicolons in some of our macros
- VOLATILE, we know what we are doing with volatile

The following were ignored, some of which will be fixed in a later
patch. This is because they are downgraded to the "check" level (instead
of warning) when checkpatch runs on files and not on patches and commit
logs (and so I did not see them at first):

- LONG_LINE
- LONG_LINE_COMMENT
- LONG_LINE_STRING
- PREFER_FALLTHROUGH
- TYPO_SPELLING

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit b612da2 ]

Let's harmonise block comment style. The kernel network codestyle is
preferred:

    /* Single-line comment */

    /* Multi-line comment,
     * spanning several lines.
     */

The regular kernel codestyle (non-network-specific) is not reported by
checkpatch, but strongly discouraged, to keep comment blocks consistent:

    /*
     * Multi-line comment,
     * with opening marker on its own line
     */

(Note that we could detect those by tricking checkpatch into believing
the files are under a "net/" directory, but this would also enforce the
"--strict" mode, which we do not desire.)

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 7a69ee2 ]

Checkpatch was run as follows:

    $ checkpatch.pl --no-tree --strict --show-types \
        --types AVOID_BUG --types FSF_MAILING_ADDRESS \
        --types LONG_LINE --types LONG_LINE_COMMENT \
        --types SPDX_LICENSE_TAG --types TYPO_SPELLING \
        -f bpf/**/*.{c,h}

Although we do not try to fix the reports on the "check" level (logged
when the "--strict" option is used), these are some checks that would
be considered as warnings by checkpatch if not running directly on the
source files (but on patches or git commit instead). Let's process them
to be on par with what the GitHub action would report.

The following warning types were fixed in the bpf/ files:

- LONG_LINE: line length of <n> exceeds 100 columns
- LONG_LINE_COMMENT: line length of <n> exceeds 100 columns
- TYPO_SPELLING: 'xxx' may be misspelled - perhaps 'xxy'?

The following were ignored:

- LONG_LINE_STRING, not desired
- LONG_LINE: IP addresses are better on long lines in bpf/node_config.h
- PREFER_FALLTHROUGH, since "fallthrough;" is not implemented

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 9acc12c ]

Run checkpatch with option "--ignore C99_COMMENT_TOLERANCE" to disable
its default tolerance to C99-style comments (those comments starting
with "// ...").

For this error type, checkpatch.pl also reports the SPDX identifier tags
in .c files as infringing the rule. Those are left untouched.

All other occurrences are replaced with "/* ... */" style comments.

One exception: one comment in bpf/lib/icmp6.h was apparently a leftover
commented instruction and was simply removed.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit e625082 ]

This commit optimizes the node update handler in two ways:

  1) Change notifications from the node manager that do not change the
     name or address are now ignored as they are irrelevant to the peer
     service.
  2) If the name is changed, a DELETE notification with the old name is
     sent followed by an ADD notification with the new name.

Note that if the name has not changed but the address has, an UPDATE
notification is sent. This behavior is unchanged.

This set of changes will should reduce the amount of unnecessary update
notifications that result in overhead for the main consumer of the peer
service (Hubble Relay). Also, this way of handling a name change is
easier to process.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 0f10a70 ]

It's always built-in, but there is no use-case anymore after we've removed
the stand-alone LB back in the days, and it's not configurable via services
either. Removing this logic reduces the service map lookups in critical fast
path from two to one, and allows to simplify the code where we would otherwise
need to restore the dest port in the lookup.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 4a72c13 ]

Currently, the in-cluster implementation for externalTrafficPolicy=Local
services is incorrect in that E-W service communication should work always
for these cases. Reason is that we can always /directly/ select a backend
from the socket lb and never need to go the additional hop where original
client source IP would be otherwise obfuscated for non-DSR. Minimal straight
forward change would be to have the orchestrator add two service entries
into the map with different scoping. By default we do the regular lookup
where we only find the set of local endpoints. This is hit from bpf_host
side as-is today. For socket lb, we then do a scope switch and do a lookup
on the cluster internal scope which can then select all backends including
remote ones. This is least-complex and also most optimial for fast-path
since for bpf_host the second lookup can simply be optimized-out from the
compiler. All the rest wrt special service type behavior is transparent.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 846b037 ]

Pull in the lookup scope into L3n4Addr. This is needed since services are
hashed based on their frontend L3n4Addr. Also, implement support for setting
the scope via `cilium service update` CLI to allow for manual testing.

Follow-up patches will add surrogate internal entries from k8s side for
externalTrafficPolicy=Local to make backends cluster internal reachable.

Example from CLI config:

  # cilium service update --frontend="192.168.178.29:53" --backends="1.1.1.1:53" --id 11 --k8s-load-balancer --k8s-traffic-policy=local
  # cilium service update --frontend="192.168.178.29:53" --backends="8.8.8.8:53" --id 12 --k8s-load-balancer --k8s-traffic-policy=local --k8s-cluster-internal

  # ./cilium/cilium service list
  ID   Frontend              Service Type   Backend
  11   192.168.178.29:53     LoadBalancer   1 => 1.1.1.1:53
  12   192.168.178.29:53/i   LoadBalancer   1 => 8.8.8.8:53

  # ./cilium/cilium bpf lb list
  SERVICE ADDRESS       BACKEND ADDRESS
  192.168.178.29:53/i   0.0.0.0:0 (12) [LoadBalancer, Local]
                        8.8.8.8:53 (12)
  192.168.178.29:53     0.0.0.0:0 (11) [LoadBalancer, Local]
                        1.1.1.1:53 (11)

  * dig from outside node:

    23:27:47.341486 IP 192.168.178.28.48963 > 192.168.178.29.53: 7220+ [1au] A? heise.de. (49)
    23:27:47.341502 IP 192.168.178.29.48963 > 1.1.1.1.53: 7220+ [1au] A? heise.de. (49)
    23:27:47.350990 IP 1.1.1.1.53 > 192.168.178.29.48963: 7220 1/0/1 A 193.99.144.80 (61)
    23:27:47.350994 IP 192.168.178.29.53 > 192.168.178.28.48963: 7220 1/0/1 A 193.99.144.80 (61)

  * dig from inside node:

    23:31:00.994830 IP 192.168.178.29.47978 > 8.8.8.8.53: 27432+ [1au] A? heise.de. (49)
    23:31:01.004566 IP 8.8.8.8.53 > 192.168.178.29.47978: 27432 1/0/1 A 193.99.144.80 (53)

Restore from BPF map upon agent restart also works.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 40b159a ]

Finally, wire-up Kubernetes side to add surrogate entries for cluster-internal
communication in case of externalTrafficPolicy=Local. We only filter local
backends for ScopeExternal. ScopeInternal will have everything.

Example from nginx deployment with externalTrafficPolicy=Local, listing from
node apoc:

  # ./cilium/cilium service list
  ID   Frontend                 Service Type   Backend
  4    10.100.125.187:80        ClusterIP      1 => 10.217.0.252:80
                                               2 => 10.217.1.86:80
  5    192.168.178.29:30465     NodePort       1 => 10.217.0.252:80
  6    0.0.0.0:30465            NodePort       1 => 10.217.0.252:80
  7    192.168.178.29:30465/i   NodePort       1 => 10.217.0.252:80
                                               2 => 10.217.1.86:80
  8    0.0.0.0:30465/i          NodePort       1 => 10.217.0.252:80
                                               2 => 10.217.1.86:80

  # ./cilium/cilium bpf lb list
  SERVICE ADDRESS          BACKEND ADDRESS
  10.100.125.187:80        0.0.0.0:0 (4) [ClusterIP]
                           10.217.0.252:80 (4)
                           10.217.1.86:80 (4)
  192.168.178.29:30465     0.0.0.0:0 (5) [NodePort, Local]
                           10.217.0.252:80 (5)
  192.168.178.29:30465/i   0.0.0.0:0 (7) [NodePort, Local]
                           10.217.0.252:80 (7)
                           10.217.1.86:80 (7)
  0.0.0.0:30465            0.0.0.0:0 (6) [NodePort, Local]
                           10.217.0.252:80 (6)
  0.0.0.0:30465/i          10.217.1.86:80 (8)
                           10.217.0.252:80 (8)
                           0.0.0.0:0 (8) [NodePort, Local]

  # kubectl get pods --all-namespaces -o wide
  NAMESPACE     NAME                           READY   STATUS    RESTARTS   AGE    IP             NODE   NOMINATED NODE   READINESS GATES
  default       nginx6-7d4b5d6bdf-7tpk4        1/1     Running   0          127m   10.217.1.86    tank   <none>           <none>
  default       nginx6-7d4b5d6bdf-wd9hf        1/1     Running   0          127m   10.217.0.252   apoc   <none>           <none>
  [...]

As can be seen the ClusterIP has all backends, the external facing NodePort
only has local backends whereas the internal facing NodePort has all backends.
The 0.0.0.0:30465 could still be optimized away since genCartesianProduct()
has no awareness from the logic in ParseService() which adds the surrogate
entries.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 67f85e3 ]

... given kube-proxy-free setup supports them now, lets run these. Also
add tests from third node to make sure they fail to a node that has no
local backend.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[ upstream commit 30f44a5 ]

Add an 'umbrella' section on client source IP preservation with links
to the related sections.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann requested review from a team as code owners July 2, 2020 06:53
@maintainer-s-little-helper maintainer-s-little-helper bot added the kind/backports This PR provides functionality previously merged into master. label Jul 2, 2020
@borkmann
Copy link
Member Author

borkmann commented Jul 2, 2020

test-backport-1.8

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11936 -- bpf: run kernel's checkpatch.pl locally and as GitHub action, fix style (@qmonnet)

Looks all good, thanks a lot for that!

@borkmann borkmann merged commit f5b6330 into v1.8 Jul 2, 2020
@borkmann borkmann deleted the pr/v1.8-backport-2020-07-02 branch July 2, 2020 11:03
@borkmann borkmann mentioned this pull request Jul 2, 2020
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants