Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.6 backports 2020-02-17 #10223

Merged
merged 14 commits into from Feb 21, 2020
Merged

v1.6 backports 2020-02-17 #10223

merged 14 commits into from Feb 21, 2020

Conversation

nebril
Copy link
Member

@nebril nebril commented Feb 17, 2020

Once this PR is merged, you can update the PR labels via:

$ for pr in 10113 10131 10144 10140 10137 10171 10173 10178 10192 10190 10185; do contrib/backporting/set-labels.py $pr done 1.6; done

Note: dropped last commit from #10185, as status method was not available in 1.6 branch


This change is Reviewable

@nebril nebril requested a review from a team as a code owner February 17, 2020 22:19
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.6 kind/backports This PR provides functionality previously merged into master. labels Feb 17, 2020
@@ -116,8 +118,8 @@ func prefixMatchesKey(prefix, key string) bool {

// NewKVStoreBackend creates a pkg/allocator.Backend compatible instance. The
// specific kvstore used is configured in pkg/kvstore.
func NewKVStoreBackend(basePath, suffix string, typ allocator.AllocatorKey) (*kvstoreBackend, error) {
if kvstore.Client() == nil {
func NewKVStoreBackend(basePath, suffix string, typ allocator.AllocatorKey, backend kvstore.BackendOperations) (*kvstoreBackend, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported func NewKVStoreBackend returns unexported type *allocator.kvstoreBackend, which can be annoying to use

@@ -116,8 +118,8 @@ func prefixMatchesKey(prefix, key string) bool {

// NewKVStoreBackend creates a pkg/allocator.Backend compatible instance. The
// specific kvstore used is configured in pkg/kvstore.
func NewKVStoreBackend(basePath, suffix string, typ allocator.AllocatorKey) (*kvstoreBackend, error) {
if kvstore.Client() == nil {
func NewKVStoreBackend(basePath, suffix string, typ allocator.AllocatorKey, backend kvstore.BackendOperations) (*kvstoreBackend, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported func NewKVStoreBackend returns unexported type *allocator.kvstoreBackend, which can be annoying to use

@@ -116,8 +118,8 @@ func prefixMatchesKey(prefix, key string) bool {

// NewKVStoreBackend creates a pkg/allocator.Backend compatible instance. The
// specific kvstore used is configured in pkg/kvstore.
func NewKVStoreBackend(basePath, suffix string, typ allocator.AllocatorKey) (*kvstoreBackend, error) {
if kvstore.Client() == nil {
func NewKVStoreBackend(basePath, suffix string, typ allocator.AllocatorKey, backend kvstore.BackendOperations) (*kvstoreBackend, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported func NewKVStoreBackend returns unexported type *allocator.kvstoreBackend, which can be annoying to use

@nebril
Copy link
Member Author

nebril commented Feb 17, 2020

never-tell-me-the-odds

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my PRs

@nebril
Copy link
Member Author

nebril commented Feb 17, 2020

never-tell-me-the-odds

@nebril
Copy link
Member Author

nebril commented Feb 17, 2020

never-tell-me-the-odds

@nebril
Copy link
Member Author

nebril commented Feb 17, 2020

test-me-please

@nebril
Copy link
Member Author

nebril commented Feb 17, 2020

test-me-please

@nebril
Copy link
Member Author

nebril commented Feb 17, 2020

never-tell-me-the-odds

@nebril
Copy link
Member Author

nebril commented Feb 18, 2020

test-me-please

@nebril
Copy link
Member Author

nebril commented Feb 18, 2020

never-tell-me-the-odds

@aanm
Copy link
Member

aanm commented Feb 18, 2020

test-me-please

1 similar comment
@nebril
Copy link
Member Author

nebril commented Feb 18, 2020

test-me-please

@nebril
Copy link
Member Author

nebril commented Feb 18, 2020

seems like Cilium was not running in runtime vm

@nebril
Copy link
Member Author

nebril commented Feb 19, 2020

test-me-please

1 similar comment
@nebril
Copy link
Member Author

nebril commented Feb 19, 2020

test-me-please

Copy link
Contributor

@raybejjani raybejjani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my clustermesh stuff

@@ -36,7 +36,7 @@ var kvstoreDeleteCmd = &cobra.Command{
Fatalf("Unable to delete keys: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this spot, but this backport is missing deleting all the package global functions in pkg/kvstore/kvstore.go (76c7fca#diff-fb5855a8577b97a3f31a2f1fe6c75009L49-L169)

@@ -143,15 +146,15 @@ func (k *kvstoreBackend) DeleteAllKeys() {
kvstore.Client().DeletePrefix(k.basePrefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops! I forgot to change this in the original PR. I'll make a follow up :/

@nebril
Copy link
Member Author

nebril commented Feb 19, 2020

test-me-please

@aanm
Copy link
Member

aanm commented Feb 19, 2020

@nebril the tests have been failing for the runtime VM for the same reason, this is probably a regression. I would suggest to drop the PRs that had conflicts, rebase against 1.6, test again and open a new PR with the PRs that had conflicts.

@nebril
Copy link
Member Author

nebril commented Feb 19, 2020

@aanm yeah, got really unlucky with provisioning errors on this PR, wanted to make sure this isn't the underlying issue

raybejjani and others added 14 commits February 20, 2020 16:04
[ upstream commit f868f97 ]

We need to avoid using the package level functions but need to keep the
tracing for debug. Moving the trace calls into the kvstore backends is
verbose but is the simplest way to do this.

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 76c7fca ]

We need to use different backends at the same time. Currently we assume a
single connected backend instance by using the package level functions.
We now direclty call these functions on the default client instead,
allowing future changes to use non-global instances of the backends.

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 913ed22 ]

We previously only used single pkg/kvstore/allocator.kvstoreBackend
instance system-wide. This also used a single, global, etcd/consul
backend instance (in this case a backend to allocator.kvstoreBackend,
itself the kvstore backend for the identity allocator).
With clustermesh, cilium-agent needs to sync with remote etcd instances.
This means that multiple etcd clients can exist, and that the identity
allocator may use different ones. This is now possible by keeping
an internal reference instead of using the global one.

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit f28f6cd ]

Clustermesh remote-cluster watching accidentally used the local caching
identity allocator in cilium-agent. This meant that while ipcache and
nodes were sycned correctly, identities were not. This was because the
watch was called on the main identity allocator, adding itself to itself
as a remote cluster. This resulted in two connections to the local etcd,
and double entries when listing identities.

A new allocator instance is created to use the remote cluster etcd
backend. This is then added to the main allocators remote clusters list.

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 5e01fb6 ]

If "cilium bpf nat list" fails, then we will be able to rely to
the raw dumps when debugging.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 8408128 ]

When Cilium performs a connection tracking garbage collection it will
also purge NAT entries from the NAT bpf map. To perform this action,
Cilium will execute BPF map lookups from userspace by giving the memory
location of a golang structure, previously NatEntry4 now NatEntry6. The
kernel will then write into this memory location the value of the NAT
entry being looked up. As the size of NatEntry4 is only 38 bytes, as
oppose the NatEntry6 which is 50 bytes, the kernel would then write 12
bytes in random memory locations of the Cilium agent. This could cause
Cilium agent to crash or to have errors such as:

```
msg="Cannot create CEP" ... error="CiliumEndpoint.cilium.io \"��\\x00\\x00\\x00\\x00\\x00\\x00-multi-node-headless-854b65674d-lj45r\" is invalid: metadata.name: Invalid value: \"��\\x00\\x00\\x00\\x00\\x00\\x00-multi-node-headless-854b65674d-lj45r\":
```

Fixes: b9d2a0a ("maps/nat: Add NatKey{4,6} types")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 5b9002d ]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 4630078 ]

Fix the space hack which stopped working with make v4.3 (works with
v4.2 though):

    [..]
    " [-DENABLE_HOST_REDIRECT-DENABLE_IPV4-DENABLE_IPV6-DENABLE_NAT46]";
    clang -DENABLE_HOST_REDIRECT-DENABLE_IPV4-DENABLE_IPV6-DENABLE_NAT46
    -I/home/brb/sandbox/gopath/src/github.com/cilium/cilium/bpf/include
    -I/home/brb/sandbox/gopath/src/github.com/cilium/cilium/bpf
    -D__NR_CPUS__=8 -O2 -g -target bpf -emit-llvm -Wall -Werror
    -Wno-address-of-packed-member -Wno-unknown-warning-option -c bpf_lxc.c
    -o bpf_lxc.ll; llc -march=bpf -mcpu=probe -mattr=dwarfris -o /dev/null
    bpf_lxc.ll;  \
    fi
    In file included from <built-in>:323:
    <command line>:1:20: error: ISO C99 requires whitespace after the macro
    name [-Werror,-Wc99-extensions]
    #define ENABLE_IPV4-DHAVE_LPM_MAP_TYPE 1

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 640b7a6 ]

Use the top-of-tree VERSION file to generate the chart versions and
update the pull policy using the following rules:

  * Set the helm chart versions to the VERSION in the file
  * If the VERSION file ends with ".90":
    - Set the cilium tag to 'latest'
    - Set the pullPolicy to 'Always'
  * If the VERSION file does not end with ".90":
    - Set the cilium tag to the VERSION in the file
    - Set the pullPolicy to 'IfNotPresent'
  * Set the managed-etcd version tag to the version specified at the top
    of this Makefile. This must be manually bumped, it does not appear
    to follow the standard Cilium docker image tag process.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit da2e652 ]

spanStart of waitConsumeOffQueue could be read before being written in
case the buffered event was executed before the execution of
waitConsumeOffQueue.Start() in the modified lines of this commit. To fix
this we should execute waitConsumeOffQueue.Start() even before the event
is put into the queue. Although it does not give the correct span stat,
the developer or user can derive it by subtracting the waitEnqueue span
to retrieve the real waitConsumeOffQueue span.

Fixes: add0d65 ("add eventqueue package")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 4b61dd5 ]

getState function should only be called when the endpoint mutex is being
held which was not always the case. Changed the function call so that it
can be accessed concurrently for the data race detected bellow:

```
==================
WARNING: DATA RACE
Write at 0x00c000544f30 by goroutine 56:
  github.com/cilium/cilium/pkg/endpoint.(*Endpoint).SetStateLocked()
      /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1641 +0xe9
  github.com/cilium/cilium/pkg/endpoint.(*Endpoint).LeaveLocked()
      /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1386 +0x333
  main.(*Daemon).deleteEndpointQuiet()
      /go/src/github.com/cilium/cilium/daemon/endpoint.go:651 +0x3e4
  main.(*Daemon).regenerateRestoredEndpoints.func2()
      /go/src/github.com/cilium/cilium/daemon/state.go:388 +0x49

Previous read at 0x00c000544f30 by goroutine 165:
  github.com/cilium/cilium/pkg/endpointmanager.releaseID()
      /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1549 +0xb5
  github.com/cilium/cilium/pkg/endpointmanager.Remove.func1()
      /go/src/github.com/cilium/cilium/pkg/endpointmanager/manager.go:335 +0x88

Goroutine 56 (running) created at:
  main.(*Daemon).regenerateRestoredEndpoints()
      /go/src/github.com/cilium/cilium/daemon/state.go:382 +0x916
  main.(*Daemon).initRestore()
      /go/src/github.com/cilium/cilium/daemon/state.go:454 +0xf3
  main.runDaemon()
      /go/src/github.com/cilium/cilium/daemon/daemon_main.go:1441 +0xa17
  main.NewDaemon()
      /go/src/github.com/cilium/cilium/daemon/daemon.go:894 +0x23a4
  main.runDaemon()
      /go/src/github.com/cilium/cilium/daemon/daemon_main.go:1396 +0x314
  main.glob..func1()
      /go/src/github.com/cilium/cilium/daemon/daemon_main.go:121 +0xbf
  github.com/cilium/cilium/vendor/github.com/spf13/cobra.(*Command).execute()
      /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:766 +0x8eb
  github.com/cilium/cilium/vendor/github.com/spf13/cobra.(*Command).ExecuteC()
      /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:850 +0x41b
  main.daemonMain()
      /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:800 +0x237
  main.main()
      /go/src/github.com/cilium/cilium/daemon/main.go:18 +0x2f

Goroutine 165 (finished) created at:
  github.com/cilium/cilium/pkg/endpointmanager.Remove()
      /go/src/github.com/cilium/cilium/pkg/endpointmanager/manager.go:324 +0x121
  main.(*Daemon).deleteEndpointQuiet()
      /go/src/github.com/cilium/cilium/daemon/endpoint.go:614 +0x1a8
  main.(*Daemon).regenerateRestoredEndpoints.func2()
      /go/src/github.com/cilium/cilium/daemon/state.go:388 +0x49
==================
```

Fixes: f71d87a ("endpointmanager: signal when work is done")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 69a6fce ]

Besides the ability to delete, the ipcache garbage collector also requires the
ability to dump the table. Add this to the existing probe which feeds
`SupportsDelete()`.

	level=debug msg="Detected IPCache delete operation support: true" subsys=map-ipcache
	level=debug msg="Detected IPCache dump operation support: true" subsys=map-ipcache

Fixes: #10080

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit 6b9559e ]

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
[ upstream commit c58d749 ]

 * Handling of the proxy redirection was missing entirely in the to-container
   section for IPv6. Add it.

 * The to-container section case was assuming that any proxy redirection is
   indicated by ipv{46}_policy() returning a non-zero proxy port. This is no
   longer true since commit 830adba. Fix this by using a separate return code
   to indicate proxy redirection and treating the proxy port as optional.

The above deficits lead to proxy redirection being ineffective when the setting
EnableEndpointRoutes was set.

Fixes: 830adba ("bpf: Support proxy using original source address and port.")
Fixes: 25a80df ("bpf: Add to-container section to bpf_lxc")
Fixes: #10105

Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
@nebril
Copy link
Member Author

nebril commented Feb 20, 2020

never-tell-me-the-odds

@nebril
Copy link
Member Author

nebril commented Feb 20, 2020

Seems like #10168 backport broke the CI, removed it from this PR, will create separate PR with just this backport.

@joestringer
Copy link
Member

Hit #10270 , retrying k8s tests.

@joestringer
Copy link
Member

test-missed-k8s

@aanm aanm merged commit 8809974 into v1.6 Feb 21, 2020
@aanm aanm deleted the pr/backports-1.6-17.02.2020 branch February 21, 2020 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants