Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.8 backports 2020-10-27 #13788

Merged
merged 16 commits into from
Oct 28, 2020
Merged

v1.8 backports 2020-10-27 #13788

merged 16 commits into from
Oct 28, 2020

Conversation

christarazi
Copy link
Member

Once this PR is merged, you can update the PR labels via:

$ for pr in 13667 13703 13699 13696 13702 13713 13717 13733 13716 13707 13728 13734 12268 13781; do contrib/backporting/set-labels.py $pr done 1.8; done

pchaigno and others added 16 commits October 27, 2020 15:31
[ upstream commit 228a485 ]

During the endpoint restoration process, when we parse the endpoints, we
assign them a reserved init identity if they don't already have an
identity [0]. If we later remove the endpoint (because the corresponding
K8s pod or interface are missing), we attempt to remove the identity
from the identity manager. That last operation results in the following
error message because the init identity was never added to the manager.

  level=error msg="removing identity not added to the identity manager!" identity=5 subsys=identitymanager

This commit fixes it by skipping the removal attempt from the manager in
the case of identity init.

0 - https://github.com/cilium/cilium/blob/80a71791320df34df5b6252b9680553e38d88d20/pkg/endpoint/endpoint.go#L819
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit a8e67f1 ]

When submitting the backport PR using submit-backport, the script
proposes to update the labels (i.e., remove needs-backport/X and add
backport-pending/X):

    Sending pull request...
    Everything up-to-date
    #13700

    Updating labels for PRs 13383 13608 12975

    Set labels for all PRs above? [y/N] y
    Setting labels for PR 13383... ✓
    Setting labels for PR 13608... ✓
    Setting labels for PR 12975... ✓

The choice defaults to not updating the labels. That may give the wrong
impression that it is an optional step---and if you're like me, when
you're unsure what an optional step does, you skip it. We should default
to setting the labels because we later rely on the labels being set
(e.g., when we update them after the PR is merged).

This commit changes it to the following:

    Sending pull request...
    Everything up-to-date
    #13700

    Updating labels for PRs 13383 13608 12975

    Set labels for all PRs above? [Y/n]
    Setting labels for PR 13383... ✓
    Setting labels for PR 13608... ✓
    Setting labels for PR 12975... ✓

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit ec16cab ]

If the controller that is used for label resolution fails, the
prometheus metrics will increase its cardinality since the uniquely
controller name was being used as a prometheus label. To avoid this, we
will reference these warnings with a common subsystem name,
'resolve-labels'.

Fixes: a31ab29 ("endpoint: Run labels controller under ep manager")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 5733f6a ]

This change is only adding more unit tests to better understand the
behavior of the labelsfilter as well as improving the documentation for
the expectation of filtering labels.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 4e29130 ]

Since Cilium receives CNI events when a pod is created, Cilium can
calculate the lag for kube-apiserver events by checking the time an
ADD event for that Pod was received and subtracting by the time the CNI
event for that pod was received.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit c40dfb0 ]

- Use `GOOGLE_CREDENTIALS` instead of `GOOGLE_APPLICATION_CREDENTIALS`.
- Refer to https://github.com/openshift/installer/blob/master/docs/user/gcp/iam.md
  to assign appropriate roles to service account.
- Make it a bit clearer that the firewall rule creation is time-sensitive.

Ref: #13627 (comment)

Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 61100c5 ]

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 91748c7 ]

We haven't decoupled both yet.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 320b83f ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 253368a ]

This commit contains no functional changes. It just reorders some of the
functions by importance in this file so that it's easier to parse. This
should hopefully reduce the number of times needed to scroll up and
down.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 1c0f00d ]

This commit ensures that the EventQueue is fully drained, even when its
not running its loop. When endpoints are being restored, their
EventQueue is initialized, but non-running state (processing events). It
is the job of the endpoint manager to kick off the event loop by calling
Expose() on the endpoint.

This commit fixes the following commits which causes Cilium to be stuck
waiting for the EventQueue to drain (WaitToBeDrained()):

290d9e9 ("daemon: Init endpoint queue during validation")
79bf425 ("endpoint: Add function to initialize event queue")

Cilium becoming stuck is described in the following flow:
  - Endpoints began restoration
  - Endpoint's EventQueue initialized (but never run)
  - Endpoint's metadata data resolver controller kicked off
  - Visilbity and bandwidth policy events enqueued
  - Endpoint fails restoration due to some issue (e.g. interface not
    found, etc)
  - Endpoint queued for deletion because it failed restoration
  - As part of endpoint deletion, the EventQueue is stopped and drained
  - Cilium deadlocks trying to drain, but the EventQueue run loop was
    never run, which would pop events off the `events` channel, and
    close the `eventsClosed` channel

This commit fixes this deadlock by forcefully running the event loop to
drain the queue. After the `events` channel is closed (from Stop()), the
loop will terminate and the `eventsClosed` channel will close, thereby
unblocking WaitToBeDrained().

Stacktrace from `gops`:

```
goroutine 632 [chan receive, 1 minutes]:
github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).WaitToBeDrained(0xc00013c960)
        /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:322 +0x1ad
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).Delete(0xc000ad6900, 0x27faee0, 0xc00062ac40, 0x27fba60, 0xc00099a120, 0x2877280, 0xc0005d2340, 0x430101, 0x0, 0x0, ...)
        /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:2194 +0x91
github.com/cilium/cilium/daemon/cmd.(*Daemon).deleteEndpointQuiet(...)
        /go/src/github.com/cilium/cilium/daemon/cmd/endpoint.go:674
github.com/cilium/cilium/daemon/cmd.(*Daemon).regenerateRestoredEndpoints.func2(0xc00062ac40, 0xc000b63214, 0xc000ad6900)
        /go/src/github.com/cilium/cilium/daemon/cmd/state.go:302 +0x7c
created by github.com/cilium/cilium/daemon/cmd.(*Daemon).regenerateRestoredEndpoints
        /go/src/github.com/cilium/cilium/daemon/cmd/state.go:296 +0x8a0
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 7b4d700 ]

During the backport process, when cherry-picking commits to backport,
the cherry-pick script may fail if there are conflicts. In that case,
the temporary file holding the backported commit is not clean up.

This commit fixes it to clean the temporary file even in case of
failure.

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 2ec7305 ]

Fixes: b3adc4d ("k8s: delete IPs from ipcache for no running Pods")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 8909c7f ]

Open to discussion: The port-distribution metric covers "Number of packets by destination port number" which results in a high cardinality metric with arguably minimal value from a default installation. We should consider removing the metric from the default metric GSG.

Signed-off-by: Jed Salazar jed@isovalent.com
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit ce8be45 ]

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 42f19f1 ]

This commit clarifies the docs to mention the case where users are
simply interested in bumping the runtime images to update to the latest
package, and don't necessarily have "a branch with changes".

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi requested a review from a team as a code owner October 27, 2020 22:52
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Oct 27, 2020
@christarazi christarazi added backport/1.8 kind/backports This PR provides functionality previously merged into master. labels Oct 27, 2020
@christarazi
Copy link
Member Author

test-backport-1.8

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for my PRs.

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes, thanks.

@jrajahalme jrajahalme merged commit 7d48cdd into v1.8 Oct 28, 2020
@jrajahalme jrajahalme deleted the pr/v1.8-backport-2020-10-27 branch October 28, 2020 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants