Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2021-12-08 #18189

Merged
merged 4 commits into from
Dec 9, 2021

Conversation

joestringer
Copy link
Member

Once this PR is merged, you can update the PR labels via:

$ for pr in 17982 18155 18166; do contrib/backporting/set-labels.py $pr done 1.10; done

pchaigno and others added 4 commits December 8, 2021 17:11
[ upstream commit 3bd4ad6 ]

We use path filters in the CodeQL workflow to avoid running it for
unrelated changes. We're however missing the workflow file itself in the
path filters. As a result, the CodeQL workflow isn't run when the GitHub
Actions it uses are updated by dependabot. This commit fixes it.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit f4d59d1 ]

Minikube 1.12.0 or later is required to use the --cni flag [1].

1 - kubernetes/minikube@9e95435
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 018be31 ]

Alexander reports in GitHub issue 18023 that establishing a connection
via an FQDN policy, then modifying that FQDN policy, would cause
subsequent traffic to the FQDN to be dropped, even if the new policy
still allowed the same traffic via a toFQDN statement.

This was caused by overzealous release of CIDR identities while
generating a new policy. Although the policy calculation itself keeps
all selectorcache entries alive during the policy generation phase (see
cachedSelectorPolicy.setPolicy() ), after the new policy is inserted
into the PolicyCache, the distillery package would clean up the old
policy. As part of that cleanup, it would call into the individual
selector to call the RemoveSelectors() function.

The previous implementation of this logic unintentionally released the
underlying identities any time a user of a selector was released, rather
than only releasing the underlying identities when the number of users
reached zero and the selector itself would be released. This meant that
rather than the selectorcache retaining references to the underlying
identities when a policy was updated, instead the references would be
released (and all corresponding BPF resources cleaned up) at the end of
the process. This then triggered subsequent connectivity outages.

Fix it by only releasing the identity references once the cached
selector itself is removed from the SelectorCache.

Fixes: f559cf1 ("selectorcache: Release identities on selector removal")
Reported-by: Alexander Block <ablock84@gmail.com>
Suggested-by: Jarno Rajahalme <jarno@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 80ff05a ]

Test that when FQDN policy is updated to a new policy which still
selects the same old FQDN destination, connectivity continues to work.

Validated to fail without the previous commit:

    K8sFQDNTest
    /home/joe/git/cilium/test/ginkgo-ext/scopes.go:473
      Validate that FQDN policy continues to work after being updated [It]
      /home/joe/git/cilium/test/ginkgo-ext/scopes.go:527

      Can't connect to to a valid target when it should work
      Expected command: kubectl exec -n default app2-58757b7dd5-rh7dd --
        curl --path-as-is -s -D /dev/stderr --fail --connect-timeout 5
            --max-time 20 --retry 5 http://vagrant-cache.ci.cilium.io
            -w "time-> DNS: '%{time_namelookup}(%{remote_ip})',
                Connect: '%{time_connect}',
                Transfer '%{time_starttransfer}',
                total '%{time_total}'"
      To succeed, but it failed:
      Exitcode: 28
      Err: exit status 28
      Stdout:
             time-> DNS: '0.000016()', Connect: '0.000000',Transfer '0.000000', total '5.000415'
      Stderr:
             command terminated with exit code 28

Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer requested review from a team as code owners December 9, 2021 01:12
@joestringer joestringer added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Dec 9, 2021
@joestringer
Copy link
Member Author

joestringer commented Dec 9, 2021

/test-backport-1.10

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with direct routing Tests NodePort

Failure Output

FAIL: Can not connect to service "http://192.168.36.12:30193" from outside cluster (2/10)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good 🙏

@nbusseneau
Copy link
Member

net-next hit #12511. This is unrelated to the backports at hand, merging.

@nbusseneau nbusseneau added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 9, 2021
@nbusseneau nbusseneau merged commit 421bd59 into cilium:v1.10 Dec 9, 2021
@joestringer joestringer deleted the pr/v1.10-backport-2021-12-08 branch December 9, 2021 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants