Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fqdn: Delay ipcache upserts until policies have been updated #14084

Merged
merged 2 commits into from Nov 24, 2020

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Nov 19, 2020

Add a map for newly allocated identities to ipcache.AllocateCIDR
functions that the caller can use to insert the IPs to the IP cache later,
after affected endpoint policy maps have been updated.

Use this new functionality on the DNS proxy and policy update code
paths, that makes sure that new policy map entries are in place before
an IP received from a DNS server or a CIDR included in a policy is placed
in IP cache. This is really straightforward as the logic for waiting was
already in place.

The DNS poller path still inserts in to IP cache before policies
have been updated.

Signed-off-by: Jarno Rajahalme jarno@covalent.io

@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Nov 19, 2020
@jrajahalme jrajahalme requested a review from a team as a code owner November 19, 2020 07:10
@jrajahalme jrajahalme requested review from nathanjsweet and removed request for a team November 19, 2020 07:10
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.7 kind/backports This PR provides functionality previously merged into master. labels Nov 19, 2020
@jrajahalme
Copy link
Member Author

test-backport-1.7

@jrajahalme
Copy link
Member Author

Locally passed runtime FQDN tests.

@jrajahalme
Copy link
Member Author

This is not a backport even though backport labels were automatically added :-)

@tgraf tgraf force-pushed the pr/jrajahalme/1.7-policy-before-ipcache branch from cc49269 to a799ba7 Compare November 19, 2020 09:47
@tgraf
Copy link
Member

tgraf commented Nov 19, 2020

Fixed up unit tests

@tgraf tgraf force-pushed the pr/jrajahalme/1.7-policy-before-ipcache branch from a799ba7 to 4b6e697 Compare November 19, 2020 16:03
@jrajahalme
Copy link
Member Author

test-backport-1.7

Add a map for newly allocated identities to ipcache.AllocateCIDR
functions that the caller can use to upsert the IPs to ipcache later,
after affected endpoint policy maps have been updated.

Use this new functionality on the DNS proxy code path, that makes sure
that new policy map entries are in place before an IP received from a
DNS server is placed in ipcache. This is really straightforward as the
logic for waiting was already in place for delaying the forwarding of
the DNS response.

Policy update path is still allowing ipcache upserts at policy
ingestion time rather than waiting for the policy maps to be
updated. This means that new, more specific CIDRs (e.g., 10.0.0/24) in
policies can still cause momentary drops on traffic currently using a
less specific CIDR (e.g., 10.0/16).

Similarly the DNS poller path still upserts to ipcache before policies
have been updated.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/1.7-policy-before-ipcache branch from 4b6e697 to 3b1ef2b Compare November 19, 2020 17:50
@jrajahalme
Copy link
Member Author

test-backport-1.7

@jrajahalme
Copy link
Member Author

CI infra problems:

11:04:24  go: creating work dir: mkdir /tmp/go-build035857946: no space left on device 

@jrajahalme
Copy link
Member Author

test-backport-1.7

@jrajahalme
Copy link
Member Author

Build K8s 1.17 vm provisioning fail #20005 (Nov 19, 2020 9:48:01 PM), everything else succeeded, testing again.

@jrajahalme
Copy link
Member Author

test-backport-1.7

1 similar comment
@jrajahalme
Copy link
Member Author

test-backport-1.7

… regenerated by endpoints.

Move ipcache CIDR upserts and releases to the policy reaction queue,
where upserts can be executed after regenerations have been completed,
i.e. after endpoint policy maps have been updated. This way IP
addresses are mapped to newly allocated identities only after endpoint
policy maps are ready to classify them.

Correspondingly, on deletes the to-be-deleted CIDR identities are
first deleted from ipcache so that when they are deleted from endpoint
policy maps they are no longer used in classification. Releases of
CIDR identities must still be serialized with ipcache upserts via the
policy reaction queue so that they are executed in the same order
w.r.t. ipcache upserts as policy deletes and adds.

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/1.7-policy-before-ipcache branch from d2911b0 to 86ef8aa Compare November 20, 2020 20:15
@jrajahalme
Copy link
Member Author

Simplified 2nd commit a lot, re-testing

@jrajahalme
Copy link
Member Author

test-backport-1.7

@jrajahalme
Copy link
Member Author

Same failure in two different tests: Cilium agent in ImagePullBackoff in Upgrade Downgrade test. Seems like the docker rate limiting issue?

@joestringer
Copy link
Member

joestringer commented Nov 23, 2020

@jrajahalme have you rebased? I believe we only merged the v1.7 backport for the image pull secrets mitigation on Friday afternoon. The CI run on this PR looks like it was likely run before that other PR was merged.

@joestringer
Copy link
Member

Everything was green except the k8s tests which failed with imagepullbackoff due to docker ratelimits, so I will re-trigger to see if that rebases it to make CI pass.

@joestringer
Copy link
Member

joestringer commented Nov 24, 2020

test-backport-1.8

EDIT: Typo, caused the failed "Runtime-4.9" job below which can be safely ignored.

@joestringer
Copy link
Member

test-backport-1.7

@joestringer
Copy link
Member

The only failures are triggered because I accidentally ran the 1.8 backports jobs, which will guarantee to fail. All of the legitimate test checks were successful. Merging.

@joestringer joestringer merged commit 538223c into v1.7 Nov 24, 2020
@joestringer joestringer deleted the pr/jrajahalme/1.7-policy-before-ipcache branch November 24, 2020 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants