Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy: improve CNP initial sync #15492

Merged
merged 1 commit into from
Apr 6, 2021

Conversation

jaffcheng
Copy link
Contributor

@jaffcheng jaffcheng commented Mar 29, 2021

Check for the existence of CNP egress rules before translating
since we only do translation for egress rules.
This reduces the time cost of an add event of CNP without egress rules
and makes cilium agent start faster in a cluster with large amount of
CNPs and services.

Time cost comparison in a cluster with 10k CCNPs (all without egress rule), 44k services/endpoints:

Before:

per CNP add event time cost              20 ~ 30 ms
overall time cost of initial CNP sync    255 s

After:

per CNP add event time cost              0.5 ~ 1 ms
overall time cost of initial CNP sync    15 s

Signed-off-by: Jaff Cheng jaff.cheng.sh@gmail.com

@jaffcheng jaffcheng requested review from a team and errordeveloper March 29, 2021 11:04
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 29, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Mar 29, 2021
@aanm aanm added the release-note/misc This PR makes changes that have no direct user impact. label Mar 29, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 29, 2021
@aanm
Copy link
Member

aanm commented Mar 29, 2021

test-me-please

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Did you happen to make any time comparisons before and after this commit?

@jaffcheng
Copy link
Contributor Author

jaffcheng commented Mar 29, 2021

🚀 Did you happen to make any time comparisons before and after this commit?

Yes, in one of our clusters with 10k CCNPs (all without egress rule), 44k services/endpoints, time cost of each CNP add event reduced from 20 ~ 30ms to 0.5 ~ 1ms, and the overall time cost of initial CNP sync reduced from 4m15s (exceeds k8s-sync-timeout) to 15s.

@christarazi
Copy link
Member

rocket Did you happen to make any time comparisons before and after this commit?

Yes, in one of our clusters with 10k CCNPs (all without egress rule), 44k services/endpoints, time cost of each CNP add event reduced from 20 ~ 30ms to 0.5 ~ 1ms, and the overall time cost of initial CNP sync reduced from 4m15s (exceeds k8s-sync-timeout) to 15s.

Awesome, could you encode that in the commit msg?

@jaffcheng jaffcheng force-pushed the improve-cnp-sync-upstream branch 2 times, most recently from f22a3ce to af45b61 Compare March 30, 2021 08:04
@jaffcheng
Copy link
Contributor Author

Sure, commit msg updated, PTAL @christarazi

@aanm
Copy link
Member

aanm commented Mar 30, 2021

test-me-please

@aanm
Copy link
Member

aanm commented Apr 2, 2021

@jaffcheng we have fixed a lot of flakes in our CI. Could you rebase against master so that we can trigger the tests again? Thank you!

Check for the existence of CNP egress rules before translating
since we only do translation for egress rules.
This reduces the time cost of an add event of CNP without egress rules
and makes cilium agent start faster in a cluster with large amount of
CNPs and services.

Time cost comparison in a cluster with 10k CCNPs (all without egress rule), 44k services/endpoints:

Before:

    per CNP add event time cost              20 ~ 30 ms
    overall time cost of initial CNP sync    255 s

After:

    per CNP add event time cost              0.5 ~ 1 ms
    overall time cost of initial CNP sync    15 s

Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
@jaffcheng
Copy link
Contributor Author

Rebased

@aanm
Copy link
Member

aanm commented Apr 2, 2021

test-me-please

@pchaigno
Copy link
Member

pchaigno commented Apr 6, 2021

k8s-1.21-kernel-4.9 is failing with known flake #14959. Other CI jobs are passing and reviews are in. Merging.

@pchaigno pchaigno added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. labels Apr 6, 2021
@pchaigno pchaigno merged commit 3366a2b into cilium:master Apr 6, 2021
1.10.0 automation moved this from In progress to Done Apr 6, 2021
@jaffcheng jaffcheng deleted the improve-cnp-sync-upstream branch April 6, 2021 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants