Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: [WIN-NPM] add all cached NetworkPolicies to a Pod at once #1893

Merged
merged 20 commits into from
Apr 26, 2023

Conversation

huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Apr 5, 2023

Reason for Change:
Reduces time spent on ACL SysCalls by 99% at mid/high scale.

Specifically, when all of a Pod's NetworkPolicies have been cached. For example, for any Pod CRUD after NPM has reached steady state. Further optimized in #1900.

Issue Fixed:
Second of three PRs addressing Windows limitations in #1614.

Requirements:

Notes:

Experiment

500 NetworkPolicies applied to a Pod.

Results

  • Old NPM: 2 hours to apply NetworkPolicies sequentially.
  • This PR (MaxBatchedACLsPerPod=33): 1 min 10 seconds for all Pods (even adding ACLs to the 15th Pod).

@huntergregory huntergregory added npm Related to NPM. windows labels Apr 5, 2023
@huntergregory huntergregory changed the title perf: [POC] [WIN-NPM] add all NetPols for UpdatePod at once perf: [WIN-NPM] add all NetworkPolicies to a Pod at once Apr 7, 2023
@huntergregory huntergregory changed the title perf: [WIN-NPM] add all NetworkPolicies to a Pod at once perf: [WIN-NPM] add all cached NetworkPolicies to a Pod at once Apr 7, 2023
@huntergregory huntergregory mentioned this pull request Apr 7, 2023
3 tasks
@huntergregory huntergregory force-pushed the hgregory/04-04-update-pod-quicker branch from f2620f1 to 15857d9 Compare April 20, 2023 00:09
@huntergregory huntergregory force-pushed the hgregory/04-04-update-pod-quicker branch from 15857d9 to 9ba6ecc Compare April 20, 2023 00:11
defaultResyncPeriod = 15
defaultApplyMaxBatches = 100
defaultApplyInterval = 500
defaultMaxBatchedACLsPerPod = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make this smaller, if there is an issue with applying ACLs due to some weird limit of data type they are using underneath, we can end up in a limbo of not being able to apply a bunch of network policies at once.

Smaller means more network policies are guaranteed to get applied. Wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline to make this 30

}

klog.Infof("[PolicyManager] finished applying all rules to endpoint for batch %d out of %d. endpoint ID: %s", i+1, len(ruleBatches), epToModifyID)
for policyKey := range policyKeys {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you applied a portion of policies and you error out ? you need to persist the state of policies you already applied on that endpoint so we dont end up with duplicate policies?

"ListeningPort": 10091,
"ListeningAddress": "0.0.0.0",
"ApplyMaxBatches": 100,
"ResyncPeriodInMinutes": 15,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add windows ds to this yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline that there may be script dependencies on either yaml

}
}

return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return list of policyKeys you applied here


if err := dp.policyMgr.AddAllPolicies(toAddPolicies, endpoint.id, endpoint.ip); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure to update netpolrtef even if this errors out. because we could have applied a portion of the batch of policies

@huntergregory huntergregory marked this pull request as ready for review April 21, 2023 00:49
@huntergregory huntergregory requested a review from a team as a code owner April 21, 2023 00:49
@huntergregory huntergregory requested review from vakalapa and removed request for a team April 21, 2023 00:49
@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@vakalapa vakalapa merged commit 61aae03 into master Apr 26, 2023
@vakalapa vakalapa deleted the hgregory/04-04-update-pod-quicker branch April 26, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
npm Related to NPM. windows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants