New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Do not pre-allocate BPF maps by default #6357

Merged
merged 5 commits into from Jan 10, 2019

Conversation

@tgraf
Copy link
Member

tgraf commented Dec 1, 2018

Set the daemon option to by default have the same setting as previous
versions of Cilium (enable preallocation where supported).

Update the kubernetes YAMLs such that users who upgrade just the
DaemonSet will have minimal upgrade impact as the current maps will be
retained with pre-allocated values. At the same time, add a default
option to the ConfigMap such that new users will automatically deploy
without pre-allocation, minimizing memory usage in newer deployments.

For more information on the new setting, see the ConfigMap comments.

Individual commit logs have further information.

Related: #6351

This change is Reviewable

@tgraf tgraf requested review from cilium/agent as code owners Dec 1, 2018

@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Dec 1, 2018

test-me-please

Show resolved Hide resolved pkg/option/config.go Outdated
Show resolved Hide resolved pkg/bpf/bpf.go Outdated
Show resolved Hide resolved pkg/bpf/bpf.go Outdated
Show resolved Hide resolved bpf/bpf_lxc.c Outdated

@tgraf tgraf force-pushed the pr/tgraf/no-prealloc branch from 9a0d7a0 to 7be7b11 Dec 1, 2018

@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Dec 1, 2018

test-me-please

@coveralls

This comment has been minimized.

Copy link

coveralls commented Dec 1, 2018

Coverage Status

Coverage increased (+0.03%) to 43.925% when pulling 4883977 on pr/tgraf/no-prealloc into 5de8828 on master.

@aanm

This comment has been minimized.

Copy link
Member

aanm commented Dec 1, 2018

test-me-please

1 similar comment
@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Dec 2, 2018

test-me-please

@tgraf tgraf force-pushed the pr/tgraf/no-prealloc branch from 7be7b11 to 3e847e0 Dec 2, 2018

@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Dec 2, 2018

test-me-please

@tgraf tgraf force-pushed the pr/tgraf/no-prealloc branch from 3e847e0 to 6a2eebd Dec 2, 2018

@tgraf tgraf requested review from cilium/endpoint as code owners Dec 2, 2018

@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Dec 2, 2018

test-me-please

Show resolved Hide resolved bpf/lib/maps.h Outdated
@borkmann
Copy link
Member

borkmann left a comment

Aside from the one issue lgtm

@joestringer

This comment has been minimized.

Copy link
Contributor

joestringer commented Dec 7, 2018

The approach where the option flags are checked in the bpf.NewMap(...) invocations doesn't really work, because this tends to run on application init rather than after the daemon's commandlines have been parsed. I'll shift that logic into OpenOrCreateMap(...).

@joestringer joestringer force-pushed the pr/tgraf/no-prealloc branch from 6a2eebd to 721f8be Dec 7, 2018

@tgraf tgraf requested a review from cilium/kubernetes as a code owner Dec 7, 2018

@joestringer joestringer force-pushed the pr/tgraf/no-prealloc branch from 721f8be to 3441eeb Dec 7, 2018

@joestringer joestringer added pending-review and removed wip labels Dec 7, 2018

joestringer and others added some commits Jan 9, 2019

test: Enable BPF preallocation by default
Enable the upcoming BPF preallocation by default in the tests, so that
upgrade can be undertaken with minimal impact. Note that if this option
is toggled during upgrade, then it may have adverse effects.

This is being added because when it is set to false, the k8s upgrade
test is regularly failing, likely because the deletion of existing maps
causes etcd connectivity issues, which requires the etcd operator to
delete the pods and recreate them to get back into a good state. This
typically takes longer than the timeouts specified in the upgrade test,
causing the test to fail.

Further investigation is required to establish the exact behaviour for
toggling this option and how to safely enable/disable it; and also to
test the behaviour.

Signed-off-by: Joe Stringer <joe@cilium.io>
daemon: Refactor service map open
Refactor this code into the loadbalancer.go file, which is a more
natural place for all of the service-related map open logic.

Signed-off-by: Joe Stringer <joe@cilium.io>
bpf: Be less aggressive when pre-allocating BPF maps
The default so far has been to pre-allocate all BPF maps. This has performance
benefits and avoids RCU free pile up but comes at a steep cost of memory
footprint.

Disable pre-allocation for all non-LRU map types and provide an option
--preallocate-bpf-maps to re-enable the behavior.

Signed-off-by: Thomas Graf <thomas@cilium.io>
Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
daemon: Retain map preallocation settings by default
Set the daemon option to by default have the same setting as previous
versions of Cilium (enable preallocation where supported).

Update the kubernetes YAMLs such that users who upgrade just the
DaemonSet will have minimal upgrade impact as the current maps will be
retained with pre-allocated values. At the same time, add a default
option to the ConfigMap such that new users will automatically deploy
without pre-allocation, minimizing memory usage in newer deployments.

For more information on the new setting, see the ConfigMap comments.

Signed-off-by: Joe Stringer <joe@cilium.io>
examples: Regenerate YAMLs
Regenerate YAMLs based on the recent changes.

Signed-off-by: Joe Stringer <joe@cilium.io>

@joestringer joestringer force-pushed the pr/tgraf/no-prealloc branch from 54f6f1e to 4883977 Jan 10, 2019

@joestringer

This comment has been minimized.

Copy link
Contributor

joestringer commented Jan 10, 2019

test-me-please

if _, err := proxymap.Proxy4Map.OpenOrCreate(); err != nil {
return err
}
if err := openServiceMaps(); err != nil {

This comment has been minimized.

@ianvernon

ianvernon Jan 10, 2019

Contributor

Thanks for putting this into a separate function!

if _, err := lbmap.RRSeq4Map.OpenOrCreate(); err != nil {
return err
}
if _, err := proxymap.Proxy4Map.OpenOrCreate(); err != nil {

This comment has been minimized.

@ianvernon

ianvernon Jan 10, 2019

Contributor

Only slight nit I have here is that proxy maps are unrelated to the loadbalancer code, but I think having this code in a separate function is better than how it is currently in master.

# If this value is modified, then during the next Cilium startup the restore
# of existing endpoints and tracking of ongoing connections may be disrupted.
# This may lead to policy drops or a change in loadbalancing decisions for a
# connection for some time. Endpoints may need to be recreated to restore

This comment has been minimized.

@ianvernon

ianvernon Jan 10, 2019

Contributor

This implies that the state is not able to be recovered over time? Or is it just that because the applications may get into a bad state themselves vs. in a bad state on the Cilium side?

This comment has been minimized.

@joestringer

joestringer Jan 10, 2019

Contributor

Basically Cilium completely forgets about endpoints that were previously managed, and they must be restarted in order for the new instance of Cilium to start managing them.

EDIT: The above is wrong, that was with clean-cilium-state enabled in the ConfigMap.

In local upgrade testing of just the Cilium component, it surprisingly seems to update pretty seamlessly (admittedly without testing services, which I'd expect to break due to loadbalancing decisions being lost). I suspect that in the upgrade test, because the etcd instance is being redeployed, some more serious problems occur during bootstrap which take some time to get back into a good state.

@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Jan 10, 2019

I can't approve formally as I've contributed commits. So LGTM via comment.

@tgraf tgraf merged commit 3185111 into master Jan 10, 2019

4 checks passed

Cilium-Ginkgo-Tests Build finished.
Details
Hound No violations found. Woof!
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.03%) to 43.925%
Details

1.4 automation moved this from In Progress to Done Jan 10, 2019

@tgraf tgraf deleted the pr/tgraf/no-prealloc branch Jan 10, 2019

@tgraf tgraf referenced this pull request Jan 14, 2019

Merged

1.4.0-rc3 backports #6633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment