Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rare crash on startup when kubernetes initialization occurs before IP address configuration #14299

Merged
merged 1 commit into from
Dec 11, 2020

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Dec 8, 2020

Previously, a race condition between Kubernetes initialization logic and
the IPAM / address handling / NodeDiscovery logic could cause a crash on
startup in rare circumstances due to the k8s subsystem handling a policy
update event before the initialization was complete.

André pointed out that InitK8sSubsystem() occurs asynchronously to
initialize and begin handling k8s events, and this is initiated prior to
the allocation / initialization of the IP handling subsystem. Looking at
the stacktrace, we see that the "index out of range" error is triggered
from handling a policyAdd event, through the ObjectCache initialization,
through to the headerfile write / nodeconfig fetch logic, through to
logic which would serialize the node IP addressing information into the
node config file, subsequently to be used by other datapath
initialization logic:

panic: runtime error: index out of range [3] with length 0

goroutine 145 [running]:
encoding/binary.bigEndian.Uint32(...)
    /usr/local/go/src/encoding/binary/binary.go:112
github.com/cilium/cilium/pkg/byteorder.HostSliceToNetwork(...)
    /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:134 +0x1ee
github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(...)
    /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:108 +0x5baf
...
github.com/cilium/cilium/pkg/datapath/loader.(*Loader).Reinitialize(...)
    /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:187 +0x2c0
github.com/cilium/cilium/daemon/cmd.(*Daemon).policyAdd(...)
    /go/src/github.com/cilium/cilium/daemon/cmd/policy.go:285 +0x1ff7
github.com/cilium/cilium/daemon/cmd.(*PolicyAddEvent).Handle(...)
    /go/src/github.com/cilium/cilium/daemon/cmd/policy.go:217 +0x5c
...
created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run
    /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:240 +0x62

This occurs when converting the IPv4 loopback IP to network byte order,
and the index out of range implies that the IP address is not yet
initialized. The IP loopback address is initialized from
Daemon.allocateIPs() -> node.SetIPv4Loopback(...).

I had initially considered extending EventQueue to allow an
"initialization prerequisite" to be configured in the form of a channel.
This would require the daemon to create a channel for ip initialization
and either pass it down through the stack or pass it via the Daemon
object through to the datapath initialization logic and the endpoint add
logic (including host endpoint add), then into the endpoint EventQueue
initialization routines, which would keep a reference to the channel
around and wait on the close of the channel when EventQueue.Run() is
called. This would ensure that the "wait" is asynchronous, and hence
shouldn't slow down any of the ongoing initialization logic, however
it's a little more invasive than I'd like for a bugfix.

Looking a little closer around this logic, I realized that
loader.Reinitialize() consumes information from NodeDiscovery.
NodeDiscovery is initialized even later than allocateIPs(), and the
NodeDiscovery object has a convenient "Registered" channel which is
closed when the Daemon completes NodeDiscovery initialization. So, if we
wait on that channel, we could ensure all prerequisite IP addressing
information would be initialized prior to datapath generation.

From there, it's just a question of whether this path is guaranteed to
be handled asynchronously or whether there are some other paths that
call Reinitialize() that could potentially end up causing some kind of
deadlock. That does not appear possible upon initial inspection:

  File        Function     Line
0 policy.go   <global>     286 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
1 policy.go   <global>     632 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
2 loader.go   <global>      34 Reinitialize(ctx context.Context, o BaseProgramOwner, deviceMTU int , iptMgr IptablesManager, p Proxy) error
3 config.go   Handle        90 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l8Proxy); err != nil {
4 daemon.go   init         228 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
5 daemon.go   func         795 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
6 datapath.go Reinitialize 131 func (f *fakeLoader) Reinitialize(ctx context.Context, o datapath.BaseProgramOwner, deviceMTU int , iptMgr datapath.IptablesManager, p
                               datapath.Proxy) error {
7 base.go     Reinitialize 171 func (l *Loader) Reinitialize(ctx context.Context, o datapath.BaseProgramOwner, deviceMTU int , iptMgr datapath.IptablesManager, p
                               datapath.Proxy) error {
  1. policyAdd (From EventQueue)
  2. policyDelete (From EventQueue)
  3. type Loader interface
  4. Daemon /config API (API only served after initialization is complete)
  5. Daemon.init() (Occurs after StartDiscovery()
  6. TriggerReloadWithoutCompile (Triggered from ipcache GC controller)
  7. fakeLoader (unit tests)
  8. Loader (function definition)

So, by blocking on node discovery when fetching LocalConfig(), we are
able to ensure that ip addressing configuration is properly initialized
prior to using it from datapath Reinitialize().

Fixes: #14245
Reported-by: @jorge07

@joestringer joestringer added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.8 labels Dec 8, 2020
@joestringer joestringer requested review from a team December 8, 2020 00:26
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Dec 8, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.2 Dec 8, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.7 Dec 8, 2020
@joestringer joestringer requested a review from jibi December 8, 2020 00:26
Previously, a race condition between Kubernetes initialization logic and
the IPAM / address handling / NodeDiscovery logic could cause a crash on
startup in rare circumstances due to the k8s subsystem handling a policy
update event before the initialization was complete.

André pointed out that InitK8sSubsystem() occurs asynchronously to
initialize and begin handling k8s events, and this is initiated prior to
the allocation / initialization of the IP handling subsystem. Looking at
the stacktrace, we see that the "index out of range" error is triggered
from handling a policyAdd event, through the ObjectCache initialization,
through to the headerfile write / nodeconfig fetch logic, through to
logic which would serialize the node IP addressing information into the
node config file, subsequently to be used by other datapath
initialization logic:

    panic: runtime error: index out of range [3] with length 0

    goroutine 145 [running]:
    encoding/binary.bigEndian.Uint32(...)
        /usr/local/go/src/encoding/binary/binary.go:112
    github.com/cilium/cilium/pkg/byteorder.HostSliceToNetwork(...)
        /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:134 +0x1ee
    github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(...)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:108 +0x5baf
    ...
    github.com/cilium/cilium/pkg/datapath/loader.(*Loader).Reinitialize(...)
        /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:187 +0x2c0
    github.com/cilium/cilium/daemon/cmd.(*Daemon).policyAdd(...)
        /go/src/github.com/cilium/cilium/daemon/cmd/policy.go:285 +0x1ff7
    github.com/cilium/cilium/daemon/cmd.(*PolicyAddEvent).Handle(...)
        /go/src/github.com/cilium/cilium/daemon/cmd/policy.go:217 +0x5c
    ...
    created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run
        /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:240 +0x62

This occurs when converting the IPv4 loopback IP to network byte order,
and the index out of range implies that the IP address is not yet
initialized. The IP loopback address is initialized from
Daemon.allocateIPs() -> node.SetIPv4Loopback(...).

I had initially considered extending EventQueue to allow an
"initialization prerequisite" to be configured in the form of a channel.
This would require the daemon to create a channel for ip initialization
and either pass it down through the stack or pass it via the Daemon
object through to the datapath initialization logic and the endpoint add
logic (including host endpoint add), then into the endpoint EventQueue
initialization routines, which would keep a reference to the channel
around and wait on the close of the channel when EventQueue.Run() is
called. This would ensure that the "wait" is asynchronous, and hence
shouldn't slow down any of the ongoing initialization logic, however
it's a little more invasive than I'd like for a bugfix.

Looking a little closer around this logic, I realized that
loader.Reinitialize() consumes information from NodeDiscovery.
NodeDiscovery is initialized even later than allocateIPs(), and the
NodeDiscovery object has a convenient "Registered" channel which is
closed when the Daemon completes NodeDiscovery initialization. So, if we
wait on that channel, we could ensure all prerequisite IP addressing
information would be initialized prior to datapath generation.

From there, it's just a question of whether this path is guaranteed to
be handled asynchronously or whether there are some other paths that
call Reinitialize() that could potentially end up causing some kind of
deadlock. That does not appear possible upon initial inspection:

      File        Function     Line
    0 policy.go   <global>     286 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
    1 policy.go   <global>     632 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
    2 loader.go   <global>      34 Reinitialize(ctx context.Context, o BaseProgramOwner, deviceMTU int , iptMgr IptablesManager, p Proxy) error
    3 config.go   Handle        90 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l8Proxy); err != nil {
    4 daemon.go   init         228 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
    5 daemon.go   func         795 if (err := d.Datapath().Loader().Reinitialize(d.ctx, d, d.mtuConfig.GetDeviceMTU(), d.Datapath(), d.l7Proxy); err != nil {
    6 datapath.go Reinitialize 131 func (f *fakeLoader) Reinitialize(ctx context.Context, o datapath.BaseProgramOwner, deviceMTU int , iptMgr datapath.IptablesManager, p
                                   datapath.Proxy) error {
    7 base.go     Reinitialize 171 func (l *Loader) Reinitialize(ctx context.Context, o datapath.BaseProgramOwner, deviceMTU int , iptMgr datapath.IptablesManager, p
                                   datapath.Proxy) error {

0. policyAdd (From EventQueue)
1. policyDelete (From EventQueue)
2. type Loader interface
3. Daemon /config API (API only served after initialization is complete)
4. Daemon.init() (Occurs after StartDiscovery()
5. TriggerReloadWithoutCompile (Triggered from ipcache GC controller)
6. fakeLoader (unit tests)
7. Loader (function definition)

So, by blocking on node discovery when fetching LocalConfig(), we are
able to ensure that ip addressing configuration is properly initialized
prior to using it from datapath Reinitialize().

Fixes: cilium#14245

Reported-by: Jorge Arco <jorge.arcoma@gmail.com>
Signed-off-by: Joe Stringer <joe@cilium.io>
@aanm
Copy link
Member

aanm commented Dec 8, 2020

test-me-please

@aanm aanm removed their assignment Dec 8, 2020
@joestringer
Copy link
Member Author

@joestringer
Copy link
Member Author

retest-4.9

@joestringer joestringer added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 10, 2020
@jrajahalme jrajahalme merged commit d117d50 into cilium:master Dec 11, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.2 Dec 15, 2020
@joestringer joestringer deleted the submit/k8s-init-after-ipam branch December 15, 2020 22:38
@jibi jibi mentioned this pull request Dec 17, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.7 Dec 17, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.2 Dec 18, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Needs backport from master in 1.8.7 Dec 21, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Needs backport from master in 1.8.7 Dec 21, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.7 Jan 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.7 Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.8.7
Backport done to v1.8
1.9.2
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

Panic on Policy Add Request
5 participants