Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run setup on networks in parallel #76

Merged
merged 2 commits into from
Dec 15, 2021

Conversation

mikebrow
Copy link
Member

@mikebrow mikebrow commented Dec 7, 2021

Run setup on the networks in parallel.. partial address of #6

On average I'm seeing an approximate 30% reduction in cni setup duration by running the cni network attach requests in parallel.

DEBU[2021-12-06T13:31:36.258694903-06:00] begin cni setup                               podsandboxid=c2201c8869e4974f31e9dd59f991935243eec0f4ef7fb25d8173caabce42251d
DEBU[2021-12-06T13:31:38.307254529-06:00] cni result: {"Interfaces":{"cni0":{"IPConfigs":null,"Mac":"ea:78:3c:3a:aa:a9","Sandbox":""},

"eth0":{"IPConfigs":[{"IP":"10.88.0.233","Gateway":"10.88.0.1"},{"IP":"2001:4860:4860::e9","Gateway":"2001:4860:4860::1"}],"Mac":"b2:a1:c0:16:d5:e2","Sandbox":"/var/run/netns/cni-51d91578-476f-c880-270f-3c16ebe5997f"},

"lo":{"IPConfigs":[{"IP":"127.0.0.1","Gateway":""},{"IP":"::1","Gateway":""}],"Mac":"00:00:00:00:00:00","Sandbox":"/var/run/netns/cni-51d91578-476f-c880-270f-3c16ebe5997f"},

"veth47f7d31d":{"IPConfigs":null,"Mac":"32:dd:8a:99:66:c9","Sandbox":""}},"DNS":[{},{}],"Routes":[{"dst":"0.0.0.0/0"},{"dst":"::/0"}]}  podsandboxid=c2201c8869e4974f31e9dd59f991935243eec0f4ef7fb25d8173caabce42251d

before 2.049 seconds

DEBU[2021-12-07T11:00:01.508232056-06:00] begin cni setup                               podsandboxid=dfd1ee983bd4b0ddd15816e139108f3ae588384b8022c72093ee39bc462dece1
DEBU[2021-12-07T11:00:02.896690448-06:00] cni result: {"Interfaces":{"cni0":{"IPConfigs":null,"Mac":"ea:78:3c:3a:aa:a9","Sandbox":""},

"eth0":{"IPConfigs":[{"IP":"10.88.0.235","Gateway":"10.88.0.1"},{"IP":"2001:4860:4860::eb","Gateway":"2001:4860:4860::1"}],"Mac":"7a:9c:48:af:5a:3d","Sandbox":"/var/run/netns/cni-e09d8559-45cc-4c26-a07e-72deba10952e"},

"lo":{"IPConfigs":[{"IP":"127.0.0.1","Gateway":""},{"IP":"::1","Gateway":""}],"Mac":"00:00:00:00:00:00","Sandbox":"/var/run/netns/cni-e09d8559-45cc-4c26-a07e-72deba10952e"},

"vethf1f5b697":{"IPConfigs":null,"Mac":"ea:83:c6:7a:db:9e","Sandbox":""}},"DNS":[{},{}],"Routes":[{"dst":"0.0.0.0/0"},{"dst":"::/0"}]}  podsandboxid=dfd1ee983bd4b0ddd15816e139108f3ae588384b8022c72093ee39bc462dece1

after cni network setup parallelization 1.388 seconds

Note: the list of plugins still runs serially for each network attach request. To run the plugins in parallel would require a containernetworking/cni PR if they can be run in parallel would have to consider the possibility of plugins having a serial dependency with other plugins on the same network interface.

Signed-off-by: Mike Brown brownwm@us.ibm.com

Signed-off-by: Mike Brown <brownwm@us.ibm.com>
cni.go Show resolved Hide resolved
cni.go Outdated Show resolved Hide resolved
cni.go Outdated Show resolved Hide resolved
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
@mikebrow mikebrow merged commit 2d9d28f into containerd:main Dec 15, 2021
fuweid added a commit to fuweid/go-cni that referenced this pull request Feb 10, 2022
Before this patch, the package doesn't init CNIConfig.exec field. It is
lazy init by CNIConfig.ensureExec() during AddNetworkList. After
enhancement [1] , there are always more than two goroutines(one for
loopback and other one for eth0) to call ensureExec(). Go runtime
doesn't ensure that value-assignment is atomic, and then it is possible
to use nil and panic, like [2].

> Programs that modify data being simultaneously accessed by multiple
> goroutines must serialize such access.
>
> from https://golang.org/ref/mem Advice section.

I think it should be fixed in github.com/containernetworking/cni library
but we need to delivery containerd release 1.6 version in time. I would
like to init CNIConfig like what the `ensureExec()` [3] does instead of lazy
init.

------------

How to reproduce it:

```golang
package main

import "sync"

type Exec interface {
	FindInPath(v int)
}

type rawExec struct {
	v int
}

func (r *rawExec) FindInPath(v int) {
	r.v += v
}

type cniConfig struct {
	exec Exec
}

func (c *cniConfig) addNetwork(v int) {
	if c.exec == nil {
		c.exec = &rawExec{}
	}
	c.exec.FindInPath(v)
}

func main() {
	c := &cniConfig{}

	for i := 0; i < 100000; i++ {
		c.exec = nil // reset

		var wg sync.WaitGroup
		for j := 0; j <= 10; j++ {
			wg.Add(1)
			go func(k int) {
				defer wg.Done()

				c.addNetwork(k)
			}(j)
		}
		wg.Wait()
	}
}
```

And run it

```bash
➜  /tmp go version
go version go1.17.3 linux/amd64

➜  /tmp go run main.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x455a40]

goroutine 264932 [running]:
main.(*rawExec).FindInPath(0x0, 0x0)
	/tmp/main.go:14
main.(*cniConfig).addNetwork(...)
	/tmp/main.go:25
main.main.func1(0x0)
	/tmp/main.go:40 +0xb6
created by main.main
	/tmp/main.go:37 +0xb8
exit status 2
➜  /tmp

```

Root Cause:

The panic meesage is `main.(*rawExec).FindInPath(0x0, 0x0)` which means
that the interface `exec` var has the underly type `rawExec` but its
value is zero `FindInPath(0x0`, like what [2] mentioned.

```golang-assembly
// go tool compile -S ./main.go
//
"".(*cniConfig).addNetwork STEXT size=148 args=0x10 locals=0x20 funcid=0x0
	...
	...

	# (AX) is cniConfig address -> checkout c.exec == nil
        0x0014 00020 (./main.go:22)     CMPQ    (AX), $0
        0x0018 00024 (./main.go:22)     JNE     95

	# if c.exec == nil
	#
	# Since the go interface is struct iface
	#
	# https://github.com/golang/go/blob/release-branch.go1.17/src/runtime/runtime2.go#L202
	#
	# type iface struct {
    	# 	tab  *itab
    	#	data unsafe.Pointer
	# }
	#
	# The value-assignment will be two instructions(one for tab and
	# the other one for data.
        0x001a 00026 (./main.go:21)     MOVQ    AX, "".c+40(SP)
        0x001f 00031 (./main.go:25)     MOVQ    BX, ""..autotmp_4+16(SP)
	#
	# AX will be rawExec type information
	#
	#   	type."".rawExec SRODATA size=120
	#		...
        #	rel 24+8 t=1 runtime.memequal64·f+0
        #	rel 32+8 t=1 runtime.gcbits.+0
        #	rel 40+4 t=5 type..namedata.*main.rawExec-+0
        #	rel 44+4 t=5 type.*"".rawExec+0
        #	rel 48+8 t=1 type..importpath."".+0
        #	rel 56+8 t=1 type."".rawExec+96
        #	rel 80+4 t=5 type..importpath."".+0
        #	rel 96+8 t=1 type..namedata.v-+0
        #	rel 104+8 t=1 type.int+0
	#
        0x0024 00036 (./main.go:23)     LEAQ    type."".rawExec(SB), AX
        0x002b 00043 (./main.go:23)     PCDATA  $1, $0
        0x002b 00043 (./main.go:23)     CALL    runtime.newobject(SB)
	#
	# CX will be itab and get FindInPath TEXT address.
	#
	# 	go.itab.*"".rawExec,"".Exec SRODATA dupok size=32
	#		...
        # 	rel 0+8 t=1 type."".Exec+0
        #	rel 8+8 t=1 type.*"".rawExec+0
        #	rel 24+8 t=-32767 "".(*rawExec).FindInPath+0
	#
        0x0030 00048 (./main.go:23)     LEAQ    go.itab.*"".rawExec,"".Exec(SB), CX
        0x0037 00055 (./main.go:23)     MOVQ    "".c+40(SP), DX
        0x003c 00060 (./main.go:23)     MOVQ    CX, (DX)
        0x003f 00063 (./main.go:23)     PCDATA  $0, $-2
        0x003f 00063 (./main.go:23)     CMPL    runtime.writeBarrier(SB), $0
        0x0046 00070 (./main.go:23)     JNE     78
        0x0048 00072 (./main.go:23)     MOVQ    AX, 8(DX)
        0x004c 00076 (./main.go:23)     JMP     87
        0x004e 00078 (./main.go:23)     LEAQ    8(DX), DI
        0x0052 00082 (./main.go:23)     CALL    runtime.gcWriteBarrier(SB)
        0x0057 00087 (./main.go:25)     PCDATA  $0, $-1
        0x0057 00087 (./main.go:25)     MOVQ    DX, AX
        0x005a 00090 (./main.go:25)     MOVQ    ""..autotmp_4+16(SP), BX

	# if c.exec != nil and try to call FindInPath
	#
	# (AX) is cniConfig
	#
        0x005f 00095 (./main.go:25)     MOVQ    (AX), CX
	#
	# 8(AX) is rawExec address (receiver)
	#
        0x0062 00098 (./main.go:25)     MOVQ    8(AX), AX
	#
	# 24(CX) is the FindInPath TEXT address.
	#
        0x0066 00102 (./main.go:25)     MOVQ    24(CX), CX
        0x006a 00106 (./main.go:25)     PCDATA  $1, $1
        0x006a 00106 (./main.go:25)     CALL    CX
        0x006c 00108 (./main.go:26)     MOVQ    24(SP), BP
        0x0071 00113 (./main.go:26)     ADDQ    $32, SP
        0x0075 00117 (./main.go:26)     RET
	...
	...
```

Since there are multiple goroutines to check `c.exec == nil`, if the
lazy-init only assigns iface.tab and the FindInPath TEXT address,
interface's value is still nil and the FindInPath will use the
interface's value as method receiver. It will panic if try to read the
value from receiver.

Reference:

[1] containerd#76
[2] moby/buildkit#2589 (comment)
[3] https://github.com/containernetworking/cni/blob/1694fd7b57e0176a98a12823a5ffc03337fdc152/libcni/api.go#L182

Fixes: containerd#76

Signed-off-by: Wei Fu <fuweid89@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants