-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run setup on networks in parallel #76
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
AkihiroSuda
reviewed
Dec 7, 2021
AkihiroSuda
approved these changes
Dec 8, 2021
dmcgowan
reviewed
Dec 14, 2021
dmcgowan
reviewed
Dec 14, 2021
mikebrow
force-pushed
the
async-network-setu
branch
from
December 14, 2021 18:02
7e44a24
to
7be0418
Compare
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
mikebrow
force-pushed
the
async-network-setu
branch
from
December 14, 2021 19:46
7be0418
to
9caf1de
Compare
dmcgowan
approved these changes
Dec 14, 2021
fuweid
added a commit
to fuweid/go-cni
that referenced
this pull request
Feb 10, 2022
Before this patch, the package doesn't init CNIConfig.exec field. It is lazy init by CNIConfig.ensureExec() during AddNetworkList. After enhancement [1] , there are always more than two goroutines(one for loopback and other one for eth0) to call ensureExec(). Go runtime doesn't ensure that value-assignment is atomic, and then it is possible to use nil and panic, like [2]. > Programs that modify data being simultaneously accessed by multiple > goroutines must serialize such access. > > from https://golang.org/ref/mem Advice section. I think it should be fixed in github.com/containernetworking/cni library but we need to delivery containerd release 1.6 version in time. I would like to init CNIConfig like what the `ensureExec()` [3] does instead of lazy init. ------------ How to reproduce it: ```golang package main import "sync" type Exec interface { FindInPath(v int) } type rawExec struct { v int } func (r *rawExec) FindInPath(v int) { r.v += v } type cniConfig struct { exec Exec } func (c *cniConfig) addNetwork(v int) { if c.exec == nil { c.exec = &rawExec{} } c.exec.FindInPath(v) } func main() { c := &cniConfig{} for i := 0; i < 100000; i++ { c.exec = nil // reset var wg sync.WaitGroup for j := 0; j <= 10; j++ { wg.Add(1) go func(k int) { defer wg.Done() c.addNetwork(k) }(j) } wg.Wait() } } ``` And run it ```bash ➜ /tmp go version go version go1.17.3 linux/amd64 ➜ /tmp go run main.go panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x455a40] goroutine 264932 [running]: main.(*rawExec).FindInPath(0x0, 0x0) /tmp/main.go:14 main.(*cniConfig).addNetwork(...) /tmp/main.go:25 main.main.func1(0x0) /tmp/main.go:40 +0xb6 created by main.main /tmp/main.go:37 +0xb8 exit status 2 ➜ /tmp ``` Root Cause: The panic meesage is `main.(*rawExec).FindInPath(0x0, 0x0)` which means that the interface `exec` var has the underly type `rawExec` but its value is zero `FindInPath(0x0`, like what [2] mentioned. ```golang-assembly // go tool compile -S ./main.go // "".(*cniConfig).addNetwork STEXT size=148 args=0x10 locals=0x20 funcid=0x0 ... ... # (AX) is cniConfig address -> checkout c.exec == nil 0x0014 00020 (./main.go:22) CMPQ (AX), $0 0x0018 00024 (./main.go:22) JNE 95 # if c.exec == nil # # Since the go interface is struct iface # # https://github.com/golang/go/blob/release-branch.go1.17/src/runtime/runtime2.go#L202 # # type iface struct { # tab *itab # data unsafe.Pointer # } # # The value-assignment will be two instructions(one for tab and # the other one for data. 0x001a 00026 (./main.go:21) MOVQ AX, "".c+40(SP) 0x001f 00031 (./main.go:25) MOVQ BX, ""..autotmp_4+16(SP) # # AX will be rawExec type information # # type."".rawExec SRODATA size=120 # ... # rel 24+8 t=1 runtime.memequal64·f+0 # rel 32+8 t=1 runtime.gcbits.+0 # rel 40+4 t=5 type..namedata.*main.rawExec-+0 # rel 44+4 t=5 type.*"".rawExec+0 # rel 48+8 t=1 type..importpath."".+0 # rel 56+8 t=1 type."".rawExec+96 # rel 80+4 t=5 type..importpath."".+0 # rel 96+8 t=1 type..namedata.v-+0 # rel 104+8 t=1 type.int+0 # 0x0024 00036 (./main.go:23) LEAQ type."".rawExec(SB), AX 0x002b 00043 (./main.go:23) PCDATA $1, $0 0x002b 00043 (./main.go:23) CALL runtime.newobject(SB) # # CX will be itab and get FindInPath TEXT address. # # go.itab.*"".rawExec,"".Exec SRODATA dupok size=32 # ... # rel 0+8 t=1 type."".Exec+0 # rel 8+8 t=1 type.*"".rawExec+0 # rel 24+8 t=-32767 "".(*rawExec).FindInPath+0 # 0x0030 00048 (./main.go:23) LEAQ go.itab.*"".rawExec,"".Exec(SB), CX 0x0037 00055 (./main.go:23) MOVQ "".c+40(SP), DX 0x003c 00060 (./main.go:23) MOVQ CX, (DX) 0x003f 00063 (./main.go:23) PCDATA $0, $-2 0x003f 00063 (./main.go:23) CMPL runtime.writeBarrier(SB), $0 0x0046 00070 (./main.go:23) JNE 78 0x0048 00072 (./main.go:23) MOVQ AX, 8(DX) 0x004c 00076 (./main.go:23) JMP 87 0x004e 00078 (./main.go:23) LEAQ 8(DX), DI 0x0052 00082 (./main.go:23) CALL runtime.gcWriteBarrier(SB) 0x0057 00087 (./main.go:25) PCDATA $0, $-1 0x0057 00087 (./main.go:25) MOVQ DX, AX 0x005a 00090 (./main.go:25) MOVQ ""..autotmp_4+16(SP), BX # if c.exec != nil and try to call FindInPath # # (AX) is cniConfig # 0x005f 00095 (./main.go:25) MOVQ (AX), CX # # 8(AX) is rawExec address (receiver) # 0x0062 00098 (./main.go:25) MOVQ 8(AX), AX # # 24(CX) is the FindInPath TEXT address. # 0x0066 00102 (./main.go:25) MOVQ 24(CX), CX 0x006a 00106 (./main.go:25) PCDATA $1, $1 0x006a 00106 (./main.go:25) CALL CX 0x006c 00108 (./main.go:26) MOVQ 24(SP), BP 0x0071 00113 (./main.go:26) ADDQ $32, SP 0x0075 00117 (./main.go:26) RET ... ... ``` Since there are multiple goroutines to check `c.exec == nil`, if the lazy-init only assigns iface.tab and the FindInPath TEXT address, interface's value is still nil and the FindInPath will use the interface's value as method receiver. It will panic if try to read the value from receiver. Reference: [1] containerd#76 [2] moby/buildkit#2589 (comment) [3] https://github.com/containernetworking/cni/blob/1694fd7b57e0176a98a12823a5ffc03337fdc152/libcni/api.go#L182 Fixes: containerd#76 Signed-off-by: Wei Fu <fuweid89@gmail.com>
This was referenced May 13, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Run setup on the networks in parallel.. partial address of #6
On average I'm seeing an approximate 30% reduction in cni setup duration by running the cni network attach requests in parallel.
before 2.049 seconds
after cni network setup parallelization 1.388 seconds
Note: the list of plugins still runs serially for each network attach request. To run the plugins in parallel would require a containernetworking/cni PR if they can be run in parallel would have to consider the possibility of plugins having a serial dependency with other plugins on the same network interface.
Signed-off-by: Mike Brown brownwm@us.ibm.com