Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds missing lock for cesTracker operation #18055

Merged
merged 1 commit into from
Jan 5, 2022
Merged

Conversation

Weil0ng
Copy link
Contributor

@Weil0ng Weil0ng commented Nov 30, 2021

Fixes: #17914

Signed-off-by: Weilong Cui cuiwl@google.com

@Weil0ng Weil0ng added the release-note/misc This PR makes changes that have no direct user impact. label Nov 30, 2021
@Weil0ng Weil0ng requested a review from aanm November 30, 2021 05:46
@Weil0ng Weil0ng requested a review from a team as a code owner November 30, 2021 05:46
@Weil0ng
Copy link
Contributor Author

Weil0ng commented Nov 30, 2021

test-1.21-5.4

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Could you split the rename into a dedicated commit, then the bugfixes into dedicated commit(s)? It's a bit tricky to track what the intended logical changes are here when the two sets of changes are all mixed together.

Couple of concerns around the deferred locking below.

operator/pkg/ciliumendpointslice/manager.go Outdated Show resolved Hide resolved
operator/pkg/ciliumendpointslice/manager.go Outdated Show resolved Hide resolved
@Weil0ng Weil0ng changed the title Refactor the locking code of cesTracker. Adds missing lock for cesTracker operation Dec 1, 2021
@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 1, 2021

Thanks for the fix. Could you split the rename into a dedicated commit, then the bugfixes into dedicated commit(s)? It's a bit tricky to track what the intended logical changes are here when the two sets of changes are all mixed together.

Upon further investigation, I think the refactor needs to be more in depth (taking cesTracker out and adding a layer of locked APIs to access the underlying ces instead of accessing it directly). I will do the refactor in another PR, renamed this one to just fix the racing issue for now.

@joestringer
Copy link
Member

The travis failure seems suspicious:

=== RUN   TestCepToCESCounts
=== RUN   TestCepToCESCounts/Insert_CEPs_-_1
=== RUN   TestCepToCESCounts/Insert_CEPs_-_2
=== RUN   TestCepToCESCounts/Insert_CEPs_-_3
=== RUN   TestCepToCESCounts/Insert_CEPs_-_4
=== RUN   TestCepToCESCounts/Check_same_CEP-name_with_CES_name
=== RUN   TestCepToCESCounts/Insert_CEPs_-_1#01
=== RUN   TestCepToCESCounts/Insert_CEPs_-_2#01
=== RUN   TestCepToCESCounts/Insert_CEPs_-_3#01
=== RUN   TestCepToCESCounts/Insert_CEPs_-_4#01
=== RUN   TestCepToCESCounts/Check_same_CEP-name_with_CES_name#01
--- PASS: TestCepToCESCounts (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_1 (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_2 (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_3 (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_4 (0.00s)
    --- PASS: TestCepToCESCounts/Check_same_CEP-name_with_CES_name (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_1#01 (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_2#01 (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_3#01 (0.00s)
    --- PASS: TestCepToCESCounts/Insert_CEPs_-_4#01 (0.00s)
    --- PASS: TestCepToCESCounts/Check_same_CEP-name_with_CES_name#01 (0.00s)
=== RUN   TestInsertAndRemoveCEPsInCache
=== RUN   TestInsertAndRemoveCEPsInCache/Test_Inserting_CEPs_in_cache_and_count_number_of_CEPs_and_CESs
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
The build has been terminated

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 1, 2021

It's dead-locked...the more reason this needs refactoring...

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 2, 2021

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with secondary NodePort device

Failure Output

FAIL: Request from k8s1 to service tftp://[fd05::12]:31446/hello failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 2, 2021

test-race-4.9

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 2, 2021

test-race-4.19

@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.11.0 Dec 3, 2021
@joestringer joestringer added this to Needs backport from master in 1.11.1 Dec 5, 2021
@joestringer joestringer removed this from Needs backport from master in 1.11.0 Dec 5, 2021
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The commit message needs operatrion -> operation.

At a glance this LGTM, but I'm very unfamiliar with this part of the code. I'll defer to @cilium/operator for more thorough review.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 8, 2021

@aanm @cilium/operator friendly ping :)

@aanm
Copy link
Member

aanm commented Dec 9, 2021

/test

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sFQDNTest Validate that FQDN policy continues to work after being updated

Failure Output

FAIL: Can't connect to to a valid target when it should work

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sFQDNTest Validate that FQDN policy continues to work after being updated

Failure Output

FAIL: Can't connect to to a valid target when it should work

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 10, 2021

I think CI hits same flakiness similar to #18184

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 10, 2021

/mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19

👍 created #18218

Fixes: cilium#17914

Signed-off-by: Weilong Cui <cuiwl@google.com>
@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 13, 2021

/test

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 15, 2021

test-race-4.9

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 15, 2021

test-race-4.19

@christarazi
Copy link
Member

@Weil0ng JFYI something is causing the agents to crash in those above Ci test runs.

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 17, 2021

Hmm, this is what I saw in the agent log:

2021-12-15T18:48:06.788021529Z level=debug msg="Resolved reserved identity" identity=remote-node identityLabels="reserved:remote-node" isNew=false subsys=identity-cache
2021-12-15T18:48:06.788023615Z POTENTIAL DEADLOCK: Inconsistent locking. saw this ordering in one goroutine:
2021-12-15T18:48:06.788025602Z happened before
2021-12-15T18:48:06.788027555Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:52 lock.(*internalRWMutex).Lock ??? <<<<<
2021-12-15T18:48:06.788030022Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:51 lock.(*internalRWMutex).Lock ???
2021-12-15T18:48:06.788032051Z /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:803 policy.(*SelectorCache).AddFQDNSelector ???
2021-12-15T18:48:06.788034036Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:483 policy.(*L4Filter).cacheFQDNSelector ???
2021-12-15T18:48:06.788036077Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:472 policy.(*L4Filter).cacheFQDNSelectors ???
2021-12-15T18:48:06.788038036Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:576 policy.createL4Filter ???
2021-12-15T18:48:06.788040182Z /go/src/github.com/cilium/cilium/pkg/policy/rule.go:773 policy.mergeEgressPortProto ???
2021-12-15T18:48:06.788042145Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:696 policy.createL4EgressFilter ???
2021-12-15T18:48:06.788046171Z /go/src/github.com/cilium/cilium/pkg/policy/rule.go:678 policy.mergeEgress ???
2021-12-15T18:48:06.788197091Z /go/src/github.com/cilium/cilium/pkg/policy/rule.go:808 policy.(*rule).resolveEgressPolicy ???
2021-12-15T18:48:06.788210005Z /go/src/github.com/cilium/cilium/pkg/policy/rules.go:102 policy.ruleSlice.resolveL4EgressPolicy ???
2021-12-15T18:48:06.788228799Z /go/src/github.com/cilium/cilium/pkg/policy/repository.go:704 policy.(*Repository).resolvePolicyLocked ???
2021-12-15T18:48:06.788252106Z /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:119 policy.(*PolicyCache).updateSelectorPolicy ???
2021-12-15T18:48:06.788260699Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:229 endpoint.(*Endpoint).regeneratePolicy ???
2021-12-15T18:48:06.788283492Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:229 endpoint.(*Endpoint).regeneratePolicy ???
2021-12-15T18:48:06.788294974Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:816 endpoint.(*Endpoint).runPreCompilationSteps ???
2021-12-15T18:48:06.788318292Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:583 endpoint.(*Endpoint).regenerateBPF ???
2021-12-15T18:48:06.788322643Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:405 endpoint.(*Endpoint).regenerate ???
2021-12-15T18:48:06.788350724Z /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:65 endpoint.(*EndpointRegenerationEvent).Handle ???
2021-12-15T18:48:06.788360851Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:246 eventqueue.(*EventQueue).run.func1 ???
2021-12-15T18:48:06.788382898Z /usr/local/go/src/sync/once.go:69 sync.(*Once).doSlow ???
2021-12-15T18:48:06.788390138Z /usr/local/go/src/sync/once.go:60 sync.(*Once).Do ???
2021-12-15T18:48:06.788421399Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:254 eventqueue.(*EventQueue).run ???
2021-12-15T18:48:06.788426174Z 
2021-12-15T18:48:06.788428217Z happened after
2021-12-15T18:48:06.788453838Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:68 ipcache.GetIDMetadataByIP ??? <<<<<
2021-12-15T18:48:06.788461874Z /go/src/github.com/cilium/cilium/vendor/github.com/sasha-s/go-deadlock/deadlock.go:137 go-deadlock.(*RWMutex).RLock ???
2021-12-15T18:48:06.788494681Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:67 lock.(*internalRWMutex).RLock ???
2021-12-15T18:48:06.788502255Z /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:61 ipcache.AllocateCIDRs ???
2021-12-15T18:48:06.788549465Z /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:98 ipcache.AllocateCIDRsForIPs ???
2021-12-15T18:48:06.788561089Z /go/src/github.com/cilium/cilium/daemon/cmd/identity.go:112 cmd.cachingIdentityAllocator.AllocateCIDRsForIPs ???
2021-12-15T18:48:06.788596963Z /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:490 policy.(*fqdnSelector).allocateIdentityMappings ???
2021-12-15T18:48:06.788604236Z /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:820 policy.(*SelectorCache).AddFQDNSelector ???
2021-12-15T18:48:06.788617920Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:483 policy.(*L4Filter).cacheFQDNSelector ???
2021-12-15T18:48:06.788642908Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:472 policy.(*L4Filter).cacheFQDNSelectors ???
2021-12-15T18:48:06.788647452Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:576 policy.createL4Filter ???
2021-12-15T18:48:06.788652478Z /go/src/github.com/cilium/cilium/pkg/policy/rule.go:773 policy.mergeEgressPortProto ???
2021-12-15T18:48:06.788654697Z /go/src/github.com/cilium/cilium/pkg/policy/l4.go:696 policy.createL4EgressFilter ???
2021-12-15T18:48:06.788689687Z /go/src/github.com/cilium/cilium/pkg/policy/rule.go:678 policy.mergeEgress ???
2021-12-15T18:48:06.788694285Z /go/src/github.com/cilium/cilium/pkg/policy/rule.go:808 policy.(*rule).resolveEgressPolicy ???
2021-12-15T18:48:06.788700626Z /go/src/github.com/cilium/cilium/pkg/policy/rules.go:102 policy.ruleSlice.resolveL4EgressPolicy ???
2021-12-15T18:48:06.788725291Z /go/src/github.com/cilium/cilium/pkg/policy/repository.go:704 policy.(*Repository).resolvePolicyLocked ???
2021-12-15T18:48:06.788729636Z /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:119 policy.(*PolicyCache).updateSelectorPolicy ???
2021-12-15T18:48:06.788737567Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:229 endpoint.(*Endpoint).regeneratePolicy ???
2021-12-15T18:48:06.788759459Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:229 endpoint.(*Endpoint).regeneratePolicy ???
2021-12-15T18:48:06.788766729Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:816 endpoint.(*Endpoint).runPreCompilationSteps ???
2021-12-15T18:48:06.788788783Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:583 endpoint.(*Endpoint).regenerateBPF ???
2021-12-15T18:48:06.788796093Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:405 endpoint.(*Endpoint).regenerate ???
2021-12-15T18:48:06.788820194Z /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:65 endpoint.(*EndpointRegenerationEvent).Handle ???
2021-12-15T18:48:06.788824632Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:246 eventqueue.(*EventQueue).run.func1 ???
2021-12-15T18:48:06.788832261Z /usr/local/go/src/sync/once.go:69 sync.(*Once).doSlow ???
2021-12-15T18:48:06.788834687Z /usr/local/go/src/sync/once.go:60 sync.(*Once).Do ???
2021-12-15T18:48:06.788858078Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:254 eventqueue.(*EventQueue).run ???
2021-12-15T18:48:06.788862457Z 
2021-12-15T18:48:06.788864490Z in another goroutine: happened before
2021-12-15T18:48:06.788871551Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:52 lock.(*internalRWMutex).Lock ??? <<<<<
2021-12-15T18:48:06.788893538Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:51 lock.(*internalRWMutex).Lock ???
2021-12-15T18:48:06.788901492Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:102 ipcache.InjectLabels ???
2021-12-15T18:48:06.788924439Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:428 ipcache.(*IPCache).TriggerLabelInjection.func1 ???
2021-12-15T18:48:06.788944563Z /go/src/github.com/cilium/cilium/pkg/controller/controller.go:212 controller.(*Controller).runController ???
2021-12-15T18:48:06.788947834Z 
2021-12-15T18:48:06.788949787Z happened after
2021-12-15T18:48:06.788993859Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:52 lock.(*internalRWMutex).Lock ??? <<<<<
2021-12-15T18:48:06.789095206Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:51 lock.(*internalRWMutex).Lock ???
2021-12-15T18:48:06.789100817Z /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:949 policy.(*SelectorCache).UpdateIdentities ???
2021-12-15T18:48:06.789175110Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:157 ipcache.InjectLabels ???
2021-12-15T18:48:06.789180721Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:428 ipcache.(*IPCache).TriggerLabelInjection.func1 ???
2021-12-15T18:48:06.789213769Z /go/src/github.com/cilium/cilium/pkg/controller/controller.go:212 controller.(*Controller).runController ???
2021-12-15T18:48:06.789218704Z 
2021-12-15T18:48:06.789220798Z Other goroutines holding locks:
2021-12-15T18:48:06.789228369Z goroutine 2248 lock 0xc00091d080
2021-12-15T18:48:06.789252938Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:81 lock.(*internalMutex).Lock ??? <<<<<
2021-12-15T18:48:06.789257263Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:80 lock.(*internalMutex).Lock ???
2021-12-15T18:48:06.789265013Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:323 endpoint.(*Endpoint).regenerate ???
2021-12-15T18:48:06.789289706Z /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:65 endpoint.(*EndpointRegenerationEvent).Handle ???
2021-12-15T18:48:06.789294139Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:246 eventqueue.(*EventQueue).run.func1 ???
2021-12-15T18:48:06.789300826Z /usr/local/go/src/sync/once.go:69 sync.(*Once).doSlow ???
2021-12-15T18:48:06.789325271Z /usr/local/go/src/sync/once.go:60 sync.(*Once).Do ???
2021-12-15T18:48:06.789329521Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:254 eventqueue.(*EventQueue).run ???
2021-12-15T18:48:06.789331644Z 
2021-12-15T18:48:06.789338903Z goroutine 811 lock 0x58a2740
2021-12-15T18:48:06.789363008Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:52 lock.(*internalRWMutex).Lock ??? <<<<<
2021-12-15T18:48:06.789367481Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:51 lock.(*internalRWMutex).Lock ???
2021-12-15T18:48:06.789374394Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:102 ipcache.InjectLabels ???
2021-12-15T18:48:06.789399468Z /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:428 ipcache.(*IPCache).TriggerLabelInjection.func1 ???
2021-12-15T18:48:06.789403797Z /go/src/github.com/cilium/cilium/pkg/controller/controller.go:212 controller.(*Controller).runController ???
2021-12-15T18:48:06.789406072Z 
2021-12-15T18:48:06.789414192Z goroutine 2599 lock 0xc0002bf400
2021-12-15T18:48:06.789439167Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:81 lock.(*internalMutex).Lock ??? <<<<<
2021-12-15T18:48:06.789443664Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:80 lock.(*internalMutex).Lock ???
2021-12-15T18:48:06.789450437Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:323 endpoint.(*Endpoint).regenerate ???
2021-12-15T18:48:06.789476786Z /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:65 endpoint.(*EndpointRegenerationEvent).Handle ???
2021-12-15T18:48:06.789481375Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:246 eventqueue.(*EventQueue).run.func1 ???
2021-12-15T18:48:06.789488803Z /usr/local/go/src/sync/once.go:69 sync.(*Once).doSlow ???
2021-12-15T18:48:06.789491235Z /usr/local/go/src/sync/once.go:60 sync.(*Once).Do ???
2021-12-15T18:48:06.789514922Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:254 eventqueue.(*EventQueue).run ???
2021-12-15T18:48:06.789519261Z 
2021-12-15T18:48:06.789521278Z goroutine 2306 lock 0xc00091cd00
2021-12-15T18:48:06.789528740Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:81 lock.(*internalMutex).Lock ??? <<<<<
2021-12-15T18:48:06.789552361Z /go/src/github.com/cilium/cilium/pkg/lock/lock_debug.go:80 lock.(*internalMutex).Lock ???
2021-12-15T18:48:06.789556776Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:323 endpoint.(*Endpoint).regenerate ???
2021-12-15T18:48:06.789564412Z /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:65 endpoint.(*EndpointRegenerationEvent).Handle ???
2021-12-15T18:48:06.789588139Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:246 eventqueue.(*EventQueue).run.func1 ???
2021-12-15T18:48:06.789592440Z /usr/local/go/src/sync/once.go:69 sync.(*Once).doSlow ???
2021-12-15T18:48:06.789599617Z /usr/local/go/src/sync/once.go:60 sync.(*Once).Do ???
2021-12-15T18:48:06.789623262Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:254 eventqueue.(*EventQueue).run ???
2021-12-15T18:48:06.789627672Z 
2021-12-15T18:48:06.789629695Z goroutine 2599 lock 0xc000635350
2021-12-15T18:48:06.789652702Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:577 endpoint.(*Endpoint).regenerateBPF ??? <<<<<
2021-12-15T18:48:06.789661170Z /go/src/github.com/cilium/cilium/vendor/github.com/sasha-s/go-deadlock/deadlock.go:137 go-deadlock.(*RWMutex).RLock ???
2021-12-15T18:48:06.789683050Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:576 endpoint.(*Endpoint).regenerateBPF ???
2021-12-15T18:48:06.789690789Z /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:405 endpoint.(*Endpoint).regenerate ???
2021-12-15T18:48:06.789715033Z /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:65 endpoint.(*EndpointRegenerationEvent).Handle ???
2021-12-15T18:48:06.789719359Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:246 eventqueue.(*EventQueue).run.func1 ???
2021-12-15T18:48:06.789729848Z /usr/local/go/src/sync/once.go:69 sync.(*Once).doSlow ???
2021-12-15T18:48:06.789732507Z /usr/local/go/src/sync/once.go:60 sync.(*Once).Do ???
2021-12-15T18:48:06.789781998Z /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:254 eventqueue.(*EventQueue).run ???
2021-12-15T18:48:06.789786988Z 
2021-12-15T18:48:06.789788906Z 
2021-12-15T18:48:06.789790766Z 

Seems like some other locks are triggering this?

@christarazi
Copy link
Member

@Weil0ng I think the above is related to #18237, so it doesn't seem related to your PR. Is that what's causing the agent to crash though? (I'm not sure if the presence of these logs are supposed to do that, but if so, then 👍.)

@Weil0ng
Copy link
Contributor Author

Weil0ng commented Dec 18, 2021

@Weil0ng I think the above is related to #18237, so it doesn't seem related to your PR. Is that what's causing the agent to crash though? (I'm not sure if the presence of these logs are supposed to do that, but if so, then 👍.)

Hmm, that's the only abnormal msg I saw in the agent log and the only failure case is FQDN, so I think it is what's causing the test suite to fail, unless I missed something?

@pchaigno
Copy link
Member

pchaigno commented Jan 5, 2022

The race detection CI jobs are not required and failing with unrelated errors. All reviews are in. Merging.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 5, 2022
@pchaigno pchaigno merged commit 155eacc into cilium:master Jan 5, 2022
@christarazi christarazi moved this from Needs backport from master to Backport pending to v1.11 in 1.11.1 Jan 12, 2022
@joestringer joestringer added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jan 18, 2022
@joestringer joestringer moved this from Backport pending to v1.11 to Backport done to v1.11 in 1.11.1 Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.11.1
Backport done to v1.11
Development

Successfully merging this pull request may close these issues.

CI: Data race in (*cesManagerIdentity).updateCESInCache()
5 participants