reftracker: return a snapshot from AppsForRef / RefsForApp to fix concurrent-map panic#1817
Open
SAY-5 wants to merge 1 commit intocarvel-dev:developfrom
Open
reftracker: return a snapshot from AppsForRef / RefsForApp to fix concurrent-map panic#1817SAY-5 wants to merge 1 commit intocarvel-dev:developfrom
SAY-5 wants to merge 1 commit intocarvel-dev:developfrom
Conversation
…current-map panic
AppRefTracker protects its internal maps with a sync.Mutex, but
AppsForRef and RefsForApp returned a.refsToApps[refKey] / a.appsToRefs[appKey]
directly. The caller (e.g. SecretHandler.enqueueAppsForUpdate) then
iterates that returned map without holding the tracker lock:
apps, err := sch.appRefTracker.AppsForRef(reftracker.NewSecretKey(...))
...
for refKey := range apps { // concurrent modifier can fire here
...
}
Under the reported production load (1,680+ namespaces, rapid Secret
and ConfigMap churn, many reconcile goroutines) a parallel
ReconcileRefs or RemoveAppFromAllRefs mutates the very same inner
map the handler is ranging over, and the Go runtime aborts with
fatal error: concurrent map iteration and map write
crashing the kapp-controller pod (carvel-dev#1812).
Return a shallow copy of the inner set from both lookup methods so
callers can iterate without holding the tracker lock. The copy is
cheap (refs per app is small in practice; the outer app-to-refs map
stays unbounded either way), and the concurrent writers keep
exclusive ownership of the originals. A small cloneRefKeySet helper
keeps the two call sites in sync.
Fixes carvel-dev#1812
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1812.
AppRefTrackerprotects its internal maps with async.Mutex, butAppsForRefandRefsForAppreturneda.refsToApps[refKey]/a.appsToRefs[appKey]directly. The caller (e.g.SecretHandler.enqueueAppsForUpdate) then iterates that returned map without holding the tracker lock:Under the reported production load (1,680+ namespaces, rapid Secret and ConfigMap churn, many reconcile goroutines) a parallel
ReconcileRefsorRemoveAppFromAllRefsmutates the very same inner map the handler is ranging over, and the Go runtime aborts withcrashing the kapp-controller pod.
This returns a shallow copy of the inner set from both lookup methods so callers can iterate without holding the tracker lock. The copy is cheap (refs per app is small in practice; the outer app-to-refs map stays unbounded either way), and the concurrent writers keep exclusive ownership of the originals. A small
cloneRefKeySethelper keeps the two call sites in sync.