Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterCache: lock namespacedResources, don't miss finding live obj if obj is cluster-scoped and namespacedResources is in transition #597

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ncdc
Copy link

@ncdc ncdc commented Jul 1, 2024

  1. namespacedResources is accessed by multiple goroutines simultaneously
    and therefore must be guarded by a mutex. Because this field is read
    significantly more often than it's written, use an RWMutex.

  2. When Reconcile performs its logic to compare the desired state (target
    objects) against the actual state (live objects), it looks up each live
    object based on a key comprised of data from the target object: API
    group, API kind, namespace, and name. While group, kind, and name will
    always be accurate, there is a chance that the value for namespace is
    not. If a cluster-scoped target object has a namespace (because it
    incorrectly has a namespace from its source) or the namespace parameter
    passed into the Reconcile method has a non-empty value (indicating a
    default value to use on namespace-scoped objects that don't have it set
    in the source), AND the resInfo ResourceInfoProvider has incomplete or
    missing API discovery data, the call to IsNamespacedOrUnknown will
    return true when the information is unknown. This leads to the key being
    incorrect - it will have a value for namespace when it shouldn't. As a
    result, indexing into liveObjByKey will fail. This failure results in
    the reconciliation containing incorrect data: there will be a nil entry
    appended to targetObjs when there shouldn't be.

Fixes #171

Most likely addresses argoproj/argo-cd#17440.

Most likely addresses at least 1 user who had cluster-scoped resources get wiped out in argoproj/argo-cd#15058

ncdc added 2 commits July 1, 2024 12:07
When Reconcile performs its logic to compare the desired state (target
objects) against the actual state (live objects), it looks up each live
object based on a key comprised of data from the target object: API
group, API kind, namespace, and name. While group, kind, and name will
always be accurate, there is a chance that the value for namespace is
not. If a cluster-scoped target object has a namespace (because it
incorrectly has a namespace from its source) or the namespace parameter
passed into the Reconcile method has a non-empty value (indicating a
default value to use on namespace-scoped objects that don't have it set
in the source), AND the resInfo ResourceInfoProvider has incomplete or
missing API discovery data, the call to IsNamespacedOrUnknown will
return true when the information is unknown. This leads to the key being
incorrect - it will have a value for namespace when it shouldn't. As a
result, indexing into liveObjByKey will fail. This failure results in
the reconciliation containing incorrect data: there will be a nil entry
appended to targetObjs when there shouldn't be.

Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
namespacedResources is accessed by multiple goroutines simultaneously
and therefore must be guarded by a mutex. Because this field is read
significantly more often than it's written, use an RWMutex.

Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
@rumstead
Copy link

rumstead commented Jul 1, 2024

@ncdc awesome work! have you posted this on the Argo CD slack or plan to represent it at one of the community/contributor meetings?

@rumstead
Copy link

rumstead commented Jul 1, 2024

For argoproj/argo-cd#15058, we have namespace on the Argo CD application because we deploy namespace-scoped resources as well so we definitely hit the condition.

@ncdc
Copy link
Author

ncdc commented Jul 1, 2024

@ncdc
Copy link
Author

ncdc commented Jul 2, 2024

/cc @ash2k

Copy link

@cpanato cpanato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clusterCache.IsNamespaced() lacks locking
4 participants