Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossplane fails to synchronize claims with XRs #5400

Open
Tracked by #4828
fernandezcuesta opened this issue Feb 16, 2024 · 12 comments · Fixed by #5651
Open
Tracked by #4828

Crossplane fails to synchronize claims with XRs #5400

fernandezcuesta opened this issue Feb 16, 2024 · 12 comments · Fixed by #5651

Comments

@fernandezcuesta
Copy link
Contributor

fernandezcuesta commented Feb 16, 2024

What happened?

From time to time I see that claims and XRs loose sync such as here (see second resource):

❯ kubectl get xirsas
NAME                                           API                           SYNCED   READY   COMPOSITION                                         AGE
acm-pca-issuer-nw-eu-west-3-main               infra.nexthink.com/v1alpha3   True     True    custom.policy.xirsas.infra.nexthink.com             9m30s
acm-pca-issuer-nw-us-east-2-main               infra.nexthink.com/v1alpha3                    custom.policy.xirsas.infra.nexthink.com             9m31s
collector-traffic-nw-eu-west-3-main            infra.nexthink.com/v1alpha3   True     True    custom.auth.xirsas.infra.nexthink.com               22h
[...]

And this does not change until I do a rollout restart of crossplane deployment. Logs in crossplane deployment look to be on a loop continuously prompting logs such these:

crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}

How can we reproduce it?

What environment did it happen in?

Crossplane version: universal-crossplane-1.14.5-up.1

EKS v1.27.9-eks-5e0fdde

Relevant PRs

@fernandezcuesta fernandezcuesta added the bug Something isn't working label Feb 16, 2024
@fernandezcuesta
Copy link
Contributor Author

As suggested by @haarchri I set --enable-composition-webhook-schema-validation=false but unfortunately didn't help.

@haarchri
Copy link
Contributor

haarchri commented Feb 16, 2024

do you using realtime compositions ? for reference we disabled the tests for realtime compositions with #5296

@fernandezcuesta
Copy link
Contributor Author

looks like:

      containers:
      - args:
        - core
        - start
        - --enable-composition-functions
        - --enable-environment-configs
        - --enable-realtime-compositions
        - --enable-usages
        - --enable-composition-webhook-schema-validation=false

@haarchri
Copy link
Contributor

think its related to: #5151

@jbw976
Copy link
Member

jbw976 commented Feb 20, 2024

@haarchri were you able to confirm positively that this behavior is related to realtime compositions? i.e. it only manifests when --enable-realtime-compositions is set? 🤔

@haarchri
Copy link
Contributor

Yes and i can reproduce this issue with enabled Realtime compositions - currently debugging it

@jbw976
Copy link
Member

jbw976 commented Feb 20, 2024

Awesome dude! thanks for confirming - tracking this as part of the maturing realtime compositions epic:

@haarchri
Copy link
Contributor

haarchri commented Feb 20, 2024

NAME                          SYNCED   READY   COMPOSITION                                             AGE
realtime-composition1-dshdn   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   45s
realtime-composition2-4pbd5   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   45s
realtime-composition3-r8zbk   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   44s
realtime-composition4-58f9w                                                                            14s
realtime-composition5-9p5cn                                                                            14s
realtime-composition6-vnsgs                                                                            14s

after new claims created all new XRs have no SYNCED, READY or status - during crossplane startup you can see the following log line: cannot list in CompositionRevision handler :

kubectl get xnopresources
NAME                          SYNCED   READY   COMPOSITION                                             AGE
realtime-composition1-dshdn   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   17m
realtime-composition2-4pbd5   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   17m
realtime-composition3-r8zbk   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   17m
realtime-composition4-58f9w                                                                            16m
realtime-composition5-9p5cn                                                                            16m
realtime-composition6-vnsgs                                                                            16m

problem starts around here - think we can hit this issue also without real-time compositions - here is no feature flag block around :
https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/definition/reconciler.go#L475-L481

and then we hit the following:
https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/composite/reconciler.go#L712

if i add a long sleep here - its working - so i wonder if the setup is to fast an we need to find a way to wait https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/definition/reconciler.go#L474

@jbw976
Copy link
Member

jbw976 commented Mar 4, 2024

think we can hit this issue also without real-time compositions

So we do think this is something that is hitting mainstream scenarios in v1.15? Is there reason to believe that we should backport any of these PRs for a v1.15 patch release?

/cc @haarchri @sttts @phisco

@haarchri
Copy link
Contributor

haarchri commented Mar 8, 2024

To replicate the issue, first install Crossplane version 1.15.0 and run with --enable-realtime-compositions
Then, follow these steps:

kubectl apply -f test/e2e/manifests/apiextensions/composition/realtime-compositions/setup
for i in {1..3}; do
  kubectl apply -f - <<EOF
apiVersion: realtime-compositions.e2e.crossplane.io/v1alpha1
kind: NopResource
metadata:
  namespace: default
  name: realtime-composition$i
  labels:
    realtime-compositions: "true"
spec:
  coolField: "I'm cool!"
  compositeDeletePolicy: Foreground
EOF
done

Wait all Claim, XR and managed resources ready
Stop crossplane

Start crossplane

for i in {1..6}; do
  kubectl apply -f - <<EOF
apiVersion: realtime-compositions.e2e.crossplane.io/v1alpha1
kind: NopResource
metadata:
  namespace: default
  name: realtime-composition$i
  labels:
    realtime-compositions: "true"
spec:
  coolField: "I'm cool!"
  compositeDeletePolicy: Foreground
EOF
done

If the issue doesn't occur, restart Crossplane and create more claims than you did previously. You will then observe the error "cannot list in CompositionRevision handler" in the logs.

@haarchri
Copy link
Contributor

haarchri commented Mar 8, 2024

i did a long debug session with @sttts and i cannot reproduce the issue based on this PR #5422 - so i have a good feeling that we fixed the issue

@negz
Copy link
Member

negz commented May 21, 2024

Whoops, #5651 shouldn't have closed this (yet). It might fix the issue, but we don't know for sure.

It would help a lot if someone could try reproduce this issue with #5651.

@negz negz reopened this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment