Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot get composite resource #5151

Open
Tracked by #4828
duizabojul opened this issue Dec 19, 2023 · 26 comments · Fixed by crossplane/crossplane-runtime#641
Open
Tracked by #4828

cannot get composite resource #5151

duizabojul opened this issue Dec 19, 2023 · 26 comments · Fixed by crossplane/crossplane-runtime#641
Assignees
Labels

Comments

@duizabojul
Copy link
Contributor

duizabojul commented Dec 19, 2023

What happened?

Please note I use realtime compositions feature of crossplane 1.14

Some of my compositions are creating other composite resources, triggering near instant reconciliation of multiple nested level of composite/composed resources.

In a fair amount of tries, composition creation get stuck because of the following error:

2023-12-19T16:33:55Z    DEBUG    crossplane    cannot get composite resource    {"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xuserswithattachment.iam.crossplane.airnity.net", "request": {"name":"crossplane-dev-user"}, "error": "XUserWithAttachment.iam.crossplane.airnity.net \"crossplane-dev-user\" not found"} 

This error happens the exact moment composite is created by crossplane and should be reconciled.

There is maybe a race condition here and crossplane does not retry reconciliation so my composition is stuck until i delete the composite resource:

return reconcile.Result{}, errors.Wrap(resource.IgnoreNotFound(err), errGet)

How can we reproduce it?

I can make it happens regulary (once every 3/4 apply of my composition) with the setup i described: a composite resource creating other composite resources, realtime compositions feature enabled, and maybe functions are involved in the bug (server side apply?)

Enregistrement.de.l.ecran.2023-12-19.a.17.48.45.mov

Here you can see on argocd, i delete my composition, and creation get stuck after the creation of XPolicy because of this bug. Crossplane never retries and I have to trigger a reconciliation manually, by deleting the XPolicy for example.

What environment did it happen in?

Crossplane version: 1.14.4

@duizabojul duizabojul added the bug Something isn't working label Dec 19, 2023
@phisco
Copy link
Contributor

phisco commented Dec 27, 2023

Indeed it looks like for some reason the cache was not updated yet and therefore Crossplane is not able to find the XR 🤔

However, AFAICT, Crossplane should try again after 60 seconds, see here, you can set a different poll interval through the --poll-interval flag.

@negz
Copy link
Member

negz commented Jan 2, 2024

@duizabojul Could you please let us know if you can reproduce this without realtime compositions enabled?

@duizabojul
Copy link
Contributor Author

Sorry @phisco I didn't saw your comment, the main issue for me is when the bug happens, it will never reconcile again (I have default pool interval of 60s)

@negz I will try tomorrow

@duizabojul
Copy link
Contributor Author

duizabojul commented Jan 3, 2024

First try deleting an XR with realtime compositions enabled resulted in parent resource never got composed after all tree was successfully deleted. Here are the logs | grep resource-name, you can see a lot of enqueuing and deletions happening the same second, and the cannot get composite resource at the end of logs.

logs

@negz With realtime compositions disabled, i can't reproduce (made 10 tries, everything always went smooth)

@negz
Copy link
Member

negz commented Jan 3, 2024

CC @sttts

@sttts
Copy link
Contributor

sttts commented Jan 4, 2024

Is this the latest 1.14.x release? #5034 was causing a deadlock on updates to the XRD sometimes.

Nvmd. You write 1.14.4 at the bottom.

@sttts
Copy link
Contributor

sttts commented Jan 4, 2024

Wondering, could you attach full debug logs around the time this happens?

@duizabojul
Copy link
Contributor Author

Sorry for delay, here are full logs:

logs.txt

Process stopped at this resource:

apiVersion: iam.crossplane.airnity.net/v1alpha1
kind: XPolicy
metadata:
  creationTimestamp: '2024-01-09T16:21:28Z'
  generateName: external-dns-admin-
  generation: 1
  name: external-dns-admin
...

You can see the corresponding log generated the same second as resource creation:

2024-01-09T16:21:28Z	DEBUG	crossplane	cannot get composite resource	{"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xpolicies.iam.crossplane.airnity.net", "request": {"name":"external-dns-admin"}, "error": "XPolicy.iam.crossplane.airnity.net \"external-dns-admin\" not found"}

@sttts
Copy link
Contributor

sttts commented Jan 9, 2024

@duizabojul crossplane/crossplane-runtime#641 is a potential race when starting the XR controller. Wondering, are you able to test a development version of Crossplane by compiling yourself? Would be great to get feedback.

@negz please take a careful look on the synchronization logic there. Am I right that we never waited for the informers in the engine?

@duizabojul
Copy link
Contributor Author

Of course, do you want I compile crossplane on your branch?

@sttts
Copy link
Contributor

sttts commented Jan 9, 2024

Yes, vendor in my branch via go.mod of Crossplane.

@duizabojul
Copy link
Contributor Author

I got the same behavior, random resource is stuck forever. Here are logs:

logs.txt

Stuck resource is:

apiVersion: iam.crossplane.airnity.net/v1alpha1
kind: XIrsa
metadata:
  creationTimestamp: '2024-01-10T10:33:49Z'
  name: external-dns-admin

@sttts
Copy link
Contributor

sttts commented Jan 10, 2024

How can I reproduce this? Could you prepare a minimal example that I can run here locally?

@duizabojul
Copy link
Contributor Author

I'm sorry I am really busy at work right now, I can't take much time to try to reproduce this with a simple setup. I think this should happens with any composition if you delete recreate it enough time to trigger the bug (it happens really easily in our eks setup, once every 2 or 3 delete/recreate). If not, maybe adding functions in the mix is required to reproduce. Also, it happens rarely on "parent" XR, so it should be easier to trigger bug with a composition creating child XRs.

@sttts
Copy link
Contributor

sttts commented Jan 10, 2024

ok, will see whether I can easily reproduce it. Current my assumption is that this even happens without realtime compositions, it's just less surprising.

@duizabojul
Copy link
Contributor Author

@negz are you sure this pr fix the problem? I tried it 2 weeks ago and it didn't worked (see comment). I'm maybe wrong but I don't see any commits added to pr branch since i tested it.

@phisco
Copy link
Contributor

phisco commented Jan 23, 2024

You are right @duizabojul, it was automatically closed because the description of crossplane/crossplane-runtime#641 mentioned fixing this.

@luxas
Copy link

luxas commented Jan 29, 2024

Trying to look at this now (these logs)

XRD created on line 92 (36 seconds before not found):

2024-01-09T16:20:47Z	DEBUG	crossplane	Reconciling	{"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xirsas.iam.crossplane.airnity.net", "request": {"name":"external-dns-admin"}}

Reconcile starts on line 335:

2024-01-09T16:21:23Z	DEBUG	crossplane	Reconciling	{"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xirsas.iam.crossplane.airnity.net", "request": {"name":"external-dns-admin"}}

I noticed that there is this on line 349 (this in fact happens multiple times afterwards too, e.g. 360-361 and 370-371)

2024-01-09T16:21:23Z	DEBUG	crossplane	Successfully deleted composite resource	{"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xirsas.iam.crossplane.airnity.net", "request": {"name":"external-dns-admin"}, "uid": "a9b4f308-ab33-4e06-8c25-f038d9aaf788", "version": "58040904", "name": "external-dns-admin", "deletion-timestamp": "2024-01-09 16:21:23 +0000 UTC"}

and during the same second there is this on line 426:

2024-01-09T16:21:23Z	DEBUG	crossplane	Reconciling	{"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xirsas.iam.crossplane.airnity.net", "request": {"name":"external-dns-admin"}}
2024-01-09T16:21:23Z	DEBUG	crossplane	cannot get composite resource	{"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/xirsas.iam.crossplane.airnity.net", "request": {"name":"external-dns-admin"}, "error": "XIrsa.iam.crossplane.airnity.net \"external-dns-admin\" not found"}

All in all, there are 26 registered deletions that can be found by searching for DEBUG crossplane Successfully deleted composite resource {"controller": "defined/compositeresourcedefinition.apiextensions.crossplane.io", "controller": "composite/x.

I'm new to this codebase, so I don't know why the composite/xirsas.iam.crossplane.airnity.net controller would like to delete the XIrsa XR in cache or in the API server.

Hope this helps a little to get going.

@luxas
Copy link

luxas commented Jan 29, 2024

I think that it is either of these two that garbage collect/delete the XR:

Are there some links missing between the template and XR?

Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@github-actions github-actions bot added the stale label Apr 29, 2024
@jbw976
Copy link
Member

jbw976 commented Apr 29, 2024

/fresh

@negz
Copy link
Member

negz commented May 21, 2024

I'm hopeful #5651 might have fixed this. It would be helpful if someone could try out that PR and see if it's possible to reproduce this issue.

@duizabojul
Copy link
Contributor Author

Is there a docker tag already built I can try?

@jbw976
Copy link
Member

jbw976 commented May 30, 2024

Looks like the current HEAD of master is available at xpkg.upbound.io/crossplane/crossplane:v1.17.0-rc.0.175.g93610dc7.

Instructions for installing latest from master channel can be found at https://docs.crossplane.io/latest/software/install/#install-pre-release-crossplane-versions. Note there may be an issue to clean up in that flow though after we recently transitioned our build system to Earthly in #5711 - we'll look into that ASAP 🙇‍♂️

@duizabojul
Copy link
Contributor Author

duizabojul commented Jun 1, 2024

I tried this image and I got a similar behaviour from crossplane (always with realtime compositions + functions): when a XR is deleted and recreated (quickly in my case because i use argocd) it happens that the controller does not "see" the newly created XR and nothing can make crossplane compose it again (even modifying resource to trigger an update event). The only way to get it back on track is to relaunch crossplane pod or recreate XR until crossplane see it again.

Here are logs of crossplane controller: logs.txt

If you search the composite/xgkeworkloadidentities.iam.crossplane.airnity.io argocd-app-controller logs, you will see that at 2024-06-01T20:17:32Z the resource get deleted, but never composed again on creation at 2024-06-01T20:17:45Z

# The live manifest hanging in cluster
apiVersion: iam.crossplane.airnity.io/v1alpha1
kind: XGKEWorkloadIdentity
metadata:
  annotations:
     ... # no crossplane annotations
  creationTimestamp: '2024-06-01T20:17:45Z'
  generation: 1
  name: argocd-app-controller
  resourceVersion: '3170018'
  uid: db7d53bf-ce97-40c0-bbb2-6dbf2ed00402
spec:
  compositionUpdatePolicy: Automatic
  params: ...
 # no crossplane status

@negz negz self-assigned this Jun 3, 2024
@negz
Copy link
Member

negz commented Jun 3, 2024

Interesting. I'll try reproduce this when I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

6 participants