Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaked RouteTable due to eventually consistent DescribeRouteTables API #802

Closed
dee0 opened this issue Jul 13, 2021 · 43 comments · Fixed by crossplane/crossplane-runtime#283 or #825
Closed
Assignees

Comments

@dee0
Copy link

dee0 commented Jul 13, 2021

What happened?

After deleting a composite and waiting, with kubectl wait, for all of the referenced api objects to be deleted, managed resources still existed in AWS.
This is a problem for us because this wait logic is a gate used to delay deletion of our k8s cluster. If the managed resources in AWS aren't cleaned up by the time we are done waiting they will most likely end up orphaned. ( We are seeing this often )

How can we reproduce it?

Here is the clean up logic we are performing before allowing the cluster to be deleted

Get all CompositeResourceDefinition
For each CompositeResourceDefinition
  If .spec.claimNames.kind
    Get all claims of .spec.claimNames.kind
    For claim
      Get resource refs
      Delete claim
    Wait for all resources to be deleted
  For each composition of .spec.names.kind
    Get resource refs
    Delete composition
    Wait for all resources to be deleted
Get all managed resources ( kubectl get managed )
  For each managed resource
    Delete managed resource

We expect that once we have run through the above steps all managed resources have been deleted from AWS.
However this doesn't seem to be the case.

In the attached zip are example

  • xrd.yaml
  • composition.yaml
  • claim.yaml
    used to create the resources in AWS.

Also in the zip is events-for-vpc-14449-d66-dmi.xlsx which shows the events around one of the leaked VPC.

  • Events at top are most recent.
  • Events with white background are from CloudTrails
  • Events with light blue background are from my team's clean up code.
  • Events with green background are from Gardener

At line 81 you can see the first event from my team's clean up code.
By line 31, all of the resources related to the VPC have been cleaned up from k8s. There is nothing left for our clean up code to delete or wait for.
Our last attempt to wait for a resources to be cleaned up is about 9 seconds after the last event for the VPC in CloudTrails. Given all of the deletion attempts that failed with Client.DependencyViolation I am wondering if Crossplane has given up. ( And why are all those failures there in the first place? )
Anyhoo once our cleanup code has nothing left to wait for cluster Gardener is free to begin with the cluster deletion. You can see at line 28, about 45 seconds after my team's clean up code
finished, where it begins taking destructive action. It starts with simply going through the api server and deleting all CRD that it can find.
Since Gardener is just blindly deleting things if the AWS resources haven't been cleaned up at this point it is highly likely they are going to be orphaned.
14449-d66-dmi-leak.zip

What environment did it happen in?

crossplane:v1.2.2
aws provider provider-aws:v0.18.1
kubernetes 1.19.10

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

So the data that I provided when I opened this defect was collected from logs from a CI/CD pipeline.

A short while ago I performed a manual test where I I had better ability to collect logs.

I deleted a claim and all of the API objects except the composition were deleted from k8s. The last conditions for the composition are
conditions:
- lastTransitionTime: "2021-07-14T02:56:38Z"
message: 'delete failed: failed to delete the VPC resource: DependencyViolation: The vpc ''vpc-0623d044576ddef33'' has dependencies and cannot be deleted.'
reason: ReconcileError
status: "False"
type: Synced
- lastTransitionTime: "2021-07-14T04:38:17Z"
reason: Deleting
status: "False"
type: Ready

However at least at the moment if I try to delete the VPC in the AWS console the dialog indicates that it will be deleted and these resources will also be deleted
rtb-07856e5a4ef84b858
rtb-0d6436f64d2c69c98
Note I have seen the case where there are dependent resources that cannot actually be deleted. In that case the AWS console tells you upfront you won't be able to delete the VPC while the dependencies exist. So my current case is really not the same as that.

That said, CloudTrails shows that Crossplane is still trying to delete the VPC and the delete attempts are failing with Client.DependencyViolation.

Those two dependent resources above are a couple of un-named route tables. Not sure why they are there.

While I am not familiar with the AWS api, at the moment I am thinking that there is a Crossplane bug where either

  • Crossplane isn't passing a 'delete dependents' flag to the AWS delete API, or
  • Crossplane isn't recursively deleting 'anonymous' dependents when it should

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

I examined a teammate's deploy and I see there are 3 named route tables, clearly from the composition, and one nameless one.
At the moment I don't understand where these nameless route tables are coming from or why my deployment had two while my teammate's had only one.
In any case, it seems these nameless routetables are some how related to the problem we are seeing.

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

The application of the claim happens with in k8s job. I just thought to check the logs of this job for both my teammate's deployment and mine.

I see in my teammate's deployment the job ran once. However in my deployment it ran twice. The first run failed while the second succeeded. The log for my deployment's first run ends with

+ /bin/kubectl apply -f /deployment-data/claim.yaml
vpc.aws.platform.dmi.sap.com/vpc-mc-i540621-dmi created
+ /bin/kubectl wait '--timeout=900s' '--for=condition=Ready' -n mc-provisioner VPC vpc-mc-i540621-dmi
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding

The log for the second run ends with

+ /bin/kubectl apply -f /deployment-data/claim.yaml
vpc.aws.platform.dmi.sap.com/vpc-mc-i540621-dmi unchanged
+ /bin/kubectl wait '--timeout=900s' '--for=condition=Ready' -n mc-provisioner VPC vpc-mc-i540621-dmi
vpc.aws.platform.dmi.sap.com/vpc-mc-i540621-dmi condition met

That failure of the first run is something we have been seeing since we started using Crossplane but we have been ignoring since k8s always re-runs the job until there is success and it usually takes at most 3 tries to achieve this.

Am wondering now if the fact that my deployment has two unexpected, at least to me, nameless route tables associated with the VPC while my teammate's deployment just has one is somehow related to the circumstances around the job running multiple times. Say possibly the error from the API server also causes Crossplane to do something that causes the extra nameless route table to be created or us applying the claim multiple times causes it even though kubectl is reporting that vpc.aws.platform.dmi.sap.com/vpc-mc-i540621-dmi unchanged on the second run.

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

Hmm, just looked at CloudTrails events for vpc-14449-d66-dm ( the ci/cd case ) and vpc-0623d044576ddef33 ( my manual case ) and I think either there are two different problems or the nameless route tables are irrevevant.

In the case of vpc-14449-d66-dm only 3 route tables ( the expected number given the xrd ) were created while in the case of my manual those nameless route tables are hanging around.

@hasheddan
Copy link
Member

@dee0 are the "nameless" route tables being created by another process? provider-aws not force deleting dependent resources is by design -- we encourage management of all resources via Crossplane, and we do not want to delete resources that are not under Crossplane's management in any scenario.

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

@hasheddan The nameless routes are not being created by another process, they are being created by Crossplane.
The timing, user id, client ip etc for the create event in CloudTrails indicate this.

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

Wonder if this could be related at all to either
upbound/platform-ref-aws#3 (comment)
or the various issues linked from the discussion of that issue.

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

Aws provider log for my manual test ( the one involving vpc-mc-i540621-dmi )
i540621-dmi-aws-provider.log

@dee0
Copy link
Author

dee0 commented Jul 14, 2021

Btw the problem of leaked resources, whatever the cause(s), happens often but not always. A cluster I was comparing against yesterday was successfully cleaned up this morning.

@jbw976
Copy link
Member

jbw976 commented Jul 14, 2021

It would be interesting to understand more about the unexpected / "nameless" RouteTables before any attempt is made to delete resources and clean the cluster up. I want to see if we can find early on, before deletions come into play, if there are any leaked or untracked resources.

For any unexpected RouteTable, can you confirm:

  • they don't show up in Crossplane at all? e.g. kubectl get RouteTable -A
  • they would prevent the VPC from being deleted? e.g. in the AWS UI, if you click to delete the VPC, does API list the RouteTable in the list of dependent objects?

@dee0
Copy link
Author

dee0 commented Jul 15, 2021

@jbw976 I'll try to get the answers as soon as I can.
Btw in regards to the 1st question, is running
kubectl get managed -A
equivalent? I ask because I can say for sure that the nameless route tables don't show up in that output.

@dee0
Copy link
Author

dee0 commented Jul 15, 2021

Spent the last while reviewing about VPC in AWS, 7 orphaned and the rest in use.
All 7 seem to match the pattern I reported when I opened this defect. That is

  • several deletes that fail due to dependencies
  • we run out of things to wait on and then gardener cluster delete process kicks and creates a chaotic shutdown that orphans items

None of the VPC, orphaned otherwise, have nameless route tables associated with them.

So I am still of the opinion that we have two, at least, ways that VPC are being orphaned. One where we run out of things to wait on and yet the VPC hasn't been removed from AWS and one where nameless route tables prevent deletion.

@dee0
Copy link
Author

dee0 commented Jul 15, 2021

A word on the nameless routebables. Looks like there should be at least one associated with each VPC
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html
https://docs.aws.amazon.com/vpc/latest/userguide/WorkWithRouteTables.html

Not sure how we ended up with two nameless route tables though. I did just performed a deploy where there was no problems and I saw that when I looked at the details of the single nameless route table associated with my VPC there something that said it was the 'main' route table. If/when I see the case of multiple nameless routetables I'll look at the both more carefully to try and understand what the diff between them is.

That said, I would like to re-iterate that I think there are two different cases we are dealing with

  • The case where the there are extra nameless route tables
  • The case where Crossplane just seems to give up

I would also like to ask again, why does Crossplane make so many failed attempts to delete the VPC even in the success case?

@dee0
Copy link
Author

dee0 commented Jul 22, 2021

14814-2e1-dmi-leak.zip

We have made some changes so that we can capture debug logs from Crossplane. The spreadsheet in the attached zip contains log data from
kibana - This has the debug logging from Crossplane
jenkins - This contains messages showing the clean up activity
cloudtrail - This shows the events from AWS perspective

Each of the above is in its own worksheet and there is also a worksheet that combines the logs from all 3 sources.

The clocks are a little out of sync so the messages in the combined tab aren't 100% in the correct order. e.g. The clocks for the cluster and jenkins are about 1/10 second out of sync.

In the combined worksheet the first deletion message is at line 24, 2021-07-22 02:33:27.457 cluster time. Here's where you can see clocks are out of sync cause it proceeds the delete from the Jenkins log.

By line 554, 2021-07-22 02:35:29.990 jenkins time, our attempts to perform a controlled cleanup have completed.

As I mentioned near the beginning of this, it seems to me the problem is down to our clean up code not being able to accurately see what is happening with the VPC.

Looking at the 'combined' worksheet :

At line 33, 2021-07-22 02:33:27.522 jenkins time, we delete the claim vpc-mc-14814-2e1-dmi.

At line 34, 2021-07-22 02:33:27.522 jenkins time, we try to wait for the managed resource VPC vpc-mc-14814-2e1-dmi-ftm9p-9k8gb.

At line 36, 2021-07-22 02:33:28.000 jenkins time, you can see this failed with a 'resource not found' error.

However we can see from log messages a couple of minutes later the VPC still exists in AWS and Crossplane is still trying to delete it. e.g. Line 504, 2021-07-22 02:35:24.000 AWS time

So I think our problems begins with kubernetes/crossplane returning that 'not found' error when we tried to wait on vpc-mc-14814-2e1-dmi-ftm9p-9k8gb. <!-- @hasheddan Think this is the bug

Some other things of note in the combined tab

  • At line 58, 2021-07-22 02:33:30.960 cluster time, there are some stack traces. Not sure if they are relevant are not.
  • At line 551, 2021-07-22 02:35:29.990 jenkins time, our clean code calls kubectl get managed to try and identify any straggler items that still need to be deleted. This turned up the VPC even though previously we got a 'not found' when trying to wait on its deletion. Fwiw we call kubectl delete on it, with wait=true, and this returns instantly.

One last thing. By line 554, 2021-07-22 02:35:29.990 jenkins time, our attempts clean things up nicely have ended and clean up is turned over to Gardner. As mentioned before Gardener tries to delete everything it can find via the cluster api server. So after this you point you will see chaos.

@dee0
Copy link
Author

dee0 commented Jul 23, 2021

Just checked another case of leaked resource and I see pattern.

  • Delete claim
  • Try to wait for VPC to be deleted but get not found
  • Get managed returns the VPC
  • Deleting VPC the wait=true returns immediately
  • Despite the above logs show Crossplane hasn't actually deleted the VPC from AWS.

Btw for each of these cases we are only creating one VPC. So it isn't the case that get managed is returning some VPC other than the one we deleted.

@dee0
Copy link
Author

dee0 commented Jul 30, 2021

@hasheddan @jbw976 Have a couple more pieces of info

  • We are defining CompositeResourceDefinition with kind = VPC.
    I think the reason for the unexpected NotFound error when we try to wait on the VPC managed resource is that
    kubectl wait VPC [name]
    is actually trying to wait a Composite Resource instead of the managed resource.
    Here is output from kubectl api-resources to further show what I am talking about
    vpcs aws.platform.dmi.sap.com/v1alpha1 true VPC
    vpcs ec2.aws.crossplane.io/v1beta1 false VPC
  • I changed our XRD's kind to XVPC and then tried to test but then I ran into the other issue I have mentioned in this ticket. An extraneous route table was added to the VPC and this prevented clean up of the VPC. Per @jbw976 request I deleted the extraneous route table and found that Crossplane was then able to finally delete the VPC. Fwiw Crossplane had been trying to delete the VPC for about a 30 minutes prior to me deleting that extra route table. A couple of minutes after I did Crossplane finally was able to perform the delete.

So as I said a while ago, seems like we are running into multiple problems. While my team can address the name collision problem easily enough I am not so sure about the extraneous route table.

@dee0
Copy link
Author

dee0 commented Jul 30, 2021

2021-07-29-extra-route-table.zip

In the attached zip are

  • vpc-events.csv Events for the VPC from CloudTrails. The extra route table is rtb-03959f8ca1b9ebe36
  • kibana.csv Logs crossplane pods

The extra route table was created at about July 29, 2021, 09:58:23 (UTC-07:00)

@dee0
Copy link
Author

dee0 commented Jul 30, 2021

grep -Pi route.*created kibana.csv
Turns up 4 events indicating successful creation of route tables ( expect only 3 ). There are two events for vpc-mc-i540621-dmi-ds67r-k6xxz, maybe that is a clue?

@dee0
Copy link
Author

dee0 commented Jul 30, 2021

2nd-and-3rd-routetable-messages.zip

In the attached zip are log messages that I have extracted from kibana.csv ( see above ) and normalized for comparison

  • vpc-mc-i540621-dmi-ds67r-k6xxz-events.txt has the log messages related for the route table API object vpc-mc-i540621-dmi-ds67r-k6xxz. This is the 2nd one to be created in k8s.
  • vpc-mc-i540621-dmi-ds67r-pxd9x-events.txt has the log messages for the route table API object vpc-mc-i540621-dmi-ds67r-pxd9x. This is the 3rd and final one to be created in k8s.

I normalized contents of these files by removing the timestamps, replacing the last portion of the route table name with 'tableid' and by replacing the values for resourceVersion with v1, v2, ....

Comparing the files I see

  • Crossplane creates an extra route table in AWS for vpc-mc-i540621-dmi-ds67r-k6xxz
  • After the creation of extraneous route table Crossplane reports an error in the processing of vpc-mc-i540621-dmi-ds67r-k6xxz.
  • There are no errors in the processing of vpc-mc-i540621-dmi-ds67r-pxd9x

@negz
Copy link
Member

negz commented Jul 30, 2021

I'm thinking the spurious RouteTable creation could be a race in the relevant controller. RouteTables are one of the resources where we rely on the crossplane.io/external-name being set to determine whether we've created the desired RouteTable or not per the below code:

https://github.com/crossplane/provider-aws/blob/v0.18.1/pkg/controller/ec2/routetable/controller.go#L109

The flow is:

  1. Observe is called. It reports that the RouteTable does not exist because the external-name annotation is unset.
  2. Create is called. It sends a CreateRouteTableRequest then sets the external-name annotation.

You could imagine a race where 1. is called twice before 2. succeeds. I would imagine in this case we would:

  1. Reconcile A determines no external-name is set, so calls CreateRouteTableRequest.
  2. Reconcile B determines no external-name is set, so calls CreateRouteTableRequest.
  3. Reconcile A sets the external-name.
  4. Reconcile B tries to set the external-name, but is rejected by the API server because the resource has changed.

That said, I'm pretty sure we're running with MaxConcurrentReconciles = 1 so it's not clear to me how we'd even have two reconciles happening at once.

@dee0
Copy link
Author

dee0 commented Jul 30, 2021

Fwiw we haven't done anything to increase the concurrency.

Have been tracing through the code to see where an extra event might come from or where they might be allowed through. I up to the the point where the rate limiting queue receives events. Haven't seen anything yet.

Btw my background is mostly with C++ and Java and with those languages I am used to having a thread id in log messages. That takes away a lot of the mystery when diagnosing problems of concurrency. Is there a way we can enable that for goroutines? Just thinking of how we might confirm your theory.

@dee0
Copy link
Author

dee0 commented Jul 30, 2021

@negz If you haven't already would you please use a visual diff tool ( e.g. winmerge on Windows ) to compare vpc-mc-i540621-dmi-ds67r-k6xxz-events.txt and vpc-mc-i540621-dmi-ds67r-pxd9x-events.txt? ( See my post a bit earlier today )

Am hoping you will recognize something I do not, something that confirms your theory.

@negz
Copy link
Member

negz commented Jul 30, 2021

@dee0 The fact that k6xxz has two CreatedExternalResource calls, each at a different resourceVersion, supports my theory above. That said, I would also expect to have seen a "the object has been modified; please apply your changes to the latest version and try again" error immediately after the second create. I note that we do have one of those in the logs but it appears to pertain to updating the resource's status - i.e. it says "cannot update managed resource status" - where I'd instead expect the error to instead be prefixed with "cannot update managed resource".

@dee0
Copy link
Author

dee0 commented Jul 31, 2021

Checked the log and confirmed the workcount for the routetable controler is 1

"2021-07-29T16:58:02.288Z","2021-07-29T16:58:02.288Z INFO controller-runtime.manager.controller.managed/routetable.ec2.aws.crossplane.io Starting workers {""reconciler group"": ""ec2.aws.crossplane.io"", ""reconciler kind"": ""RouteTable"", ""worker count"": 1}"

@dee0
Copy link
Author

dee0 commented Jul 31, 2021

@hasheddan @negz I think I might have a clue about how the scenario Nic described is happening.
vpc-mc-i540621-dmi-ds67r-k6xxz-events-with-times-order-fixed.zip

In the attached zip is a file where I have taken the log messages related to the problematic managed resource vpc-mc-i540621-dmi-ds67r-k6xxz- and

  • put them in the right order
  • indented the messages that show the reconcile activity but not the 'Reconciling' message that Reconcile logs first

Note: The order of the messages logs collected from Kibana aren't exactly in the right order because messages are sent as datagrams to Kibana. I was able to fix the order by sorting the messages on the timestamp that was part of the original log message.

Anyhoo with the order corrected I see that when the first route table is created for vpc-mc-i540621-dmi-ds67r-k6xxz Reconcile first reports a resource version id of 34683 and then 35015.
However when the second route table for vpc-mc-i540621-dmi-ds67r-k6xxz is created Reconcile first reports the resource id is 35012 and then 35017.

So it seems show how in the later call to Reconcile it fetched an out of date version of the RouteTable vpc-mc-i540621-dmi-ds67r-k6xxz.

Please review the log messages from
2021-07-29T16:58:22.982Z
to
2021-07-29T16:58:23.780Z
to confirm I am not misreading things.

@dee0
Copy link
Author

dee0 commented Aug 1, 2021

Been reading through the code and at the moment I think the problem is with the caching.
In the log I uploaded in my last post this is the problematic revision sequence
34683 // provider-aws log message
35015 // controller-runtime.manager.events message
35012 // provider-aws log message <!-- problem
35017 // controller-runtime.manager.events message

The provider-aws messages use data from a managed resource object filled by a call to client.Get which uses the cache.
The controller-runtime.manager.events message messages use a managed resource object that has been updated by the response to client.Update. Looking at the code for client.Update I see it doesn't update the cache.

I'll try and sort out when the cache actually does get updated to see why this problem doesn't happen all the time for us.

@dee0
Copy link
Author

dee0 commented Aug 1, 2021

Think I see where the concurrency problem comes from.

There are 5 categories of goroutines involved in processing resource events.
(1) Reflector::Run
Observes resources in API server and pushes deltas it detects into DeltaFifo. Updates the cache approximately 800ms
There are one of these for each managed resource type.

(2) sharedIndexInformer::Run
Pops updates from DeltaFifo and pushes notifications to the channel processorListener:addCh
There are one of these for each managed resource type.

(3) processorListener:pop
Picks updates to addCh and pushes them to channel processorListener:nextCh, possibly buffering them in an expanding circular buffer first.
There are one of these for each managed resource type.

(4) processListener:run
Picks up items from processorListener:nextCh and for each generates a ReconcileRequest which it adds to the rateLimitingType which is the Controller's Queue. A ReconcileRequest just contains a namespace qualified name.
There are one of these for each managed resource type.

(5) Controller's workers
Processes ReconcileRequest as they become available in Controller's Queue. Fetches resource items from the cache updated by goroutine (1). Blocks until there are updates from goroutine (1) that hasn't been picked up by (2).
There is just one of these.

So here is what is happening

  • Controller worker performs CreatedExternalResource in response to a ReconcileRequest
  • Controller worker processing subsequent ReconcileRequest fetches resource item from cache. While it blocks for goroutine (2) to update the cache there is no guarantee that the update will contain the updates from the above CreatedExternalResource. e.g. There may just be an update to the ref vpcId or the ref natGatewayId. If the update from the above CreatedExternalResource isn't present ( specifically external-name hasn't been set ) then the Reconciler incorrectly concludes that it need to create the external resource even thought it just did so in the previous Reconcile run.

I think ideally controller-runtime would utilize a write-through cache. At the moment I suspect that would be major surgery. Not sure, but perhaps the Reconciler could detect this situation and requeue the request.

@dee0sap
Copy link
Contributor

dee0sap commented Aug 2, 2021

@hasheddan @negz Think this may be what happened with the reconciles

  1. Version 34625 in cache. Requeued because vpcId could not be resolved.
  2. Version 34626 in cache. Requeued because vpcId could not be resolved. Not sure what was changed between 34625 and 34626.
  3. Version 34626 in cache. Requeued because natGateWayId could not be resolved. Probably this is just the requeue from step 2. Note vpcId resolved.
  4. Version 34683 in cache. Requeued because natGateWay could not be resolved.
  5. Version 34683 in cache. This is probably the requeue from step 4 above. Reference resolution completed which means resource updated in the API server which means another change event. New version is < 35015 ( see below ) resource in API server updated with external reference producing version 35015
  6. Version 35012 in cache. External name is not set in 35012. 35012 is probably just the result of reference resolution in reconcile number 5

Read through the code yesterday and maybe adding a right through cache isn't as big of a job as I thought. Still pretty big, but not as big. :)

@dee0sap
Copy link
Contributor

dee0sap commented Aug 5, 2021

negz asked me to write up some of the options that have been floated for fixing this problem.

  • After updating the managed object with the external name value push the object into the cache. ( See here Need write back cache kubernetes-sigs/controller-runtime#1622 )

  • Don't use the cache at all.

  • After setting the external-name have Reconcile poll the cache until it gets back an object with external-name set. This will only work so long as aws-provider just has 1 worker.

  • Add crd _Type_Reservation, e.g. RouteTableReservation. A reservation can be used to track whether process of creating the external resource has started or fulfilled. So the happy-path would look like, for example,

    • Reconcile processes RouteTable x for the first time.
    • It sees external-name is unset so it creates RouteTableReservation x with state=open
    • It creates the route table in AWS and sets RouteTableReservation state to fulfilled

    In the case where a stale RouteTable was fetched from the cache the attempt to create RouteTableReservation would fail.
    Course the challenge with this would be dealing with orphaned, unfulfilled, RouteTableReservation in the event of a panic or other major failure.

    Btw I think the 4 comments just before this one do a good job of summarizing what is happening.

@dee0sap
Copy link
Contributor

dee0sap commented Aug 6, 2021

Another bandaid option

Instead of having ResolveReferences update the managed object in kubernetes have it return all of the external ids of the references if they are available. If they are not then requeue as is done today.

Then just set them on the managed object when setting the external-name.

@dee0sap
Copy link
Contributor

dee0sap commented Aug 15, 2021

@negz I am sure on our side we are hitting the issue that crossplane/crossplane-runtime#279 fixes. So I wouldn't say it is hypothetical.

From the file vpc-mc-i540621-dmi-ds67r-k6xxz-events-with-times-order-fixed.csv, which I supplied some time ago, we have the sequence below. Note the resource versions and external-name values. When that second create is happening external-name isn't set. And this makes sense because when external-name was set originally that produced version 35015 but when the second create is happening Reconcile is working with 35012.

2021-07-29T16:58:22.982Z	DEBUG	provider-aws	Reconciling	{""controller"": ""managed/routetable.ec2.aws.crossplane.io"", ""request"": ""/vpc-mc-i540621-dmi-ds67r-k6xxz""}"
    2021-07-29T16:58:23.475Z	DEBUG	provider-aws	Successfully requested creation of external resource	{""controller"": ""managed/routetable.ec2.aws.crossplane.io"", ""request"": ""/vpc-mc-i540621-dmi-ds67r-k6xxz"", ""uid"": ""db33bd1c-151e-499e-a3e6-7bbda6065d2a"", ""version"": ""34683"", ""external-name"": """"}"
    2021-07-29T16:58:23.475Z	DEBUG	controller-runtime.manager.events	Normal	{""object"": {""kind"":""RouteTable"",""name"":""vpc-mc-i540621-dmi-ds67r-k6xxz"",""uid"":""db33bd1c-151e-499e-a3e6-7bbda6065d2a"",""apiVersion"":""ec2.aws.crossplane.io/v1beta1"",""resourceVersion"":""35015""}, ""reason"": ""CreatedExternalResource"", ""message"": ""Successfully requested creation of external resource""}"
2021-07-29T16:58:23.574Z	DEBUG	provider-aws	Reconciling	{""controller"": ""managed/routetable.ec2.aws.crossplane.io"", ""request"": ""/vpc-mc-i540621-dmi-ds67r-k6xxz""}"
    2021-07-29T16:58:23.780Z	DEBUG	provider-aws	Successfully requested creation of external resource	{""controller"": ""managed/routetable.ec2.aws.crossplane.io"", ""request"": ""/vpc-mc-i540621-dmi-ds67r-k6xxz"", ""uid"": ""db33bd1c-151e-499e-a3e6-7bbda6065d2a"", ""version"": ""35012"", ""external-name"": """"}"
    2021-07-29T16:58:23.780Z	DEBUG	controller-runtime.manager.events	Normal	{""object"": {""kind"":""RouteTable"",""name"":""vpc-mc-i540621-dmi-ds67r-k6xxz"",""uid"":""db33bd1c-151e-499e-a3e6-7bbda6065d2a"",""apiVersion"":""ec2.aws.crossplane.io/v1beta1"",""resourceVersion"":""35017""}, ""reason"": ""CreatedExternalResource"", ""message"": ""Successfully requested creation of external resource""}"

@negz
Copy link
Member

negz commented Aug 16, 2021

@dee0sap Understood - thanks for pointing that out. I haven't been able to reproduce that particular variant yet but it does make sense that it's possible.

Adding a note here that the same eventually consistent AWS API issue appears to be affecting InternetGateways too - they just happen to not block deletion of the VPC. I noticed a bunch were building up during my testing, and confirmed that the Terraform folks have observed this too per https://github.com/hashicorp/terraform-provider-aws/blob/915f5f5dc3073de0f25ed216dca421bf76e114b9/aws/resource_aws_internet_gateway.go#L365.

negz added a commit to negz/crossplane-runtime that referenced this issue Aug 16, 2021
Per crossplane-contrib/provider-aws#802 some external APIs
(including some AWS APIs) appear to experience some delay between the time a new
resource is successfully created and the time at which that resource appears in
queries.

This commit adds a new 'crossplane.io/external-create-time' annotation and a new
'ResourcePending' field in the Observation struct. Together these can be used by
an Observe method to allow for a small grace period before it determines that a
resource does not exist. For example:

```go
func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
  err := api.Get(AsInput(mg))
  if IsNotFound(err) {
    if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute {
      // We're in the grace period - wait a bit longer for the resource to appear.
      return managed.ExternalObservation{ResourcePending: true}, nil
    }
    // The resource does not exist.
    return managed.ExternalObservation{ResourceExists: false}, nil
  }
  if err != nil {
    return managed.ExternalObservation{}, nil
  }
  return managed.ExternalObervation{ResourceExists: true}
}

func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) {
  _ := api.Create(AsInput(mg))
  meta.SetExternalCreateTime()
  return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil
}
```

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/crossplane-runtime that referenced this issue Aug 16, 2021
Per crossplane-contrib/provider-aws#802 some external APIs
(including some AWS APIs) appear to experience some delay between the time a new
resource is successfully created and the time at which that resource appears in
queries.

This commit adds a new 'crossplane.io/external-create-time' annotation and a new
'ResourcePending' field in the Observation struct. Together these can be used by
an Observe method to allow for a small grace period before it determines that a
resource does not exist. For example:

```go
func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
  err := api.Get(AsInput(mg))
  if IsNotFound(err) {
    if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute {
      // We're in the grace period - wait a bit longer for the resource to appear.
      return managed.ExternalObservation{ResourcePending: true}, nil
    }
    // The resource does not exist.
    return managed.ExternalObservation{ResourceExists: false}, nil
  }
  if err != nil {
    return managed.ExternalObservation{}, nil
  }
  return managed.ExternalObervation{ResourceExists: true}
}

func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) {
  _ := api.Create(AsInput(mg))
  meta.SetExternalCreateTime()
  return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil
}
```

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/crossplane-runtime that referenced this issue Aug 16, 2021
Per crossplane-contrib/provider-aws#802 some external APIs
(including some AWS APIs) appear to experience some delay between the time a new
resource is successfully created and the time at which that resource appears in
queries.

This commit adds a new 'crossplane.io/external-create-time' annotation and a new
'ResourcePending' field in the Observation struct. Together these can be used by
an Observe method to allow for a small grace period before it determines that a
resource does not exist. For example:

```go
func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) {
  err := api.Get(AsInput(mg))
  if IsNotFound(err) {
    if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute {
      // We're in the grace period - wait a bit longer for the resource to appear.
      return managed.ExternalObservation{ResourcePending: true}, nil
    }
    // The resource does not exist.
    return managed.ExternalObservation{ResourceExists: false}, nil
  }
  if err != nil {
    return managed.ExternalObservation{}, err
  }
  return managed.ExternalObervation{ResourceExists: true}
}

func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) {
  _ := api.Create(AsInput(mg))
  meta.SetExternalCreateTime()
  return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil
}
```

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this issue Aug 16, 2021
Per crossplane-contrib#802 there seems to be some
lag between when some EC2 networking resources (RouteTables, InternetGateways)
are created and when they actually show up in queries. This commit leverages
crossplane/crossplane-runtime#280 to allow for this.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this issue Aug 16, 2021
Per crossplane-contrib#802 there seems to be some
lag between when some EC2 networking resources (RouteTables, InternetGateways)
are created and when they actually show up in queries. This commit leverages
crossplane/crossplane-runtime#280 to allow for this.

Signed-off-by: Nic Cope <negz@rk0n.org>
@negz negz removed this from In progress in v1.4 Aug 26, 2021
@negz
Copy link
Member

negz commented Aug 31, 2021

https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html#eventual-consistency

Just dropping this link @chlunde found for posterity. It confirms the EC2 API is intended to be eventually consistent.

negz added a commit to negz/crossplane-runtime that referenced this issue Aug 31, 2021
This commit is intended to address two issues that we diagnosed while
investigating crossplane-contrib/provider-aws#802.

The first issue is that controller-runtime does not guarantee reads from cache
will return the freshest version of a resource. It's possible we could create an
external resource in one reconcile, then shortly after trigger another in which
it appears that the managed resource was never created because we didn't record
its external-name. This only affects the subset of managed resources with
non-deterministic external-names that are assigned during creation.

The second issue is that some external APIs are eventually consistent. A newly
created external resource may take some time before our ExternalClient's observe
call can confirm it exists. AWS EC2 is an example of one such API.

This commit attempts to address the first issue by making an Update to a managed
resource immediately before Create it called. This Update call will be rejected
by the API server if the managed resource we read from cache was not the latest
version.

It attempts to address the second issue by allowing managed resource controller
authors to configure an optional grace period that begins when an external
resource is successfully created. During this grace period we'll requeue and
keep waiting if Observe determines that the external resource doesn't exist,
rather than (re)creating it.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/crossplane-runtime that referenced this issue Aug 31, 2021
This commit is intended to address two issues that we diagnosed while
investigating crossplane-contrib/provider-aws#802.

The first issue is that controller-runtime does not guarantee reads from cache
will return the freshest version of a resource. It's possible we could create an
external resource in one reconcile, then shortly after trigger another in which
it appears that the managed resource was never created because we didn't record
its external-name. This only affects the subset of managed resources with
non-deterministic external-names that are assigned during creation.

The second issue is that some external APIs are eventually consistent. A newly
created external resource may take some time before our ExternalClient's observe
call can confirm it exists. AWS EC2 is an example of one such API.

This commit attempts to address the first issue by making an Update to a managed
resource immediately before Create it called. This Update call will be rejected
by the API server if the managed resource we read from cache was not the latest
version.

It attempts to address the second issue by allowing managed resource controller
authors to configure an optional grace period that begins when an external
resource is successfully created. During this grace period we'll requeue and
keep waiting if Observe determines that the external resource doesn't exist,
rather than (re)creating it.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/crossplane-runtime that referenced this issue Sep 1, 2021
This commit is intended to address two issues that we diagnosed while
investigating crossplane-contrib/provider-aws#802.

The first issue is that controller-runtime does not guarantee reads from cache
will return the freshest version of a resource. It's possible we could create an
external resource in one reconcile, then shortly after trigger another in which
it appears that the managed resource was never created because we didn't record
its external-name. This only affects the subset of managed resources with
non-deterministic external-names that are assigned during creation.

The second issue is that some external APIs are eventually consistent. A newly
created external resource may take some time before our ExternalClient's observe
call can confirm it exists. AWS EC2 is an example of one such API.

This commit attempts to address the first issue by making an Update to a managed
resource immediately before Create it called. This Update call will be rejected
by the API server if the managed resource we read from cache was not the latest
version.

It attempts to address the second issue by allowing managed resource controller
authors to configure an optional grace period that begins when an external
resource is successfully created. During this grace period we'll requeue and
keep waiting if Observe determines that the external resource doesn't exist,
rather than (re)creating it.

Signed-off-by: Nic Cope <negz@rk0n.org>
github-actions bot pushed a commit to crossplane/crossplane-runtime that referenced this issue Sep 7, 2021
This commit is intended to address two issues that we diagnosed while
investigating crossplane-contrib/provider-aws#802.

The first issue is that controller-runtime does not guarantee reads from cache
will return the freshest version of a resource. It's possible we could create an
external resource in one reconcile, then shortly after trigger another in which
it appears that the managed resource was never created because we didn't record
its external-name. This only affects the subset of managed resources with
non-deterministic external-names that are assigned during creation.

The second issue is that some external APIs are eventually consistent. A newly
created external resource may take some time before our ExternalClient's observe
call can confirm it exists. AWS EC2 is an example of one such API.

This commit attempts to address the first issue by making an Update to a managed
resource immediately before Create it called. This Update call will be rejected
by the API server if the managed resource we read from cache was not the latest
version.

It attempts to address the second issue by allowing managed resource controller
authors to configure an optional grace period that begins when an external
resource is successfully created. During this grace period we'll requeue and
keep waiting if Observe determines that the external resource doesn't exist,
rather than (re)creating it.

Signed-off-by: Nic Cope <negz@rk0n.org>
(cherry picked from commit a3a59c9)
github-actions bot pushed a commit to crossplane/crossplane-runtime that referenced this issue Sep 7, 2021
This commit is intended to address two issues that we diagnosed while
investigating crossplane-contrib/provider-aws#802.

The first issue is that controller-runtime does not guarantee reads from cache
will return the freshest version of a resource. It's possible we could create an
external resource in one reconcile, then shortly after trigger another in which
it appears that the managed resource was never created because we didn't record
its external-name. This only affects the subset of managed resources with
non-deterministic external-names that are assigned during creation.

The second issue is that some external APIs are eventually consistent. A newly
created external resource may take some time before our ExternalClient's observe
call can confirm it exists. AWS EC2 is an example of one such API.

This commit attempts to address the first issue by making an Update to a managed
resource immediately before Create it called. This Update call will be rejected
by the API server if the managed resource we read from cache was not the latest
version.

It attempts to address the second issue by allowing managed resource controller
authors to configure an optional grace period that begins when an external
resource is successfully created. During this grace period we'll requeue and
keep waiting if Observe determines that the external resource doesn't exist,
rather than (re)creating it.

Signed-off-by: Nic Cope <negz@rk0n.org>
(cherry picked from commit a3a59c9)
negz added a commit to negz/crossplane-runtime that referenced this issue Sep 7, 2021
This commit is intended to address two issues that we diagnosed while
investigating crossplane-contrib/provider-aws#802.

The first issue is that controller-runtime does not guarantee reads from cache
will return the freshest version of a resource. It's possible we could create an
external resource in one reconcile, then shortly after trigger another in which
it appears that the managed resource was never created because we didn't record
its external-name. This only affects the subset of managed resources with
non-deterministic external-names that are assigned during creation.

The second issue is that some external APIs are eventually consistent. A newly
created external resource may take some time before our ExternalClient's observe
call can confirm it exists. AWS EC2 is an example of one such API.

This commit attempts to address the first issue by making an Update to a managed
resource immediately before Create it called. This Update call will be rejected
by the API server if the managed resource we read from cache was not the latest
version.

It attempts to address the second issue by allowing managed resource controller
authors to configure an optional grace period that begins when an external
resource is successfully created. During this grace period we'll requeue and
keep waiting if Observe determines that the external resource doesn't exist,
rather than (re)creating it.

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this issue Sep 8, 2021
negz added a commit to negz/provider-aws that referenced this issue Sep 8, 2021
negz added a commit to negz/provider-aws that referenced this issue Sep 8, 2021
This commit fixes crossplane-contrib#802

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this issue Oct 12, 2021
This commit fixes crossplane-contrib#802

Signed-off-by: Nic Cope <negz@rk0n.org>
negz added a commit to negz/provider-aws that referenced this issue Nov 11, 2021
This commit fixes crossplane-contrib#802

Signed-off-by: Nic Cope <negz@rk0n.org>
tektondeploy pushed a commit to gtn3010/provider-aws that referenced this issue Mar 12, 2024
Request for REDSHIFT SERVERLESS resource
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment