-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaked RouteTable
due to eventually consistent DescribeRouteTables
API
#802
Leaked RouteTable
due to eventually consistent DescribeRouteTables
API
#802
Comments
So the data that I provided when I opened this defect was collected from logs from a CI/CD pipeline. A short while ago I performed a manual test where I I had better ability to collect logs. I deleted a claim and all of the API objects except the composition were deleted from k8s. The last conditions for the composition are However at least at the moment if I try to delete the VPC in the AWS console the dialog indicates that it will be deleted and these resources will also be deleted That said, CloudTrails shows that Crossplane is still trying to delete the VPC and the delete attempts are failing with Client.DependencyViolation. Those two dependent resources above are a couple of un-named route tables. Not sure why they are there. While I am not familiar with the AWS api, at the moment I am thinking that there is a Crossplane bug where either
|
I examined a teammate's deploy and I see there are 3 named route tables, clearly from the composition, and one nameless one. |
The application of the claim happens with in k8s job. I just thought to check the logs of this job for both my teammate's deployment and mine. I see in my teammate's deployment the job ran once. However in my deployment it ran twice. The first run failed while the second succeeded. The log for my deployment's first run ends with
The log for the second run ends with
That failure of the first run is something we have been seeing since we started using Crossplane but we have been ignoring since k8s always re-runs the job until there is success and it usually takes at most 3 tries to achieve this. Am wondering now if the fact that my deployment has two unexpected, at least to me, nameless route tables associated with the VPC while my teammate's deployment just has one is somehow related to the circumstances around the job running multiple times. Say possibly the error from the API server also causes Crossplane to do something that causes the extra nameless route table to be created or us applying the claim multiple times causes it even though kubectl is reporting that vpc.aws.platform.dmi.sap.com/vpc-mc-i540621-dmi unchanged on the second run. |
Hmm, just looked at CloudTrails events for vpc-14449-d66-dm ( the ci/cd case ) and vpc-0623d044576ddef33 ( my manual case ) and I think either there are two different problems or the nameless route tables are irrevevant. In the case of vpc-14449-d66-dm only 3 route tables ( the expected number given the xrd ) were created while in the case of my manual those nameless route tables are hanging around. |
@dee0 are the "nameless" route tables being created by another process? |
@hasheddan The nameless routes are not being created by another process, they are being created by Crossplane. |
Wonder if this could be related at all to either |
Aws provider log for my manual test ( the one involving vpc-mc-i540621-dmi ) |
Btw the problem of leaked resources, whatever the cause(s), happens often but not always. A cluster I was comparing against yesterday was successfully cleaned up this morning. |
It would be interesting to understand more about the unexpected / "nameless" RouteTables before any attempt is made to delete resources and clean the cluster up. I want to see if we can find early on, before deletions come into play, if there are any leaked or untracked resources. For any unexpected RouteTable, can you confirm:
|
@jbw976 I'll try to get the answers as soon as I can. |
Spent the last while reviewing about VPC in AWS, 7 orphaned and the rest in use.
None of the VPC, orphaned otherwise, have nameless route tables associated with them. So I am still of the opinion that we have two, at least, ways that VPC are being orphaned. One where we run out of things to wait on and yet the VPC hasn't been removed from AWS and one where nameless route tables prevent deletion. |
A word on the nameless routebables. Looks like there should be at least one associated with each VPC Not sure how we ended up with two nameless route tables though. I did just performed a deploy where there was no problems and I saw that when I looked at the details of the single nameless route table associated with my VPC there something that said it was the 'main' route table. If/when I see the case of multiple nameless routetables I'll look at the both more carefully to try and understand what the diff between them is. That said, I would like to re-iterate that I think there are two different cases we are dealing with
I would also like to ask again, why does Crossplane make so many failed attempts to delete the VPC even in the success case? |
We have made some changes so that we can capture debug logs from Crossplane. The spreadsheet in the attached zip contains log data from Each of the above is in its own worksheet and there is also a worksheet that combines the logs from all 3 sources. The clocks are a little out of sync so the messages in the combined tab aren't 100% in the correct order. e.g. The clocks for the cluster and jenkins are about 1/10 second out of sync. In the combined worksheet the first deletion message is at line 24, 2021-07-22 02:33:27.457 cluster time. Here's where you can see clocks are out of sync cause it proceeds the delete from the Jenkins log. By line 554, 2021-07-22 02:35:29.990 jenkins time, our attempts to perform a controlled cleanup have completed. As I mentioned near the beginning of this, it seems to me the problem is down to our clean up code not being able to accurately see what is happening with the VPC. Looking at the 'combined' worksheet : At line 33, 2021-07-22 02:33:27.522 jenkins time, we delete the claim vpc-mc-14814-2e1-dmi. At line 34, 2021-07-22 02:33:27.522 jenkins time, we try to wait for the managed resource VPC vpc-mc-14814-2e1-dmi-ftm9p-9k8gb. At line 36, 2021-07-22 02:33:28.000 jenkins time, you can see this failed with a 'resource not found' error. However we can see from log messages a couple of minutes later the VPC still exists in AWS and Crossplane is still trying to delete it. e.g. Line 504, 2021-07-22 02:35:24.000 AWS time So I think our problems begins with kubernetes/crossplane returning that 'not found' error when we tried to wait on vpc-mc-14814-2e1-dmi-ftm9p-9k8gb. <!-- @hasheddan Think this is the bug Some other things of note in the combined tab
One last thing. By line 554, 2021-07-22 02:35:29.990 jenkins time, our attempts clean things up nicely have ended and clean up is turned over to Gardner. As mentioned before Gardener tries to delete everything it can find via the cluster api server. So after this you point you will see chaos. |
Just checked another case of leaked resource and I see pattern.
Btw for each of these cases we are only creating one VPC. So it isn't the case that get managed is returning some VPC other than the one we deleted. |
@hasheddan @jbw976 Have a couple more pieces of info
So as I said a while ago, seems like we are running into multiple problems. While my team can address the name collision problem easily enough I am not so sure about the extraneous route table. |
2021-07-29-extra-route-table.zip In the attached zip are
The extra route table was created at about July 29, 2021, 09:58:23 (UTC-07:00) |
grep -Pi route.*created kibana.csv |
2nd-and-3rd-routetable-messages.zip In the attached zip are log messages that I have extracted from kibana.csv ( see above ) and normalized for comparison
I normalized contents of these files by removing the timestamps, replacing the last portion of the route table name with 'tableid' and by replacing the values for resourceVersion with v1, v2, .... Comparing the files I see
|
I'm thinking the spurious RouteTable creation could be a race in the relevant controller. RouteTables are one of the resources where we rely on the The flow is:
You could imagine a race where 1. is called twice before 2. succeeds. I would imagine in this case we would:
That said, I'm pretty sure we're running with |
Fwiw we haven't done anything to increase the concurrency. Have been tracing through the code to see where an extra event might come from or where they might be allowed through. I up to the the point where the rate limiting queue receives events. Haven't seen anything yet. Btw my background is mostly with C++ and Java and with those languages I am used to having a thread id in log messages. That takes away a lot of the mystery when diagnosing problems of concurrency. Is there a way we can enable that for goroutines? Just thinking of how we might confirm your theory. |
@negz If you haven't already would you please use a visual diff tool ( e.g. winmerge on Windows ) to compare vpc-mc-i540621-dmi-ds67r-k6xxz-events.txt and vpc-mc-i540621-dmi-ds67r-pxd9x-events.txt? ( See my post a bit earlier today ) Am hoping you will recognize something I do not, something that confirms your theory. |
@dee0 The fact that |
Checked the log and confirmed the workcount for the routetable controler is 1 "2021-07-29T16:58:02.288Z","2021-07-29T16:58:02.288Z INFO controller-runtime.manager.controller.managed/routetable.ec2.aws.crossplane.io Starting workers {""reconciler group"": ""ec2.aws.crossplane.io"", ""reconciler kind"": ""RouteTable"", ""worker count"": 1}" |
@hasheddan @negz I think I might have a clue about how the scenario Nic described is happening. In the attached zip is a file where I have taken the log messages related to the problematic managed resource vpc-mc-i540621-dmi-ds67r-k6xxz- and
Note: The order of the messages logs collected from Kibana aren't exactly in the right order because messages are sent as datagrams to Kibana. I was able to fix the order by sorting the messages on the timestamp that was part of the original log message. Anyhoo with the order corrected I see that when the first route table is created for vpc-mc-i540621-dmi-ds67r-k6xxz Reconcile first reports a resource version id of 34683 and then 35015. So it seems show how in the later call to Reconcile it fetched an out of date version of the RouteTable vpc-mc-i540621-dmi-ds67r-k6xxz. Please review the log messages from |
Been reading through the code and at the moment I think the problem is with the caching. The provider-aws messages use data from a managed resource object filled by a call to client.Get which uses the cache. I'll try and sort out when the cache actually does get updated to see why this problem doesn't happen all the time for us. |
Think I see where the concurrency problem comes from. There are 5 categories of goroutines involved in processing resource events. (2) sharedIndexInformer::Run (3) processorListener:pop (4) processListener:run (5) Controller's workers So here is what is happening
I think ideally controller-runtime would utilize a write-through cache. At the moment I suspect that would be major surgery. Not sure, but perhaps the Reconciler could detect this situation and requeue the request. |
@hasheddan @negz Think this may be what happened with the reconciles
Read through the code yesterday and maybe adding a right through cache isn't as big of a job as I thought. Still pretty big, but not as big. :) |
negz asked me to write up some of the options that have been floated for fixing this problem.
|
Another bandaid option Instead of having ResolveReferences update the managed object in kubernetes have it return all of the external ids of the references if they are available. If they are not then requeue as is done today. Then just set them on the managed object when setting the external-name. |
@negz I am sure on our side we are hitting the issue that crossplane/crossplane-runtime#279 fixes. So I wouldn't say it is hypothetical. From the file vpc-mc-i540621-dmi-ds67r-k6xxz-events-with-times-order-fixed.csv, which I supplied some time ago, we have the sequence below. Note the resource versions and external-name values. When that second create is happening external-name isn't set. And this makes sense because when external-name was set originally that produced version 35015 but when the second create is happening Reconcile is working with 35012.
|
@dee0sap Understood - thanks for pointing that out. I haven't been able to reproduce that particular variant yet but it does make sense that it's possible. Adding a note here that the same eventually consistent AWS API issue appears to be affecting InternetGateways too - they just happen to not block deletion of the VPC. I noticed a bunch were building up during my testing, and confirmed that the Terraform folks have observed this too per https://github.com/hashicorp/terraform-provider-aws/blob/915f5f5dc3073de0f25ed216dca421bf76e114b9/aws/resource_aws_internet_gateway.go#L365. |
Per crossplane-contrib/provider-aws#802 some external APIs (including some AWS APIs) appear to experience some delay between the time a new resource is successfully created and the time at which that resource appears in queries. This commit adds a new 'crossplane.io/external-create-time' annotation and a new 'ResourcePending' field in the Observation struct. Together these can be used by an Observe method to allow for a small grace period before it determines that a resource does not exist. For example: ```go func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) { err := api.Get(AsInput(mg)) if IsNotFound(err) { if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute { // We're in the grace period - wait a bit longer for the resource to appear. return managed.ExternalObservation{ResourcePending: true}, nil } // The resource does not exist. return managed.ExternalObservation{ResourceExists: false}, nil } if err != nil { return managed.ExternalObservation{}, nil } return managed.ExternalObervation{ResourceExists: true} } func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) { _ := api.Create(AsInput(mg)) meta.SetExternalCreateTime() return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil } ``` Signed-off-by: Nic Cope <negz@rk0n.org>
Per crossplane-contrib/provider-aws#802 some external APIs (including some AWS APIs) appear to experience some delay between the time a new resource is successfully created and the time at which that resource appears in queries. This commit adds a new 'crossplane.io/external-create-time' annotation and a new 'ResourcePending' field in the Observation struct. Together these can be used by an Observe method to allow for a small grace period before it determines that a resource does not exist. For example: ```go func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) { err := api.Get(AsInput(mg)) if IsNotFound(err) { if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute { // We're in the grace period - wait a bit longer for the resource to appear. return managed.ExternalObservation{ResourcePending: true}, nil } // The resource does not exist. return managed.ExternalObservation{ResourceExists: false}, nil } if err != nil { return managed.ExternalObservation{}, nil } return managed.ExternalObervation{ResourceExists: true} } func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) { _ := api.Create(AsInput(mg)) meta.SetExternalCreateTime() return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil } ``` Signed-off-by: Nic Cope <negz@rk0n.org>
Per crossplane-contrib/provider-aws#802 some external APIs (including some AWS APIs) appear to experience some delay between the time a new resource is successfully created and the time at which that resource appears in queries. This commit adds a new 'crossplane.io/external-create-time' annotation and a new 'ResourcePending' field in the Observation struct. Together these can be used by an Observe method to allow for a small grace period before it determines that a resource does not exist. For example: ```go func Observe(ctx context.Context, mg resource.Managed) (managed.ExternalObservation, error) { err := api.Get(AsInput(mg)) if IsNotFound(err) { if t := meta.GetExternalCreateTime(); t != nil && time.Since(t.Time) < 1 * time.Minute { // We're in the grace period - wait a bit longer for the resource to appear. return managed.ExternalObservation{ResourcePending: true}, nil } // The resource does not exist. return managed.ExternalObservation{ResourceExists: false}, nil } if err != nil { return managed.ExternalObservation{}, err } return managed.ExternalObervation{ResourceExists: true} } func Create(ctx context.Context, mg resource.Managed) (managed.ExternalCreation, error) { _ := api.Create(AsInput(mg)) meta.SetExternalCreateTime() return managed.ExternalCreation{ExternalCreateTimeSet: true}, nil } ``` Signed-off-by: Nic Cope <negz@rk0n.org>
Per crossplane-contrib#802 there seems to be some lag between when some EC2 networking resources (RouteTables, InternetGateways) are created and when they actually show up in queries. This commit leverages crossplane/crossplane-runtime#280 to allow for this. Signed-off-by: Nic Cope <negz@rk0n.org>
Per crossplane-contrib#802 there seems to be some lag between when some EC2 networking resources (RouteTables, InternetGateways) are created and when they actually show up in queries. This commit leverages crossplane/crossplane-runtime#280 to allow for this. Signed-off-by: Nic Cope <negz@rk0n.org>
Just dropping this link @chlunde found for posterity. It confirms the EC2 API is intended to be eventually consistent. |
This commit is intended to address two issues that we diagnosed while investigating crossplane-contrib/provider-aws#802. The first issue is that controller-runtime does not guarantee reads from cache will return the freshest version of a resource. It's possible we could create an external resource in one reconcile, then shortly after trigger another in which it appears that the managed resource was never created because we didn't record its external-name. This only affects the subset of managed resources with non-deterministic external-names that are assigned during creation. The second issue is that some external APIs are eventually consistent. A newly created external resource may take some time before our ExternalClient's observe call can confirm it exists. AWS EC2 is an example of one such API. This commit attempts to address the first issue by making an Update to a managed resource immediately before Create it called. This Update call will be rejected by the API server if the managed resource we read from cache was not the latest version. It attempts to address the second issue by allowing managed resource controller authors to configure an optional grace period that begins when an external resource is successfully created. During this grace period we'll requeue and keep waiting if Observe determines that the external resource doesn't exist, rather than (re)creating it. Signed-off-by: Nic Cope <negz@rk0n.org>
This commit is intended to address two issues that we diagnosed while investigating crossplane-contrib/provider-aws#802. The first issue is that controller-runtime does not guarantee reads from cache will return the freshest version of a resource. It's possible we could create an external resource in one reconcile, then shortly after trigger another in which it appears that the managed resource was never created because we didn't record its external-name. This only affects the subset of managed resources with non-deterministic external-names that are assigned during creation. The second issue is that some external APIs are eventually consistent. A newly created external resource may take some time before our ExternalClient's observe call can confirm it exists. AWS EC2 is an example of one such API. This commit attempts to address the first issue by making an Update to a managed resource immediately before Create it called. This Update call will be rejected by the API server if the managed resource we read from cache was not the latest version. It attempts to address the second issue by allowing managed resource controller authors to configure an optional grace period that begins when an external resource is successfully created. During this grace period we'll requeue and keep waiting if Observe determines that the external resource doesn't exist, rather than (re)creating it. Signed-off-by: Nic Cope <negz@rk0n.org>
This commit is intended to address two issues that we diagnosed while investigating crossplane-contrib/provider-aws#802. The first issue is that controller-runtime does not guarantee reads from cache will return the freshest version of a resource. It's possible we could create an external resource in one reconcile, then shortly after trigger another in which it appears that the managed resource was never created because we didn't record its external-name. This only affects the subset of managed resources with non-deterministic external-names that are assigned during creation. The second issue is that some external APIs are eventually consistent. A newly created external resource may take some time before our ExternalClient's observe call can confirm it exists. AWS EC2 is an example of one such API. This commit attempts to address the first issue by making an Update to a managed resource immediately before Create it called. This Update call will be rejected by the API server if the managed resource we read from cache was not the latest version. It attempts to address the second issue by allowing managed resource controller authors to configure an optional grace period that begins when an external resource is successfully created. During this grace period we'll requeue and keep waiting if Observe determines that the external resource doesn't exist, rather than (re)creating it. Signed-off-by: Nic Cope <negz@rk0n.org>
This commit is intended to address two issues that we diagnosed while investigating crossplane-contrib/provider-aws#802. The first issue is that controller-runtime does not guarantee reads from cache will return the freshest version of a resource. It's possible we could create an external resource in one reconcile, then shortly after trigger another in which it appears that the managed resource was never created because we didn't record its external-name. This only affects the subset of managed resources with non-deterministic external-names that are assigned during creation. The second issue is that some external APIs are eventually consistent. A newly created external resource may take some time before our ExternalClient's observe call can confirm it exists. AWS EC2 is an example of one such API. This commit attempts to address the first issue by making an Update to a managed resource immediately before Create it called. This Update call will be rejected by the API server if the managed resource we read from cache was not the latest version. It attempts to address the second issue by allowing managed resource controller authors to configure an optional grace period that begins when an external resource is successfully created. During this grace period we'll requeue and keep waiting if Observe determines that the external resource doesn't exist, rather than (re)creating it. Signed-off-by: Nic Cope <negz@rk0n.org> (cherry picked from commit a3a59c9)
This commit is intended to address two issues that we diagnosed while investigating crossplane-contrib/provider-aws#802. The first issue is that controller-runtime does not guarantee reads from cache will return the freshest version of a resource. It's possible we could create an external resource in one reconcile, then shortly after trigger another in which it appears that the managed resource was never created because we didn't record its external-name. This only affects the subset of managed resources with non-deterministic external-names that are assigned during creation. The second issue is that some external APIs are eventually consistent. A newly created external resource may take some time before our ExternalClient's observe call can confirm it exists. AWS EC2 is an example of one such API. This commit attempts to address the first issue by making an Update to a managed resource immediately before Create it called. This Update call will be rejected by the API server if the managed resource we read from cache was not the latest version. It attempts to address the second issue by allowing managed resource controller authors to configure an optional grace period that begins when an external resource is successfully created. During this grace period we'll requeue and keep waiting if Observe determines that the external resource doesn't exist, rather than (re)creating it. Signed-off-by: Nic Cope <negz@rk0n.org> (cherry picked from commit a3a59c9)
This commit is intended to address two issues that we diagnosed while investigating crossplane-contrib/provider-aws#802. The first issue is that controller-runtime does not guarantee reads from cache will return the freshest version of a resource. It's possible we could create an external resource in one reconcile, then shortly after trigger another in which it appears that the managed resource was never created because we didn't record its external-name. This only affects the subset of managed resources with non-deterministic external-names that are assigned during creation. The second issue is that some external APIs are eventually consistent. A newly created external resource may take some time before our ExternalClient's observe call can confirm it exists. AWS EC2 is an example of one such API. This commit attempts to address the first issue by making an Update to a managed resource immediately before Create it called. This Update call will be rejected by the API server if the managed resource we read from cache was not the latest version. It attempts to address the second issue by allowing managed resource controller authors to configure an optional grace period that begins when an external resource is successfully created. During this grace period we'll requeue and keep waiting if Observe determines that the external resource doesn't exist, rather than (re)creating it. Signed-off-by: Nic Cope <negz@rk0n.org>
This commit fixes crossplane-contrib#802 See https://github.com/crossplane/crossplane-runtime/releases/tag/v0.13.1 Signed-off-by: Nic Cope <negz@rk0n.org>
This commit fixes crossplane-contrib#802 See https://github.com/crossplane/crossplane-runtime/releases/tag/v0.13.1 Signed-off-by: Nic Cope <negz@rk0n.org>
This commit fixes crossplane-contrib#802 Signed-off-by: Nic Cope <negz@rk0n.org>
This commit fixes crossplane-contrib#802 Signed-off-by: Nic Cope <negz@rk0n.org>
This commit fixes crossplane-contrib#802 Signed-off-by: Nic Cope <negz@rk0n.org>
Request for REDSHIFT SERVERLESS resource
What happened?
After deleting a composite and waiting, with kubectl wait, for all of the referenced api objects to be deleted, managed resources still existed in AWS.
This is a problem for us because this wait logic is a gate used to delay deletion of our k8s cluster. If the managed resources in AWS aren't cleaned up by the time we are done waiting they will most likely end up orphaned. ( We are seeing this often )
How can we reproduce it?
Here is the clean up logic we are performing before allowing the cluster to be deleted
We expect that once we have run through the above steps all managed resources have been deleted from AWS.
However this doesn't seem to be the case.
In the attached zip are example
used to create the resources in AWS.
Also in the zip is events-for-vpc-14449-d66-dmi.xlsx which shows the events around one of the leaked VPC.
At line 81 you can see the first event from my team's clean up code.
By line 31, all of the resources related to the VPC have been cleaned up from k8s. There is nothing left for our clean up code to delete or wait for.
Our last attempt to wait for a resources to be cleaned up is about 9 seconds after the last event for the VPC in CloudTrails. Given all of the deletion attempts that failed with Client.DependencyViolation I am wondering if Crossplane has given up. ( And why are all those failures there in the first place? )
Anyhoo once our cleanup code has nothing left to wait for cluster Gardener is free to begin with the cluster deletion. You can see at line 28, about 45 seconds after my team's clean up code
finished, where it begins taking destructive action. It starts with simply going through the api server and deleting all CRD that it can find.
Since Gardener is just blindly deleting things if the AWS resources haven't been cleaned up at this point it is highly likely they are going to be orphaned.
14449-d66-dmi-leak.zip
What environment did it happen in?
crossplane:v1.2.2
aws provider provider-aws:v0.18.1
kubernetes 1.19.10
The text was updated successfully, but these errors were encountered: