Delete dependency handling PoC using Kyverno #4072
Replies: 2 comments
-
|
When applied to the "Crossplane managing EKS" use-case this would tackle the deletion ordering of child-cluster resources that were created by Crossplane, but not child-cluster resources that might have been created outside of Crossplane, correct? (for example: admin stands up EKS cluster claim, then app devs apply some |
Beta Was this translation helpful? Give feedback.
-
|
Correct - this is only targeted at resources that Crossplane knows about/manages, and the logical deletion dependencies that the composition author knows about (Release depends on Cluster, etc) but that Crossplane itself cannot derive. Reconciliation resolves the logical creation dependencies, because the Release will keep trying to install until the Cluster becomes ready, but there is no method to automatically derive the deletion dependencies - there is no record of the creation process that would result in a dependency graph usable for deletion. External resources that Crossplane doesn't know about are not in scope for this effort. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a description of a PoC that uses Kyverno to provide deletion dependency handling. It will block the deletion of a Crossplane Composite or Managed resource that is being used by another resource, as indicated by a label that contains the UID of the depended-on resource.
This information is for reference only - while we did use this PoC for several months in lab testing, the implementation has been changed to Python admission webhooks with a customized
kopfframework, which is a lot less resource-intensive solution.Deletion Dependencies in Crossplane
Background
Crossplane has well-documented issues with leaving orphaned resources in the local cluster and in
the remote cloud provider when a composite resource combines cloud infrastructure and
applications. There are several issues that combine to cause these problems:
Kubernetes, and Crossplane, default to using Background deletion when resources are deleted, causing all Managed
and Composite resources to be deleted simultaneously.
Resources may have implicit dependencies that are not represented in Crossplane but are resolved by eventual
consistency. For example a Release object may depend on a combination of Cluster and Nodegroup objects, and it will
eventually be created once the Cluster is available. If those resources are deleted before the Release
is finished deleting, the Release may be unable to finish the delete process. This causes the ProviderConfig and
ProviderConfigUsage objects to be orphaned because they are still in use even though the underlying Cluster is gone.
Cloud provider resources such as Load Balancers, Security Groups, Elastic Network Interfaces and DNS records
may be orphaned because they were allocated by a Helm chart or other resource that was not properly cleaned up,
and the cloud resources are not directly visible to Crossplane.
The first issue was resolved in the 1.10 release of Crossplane with the addition of the compositeDeletePolicy
attribute on the Claim. When this attribute is set to Foreground it causes Crossplane to delete the Composite
resource using the Foreground propagation policy, which causes a "bottom up" deletion path. That is
a requirement for the next step in providing support for deletion dependencies, which is "usage tracking" or manual
specification of delete dependencies between resources.
NOTE: the Claim MUST have the
compositeDeletePolicyset toForegroundfor the deletion to workproperly using the methods described below. If there is no Claim and the Composite is being deleted manually the
kubectl deletecommand must include--cascade=foregroundto trigger Foreground Deletion.If the default
Backgroundpolicy is used none of these changes will have any effect.Disclaimer
Everything described here should be considered as Proof of Concept only. This is NOT production grade and it
WILL cause problems in varied and unusual ways. There are better ways to do this but they require time
and development effort. Kyverno enables this functionality through policies but it can shut down your cluster when not
configured properly.
Overview
It is possible to control the deletion of Crossplane composite/managed resources
using a combination of a Custom Resource, Foreground Cascading Deletion, Kyverno policies,
and labels on resources in a Composition. The labels
specify dependencies between resources that Crossplane cannot identify, and a Kyverno policy uses these labels
to create instances of a Custom Resource to keep track of the dependencies. A second set of Kyverno policies is used to
monitor all DELETE API requests, look for existing dependencies on the object being deleted, and reject the API
call if there is a dependency. Kubernetes will continue to attempt to delete the resource until all
dependencies have been deleted and the policy allows the resource to be deleted.
Details
Custom Resource
A Custom Resource is used to track the dependencies between Crossplane resources. It can be as simple as:
Note that the resource has no spec attributes, as metadata annotations and labels are used to store the necessary
information.
Kyverno Setup
Kyverno is being used to implement policies that control the creation of Dependencies and the deletion of
resources that have dependencies. In this scenario Kyverno is only handling the policies that are defined here, so the
deployment and it's webhook configuration have been customized to be specific to this task.
NOTE that Kyverno 1.8.1 (helm chart 2.6.1) is the latest version of Kyverno that supports the policies defined below.
Changes were made in Kyverno 1.8.2 that prevent the policies from executing as desired.
Helm Chart Deployment
Kyverno should be deployed with the following settings:
This will prevent Kyverno from maintaining (but not creating) the webhooks that send admission requests to the Kyverno
services and deploy 3 Kyverno pods. The default Kyverno webhook configurations are modified below.
ClusterRole
Define a Cluster Role that allows Kyverno to create Dependency objects:
Webhook Configuration
Kyverno will deploy a number of webhook configurations by default and two of them need to be updated to restrict
the number of admission requests that are sent to Kyverno.
Kyverno creates a
MutatingWebhookConfigurationnamedkyverno-resource-mutating-webhook-cfgthat is modifiedto look like this:
The major changes are the addition of the
objectSelector.matchExpressionsand the removal of alloperationsexceptCREATE. The
timeoutSecondshas also been changed to 30 but that is not required. This configuration sends all API CREATE (POST) requests to Kyverno IF they have acrossplane.io/compositelabelAND a
dependsOnUidlabel. That will significantly reduce the number of requests sent to Kyverno.Kyverno also creates a
ValidatingWebhookConfigurationobject namedkyverno-resource-validating-webhook-cfgwhichare modified to look like this:
The major changes are the addition of the
objectSelector.matchExpressionsand the removal of alloperationsexceptDELETE. The
timeoutSecondshas also been changed to 30 but that is not required. This configuration will only send DELETE requests to Kyverno IF the resource has acrossplane.io/compositelabel.This prevents the configuration from affecting any Kubernetes resource that is NOT a Composite or Managed resource
derived from a Composite/Composition.
Policies
Now that Kyverno is running and specific API requests are being routed to it, it needs policies to execute.
The first policy executes on resource creation and will create a
Dependencyobject when the new resourcehas the
dependsOnUidlabel.This policy looks for incoming CREATE requests for resources that have the
crossplane.io/compositeanddependsOnUidlabels, and if those match/preconditions succeed, it creates (generates) a
Dependencyobject that keeps track ofthe dependency between the resource that is being created and the resource identified by the UID in the label.
The Dependency object is configured to have the creating resource as it's ownerReference so that it will be
deleted automatically when the parent resource is deleted.
The second policy executes on resource deletion and has two components.
The first component will look for any existing
Dependencyobjects that are referring to the resource that isbeing deleted. If any
Dependencyobjects are found for that resource, the DELETE requestis rejected (409 Conflict).
The second component looks for DELETE requests on
Dependencyobjects and determines whether the owner of theDependency still exists. If the owner still exists the DELETE request is rejected. This prevents the
Dependencyobject from being deleted by Foreground Cascading Deletion before the depending object has been deleted.
Putting It All Together
The last step is to add the
dependsOnUidlabel to the resources in the Composition that need to be deletedin a specific order.
For example in a Composition that deploys a Composite resource to create an EKS cluster, and then deploys a
provider-helm
Releasemanaged resource into that cluster it may be necessary to prevent the Clustercomposite object from being deleted until the
Releaseis deleted. That dependency can be specified in theComposition.
First save the UID of the resource that is going to be depended-on:
Now patch that UID value into the
dependsOnUidlabel for theReleaseobject:When the Composition is executed by Crossplane, the
Releaseobject will wait until the UID of theXClusterresource is available, and then will create the
Release. Kyverno will see the CREATE request and thedependsOnUidlabel and will create a
Dependencyobject with the associated information.kubectl lineagewill show therelationship between the
Releaseand theDependencyobjects.NOTE that the Claim MUST have the
compositeDeletePolicyset toForegroundfor the deletion to workproperly. If the default
Backgroundpolicy is used theDependencyresources will have no effect.When the
Claimis deleted, Crossplane will see thecompositeDeletePolicyis set toForegroundand will requestdeletion of the associated Composite resource using
Foregroundpropagation. Deletion of the Composite will triggerdeletion of all of the resources owner by the Composite, in this case the
XClustercomposite and theReleaseresource.
Deletion of the
XClusterwill be rejected by Kyverno because theDependencyobject exists. Kubernetes will retrythe deletion until it is allowed to succeed by the policy. At the same time the deletion of the
Releaseobjectproceeds normally as it has no
Dependencies. Kubernetes will request deletion of theDependencyobject that isowned by the
Releasebut Kyverno will block it until theReleaseitself has been deleted.Once the
Releaseand theDependencyobjects have been deleted, the next attempt by Kubernetes to delete theXClusterobject will succeed, and the rest of the deletion process will continue.Note that the deletion retries by Kubernetes use an exponential backoff with a maximum of 1000 seconds, so there
may be (long) pauses observed between the deletion of the
Dependencyobject and the deletion of the resourcethat was depended-on.
kubectl lineageis an excellent way to track the deletion process and to observe the impact of theDependencyresources on the process.
Beta Was this translation helpful? Give feedback.
All reactions