Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added finalizers addition and removal logic for nodes #466

Merged
merged 4 commits into from
Jun 22, 2021
Merged

Added finalizers addition and removal logic for nodes #466

merged 4 commits into from
Jun 22, 2021

Conversation

njtran
Copy link
Contributor

@njtran njtran commented Jun 21, 2021

Issue #, if available:

Description of changes:

  • Finalizers are added to nodes created by the allocation controller
  • We no longer have a draining/drained phase for termination labels.
  • Kubectl delete node works
  • Some testing and utils improvements

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pkg/cloudprovider/types.go Outdated Show resolved Hide resolved
pkg/controllers/terminating/v1alpha1/controller.go Outdated Show resolved Hide resolved
pkg/controllers/terminating/v1alpha1/controller.go Outdated Show resolved Hide resolved
pkg/controllers/terminating/v1alpha1/terminate.go Outdated Show resolved Hide resolved
persisted := n.DeepCopy()
node.RemoveFinalizer(n, i)
if err := t.kubeClient.Patch(ctx, n, client.MergeFrom(persisted)); err != nil {
return fmt.Errorf("removing finalizer from node %s, %w", n.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What actually happens in this case (later on)? Does it all get re-driven somehow? (Especially since the cloud instance probably is gone now, so (possibly?) some of the earlier calls will start failing (although, see other comments).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens right after the instance is terminated. If this patch fails, it will have ~1 hour to succeed to remove the finalizer, since that's when EC2 will then return a failure here. Does this answer your question?

pkg/controllers/terminating/v1alpha1/terminate.go Outdated Show resolved Hide resolved
pkg/utils/node/finalizer.go Outdated Show resolved Hide resolved
pkg/utils/node/finalizer.go Outdated Show resolved Hide resolved
@JacobGabrielson JacobGabrielson self-requested a review June 21, 2021 22:10
@@ -65,16 +66,27 @@ func NewController(kubeClient client.Client, coreV1Client corev1.CoreV1Interface

// Reconcile executes a reallocation control loop for the resource
func (c *Controller) Reconcile(ctx context.Context, object client.Object) (reconcile.Result, error) {
n := object.(*v1.Node)
if n.DeletionTimestamp == nil || !node.IsKarpenterProvisioned(n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.HasTerminationFinalizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We want to continue this reconcile loop until the finalizer is removed. Replacing the line with
if !functional.ContainsString(n.Finalizers, provisioning.KarpenterFinalizer) {

pkg/utils/node/predicates.go Outdated Show resolved Hide resolved
pkg/utils/ptr/ptr.go Outdated Show resolved Hide resolved
pkg/utils/node/finalizer.go Outdated Show resolved Hide resolved
pkg/utils/node/finalizer.go Outdated Show resolved Hide resolved
@@ -109,7 +125,7 @@ func ExpectCleanedUp(c client.Client) {
nodes := v1.NodeList{}
Expect(c.List(ctx, &nodes)).To(Succeed())
for _, node := range nodes.Items {
ExpectDeleted(c, &node)
ExpectDeletedNode(c, &node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExpectNodeDeleted sounds better to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you can't use ExpectDeleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our nodes now have the finalizers added with this code, so ExpectDeleted will only call delete, but not patch the finalizer out.

I was toying with ExpectNodeDeleted or ExpectDeletedNode and decided that the former sounds like we're not doing any deletion action, but just checking for existence. The latter sounds more like we're doing an action then checking to me.

pkg/test/expectations/expectations.go Outdated Show resolved Hide resolved
}
zap.S().Debugf("Deleted node %s", n.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we change the language from Terminated to Deleted?

Copy link
Contributor Author

@njtran njtran Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have Terminated instance above, but Deleted node right here. These two log lines assume that the user knows the difference between an instance and node, and are useful sometimes where the APIServer is failing.

In most cases where we terminate the instance, we should be able to remove the finalizer in quick succession. In that sense, I don't want to print two log lines back to back for the same node, but rather an additional Debug line for those who have the verbosity up.

pkg/controllers/terminating/v1alpha1/terminate.go Outdated Show resolved Hide resolved
}

// 2. Drain terminable nodes
drained, err := c.terminator.drainNode(ctx, n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we discussed this, did we decide we were going to hold off on draining until the next reconcile loop, otherwise new pods could schedule during draining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our discussion was with the assumption that we would patch the spec at the higher level GenericController, which we recently made the realization that we only want to patch the Status here.

Since we are patching the node here, not the status, we want to cordon and patch this node before continuing on to drain.

@@ -31,6 +31,16 @@ func UnionStringMaps(maps ...map[string]string) map[string]string {
return result
}

func StringsWithout(vals []string, remove string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this StringSliceWithout to match the next fxn name in the file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's a little surprising this doesn't return a (modified) copy of the string slice rather than the original.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so that it creates a new slice, so that it won't mess with the original slice.

@@ -30,7 +30,7 @@ var (
OperatingSystemLabelKey,
ProvisionerNameLabelKey,
ProvisionerNamespaceLabelKey,
ProvisionerPhaseLabel,
ProvisionerUnderutilizedKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LabelKey

@njtran njtran merged commit e1d4251 into aws:main Jun 22, 2021
}
zap.S().Debugf("Deleted node %s", node.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed to debug?

@njtran njtran deleted the finalizers branch July 1, 2021 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants