Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement nodegroup updates #1469

Merged

Conversation

baldorch
Copy link
Contributor

@baldorch baldorch commented Sep 9, 2022

Description of your changes

This PR implements almost full NodeGroupUpdateVersion behaviour. It allows for updating the releaseVersion and the launchTemplate .version properties of a NodeGroup. It also late initialises the launchTemplate.version property after creation, so that reconciliation can be performed without ambiguities.

Fixes #1073 by triggering an Update on releaseVersion change
Fixes #1455 by triggering an Update on launchTemplate.version change
Fixes #1474 by implementing NodegroupUpdateConfig and forcing if configured

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

make reviewable test is running as I write this. I will tick the box as soon as it finished without errors.

How has this code been tested

Testing LaunchTemplate and ReleaseVersion Updates

I created a NodeGroup for an existing EKS Cluster (1.22) where I configured

  • launchTemplate.ID
  • version 1.22
  • releaseVersion 1.22.9-20220802
    I made sure that the launchTemplate.version was added by the controller.

I created a new LaunchTemplate version and changed the NodeGroup to use

  • releaseVersion 1.22.12-20220811
  • launchTemplate.version 2
    The update was constructed and performed as expected.

I edited the NodeGroup to again use launchTemplate.version 1 and verified that the update was performed as expected.

I updated the EKS Cluster to version 1.23 and edited the NodeGroup to use:

  • version 1.23
  • releaseVersion 1.23.7-20220802
  • launchTemplate.version 2
    I verified that the update was performed as expected.

I then just changed the releaseVersion to 1.23.9-20220811 and again verified that the update was performed as expected.

Testing NodeGroupUpdateConfig

I created a NodeGroup without configuring an updateConfig and made sure that:

  • the NodeGroupUpdateConfig got late initialised with the default value false for updateConfig.force and 1 for updateConfig.maxUnavailable
  • The NodeGroupUpdateConfigStatus was populated correctly

Then I created a Deployment and an aggressive PDB (i.e. minAvailable: "100%"), created a new LaunchTemplateVersion, updated the NodeGroup to use the new Version and waited for the update to fail - which it did with PodEvictionFailure.

I then changed the LaunchTemplateVersion back and set updateConfig.force to true and verified that the update was successful.

I also made sure that I can change the value of updateConfig.maxUnavailable to 2 and that this change is reconciled to AWS and also visible in the status of the NodeGroup.

@baldorch
Copy link
Contributor Author

baldorch commented Sep 9, 2022

I would love to add a spec.forProvider.forceUpdates property - or something similiar - to the API to implement the last missing UpdateNodegroupVersion feature. Should I do this here or should I open an issue first?

@haarchri
Copy link
Member

haarchri commented Sep 9, 2022

yes please implement
applyImmediately ??

@baldorch
Copy link
Contributor Author

baldorch commented Sep 9, 2022

The AWS API calls it force, why not stick to their terminology?

EDIT: I'll get back to this on monday. I need to check what happens if an update fails due to blocking PDBS. Of course, there is the force option, but do we need to tell the controller to stop updating if the update failed? The last thing we want to do is keep a NodeGroup permanently updating. I think.

@haarchri
Copy link
Member

haarchri commented Sep 9, 2022

yes then use force - not checked the API

@adamjohnson01
Copy link
Contributor

Creating a pull request after reviewing an existing pull request that inplements the same functionality is pretty bad etiquette and sets a bad example for open source projects. I have closed my PR in favour of yours as it is better and more comprehensive but doing things in this manner is pretty discouraging to others who want to contribute, take time to do so and then are basically told their time means nothing.

@baldorch
Copy link
Contributor Author

@adamjohnson01 I am sorry that my PR made you feel that way and it was in no way meant to devalue your work. Thank you for teaching me about the etiquette. I appreciate that, since I am quite new to this process.

@adamjohnson01
Copy link
Contributor

@baldorch, no problem. Thank you for your acknowledgement and reply!

Robert Jabs added 3 commits September 14, 2022 15:02
When we are reconciling a NodeGroup and encounter an empty LaunchTemplate Version we can think of two reasons why that version is empty: It was removed or it has never been there. We cannot (easily) differentiate between the two causes. Since leaving it empty and doing nothing is unexpected behaviour that would need an explanation, especially when the version was removed from the spec, this commit will add code that makes sure that the LaunchTemplate Version is always being reconciled into the spec.
Like this, installation with an empty LaunchTemplate Version to select the default Version of that LaunchTemplate is possible and there are no ambiguities in state later on.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
When changing the releaseVersion or the version of the LaunchTemplate, the NodeGroup is no longer up to date. This Commit extends the `eks.IsNodeGroupUpToDate` method to recognize changes in these properties and adds tests for the new functionality.
Two test cases defined a releaseVersion within the `ekstypes.Nodegroup` object  but not the `manualv1alpha1.NodeGroupParameters` object. Since this difference is now important these test cases needed adjustment as well.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
When the Version, ReleaseVersion or LaunchTemplate version of a NodeGroup changes an `UpdateNodegroupVersion` request needs to be sent to AWS to trigger an update of the NodeGroup.
This commit adds a method that detects what changes have been made and constructs an `UpdateNodegroupVersionInput` object which can be used to request the inteded update.
If no changes have been made, this method will indicate that, so that it can be used to test if a VersionUpdate should be performed.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
@baldorch
Copy link
Contributor Author

After thinking about the force option and other unimplemented update configuration, I would like to move the implementation to a separate issue and pull request. This keeps things here lean and simple.

There is a NodegroupUpdateConfig AWS EKS API Object where we can specify maxUnavailable or maxUnavailablePercentage. Although force is no property of this Object, I think the "Force" Option should live there thematically.

@baldorch
Copy link
Contributor Author

The last thing we want to do is keep a NodeGroup permanently updating. I think.

After thinking a bit more about this, I came to the conclusion that keeping the node Updating is exactly what we want. That is the purpose of a reconciler: Trying again and again to move Spec to Status.

We still have no information about why or if an update failed, that is why I've created #1475.

Robert Jabs added 6 commits September 16, 2022 14:55
This commit will add the NodeGroupUpdateConfig type that allows configuration of how many nodes are unavailable during update and if an update should be forced (i.e. if PodEvictionFailures should be ignored) or not.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
if a NodeGroupUpdateConfig is specified it will now be passed on to AWS on creation

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
If a NodeGroup is created without an update config, AWS will create an update config for us. This commit will make sure that the update config is late initialized to our resource.
Furthermore, we do not want to force updates by default. This commit will make sure that if no Force property is set the default of "false" will be added to the resource.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
This commit adds functionality to detect changes in the NodeGroupUpdateConfig and to apply these changes to the external resource.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
This commit will apply the force flag to the UpdateNodegroupVersionInput if it is configured in the UpdateConfig. Like this, PodEvictionFailure during Updates can be mitigated.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
This commit will add the current state of the update config to the NodeGroup status.

Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
@baldorch
Copy link
Contributor Author

Since this was not merged an I based my work on this branch I've added the changes for #1474 and the implementation of force to this Branch/PR.

Copy link
Member

@haarchri haarchri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks we tested this function the last day in our infrastructure is working as expected - thanks for implementation ;)

@haarchri haarchri merged commit c658b5b into crossplane-contrib:master Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants