Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: add force parameter to the upgrade command #525

Merged

Conversation

serbrech
Copy link
Member

@serbrech serbrech commented Feb 15, 2019

Is this a request for help?:
No


Is this an ISSUE or FEATURE REQUEST? (choose one):
Feature Proposal


description:
We want to be able to force the upgrade process on a cluster without changing the kubernetes version. This will ensure that all components of the cluster are healthy and using the latest validated version for a given kubernetes version.

proposed cli:
For an existing kubernetes 1.12.5 cluster, we want the following command to go through each nodes of the cluster and "refresh" them if necessary (update to latest valid version).

> aks-engine upgrade --force --location westus --resource-group rg --subsctiption $sub --upgrade-version 1.12.5

Open questions:

  • What should the behavior of the --force flag be for a usual next k8s version upgrade?
    Same behavior

  • What happens for same-version upgrade?
    A: Currently, when a node is already on the requested version, the upgrade either fail validation, or the node upgrade ends up being a no-op. --force runs the upgrade process on these nodes as well

  • Should the --force flag allow other arbitrary upgrade (not valid per our get-versions matrix)?
    A : yes, using --force will allow any arbitrary version jump, or even downgrade. You're on your own.

Related
implements #522
Azure/acs-engine#3810

@serbrech serbrech self-assigned this Feb 15, 2019
@serbrech serbrech changed the title feat: add force parameter to the upgrade command [WIP] feat: add force parameter to the upgrade command Feb 15, 2019
@codecov
Copy link

codecov bot commented Feb 15, 2019

Codecov Report

Merging #525 into master will increase coverage by 0.58%.
The diff coverage is 68%.

@@            Coverage Diff             @@
##           master     #525      +/-   ##
==========================================
+ Coverage   56.69%   57.28%   +0.58%     
==========================================
  Files          91       91              
  Lines       13905    13938      +33     
==========================================
+ Hits         7884     7984     +100     
+ Misses       5355     5284      -71     
- Partials      666      670       +4

@jackfrancis jackfrancis added this to In progress in backlog Feb 19, 2019
@acs-bot acs-bot added size/L and removed size/XS labels Feb 19, 2019
@jackfrancis
Copy link
Member

This is going in the right direction. The business logic implementation should be focussed on doing actual operations, and should be decomposed in such a way so as to be general and composable. Validation should happen at a higher layer, and should inform particular implementation composition patterns. (As opposed to the validation being statically coupled to the business logic itself.)

cmd/upgrade.go Outdated Show resolved Hide resolved
@serbrech serbrech force-pushed the feature/upgrade-same-version branch 2 times, most recently from 5ce5995 to 9ea4d0f Compare February 25, 2019 19:35
@CecileRobertMichon
Copy link
Contributor

Did an online review with Stephane. To summarize:

  1. We need to make sure that the "retry failed upgrade" functionality still works without the current version in the list of available upgrades in cmd validation.

  2. Separating the PR in smaller chunks to permit easier review and faster revert if necessary might be a good idea but will leave it up to Stephane to decide if the trade off is worth the additional work to separate.

  3. Removing the upgradable validation in upgradeCluster reduces the strength of validation:

  • the cmd validation only checks if the upgrade is valid based on the version string in the apimodel which does not necessarily reflect the actual state of things and can be easily modified by the user
  • if an upgrade fails, we don't have in place validation to check that the user retries upgrade to the same version which means that the following could happen:
    User starts upgrade of 1.9.5 cluster to 1.10.7. Upgrade fails leaving some nodes in 1.10.7 and some in 1.9.5. User upgrades to 1.9.6: this will pass validation because the apimodel will still have version 1.9.5 (and 1.9.5 -> 1.9.6 is a valid upgrade) but will cause a downgrade of two of the nodes. Per-node validation would have previously caught this invalid downgrade operation but removing per-node validation makes us vulnerable in such scenarios.

Copy link
Contributor

@tariq1890 tariq1890 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one question about unit tests being maintainable with version deprecation in the future

@serbrech
Copy link
Member Author

serbrech commented Feb 28, 2019 via email

@CecileRobertMichon
Copy link
Contributor

why not use common.GetLatestPatchVersion to dynamically set the version?

@serbrech
Copy link
Member Author

why not use common.GetLatestPatchVersion to dynamically set the version?

I think there is an easier way :
common.AllKubernetesSupportedVersions is public, so I can simply set it to a minimal map that makes sense for the test case. :)

@serbrech serbrech force-pushed the feature/upgrade-same-version branch from c48a758 to b9c1406 Compare March 1, 2019 00:30
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot
Copy link

acs-bot commented Mar 1, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, serbrech

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@CecileRobertMichon
Copy link
Contributor

merging manually because windows 1.13 test failure was a flake.

@CecileRobertMichon CecileRobertMichon merged commit 60ad562 into Azure:master Mar 1, 2019
backlog automation moved this from In progress to Done Mar 1, 2019
juhacket pushed a commit to juhacket/aks-engine that referenced this pull request Mar 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
backlog
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants