Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: enable system-assigned identity by default #3856

Merged
merged 12 commits into from
Oct 13, 2020

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Sep 22, 2020

Reason for Change:

This PR changes the defaults configuration assignment implementation to set useManagedIdentity to true by default, effectively enabling system-assigned identity for all node VMs.

Official Azure docs here:

https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview#managed-identity-types

In order to do so reliably, the API version for system-assigned identity has been updated to 2018-09-01-preview so that the roleAssignment resource doesn't reference a VM-derived service principal before it has replicated (that API version provides a solution for that race condition).

Note: this default change was not applied for Azure Stack, which doesn't support managed identity at this time.

Also modified the default --auth-method for all CLI gestures to cli, from client_secret, to further deprioritize the suggestion to rely upon manually curated service principals as part of the AKS Engine solution set. Using that auth method by default w/ system-assigned identity by default yields a much simpler working, default CLI expression:

$ aks-engine deploy --resource-group $RG --location $LOCATION --api-model $API_MODEL_JSON_PATH

Also updated doc examples to reflect this simpler default flow.

Issue Fixed:

Fixes #3885

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Sep 22, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@5e6a1f1). Click here to learn what that means.
The diff coverage is 97.29%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #3856   +/-   ##
=========================================
  Coverage          ?   72.99%           
=========================================
  Files             ?      149           
  Lines             ?    23201           
  Branches          ?        0           
=========================================
  Hits              ?    16935           
  Misses            ?     5156           
  Partials          ?     1110           
Impacted Files Coverage Δ
pkg/api/vlabs/types.go 71.81% <ø> (ø)
pkg/engine/armtype.go 85.00% <ø> (ø)
pkg/engine/params_k8s.go 85.23% <50.00%> (ø)
cmd/deploy.go 60.59% <100.00%> (ø)
cmd/generate.go 49.18% <100.00%> (ø)
cmd/root.go 64.60% <100.00%> (ø)
pkg/api/defaults.go 92.91% <100.00%> (ø)
pkg/api/types.go 93.83% <100.00%> (ø)
pkg/api/vlabs/validate.go 79.64% <100.00%> (ø)
pkg/engine/armresources.go 83.47% <100.00%> (ø)
... and 157 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e6a1f1...2ca66f4. Read the comment docs.

!cs.Properties.IsCustomCloudProfile() &&
!cs.Properties.MasterProfile.IsVirtualMachineScaleSets() &&
o.KubernetesConfig.UseManagedIdentity == nil {
o.KubernetesConfig.UseManagedIdentity = to.BoolPtr(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also enable it during upgrade? since we weren't setting a default before, most of my generated apimodels have no set value for useManagedIdentity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, addressed!

@@ -177,6 +177,13 @@ func (cs *ContainerService) setOrchestratorDefaults(isUpgrade, isScale bool) {
}
}

if !cs.Properties.IsHostedMasterProfile() &&
!cs.Properties.IsCustomCloudProfile() &&
!cs.Properties.MasterProfile.IsVirtualMachineScaleSets() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why disable for VMSS masters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to vlabs/validate.go VMSS master VMs only support user-assigned identity (not system-assigned)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually ran some real world cluster buildouts to see what happens w/ VMSS control plane + system-assigned identity, and sure enough on the first test I am unable to create a Load Balancer:

Events:
  Type     Reason                  Age                  From                  Message
  ----     ------                  ----                 ----                  -------
  Normal   EnsuringLoadBalancer    2m25s (x8 over 12m)  service-controller    Ensuring load balancer
  Warning  ListLoadBalancers       2m25s (x8 over 12m)  azure-cloud-provider  Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/aa3d3369-e814-4495-899d-d31e8d7d09ce/resourceGroups/kubernetes-westus2-95074/providers/Microsoft.Network/loadBalancers?api-version=2019-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}
  Warning  SyncLoadBalancerFailed  2m25s (x8 over 12m)  service-controller    Error syncing load balancer: failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 400, RawError: azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/aa3d3369-e814-4495-899d-d31e8d7d09ce/resourceGroups/kubernetes-westus2-95074/providers/Microsoft.Network/loadBalancers?api-version=2019-06-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}

@ritazh @sozercan were you involved in implementing VMSS masters? Do you recall if this is one of the expected symptoms that dictates that system-assigned identity is not a working solution?

@jackfrancis
Copy link
Member Author

jackfrancis commented Sep 25, 2020

Hold while we address race conditions in the template spec

  • Addressed!

@@ -177,6 +177,14 @@ func (cs *ContainerService) setOrchestratorDefaults(isUpgrade, isScale bool) {
}
}

if !isUpgrade && !isScale &&
!cs.Properties.IsHostedMasterProfile() &&
!cs.Properties.IsCustomCloudProfile() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for taking custom clouds into account.

@jackfrancis jackfrancis force-pushed the system-assigned-identity-default branch from c0b127a to f9f0ce6 Compare October 2, 2020 00:32
@acs-bot acs-bot added size/XXL and removed size/L labels Oct 2, 2020
@@ -146,6 +146,11 @@ func (authArgs *authArgs) validateAuthArgs() error {
return errors.New("--auth-method is a required parameter")
}

// Back-compat to accommodate existing client usage patterns that assume that "client-secret" is the default
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jadarsie Do you have any concerns here for Azure Stack? Does the az CLI auth model work for Azure Stack? If not, are you comfortable with this back-compat solution that "re-sets" the default auth method to client_secret if the service principal ID and password are included in the command line arguments?

The goal for us is to default to CLI as the auth model because for most users it is easier (don't have to generate/maintain service principals, easier command statements). If there's a reason that a local az context isn't possible, then this back-compat solution should work for pre-existing scripts that pass in the id and pass but haven't ever bothered to set the --auth-method to client_secret explicitly, because it's always been the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -270,13 +270,13 @@ func autofillApimodel(dc *deployCmd) error {
if dc.dnsPrefix == "" {
return errors.New("apimodel: missing masterProfile.dnsPrefix and --dns-prefix was not specified")
}
log.Warnf("apimodel: missing masterProfile.dnsPrefix will use %q", dc.dnsPrefix)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a warning, it's literally a CLI feature

@@ -74,108 +74,48 @@ Note, we have launched a browser for you to login. For old experience with devic
You have logged in. Now let us find all the subscriptions to which you have access...
```

Next, we'll create a resource group. A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity. In this case, we want to deploy, manage and maintain the whole Kubernetes cluster as a single entity.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These steps are actually unnecessary, so are not in the spirit of "quick start" for first time users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I have wondered before why we weren't letting aks-engine deploy create the RG for us...

@@ -41,10 +41,6 @@ This is the AAD Pod Identity add-on. Add this add-on to your json file as shown
}
]
}
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning up these placeholder objects as we're no longer emphasizing the service principal solution

@@ -998,44 +994,6 @@ func (a *Properties) validateServicePrincipalProfile() error {
return nil
}

func (a *Properties) validateManagedIdentity() error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire validation logic is to ensure you're using k8s >= 1.12.0, which is not necessary.

@jackfrancis jackfrancis force-pushed the system-assigned-identity-default branch 2 times, most recently from a930693 to 8170a5f Compare October 9, 2020 16:41
@jackfrancis
Copy link
Member Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jackfrancis jackfrancis force-pushed the system-assigned-identity-default branch from 08c37d6 to a92b91d Compare October 12, 2020 22:26
@@ -74,108 +74,48 @@ Note, we have launched a browser for you to login. For old experience with devic
You have logged in. Now let us find all the subscriptions to which you have access...
```

Next, we'll create a resource group. A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity. In this case, we want to deploy, manage and maintain the whole Kubernetes cluster as a single entity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I have wondered before why we weren't letting aks-engine deploy create the RG for us...

docs/tutorials/quickstart.md Outdated Show resolved Hide resolved
test/e2e/azure/cli.go Outdated Show resolved Hide resolved
jackfrancis and others added 3 commits October 13, 2020 09:59
spelling correction

Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot
Copy link

acs-bot commented Oct 13, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 0b6b43b into Azure:master Oct 13, 2020
@jackfrancis jackfrancis deleted the system-assigned-identity-default branch October 13, 2020 19:18
penggu pushed a commit to penggu/aks-engine that referenced this pull request Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update system-assigned identity to use 2018-09-01-preview
5 participants