fix: VM and VMSS do not have user assigned identity in its dependsOn #2152
Conversation
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
@xuto2 FYI |
Codecov Report
@@ Coverage Diff @@
## master #2152 +/- ##
==========================================
- Coverage 72.47% 69.43% -3.04%
==========================================
Files 140 144 +4
Lines 25589 32762 +7173
==========================================
+ Hits 18545 22748 +4203
- Misses 5976 8689 +2713
- Partials 1068 1325 +257 |
@@ -34,6 +34,9 @@ func CreateMasterVM(cs *api.ContainerService) VirtualMachineARM { | |||
if isStorageAccount { | |||
dependencies = append(dependencies, "[variables('masterStorageAccountName')]") | |||
} | |||
if userAssignedIDEnabled { | |||
dependencies = append(dependencies, "[concat('Microsoft.ManagedIdentity/userAssignedIdentities/', variables('userAssignedID'))]") | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we filter the user assigned identity resource and dependency for scalingup/upgrading template? There is some normalization method in transform.go that filter those infra resources/dependencies like nsg/vnet/rt, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not familiar with that. What's the purpose of filtering resources and dependencies? To prevent the redeployment swiping existing configuration? If so, for user assigned identity, that will be ok. Even after redeployment, the identity itself, all role assignment on it and all association relationship will keep unchanged; and keep identity in VM's dependency is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the identity have the same principal ID after being redeployed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally when we scale/upgrade a cluster, we should only deploy new VM/VMSS without redeploying the cluster infrastructures, such as VNET/RouteTable/NSG and this UserAssignedIdentity. The consequence of touching these infrastructure is nondeterministic and is tied to how these infra are built/maintained.
If I were you I would filter them out(Look for these below methods in transform.go)
NormalizeMasterResourcesForScaling
NormalizeForK8sVMASScalingUp
NormalizeResourcesForK8sMasterUpgrade
NormalizeResourcesForK8sAgentUpgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think user assigned identity is different from VNET/RouteTable/NSG:
As you said "the consequence of touching these infrastructure is nondeterministic and is tied to how these infra are built/maintained", the biggest difference is it is guaranteed by identity team that a redeployment on a user assigned identity will not change current user assigned identity -- it's a deterministic behavior. And in current AKS-Engine's implementation, the definition of user assigned identity and role assignment on it(at the scope of cluster resource group, that is MC_ resource group in AKS) are not removed from the template when upgrading and scaling, as far as I can see, it brings following benefit:
- When the identity or the role assignment on it is deleted by mistake, a redeployment can bring them back, that is similar as a "reconciliation". And we don't need to worry about a redeployment on user assigned identity will wipe out anything so a redeployment will never harm.
The main purpose of this PR is to add the user assigned identity to VM's dependsOn
, I think we may want to keep current AKS-Engine's transformer implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @xuto2
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: norshtein, xuto2 If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
Is this stale? |
@jackfrancis not stale. Any thing I need to do to get this PR merged into master branch? |
e1e1e27
to
185dcf4
Compare
New changes are detected. LGTM label has been removed. |
@norshtein the reason it wasn't merged is the E2E test failures. I've rebased against master, let's run E2E tests again. |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
just glancing through the PRs on this repo, it's green now, is there anything else blocking it? |
I'm running a back-compat test to ensure this change works w/ |
185dcf4
to
166ea6c
Compare
/azp run pr-e2e
@jackfrancis were you able to run this and get any good signal? |
No pipelines are associated with this pull request. |
We've done some work in this surface area recently, so I'd prefer we start any net new changes from scratch at this point. |
Reason for Change:
Background is described in #2082 .
Issue Fixed:
Fixes #2082
Requirements:
Notes: