fix: use cloud-specific ARM endpoint for IMDS token in ORAS login#8424
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the ORAS managed-identity login flow on Linux and Windows to use a cloud-specific ARM resource endpoint (instead of hardcoding https://management.azure.com/), preventing auth failures in sovereign clouds.
Changes:
- Added a Go helper (
datamodel.GetArmResourceEndpoint) with tests to map cloud name → ARM resource endpoint. - Exposed
GetArmResourceEndpointinto the template func map and injected it into Linux CSE env (ARM_RESOURCE_ENDPOINT) and Windows custom data ($ArmResourceEndpoint). - Updated ORAS login token acquisition URLs (Linux
oras_login_with_kubelet_identity, WindowsInvoke-OrasLogin) to use the injected endpoint.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
staging/cse/windows/networkisolatedclusterfunc.ps1 |
Uses $ArmResourceEndpoint in the IMDS token request for ORAS login. |
parts/windows/kuberneteswindowssetup.ps1 |
Injects $ArmResourceEndpoint into Windows custom data from the Go template func map. |
parts/linux/cloud-init/artifacts/cse_cmd.sh |
Adds ARM_RESOURCE_ENDPOINT to the Linux CSE command environment. |
parts/linux/cloud-init/artifacts/cse_helpers.sh |
Uses ARM_RESOURCE_ENDPOINT in ORAS IMDS token request URL. |
pkg/agent/baker.go |
Adds GetArmResourceEndpoint to the template function map (including custom cloud override). |
pkg/agent/datamodel/sig_config.go / pkg/agent/datamodel/sig_config_test.go |
Implements and tests cloud→ARM endpoint mapping. |
9692840 to
1776e6f
Compare
fseldow
reviewed
Apr 29, 2026
fseldow
reviewed
Apr 29, 2026
1776e6f to
63df433
Compare
d95baf3 to
ccc6244
Compare
The oras_login_with_kubelet_identity (Linux) and Invoke-OrasLogin (Windows) functions hardcoded https://management.azure.com/ as the ARM resource endpoint in the IMDS token request URL. This causes authentication failures in sovereign clouds (e.g. Fairfax) where the correct endpoint differs. Added get_arm_resource_endpoint / Get-ArmResourceEndpoint helpers that resolve the ARM endpoint from the cloud environment variable (TARGET_CLOUD on Linux, $TargetEnvironment on Windows): - AzureUSGovernmentCloud -> https://management.usgovcloudapi.net/ - AzureChinaCloud -> https://management.chinacloudapi.cn/ - USNatCloud -> https://management.azure.eaglex.ic.gov/ - USSecCloud -> https://management.azure.microsoft.scloud/ - default (public) -> https://management.azure.com/
…oints Agent-Logs-Url: https://github.com/Azure/AgentBaker/sessions/d463ecda-08d8-4e26-8616-e577a3e40391 Co-authored-by: charleswool <65653735+charleswool@users.noreply.github.com>
…etContainerServiceFuncMap Agent-Logs-Url: https://github.com/Azure/AgentBaker/sessions/65f0b28e-1864-49bd-87e6-6c4e93668fb2 Co-authored-by: charleswool <65653735+charleswool@users.noreply.github.com>
Address @cameronmeissner's review comment: aks-node-controller's getCSEEnv must also expose ARM_RESOURCE_ENDPOINT so the new sovereign- cloud aware oras_login_with_kubelet_identity logic works under the scriptless-NBC / aks-node-controller deployment mode as well. - Export getArmResourceEndpoint -> GetARMResourceEndpoint in pkg/agent so it can be reused by aks-node-controller/parser. - Add ARM_RESOURCE_ENDPOINT env var in aks-node-controller parser, delegating to GetARMResourceEndpoint and parsing CustomEnvJsonContent.resourceManagerEndpoint for AKS custom clouds. - Add unit tests covering all branches and assert the env var in parser_test for both China and default-cloud cases.
ccc6244 to
bbc7907
Compare
| } | ||
| } | ||
| } | ||
| return agent.GetARMResourceEndpoint(getCloudTargetEnv(v)) |
Comment on lines
+175
to
+176
| $armEndpoint = if ([string]::IsNullOrWhiteSpace($ArmResourceEndpoint)) { "https://management.azure.com/" } else { $ArmResourceEndpoint } | ||
| $accessUrl = "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=${armEndpoint}&client_id=$ClientID" |
Comment on lines
+1285
to
+1286
| local arm_endpoint="${ARM_RESOURCE_ENDPOINT:-https://management.azure.com/}" | ||
| access_url="http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=${arm_endpoint}&client_id=$client_id" |
Comment on lines
+1553
to
+1557
| It("returns USNat endpoint for USNatCloud", func() { | ||
| Expect(GetARMResourceEndpoint(datamodel.USNatCloud)).To(Equal("https://management.azure.eaglex.ic.gov/")) | ||
| }) | ||
| It("returns USSec endpoint for USSecCloud", func() { | ||
| Expect(GetARMResourceEndpoint(datamodel.USSecCloud)).To(Equal("https://management.azure.microsoft.scloud/")) |
Comment on lines
+1188
to
+1249
| func Test_getArmResourceEndpoint(t *testing.T) { | ||
| tests := []struct { | ||
| name string | ||
| v *aksnodeconfigv1.Configuration | ||
| want string | ||
| }{ | ||
| { | ||
| name: "Nil config returns public endpoint", | ||
| v: &aksnodeconfigv1.Configuration{}, | ||
| want: "https://management.azure.com/", | ||
| }, | ||
| { | ||
| name: "China cloud by location", | ||
| v: &aksnodeconfigv1.Configuration{ | ||
| ClusterConfig: &aksnodeconfigv1.ClusterConfig{Location: "chinaeast2"}, | ||
| }, | ||
| want: "https://management.chinacloudapi.cn/", | ||
| }, | ||
| { | ||
| name: "US Gov by location", | ||
| v: &aksnodeconfigv1.Configuration{ | ||
| ClusterConfig: &aksnodeconfigv1.ClusterConfig{Location: "usgovvirginia"}, | ||
| }, | ||
| want: "https://management.usgovcloudapi.net/", | ||
| }, | ||
| { | ||
| name: "German cloud by location", | ||
| v: &aksnodeconfigv1.Configuration{ | ||
| ClusterConfig: &aksnodeconfigv1.ClusterConfig{Location: "germanynortheast"}, | ||
| }, | ||
| want: "https://management.microsoftazure.de/", | ||
| }, | ||
| { | ||
| name: "AKS custom cloud with resourceManagerEndpoint in CustomEnvJsonContent", | ||
| v: &aksnodeconfigv1.Configuration{ | ||
| CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{ | ||
| CustomCloudEnvName: helpers.AksCustomCloudName, | ||
| CustomEnvJsonContent: `{"resourceManagerEndpoint":"https://management.azure.microsoft.fakecustomcloud/"}`, | ||
| }, | ||
| }, | ||
| want: "https://management.azure.microsoft.fakecustomcloud/", | ||
| }, | ||
| { | ||
| name: "AKS custom cloud with empty CustomEnvJsonContent falls back to public", | ||
| v: &aksnodeconfigv1.Configuration{ | ||
| CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{ | ||
| CustomCloudEnvName: helpers.AksCustomCloudName, | ||
| }, | ||
| }, | ||
| want: "https://management.azure.com/", | ||
| }, | ||
| { | ||
| name: "AKS custom cloud with malformed JSON falls back to public", | ||
| v: &aksnodeconfigv1.Configuration{ | ||
| CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{ | ||
| CustomCloudEnvName: helpers.AksCustomCloudName, | ||
| CustomEnvJsonContent: `{not-json`, | ||
| }, | ||
| }, | ||
| want: "https://management.azure.com/", | ||
| }, | ||
| } |
cameronmeissner
requested changes
May 7, 2026
…ove hardcoded sovereign endpoints Per @cameronmeissner's review: - Hardcoding USNat/USSec ARM endpoints (eaglex.ic.gov, microsoft.scloud) in this OSS repo is not allowed. - cs.Properties.CustomCloudEnv.ResourceManagerEndpoint is always populated by AKS RP (typeconversion.go), so the cloud-name-based fallback mapping is unnecessary. - Drop GetARMResourceEndpoint(cloudName) helper from pkg/agent/baker.go. - GetArmResourceEndpoint template func now returns cs.Properties.CustomCloudEnv.ResourceManagerEndpoint directly. - aks-node-controller getArmResourceEndpoint now sources the value solely from CustomEnvJsonContent.resourceManagerEndpoint (RP-populated). When absent, returns empty; cse_helpers.sh fallback to public ARM is unchanged. - Update unit tests accordingly.
Per review feedback: CustomCloudEnv.ResourceManagerEndpoint is only populated by RP for AKS custom clouds (Azure Stack), not for the public sovereign clouds Fairfax (USGov) and Mooncake (China). Map those two explicitly by cloud name; their endpoints are public knowledge. Public Azure cloud still falls through to empty so that scripts keep defaulting to https://management.azure.com/.
| } | ||
| var env struct { | ||
| ResourceManagerEndpoint string `json:"resourceManagerEndpoint"` | ||
| } |
Comment on lines
+1222
to
+1226
| v: &aksnodeconfigv1.Configuration{ | ||
| CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{ | ||
| CustomCloudEnvName: helpers.AksCustomCloudName, | ||
| CustomEnvJsonContent: `{"resourceManagerEndpoint":"https://management.azure.microsoft.fakecustomcloud/"}`, | ||
| }, |
| want string | ||
| }{ | ||
| { | ||
| name: "Nil config returns empty (public cloud default)", |
cameronmeissner
approved these changes
May 8, 2026
Per review nit: explicitly return https://management.azure.com/ as the final fallback in both pkg/agent/baker.go GetArmResourceEndpoint and aks-node-controller/parser/helper.go getArmResourceEndpoint, instead of relying on script-side defaults.
fseldow
approved these changes
May 8, 2026
cameronmeissner
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The oras_login_with_kubelet_identity (Linux) and Invoke-OrasLogin (Windows) functions hardcoded https://management.azure.com/ as the ARM resource endpoint in the IMDS token request URL. This causes authentication failures in sovereign clouds (e.g. Fairfax) where the correct endpoint differs.
Added get_arm_resource_endpoint / Get-ArmResourceEndpoint helpers that resolve the ARM endpoint from the cloud environment variable (TARGET_CLOUD on Linux, $TargetEnvironment on Windows):
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #