Skip to content

fix: use cloud-specific ARM endpoint for IMDS token in ORAS login#8424

Merged
cameronmeissner merged 7 commits into
mainfrom
yuewu2/ni-cloud-fix
May 8, 2026
Merged

fix: use cloud-specific ARM endpoint for IMDS token in ORAS login#8424
cameronmeissner merged 7 commits into
mainfrom
yuewu2/ni-cloud-fix

Conversation

@charleswool
Copy link
Copy Markdown
Contributor

@charleswool charleswool commented Apr 29, 2026

The oras_login_with_kubelet_identity (Linux) and Invoke-OrasLogin (Windows) functions hardcoded https://management.azure.com/ as the ARM resource endpoint in the IMDS token request URL. This causes authentication failures in sovereign clouds (e.g. Fairfax) where the correct endpoint differs.

Added get_arm_resource_endpoint / Get-ArmResourceEndpoint helpers that resolve the ARM endpoint from the cloud environment variable (TARGET_CLOUD on Linux, $TargetEnvironment on Windows):

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the ORAS managed-identity login flow on Linux and Windows to use a cloud-specific ARM resource endpoint (instead of hardcoding https://management.azure.com/), preventing auth failures in sovereign clouds.

Changes:

  • Added a Go helper (datamodel.GetArmResourceEndpoint) with tests to map cloud name → ARM resource endpoint.
  • Exposed GetArmResourceEndpoint into the template func map and injected it into Linux CSE env (ARM_RESOURCE_ENDPOINT) and Windows custom data ($ArmResourceEndpoint).
  • Updated ORAS login token acquisition URLs (Linux oras_login_with_kubelet_identity, Windows Invoke-OrasLogin) to use the injected endpoint.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
staging/cse/windows/networkisolatedclusterfunc.ps1 Uses $ArmResourceEndpoint in the IMDS token request for ORAS login.
parts/windows/kuberneteswindowssetup.ps1 Injects $ArmResourceEndpoint into Windows custom data from the Go template func map.
parts/linux/cloud-init/artifacts/cse_cmd.sh Adds ARM_RESOURCE_ENDPOINT to the Linux CSE command environment.
parts/linux/cloud-init/artifacts/cse_helpers.sh Uses ARM_RESOURCE_ENDPOINT in ORAS IMDS token request URL.
pkg/agent/baker.go Adds GetArmResourceEndpoint to the template function map (including custom cloud override).
pkg/agent/datamodel/sig_config.go / pkg/agent/datamodel/sig_config_test.go Implements and tests cloud→ARM endpoint mapping.

Comment thread pkg/agent/baker.go
Comment thread pkg/agent/datamodel/sig_config_test.go Outdated
Comment thread staging/cse/windows/networkisolatedclusterfunc.ps1 Outdated
Comment thread parts/linux/cloud-init/artifacts/cse_helpers.sh Outdated
Comment thread pkg/agent/datamodel/sig_config.go Outdated
Comment thread parts/linux/cloud-init/artifacts/cse_helpers.sh Outdated
@charleswool charleswool changed the title fix: [NI]use cloud-specific ARM endpoint for IMDS token in ORAS login fix: use cloud-specific ARM endpoint for IMDS token in ORAS login Apr 29, 2026
Copilot AI review requested due to automatic review settings April 29, 2026 03:34
@charleswool charleswool force-pushed the yuewu2/ni-cloud-fix branch from 1776e6f to 63df433 Compare April 29, 2026 03:34
Ubuntu and others added 4 commits May 4, 2026 05:23
The oras_login_with_kubelet_identity (Linux) and Invoke-OrasLogin (Windows)
functions hardcoded https://management.azure.com/ as the ARM resource endpoint
in the IMDS token request URL. This causes authentication failures in sovereign
clouds (e.g. Fairfax) where the correct endpoint differs.

Added get_arm_resource_endpoint / Get-ArmResourceEndpoint helpers that resolve
the ARM endpoint from the cloud environment variable (TARGET_CLOUD on Linux,
$TargetEnvironment on Windows):
- AzureUSGovernmentCloud -> https://management.usgovcloudapi.net/
- AzureChinaCloud        -> https://management.chinacloudapi.cn/
- USNatCloud             -> https://management.azure.eaglex.ic.gov/
- USSecCloud             -> https://management.azure.microsoft.scloud/
- default (public)       -> https://management.azure.com/
…etContainerServiceFuncMap

Agent-Logs-Url: https://github.com/Azure/AgentBaker/sessions/65f0b28e-1864-49bd-87e6-6c4e93668fb2

Co-authored-by: charleswool <65653735+charleswool@users.noreply.github.com>
Address @cameronmeissner's review comment: aks-node-controller's
getCSEEnv must also expose ARM_RESOURCE_ENDPOINT so the new sovereign-
cloud aware oras_login_with_kubelet_identity logic works under the
scriptless-NBC / aks-node-controller deployment mode as well.

- Export getArmResourceEndpoint -> GetARMResourceEndpoint in pkg/agent
  so it can be reused by aks-node-controller/parser.
- Add ARM_RESOURCE_ENDPOINT env var in aks-node-controller parser,
  delegating to GetARMResourceEndpoint and parsing
  CustomEnvJsonContent.resourceManagerEndpoint for AKS custom clouds.
- Add unit tests covering all branches and assert the env var in
  parser_test for both China and default-cloud cases.
Copilot AI review requested due to automatic review settings May 4, 2026 05:30
@charleswool charleswool force-pushed the yuewu2/ni-cloud-fix branch from ccc6244 to bbc7907 Compare May 4, 2026 05:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comment thread aks-node-controller/parser/helper.go Outdated
}
}
}
return agent.GetARMResourceEndpoint(getCloudTargetEnv(v))
Comment on lines +175 to +176
$armEndpoint = if ([string]::IsNullOrWhiteSpace($ArmResourceEndpoint)) { "https://management.azure.com/" } else { $ArmResourceEndpoint }
$accessUrl = "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=${armEndpoint}&client_id=$ClientID"
Comment on lines +1285 to +1286
local arm_endpoint="${ARM_RESOURCE_ENDPOINT:-https://management.azure.com/}"
access_url="http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=${arm_endpoint}&client_id=$client_id"
Comment thread pkg/agent/baker_test.go Outdated
Comment on lines +1553 to +1557
It("returns USNat endpoint for USNatCloud", func() {
Expect(GetARMResourceEndpoint(datamodel.USNatCloud)).To(Equal("https://management.azure.eaglex.ic.gov/"))
})
It("returns USSec endpoint for USSecCloud", func() {
Expect(GetARMResourceEndpoint(datamodel.USSecCloud)).To(Equal("https://management.azure.microsoft.scloud/"))
Comment on lines +1188 to +1249
func Test_getArmResourceEndpoint(t *testing.T) {
tests := []struct {
name string
v *aksnodeconfigv1.Configuration
want string
}{
{
name: "Nil config returns public endpoint",
v: &aksnodeconfigv1.Configuration{},
want: "https://management.azure.com/",
},
{
name: "China cloud by location",
v: &aksnodeconfigv1.Configuration{
ClusterConfig: &aksnodeconfigv1.ClusterConfig{Location: "chinaeast2"},
},
want: "https://management.chinacloudapi.cn/",
},
{
name: "US Gov by location",
v: &aksnodeconfigv1.Configuration{
ClusterConfig: &aksnodeconfigv1.ClusterConfig{Location: "usgovvirginia"},
},
want: "https://management.usgovcloudapi.net/",
},
{
name: "German cloud by location",
v: &aksnodeconfigv1.Configuration{
ClusterConfig: &aksnodeconfigv1.ClusterConfig{Location: "germanynortheast"},
},
want: "https://management.microsoftazure.de/",
},
{
name: "AKS custom cloud with resourceManagerEndpoint in CustomEnvJsonContent",
v: &aksnodeconfigv1.Configuration{
CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{
CustomCloudEnvName: helpers.AksCustomCloudName,
CustomEnvJsonContent: `{"resourceManagerEndpoint":"https://management.azure.microsoft.fakecustomcloud/"}`,
},
},
want: "https://management.azure.microsoft.fakecustomcloud/",
},
{
name: "AKS custom cloud with empty CustomEnvJsonContent falls back to public",
v: &aksnodeconfigv1.Configuration{
CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{
CustomCloudEnvName: helpers.AksCustomCloudName,
},
},
want: "https://management.azure.com/",
},
{
name: "AKS custom cloud with malformed JSON falls back to public",
v: &aksnodeconfigv1.Configuration{
CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{
CustomCloudEnvName: helpers.AksCustomCloudName,
CustomEnvJsonContent: `{not-json`,
},
},
want: "https://management.azure.com/",
},
}
Copy link
Copy Markdown
Contributor

@cameronmeissner cameronmeissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would shouldn't hardcode resource manager endpoints

Comment thread pkg/agent/baker.go Outdated
…ove hardcoded sovereign endpoints

Per @cameronmeissner's review:
- Hardcoding USNat/USSec ARM endpoints (eaglex.ic.gov, microsoft.scloud)
  in this OSS repo is not allowed.
- cs.Properties.CustomCloudEnv.ResourceManagerEndpoint is always populated
  by AKS RP (typeconversion.go), so the cloud-name-based fallback mapping
  is unnecessary.

- Drop GetARMResourceEndpoint(cloudName) helper from pkg/agent/baker.go.
- GetArmResourceEndpoint template func now returns
  cs.Properties.CustomCloudEnv.ResourceManagerEndpoint directly.
- aks-node-controller getArmResourceEndpoint now sources the value solely
  from CustomEnvJsonContent.resourceManagerEndpoint (RP-populated). When
  absent, returns empty; cse_helpers.sh fallback to public ARM is unchanged.
- Update unit tests accordingly.
Comment thread pkg/agent/baker.go
Per review feedback: CustomCloudEnv.ResourceManagerEndpoint is only
populated by RP for AKS custom clouds (Azure Stack), not for the public
sovereign clouds Fairfax (USGov) and Mooncake (China). Map those two
explicitly by cloud name; their endpoints are public knowledge.

Public Azure cloud still falls through to empty so that scripts keep
defaulting to https://management.azure.com/.
Copilot AI review requested due to automatic review settings May 8, 2026 02:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

}
var env struct {
ResourceManagerEndpoint string `json:"resourceManagerEndpoint"`
}
Comment on lines +1222 to +1226
v: &aksnodeconfigv1.Configuration{
CustomCloudConfig: &aksnodeconfigv1.CustomCloudConfig{
CustomCloudEnvName: helpers.AksCustomCloudName,
CustomEnvJsonContent: `{"resourceManagerEndpoint":"https://management.azure.microsoft.fakecustomcloud/"}`,
},
want string
}{
{
name: "Nil config returns empty (public cloud default)",
Comment thread aks-node-controller/parser/helper.go
Comment thread pkg/agent/baker.go
Per review nit: explicitly return https://management.azure.com/ as the
final fallback in both pkg/agent/baker.go GetArmResourceEndpoint and
aks-node-controller/parser/helper.go getArmResourceEndpoint, instead of
relying on script-side defaults.
@cameronmeissner cameronmeissner merged commit 6bb3dc3 into main May 8, 2026
40 of 42 checks passed
@cameronmeissner cameronmeissner deleted the yuewu2/ni-cloud-fix branch May 8, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants