Skip to content

feat(e2e): add HTTPS_PROXY + private DNS test scenario#8470

Merged
r2k1 merged 3 commits into
mainfrom
artur/e2e-https-proxy-private-dns
May 8, 2026
Merged

feat(e2e): add HTTPS_PROXY + private DNS test scenario#8470
r2k1 merged 3 commits into
mainfrom
artur/e2e-https-proxy-private-dns

Conversation

@r2k1
Copy link
Copy Markdown
Contributor

@r2k1 r2k1 commented May 7, 2026

Summary

Add E2E test for node bootstrapping with HTTPS_PROXY configured and private DNS zone for the API server FQDN. Regression coverage for IcM 603699115 / ADO#31707996.

Changes

New test: Test_Ubuntu2204_HTTPSProxy_PrivateDNS

Validates that a node bootstraps successfully when HTTPS_PROXY is set and the API server FQDN resolves via a private DNS zone.

Proxy infrastructure (all non-isolated clusters)

  • Python-based HTTP CONNECT proxy deployed as DaemonSet on system pool (mcr.microsoft.com/cbl-mariner/base/python:3)
  • Private DNS zone for API server FQDN linked to cluster VNet
  • Both run in prepareCluster DAG, consistent across clusters

Signature refactor

  • BootstrapConfigMutator and AKSNodeConfigMutator now accept *Cluster as first parameter
  • Enables scenarios to access cluster properties (e.g., ProxyURL)

Bug fix

  • createNewAKSClusterWithRetry now retries on managed identity reconciliation NotFound errors after cluster deletion

Test Result

--- PASS: Test_Ubuntu2204_HTTPSProxy_PrivateDNS/default (251.99s)
PASS

Fixes: ADO#31707996

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an E2E regression scenario covering node bootstrapping with HTTPS_PROXY configured while the API server FQDN resolves via an Azure Private DNS zone, and wires shared proxy + private DNS infrastructure into cluster preparation. Also refactors scenario mutator signatures to accept the active *Cluster, and broadens AKS cluster creation retry logic to include a specific managed-identity reconciliation NotFound case.

Changes:

  • Add Test_Ubuntu2204_HTTPSProxy_PrivateDNS scenario and pass cluster-derived proxy URL into HTTPProxyConfig.
  • Provision a lightweight HTTP CONNECT proxy (ConfigMap + DaemonSet) and set up a private DNS zone for the API server FQDN during prepareCluster (non-network-isolated clusters).
  • Update mutator function signatures to accept *Cluster, and improve cluster-create retry logic via isRetryableClusterError.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
e2e/types.go Update mutator function types to accept *Cluster.
e2e/test_helpers.go Thread *Cluster into mutator invocation and pre-provision wrapper scenarios.
e2e/scenario_win_test.go Adapt Windows scenarios/helpers to new mutator signatures.
e2e/scenario_test.go Add Ubuntu 22.04 HTTPS proxy + private DNS scenario; update existing scenarios for new signatures.
e2e/scenario_gpu_managed_experience_test.go Update GPU scenario mutator signatures.
e2e/scenario_gpu_daemonset_test.go Update GPU DaemonSet scenario mutator signature.
e2e/scenario_cse_perf_test.go Update performance scenario mutator signatures.
e2e/kube.go Add proxy ConfigMap/DaemonSet creation + proxy URL discovery via Kubernetes API.
e2e/cluster.go Add cluster-scoped ProxyURL, run proxy + private DNS setup in prepareCluster, and expand retryable cluster-create errors.

Comment thread e2e/kube.go Outdated
pods, err := k.Typed.CoreV1().Pods("default").List(ctx, metav1.ListOptions{
LabelSelector: "app=" + proxyAppLabel,
})
if err != nil || len(pods.Items) == 0 {
Comment thread e2e/kube.go Outdated
const (
hostNetworkDebugAppLabel = "debug-mariner-tolerated"
podNetworkDebugAppLabel = "debugnonhost-mariner-tolerated"
proxyAppLabel = "e2e-tinyproxy"
@r2k1 r2k1 force-pushed the artur/e2e-https-proxy-private-dns branch from 3c5c071 to ee5e9b5 Compare May 7, 2026 23:12
Copilot AI review requested due to automatic review settings May 7, 2026 23:16
@r2k1 r2k1 force-pushed the artur/e2e-https-proxy-private-dns branch from ee5e9b5 to e06302b Compare May 7, 2026 23:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Comment thread e2e/kube.go
Comment on lines +308 to +316
// proxy is not available on network-isolated clusters
if !isNetworkIsolated {
if err := k.ensureProxyConfigMap(ctx); err != nil {
return err
}
proxyDS := daemonsetProxy(ctx)
if err := k.CreateDaemonset(ctx, proxyDS); err != nil {
return err
}
@r2k1 r2k1 force-pushed the artur/e2e-https-proxy-private-dns branch from e06302b to c75a491 Compare May 7, 2026 23:24
Copilot AI review requested due to automatic review settings May 8, 2026 01:23
@r2k1 r2k1 force-pushed the artur/e2e-https-proxy-private-dns branch from c75a491 to 38f8a94 Compare May 8, 2026 01:23
Add E2E test for node bootstrapping with HTTPProxyConfig set and
private DNS zone for the API server FQDN. Regression coverage for
IcM 603699115 / ADO#31707996.

Changes:
- Refactor BootstrapConfigMutator and AKSNodeConfigMutator to accept
  *Cluster parameter, enabling scenarios to access cluster properties
- Deploy Python-based CONNECT proxy DaemonSet on all non-isolated
  clusters using mcr.microsoft.com/cbl-mariner/base/python:3
- Create private DNS zone for API server FQDN on all non-isolated
  clusters, linked to VNet with A record
- Add Test_Ubuntu2204_HTTPProxy_PrivateDNS scenario
- Fix cluster creation retry to handle NotFound errors

Test verified: node boots, CSE completes, kubelet starts, node Ready,
test pod runs. Proxy receives CONNECT traffic from CSE outbound check.

Fixes: ADO#31707996

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@r2k1 r2k1 force-pushed the artur/e2e-https-proxy-private-dns branch from 38f8a94 to a1bebdc Compare May 8, 2026 01:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Comment thread e2e/aks_model.go Outdated
err := wait.PollUntilContextTimeout(ctx, 5*time.Second, 2*time.Minute, true, func(ctx context.Context) (bool, error) {
resp, err := config.Azure.PrivateZonesClient.Get(ctx, nodeResourceGroup, privateZoneName, nil)
if err != nil {
return false, nil
Comment thread e2e/cluster.go
Comment on lines +845 to +853
var aRecords []*armprivatedns.ARecord
for _, ip := range ips {
if parsed := net.ParseIP(ip); parsed != nil && parsed.To4() != nil {
aRecords = append(aRecords, &armprivatedns.ARecord{IPv4Address: to.Ptr(ip)})
}
}
if len(aRecords) == 0 {
return fmt.Errorf("no IPv4 addresses for %q", fqdn)
}
Comment thread e2e/cluster.go
Comment on lines +438 to +448
// that can be resolved by retrying, such as 409 Conflict (concurrent operations)
// and NotFound during managed identity reconciliation (stale references after cluster deletion).
func isRetryableClusterError(err error) bool {
var respErr *azcore.ResponseError
if !errors.As(err, &respErr) {
return false
}
if respErr.StatusCode == 409 {
return true
}
return respErr.ErrorCode == "NotFound" && strings.Contains(err.Error(), "Reconcile managed identity credential failed")
r2k1 and others added 2 commits May 8, 2026 14:34
…APIServer

createPrivateZone and createPrivateDNSLink already handle 409 conflicts
internally. The outer PollUntilContextTimeout loops had dead-code 409
checks — inner functions swallow the 409 and return either success or
a differently-typed error that never matches the outer check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously all errors were swallowed during polling, turning
non-transient failures (403, invalid names) into misleading timeouts.
Now only 404/NotFound is retried; other errors surface immediately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread e2e/cluster.go
}
return k.GetProxyURL(ctx)
}, debugDeps...)
if !isNetworkIsolated {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This surprises me. Probably due to me not quite understanding things. I thought for network isolated clusters, we would need a private DNS for the API server setup. But this code only sets up the private DNS for non-network isolated clusters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Network-isolated clusters block all outbound and use private endpoints with NSG allow-rules for specific IPs. They don't use HTTP_PROXY — the proxy pattern is for clusters that have internet access through a forward proxy. The private DNS zone here mirrors public DNS for the API server FQDN, which only matters when nodes use a system resolver + proxy combination.

@r2k1
Copy link
Copy Markdown
Contributor Author

r2k1 commented May 8, 2026

Re: timmy-wright's question about private DNS only for non-isolated clusters:

Network-isolated clusters already have their own private DNS setup via addPrivateEndpointForACR which creates a privatelink.azurecr.io zone. The private DNS zone we add here is for the API server FQDN — it mirrors public DNS resolution so that nodes with HTTP_PROXY configured can resolve the API server via the system resolver (nslookup) rather than through the proxy.

Network-isolated clusters block outbound entirely and use a different connectivity model (private endpoints + NSG rules for specific IPs), so the proxy + private DNS pattern doesn't apply to them.

@r2k1 r2k1 merged commit 098e897 into main May 8, 2026
30 of 32 checks passed
@r2k1 r2k1 deleted the artur/e2e-https-proxy-private-dns branch May 8, 2026 04:50
ganeshkumarashok pushed a commit that referenced this pull request May 12, 2026
PR #8470 (merged after this branch's last sync with main) changed
BootstrapConfigMutator from func(*datamodel.NodeBootstrappingConfiguration)
to func(*Cluster, *datamodel.NodeBootstrappingConfiguration). The new
Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG_H100_NoReboot was using
the old single-arg signature, which broke compilation after merging
main.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants