-
Notifications
You must be signed in to change notification settings - Fork 260
dualstack Overlay ACN pipeline #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| -nodepoolSelector="yournodepoolname" | ||
| To run the test use one of the following commands: | ||
| go test -count=1 test/integration/datapath/datapath_linux_test.go -timeout 3m -tags connection -run ^TestDatapathLinux$ -tags=connection,integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at make test-integration, it uses -mod=readonly, which ensures that it doesnt read the vendor directory. Would that be more beneficial here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be no vendor directory
| const ( | ||
| LinuxDeployIPV4 = "../manifests/datapath/linux-deployment.yaml" | ||
| LinuxDeployIPv6 = "../manifests/datapath/linux-deployment-ipv6.yaml" | ||
| podLabelKey = "app" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This podLabelKey field is re-defined here, so it is throwing an error using local debugging:

The reason is, the datapath_win_test.go file, is not named datapath_windows_test.go
Go knows which OS you're using, and if you name your files *_linux_* or *_windows_*, it will ignore them if that's not your OS
*_win_* however, is not recognized by Go for windows
So can you rename the datapath_win_test.go file to datapath_windows_test.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today I learned! Does GO know based off of what GOOS is set as? If so, that would avoid the overwriting from occurring if the filename change happens.
Windows clusters can be run on a linux VM. Wouldn't that break this setup? If so, perhaps they shouldn't be under the same build tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GO know based off of what GOOS is set as
yes
*_windows.go is equivalent to adding a build tag //go:build windows in the file
https://pkg.go.dev/go/build
| app: goldpinger | ||
| type: goldpinger-host | ||
| spec: | ||
| tolerations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trying to make goldpinger pods on my winodws node pool (don't want that, since it's for linux only, should target linux nodes only)
Can you make it so the goldpinger pods only get created on linux nodes, by using taints/tolerations?
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see you have this flag in your tests, which is good:
-nodepoolSelector="yournodepoolname"
Keep that, but also I think you should make sure, there is no way, that this goldpinger daemonset can land on windows VMs, since goldpinger does not support windows
| } | ||
|
|
||
| t.Run("all linux pods can ping each other", func(t *testing.T) { | ||
| clusterCheckCtx, cancel := context.WithTimeout(ctx, 3*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you get another context here? No other test has its context reassigned. Is this to leverage goldpinger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's specifically used for goldpinger to get clusterState
|
|
||
| // get node allocated IPs and check whether it includes ipv4 and ipv6 address | ||
| // node status addresses object will return three objects; two of them are ip addresses object(one is ipv4 and one is ipv6) | ||
| if len(nodes.Items[index].Status.Addresses) < dualstackNodeStatusAddr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should explicitly check for the number of type InternalIP. There are multiple types of addresses that can be applied to a node and you are capturing all of them with this. https://github.com/kubernetes/api/blob/v0.27.3/core/v1/types.go#L5468-L5518
|
|
||
| // return podLabelSelector and nodeLabelSelector | ||
| func createLabelSelectors() (string, string) { | ||
| return fmt.Sprintf("%s=%s", podLabelKey, *podPrefix), fmt.Sprintf("%s=%s", nodepoolKey, *nodepoolSelector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpayne3506 @paulyufan2 i guess this function can be extracted as I see John is also using the same thing in his windows tests.
| require.NoError(t, err, "could not get k8s node list: %v", err) | ||
| } | ||
|
|
||
| createPodFlag := !(apierrors.IsAlreadyExists(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is createPodFlag is using alreadyexist err of list the nodes ? I don't think nodelist will give an error of type already exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should be checking for namespace creation(create if not exists) as the podnamespace is a flag
|
|
||
| dualstack-windows-up: ## Brings up windows nodes on dualstack overlay cluster | ||
| $(AZCLI) aks nodepool add -g $(GROUP) -n npwin \ | ||
| --cluster-name $(CLUSTER) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation is not aligned with the other attributes.
| if *testProfile == LinuxDeployIPV4 { | ||
| daemonset, err = k8sutils.MustParseDaemonSet(gpDaemonset) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| } else { | ||
| daemonset, err = k8sutils.MustParseDaemonSet(gpDaemonsetIPv6) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This can be replaced with(easier to scale if we need to add more test profiles.)
var gpDaemonset= map[string]string{
"LinuxDeployIPV4": "filename_linux",
"DualStack" : "filename"
}
...
daemonset, err = k8sutils.MustParseDaemonSet(gpDaemonset[*testProfile])
if err != nil {
t.Fatal(err)
}
...
| podsClient := clientset.CoreV1().Pods(*podNamespace) | ||
|
|
||
| checkPodIPsFn := func() error { | ||
| podList, err := podsClient.List(ctx, metav1.ListOptions{LabelSelector: "app=goldpinger"}) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if len(podList.Items) == 0 { | ||
| return errors.New("no pods scheduled") | ||
| } | ||
|
|
||
| for _, pod := range podList.Items { | ||
| if pod.Status.Phase == apiv1.PodPending { | ||
| return errors.New("some pods still pending") | ||
| } | ||
| } | ||
|
|
||
| for _, pod := range podList.Items { | ||
| if pod.Status.PodIP == "" { | ||
| return errors.New("a pod has not been allocated an IP") | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
| err := defaultRetrier.Do(ctx, checkPodIPsFn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function WaitForPodsRunning does the same thing, we can use it here.
| runAsUser: 0 | ||
| volumeMounts: | ||
| - mountPath: /var/run/azure-cns | ||
| - mountPath: /var/run/azure-network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might break the existing cilium overlay load test pipeline fi privileged pods are used to get the json file. ( I don't think they do, but good to check)
| for _, ip := range networks.Endpoints { | ||
| pod := ip.PODName | ||
| ipv4 := ip.IPAddresses[0].IP | ||
| azureCnsPodIps[ipv4] = pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would pass in both cases, but want to make sure the variable names though. In case of dual stack the first IP address is IPv4, i thought it would be IPv6 ?
| // check node status | ||
| nodeConditions := nodes.Items[index].Status.Conditions | ||
| if nodeConditions[len(nodeConditions)-1].Type != corev1.NodeReady { | ||
| return errors.Wrapf(err, "node %s status is not ready", nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because of error ? Or we expect the node to be ready after a certain time. ?
If it is going to be ready after a certain time, should we add a retry ?
a4a2d8d to
e75c993
Compare
| name: "DualStack_Overlay_Linux_tests" | ||
| displayName: "DualStack Overlay Linux Tests" | ||
| - task: AzureCLI@2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest move the windows part to a separate PR. Let's keep the first PR for adding linux coverage for dualstack
| @@ -0,0 +1,32 @@ | |||
| parameters: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, existing
.pipeline/singletenancy/overlay/* should be moved to cilium folder. Cilium folder can have overlay folder if needed. @tamilmani1989 @vipul-21
We should create
.pipeline/singletenancy/azcnioverlay/v4/.yaml
.pipeline/singletenancy/azcnioverlay/dualstack/.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to keep the singletenancy folder as it is. Maybe add a parent to it which tells us that this is PR pipeline code and under that we can have different folder structure like:
pr_pipeline(or something better)/singletenancy/azcnioverlay/cilium
pr_pipeline(or something better)/singletenancy/azcnioverlay/v4/.yaml
pr_pipeline(or something better)/multitenany/azcnioverlay/dualstack/.yaml
Thoughts ?
| --yes | ||
| @$(MAKE) set-kubeconf | ||
|
|
||
| dualstack-windows-up: ## Brings up windows nodes on dualstack overlay cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, let's move windows out of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these make targets are misleading, they do not bring up a cluster, or (as far as I can tell) do anything dualstack-specific. it adds a windows nodepool. it should just be called add windows nodepool.
hack/aks/README.md
Outdated
| swift-up Bring up a SWIFT AzCNI cluster | ||
| windows-cniv1-up Bring up a Windows AzCNIv1 cluster | ||
| dualstack-overlay-up Brings up an dualstack overlay cluster | ||
| dualstack-windows-up Brings up windows nodes on dualstack overlay cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dualstack-overlay-byocni-up: is missing
| @@ -0,0 +1,86 @@ | |||
| apiVersion: apps/v1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not reuse test\integration\manifests\goldpinger\deployment.yaml ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original goldpinger deployment yaml does not have:
- name: HOSTS_TO_RESOLVE
value: "1.1.1.1 8.8.8.8 www.bing.com"
That we need to use to test host resolution
| @@ -0,0 +1,88 @@ | |||
| apiVersion: apps/v1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be added to goldpinger folder to be consistent with existing structure?
| kind create cluster --config ./test/kind/kind.yaml | ||
|
|
||
| # install azure Linux CNS and CNI dropgz images | ||
| install-azure-images: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to create standalone tooling for installing our components, let's fully commit to it and write some small kustomize payloads for them instead of hacking it with the integration test like this.
| --yes | ||
| @$(MAKE) set-kubeconf | ||
|
|
||
| dualstack-windows-up: ## Brings up windows nodes on dualstack overlay cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these make targets are misleading, they do not bring up a cluster, or (as far as I can tell) do anything dualstack-specific. it adds a windows nodepool. it should just be called add windows nodepool.
|
separate this PR to two PRs. Close this PR |

Reason for Change:
This PR is the datapath for ACN Linux test and dualstack Linux node properties check and cns state file check.
To run them locally, use following commands on dualstack overlay setup:
Linux Test Commands:
go test -timeout 30m -tags load -run ^TestLoad$ -tags=load
go test -timeout 30m -tags load -cni cniv2 -run ^TestValidateState$ -tags=load
go test -timeout 30m -tags load -run ^TestDualStackProperties$ -tags=load
go test -count=1 test/integration/datapath/datapath_linux_test.go -timeout 1m -tags connection -run ^TestDatapathLinux$ -tags=connection,integration -v
Windows Test Commands
go test -timeout 30m -tags load -run ^TestLoad$ -tags=load -os=windows
go test -timeout 30m -tags load -run ^TestDualStackProperties$ -tags=load -os=windows -cni cniv2
go test -timeout 30m -tags load -cni cniv2 -run ^TestValidateState$ -tags=load -os=windows
go test -count=1 test/integration/datapath/datapath_windows_test.go -timeout 3m -tags connection -run ^TestDatapathWin$ -tags=connection -v
Logs:
https://msazure.visualstudio.com/One/_build/results?buildId=76033872&view=results
Issue Fixed:
Requirements:
Notes: