Skip to content

Conversation

@paulyufan2
Copy link
Contributor

@paulyufan2 paulyufan2 commented Jun 24, 2023

Reason for Change:

This PR is the datapath for ACN Linux test and dualstack Linux node properties check and cns state file check.
To run them locally, use following commands on dualstack overlay setup:

Linux Test Commands:
go test -timeout 30m -tags load -run ^TestLoad$ -tags=load
go test -timeout 30m -tags load -cni cniv2 -run ^TestValidateState$ -tags=load
go test -timeout 30m -tags load -run ^TestDualStackProperties$ -tags=load
go test -count=1 test/integration/datapath/datapath_linux_test.go -timeout 1m -tags connection -run ^TestDatapathLinux$ -tags=connection,integration -v

Windows Test Commands
go test -timeout 30m -tags load -run ^TestLoad$ -tags=load -os=windows
go test -timeout 30m -tags load -run ^TestDualStackProperties$ -tags=load -os=windows -cni cniv2
go test -timeout 30m -tags load -cni cniv2 -run ^TestValidateState$ -tags=load -os=windows
go test -count=1 test/integration/datapath/datapath_windows_test.go -timeout 3m -tags connection -run ^TestDatapathWin$ -tags=connection -v

Logs:
https://msazure.visualstudio.com/One/_build/results?buildId=76033872&view=results

Issue Fixed:

Requirements:

Notes:

@paulyufan2 paulyufan2 requested a review from a team as a code owner June 24, 2023 01:25
@paulyufan2 paulyufan2 requested a review from camrynl June 24, 2023 01:25
@paulyufan2 paulyufan2 added cni Related to CNI. ci Infra or tooling. labels Jun 24, 2023
-nodepoolSelector="yournodepoolname"
To run the test use one of the following commands:
go test -count=1 test/integration/datapath/datapath_linux_test.go -timeout 3m -tags connection -run ^TestDatapathLinux$ -tags=connection,integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to add -mod=mod also to the test parameters, otherwise, golang makes you do go mod vendor which changes a bunch of files in the vendor directory
image

If you add the -mod=mod argurment however, no go mod vendor is needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at make test-integration, it uses -mod=readonly, which ensures that it doesnt read the vendor directory. Would that be more beneficial here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be no vendor directory

const (
LinuxDeployIPV4 = "../manifests/datapath/linux-deployment.yaml"
LinuxDeployIPv6 = "../manifests/datapath/linux-deployment-ipv6.yaml"
podLabelKey = "app"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This podLabelKey field is re-defined here, so it is throwing an error using local debugging:
image

The reason is, the datapath_win_test.go file, is not named datapath_windows_test.go
Go knows which OS you're using, and if you name your files *_linux_* or *_windows_*, it will ignore them if that's not your OS
*_win_* however, is not recognized by Go for windows

So can you rename the datapath_win_test.go file to datapath_windows_test.go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned! Does GO know based off of what GOOS is set as? If so, that would avoid the overwriting from occurring if the filename change happens.

Windows clusters can be run on a linux VM. Wouldn't that break this setup? If so, perhaps they shouldn't be under the same build tag.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GO know based off of what GOOS is set as

yes

*_windows.go is equivalent to adding a build tag //go:build windows in the file
https://pkg.go.dev/go/build

app: goldpinger
type: goldpinger-host
spec:
tolerations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to make goldpinger pods on my winodws node pool (don't want that, since it's for linux only, should target linux nodes only)
Can you make it so the goldpinger pods only get created on linux nodes, by using taints/tolerations?
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you have this flag in your tests, which is good:
-nodepoolSelector="yournodepoolname"

Keep that, but also I think you should make sure, there is no way, that this goldpinger daemonset can land on windows VMs, since goldpinger does not support windows

}

t.Run("all linux pods can ping each other", func(t *testing.T) {
clusterCheckCtx, cancel := context.WithTimeout(ctx, 3*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you get another context here? No other test has its context reassigned. Is this to leverage goldpinger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's specifically used for goldpinger to get clusterState


// get node allocated IPs and check whether it includes ipv4 and ipv6 address
// node status addresses object will return three objects; two of them are ip addresses object(one is ipv4 and one is ipv6)
if len(nodes.Items[index].Status.Addresses) < dualstackNodeStatusAddr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should explicitly check for the number of type InternalIP. There are multiple types of addresses that can be applied to a node and you are capturing all of them with this. https://github.com/kubernetes/api/blob/v0.27.3/core/v1/types.go#L5468-L5518

@paulyufan2 paulyufan2 changed the title Datapath linux ACN pipeline dualstack and v4 Overlay Linux ACN pipeline Jun 29, 2023
@paulyufan2 paulyufan2 changed the title dualstack and v4 Overlay Linux ACN pipeline dualstack and v4 Overlay ACN pipeline Jun 29, 2023

// return podLabelSelector and nodeLabelSelector
func createLabelSelectors() (string, string) {
return fmt.Sprintf("%s=%s", podLabelKey, *podPrefix), fmt.Sprintf("%s=%s", nodepoolKey, *nodepoolSelector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpayne3506 @paulyufan2 i guess this function can be extracted as I see John is also using the same thing in his windows tests.

require.NoError(t, err, "could not get k8s node list: %v", err)
}

createPodFlag := !(apierrors.IsAlreadyExists(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is createPodFlag is using alreadyexist err of list the nodes ? I don't think nodelist will give an error of type already exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should be checking for namespace creation(create if not exists) as the podnamespace is a flag


dualstack-windows-up: ## Brings up windows nodes on dualstack overlay cluster
$(AZCLI) aks nodepool add -g $(GROUP) -n npwin \
--cluster-name $(CLUSTER) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation is not aligned with the other attributes.

Comment on lines +110 to +120
if *testProfile == LinuxDeployIPV4 {
daemonset, err = k8sutils.MustParseDaemonSet(gpDaemonset)
if err != nil {
t.Fatal(err)
}
} else {
daemonset, err = k8sutils.MustParseDaemonSet(gpDaemonsetIPv6)
if err != nil {
t.Fatal(err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This can be replaced with(easier to scale if we need to add more test profiles.)

var gpDaemonset= map[string]string{
	"LinuxDeployIPV4":  "filename_linux",
	"DualStack"      :  "filename"   
}
...
daemonset, err = k8sutils.MustParseDaemonSet(gpDaemonset[*testProfile])
if err != nil {
	t.Fatal(err)
}
...

Comment on lines +205 to +231
podsClient := clientset.CoreV1().Pods(*podNamespace)

checkPodIPsFn := func() error {
podList, err := podsClient.List(ctx, metav1.ListOptions{LabelSelector: "app=goldpinger"})
if err != nil {
return err
}

if len(podList.Items) == 0 {
return errors.New("no pods scheduled")
}

for _, pod := range podList.Items {
if pod.Status.Phase == apiv1.PodPending {
return errors.New("some pods still pending")
}
}

for _, pod := range podList.Items {
if pod.Status.PodIP == "" {
return errors.New("a pod has not been allocated an IP")
}
}

return nil
}
err := defaultRetrier.Do(ctx, checkPodIPsFn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function WaitForPodsRunning does the same thing, we can use it here.

runAsUser: 0
volumeMounts:
- mountPath: /var/run/azure-cns
- mountPath: /var/run/azure-network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might break the existing cilium overlay load test pipeline fi privileged pods are used to get the json file. ( I don't think they do, but good to check)

for _, ip := range networks.Endpoints {
pod := ip.PODName
ipv4 := ip.IPAddresses[0].IP
azureCnsPodIps[ipv4] = pod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would pass in both cases, but want to make sure the variable names though. In case of dual stack the first IP address is IPv4, i thought it would be IPv6 ?

// check node status
nodeConditions := nodes.Items[index].Status.Conditions
if nodeConditions[len(nodeConditions)-1].Type != corev1.NodeReady {
return errors.Wrapf(err, "node %s status is not ready", nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because of error ? Or we expect the node to be ready after a certain time. ?
If it is going to be ready after a certain time, should we add a retry ?

@paulyufan2 paulyufan2 changed the title dualstack and v4 Overlay ACN pipeline dualstack Overlay ACN pipeline Jul 1, 2023
@paulyufan2 paulyufan2 force-pushed the datapath_Linux_acn branch from a4a2d8d to e75c993 Compare July 7, 2023 17:39
name: "DualStack_Overlay_Linux_tests"
displayName: "DualStack Overlay Linux Tests"
- task: AzureCLI@2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest move the windows part to a separate PR. Let's keep the first PR for adding linux coverage for dualstack

@@ -0,0 +1,32 @@
parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, existing
.pipeline/singletenancy/overlay/* should be moved to cilium folder. Cilium folder can have overlay folder if needed. @tamilmani1989 @vipul-21
We should create
.pipeline/singletenancy/azcnioverlay/v4/.yaml
.pipeline/singletenancy/azcnioverlay/dualstack/
.yaml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to keep the singletenancy folder as it is. Maybe add a parent to it which tells us that this is PR pipeline code and under that we can have different folder structure like:
pr_pipeline(or something better)/singletenancy/azcnioverlay/cilium
pr_pipeline(or something better)/singletenancy/azcnioverlay/v4/.yaml
pr_pipeline(or something better)/multitenany/azcnioverlay/dualstack/.yaml

Thoughts ?

--yes
@$(MAKE) set-kubeconf

dualstack-windows-up: ## Brings up windows nodes on dualstack overlay cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, let's move windows out of this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these make targets are misleading, they do not bring up a cluster, or (as far as I can tell) do anything dualstack-specific. it adds a windows nodepool. it should just be called add windows nodepool.

swift-up Bring up a SWIFT AzCNI cluster
windows-cniv1-up Bring up a Windows AzCNIv1 cluster
dualstack-overlay-up Brings up an dualstack overlay cluster
dualstack-windows-up Brings up windows nodes on dualstack overlay cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dualstack-overlay-byocni-up: is missing

@@ -0,0 +1,86 @@
apiVersion: apps/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not reuse test\integration\manifests\goldpinger\deployment.yaml ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original goldpinger deployment yaml does not have:

              - name: HOSTS_TO_RESOLVE
                value: "1.1.1.1 8.8.8.8 www.bing.com"

That we need to use to test host resolution

@@ -0,0 +1,88 @@
apiVersion: apps/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be added to goldpinger folder to be consistent with existing structure?

kind create cluster --config ./test/kind/kind.yaml

# install azure Linux CNS and CNI dropgz images
install-azure-images:
Copy link
Collaborator

@rbtr rbtr Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're going to create standalone tooling for installing our components, let's fully commit to it and write some small kustomize payloads for them instead of hacking it with the integration test like this.

--yes
@$(MAKE) set-kubeconf

dualstack-windows-up: ## Brings up windows nodes on dualstack overlay cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these make targets are misleading, they do not bring up a cluster, or (as far as I can tell) do anything dualstack-specific. it adds a windows nodepool. it should just be called add windows nodepool.

@rbtr rbtr mentioned this pull request Aug 3, 2023
4 tasks
@paulyufan2
Copy link
Contributor Author

separate this PR to two PRs. Close this PR

@paulyufan2 paulyufan2 closed this Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Infra or tooling. cni Related to CNI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants